Senior Software Engineer (AI/ML)

Senior • On-site • Remote

$119,800 - $234,700/yr

Overview

The Worldwide Fleet Resources Lifecycle Management team is dedicated to revolutionizing the management and optimization of Microsoft's global fleet resources. In addition to enhancing operational efficiency, reducing costs, and improving sustainability, the team is responsible for automating how new hardware is verified, managed, and delivered to Microsoft datacenters. This includes supporting Azure, High-Performance Computing, Office, and Edge Computing products within Microsoft. By enabling the seamless expansion of capacity for all Microsoft services, the team operates at the forefront of integrating cutting-edge hardware platforms into the cloud. Through advanced technologies and data-driven insights, our team sets new standards in fleet resource management.

As a Senior Software Engineer (AI/ML) in the Worldwide Fleet Resources Lifecycle Management team, you will play a pivotal role in supporting the onboarding of new hardware into the Azure cloud and driving the integration of intelligence into tools, processes, and resources across the entire organization. In this role, you will also be expected to understand requirements, create designs, and implement features needed to enable new technologies. This opportunity will allow you to grow your skills in both software and hardware by collaborating with various Azure teams, learning about emerging technologies in the industry, and driving meaningful change in Azure—all achieved through the seamless integration of intelligence into your work.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications:

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
1+ years of experience in designing, deploying, and scaling AI/ML solutions on cloud platforms (Azure, AWS, GCP).
1+ years of experience in robust production operations, reliability engineering, and lifecycle management of AI/ML systems using MLOps/LLMOps best practices.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:  Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

Bachelor's Degree in Computer Science
- OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
Proven experience in designing and implementing end-to-end AI/ML solutions, integrating them seamlessly into both new and existing products and services across the full technology stack.
Demonstrated expertise in compliance requirements, data governance, security best practices, and responsible AI principles.
Strong foundation in core machine learning principles and algorithms, along with knowledge of deep learning architectures, natural language processing (NLP), and generative AI techniques.
Experience with frameworks and libraries for ML (PyTorch, TensorFlow, Scikit-learn, Keras) and multi-agent AI applications (AutoGen, LangChain).
Experience with containers (Docker) and orchestration tools like Kubernetes.
Experience in handling large datasets and working with data processing frameworks (Apache Spark, Hadoop).
Excellent computer science fundamentals in algorithmic design, data structures, and complexity analysis.
Experience mentoring junior engineers and data scientists, providing technical guidance and code reviews.
Excellent cross-functional and interpersonal skills, with the ability to articulate solutions clearly and effectively.
Ability to balance competing demands and adapt to changing priorities.

Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until July 31, 2025.

#azurecorejobs

Responsibilities

Collaborates with stakeholders to identify user requirements, create design documents, and develop scalable systems and services. Works with product managers, engineers, and infrastructure teams to deliver impactful solutions.
Utilizes strong software engineering fundamentals, including clean architecture, modular design, thorough testing, and peer reviews for reliable codebases. Develops and optimizes code to enhance performance, maintainability, effectiveness, and return on investment (ROI).
Develops and deploys scalable Artificial Intelligence (AI)-driven tools, algorithms, and machine learning (ML) models to enhance efficiency, reliability, and productivity.
Collaborates with data scientists and product teams to align solutions with business objectives and deliver measurable value. Optimizes AI/ML models for performance and ensures seamless production integration.
Breaks down larger features into work items and supports planning, ensuring alignment with business priorities. Estimates engineering effort and tracks progress to ensure that all tasks are completed efficiently and effectively.
Serves as the Designated Responsible Individual (DRI) for monitoring, troubleshooting, and restoring production systems during on-call rotations. Leads live-site incident response, conducts root cause analysis, and implements long-term improvements to enhance system reliability and operational readiness.
Demonstrates a commitment to continuous learning, staying up to date with evolving technologies and best practices. Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns to improve product availability, reliability, efficiency, observability, and performance. Actively shares knowledge and contributes to a team culture that values technical excellence and growth.
Follows organizational policies to ensure security, privacy, safety, and accessibility standards. Demonstrates ownership and promotes a learning-oriented, inclusive team environment. Practices secure coding, data governance, and respectful collaboration within a mission-driven workplace.

Microsoft

Microsoft is a global technology company that focuses on empowering every person and organization to achieve more. With a mission to deliver the core infrastructure and foundational technologies for Microsoft's online businesses, including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform, the company is dedicated to powering the 'Intelligent Cloud' mission. Microsoft values respect, integrity, and accountability, and fosters a culture of inclusion where employees can thrive. The company is committed to continuous improvement and collaboration to drive innovation and provide high-quality results to customers. With a diverse and inclusive workforce, Microsoft is an equal opportunity employer.