New offer - be the first one to apply!
September 16, 2025
Senior • On-site
$184,000 - $287,500/yr
Santa Clara, CA
NVIDIA DGX Cloud is a fully managed, cloud-based AI supercomputing platform that provides organizations with direct access to NVIDIA's advanced GPU clusters, software, and AI expertise for developing and deploying AI workloads. It offers an enterprise-grade, full-stack solution, optimized for large-scale AI training and inference, and is available through partnerships with leading cloud service providers.
NVIDIA Mission Control powers every aspect of AI factory operations — from developer workloads to infrastructure to facilities — with the skills of a world-class operations team delivered as software. It powers NVIDIA Blackwell data centers for the newest frontiers of AI, bringing instant agility to inference and training workloads and full-stack intelligence that delivers world-class infrastructure resiliency. We are looking for a Senior Software Engineer with experience in building highly agile and reliable software to join us. We are building and improving a powerful platform (NVIDIA Mission Control) that will automate diagnosis and repair of GPU and CPU clusters on public clouds, private clouds, and virtual and physical hardware.
What you'll be doing:
Making the existing cluster automation platform more fault-tolerant, agile, hardware/networking aware, and resource-efficient
Enabling AI capabilities in the platform to enhance user experience and accelerate automation, and diagnosis and remediation of issues
Integrating with the ecosystem tools to enable a rich, unified user experience with full end-to-end capabilities
Collaborating with various stakeholders across NVIDIA to understand business context, influence the product roadmap, help with adoption of the automation platform, and reduce toil for managing clusters
Operating critical software services with high availability and reliability
Programming in systems languages like Rust and Go
Driving engineering best practices, mentoring engineers, and fostering an inclusive team culture
What we need to see:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience)
Keen interest in driving Agent AI projects
10 years of equivalent experience
Demonstrated ability in building scalable, agile, and robust distributed systems
Successful product rollouts and collaboration with early adopters
Technical leadership and ownership of projects across the organization
Hands-on approach, passion for continuous improvement, and willingness to get involved in all aspects of development
Experience working with ambiguity and driving clarity in complex technical decisions
Ways to stand out from the crowd:
Skilled in using AI to scale team productivity and agility
Experience with revamping complex systems with existing customers to take them to the next level
Experience with SRE, DevOps, CI/CD, and a variety of platforms
With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and versatile people in the world working with us, and our engineering teams are growing fast in some of the most impactful fields of our generation: Cloud Engineering and Cloud Functions. If you're a creative engineer who enjoys autonomy and shares our passion for technology, we want to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.You will also be eligible for equity and benefits.