New offer - be the first one to apply!
June 2, 2025
Senior • Hybrid • On-site • Remote
$284,000 - $425,500/yr
Santa Clara, CA
NVIDIA is seeking a strategic and technically grounded Director of Engineering to lead a high-impact organization at the intersection of core compute cloud infrastructure for AI factories. This organization is a key pillar in NVIDIA’s DGX Cloud ecosystem, building shared automation and reliability tooling that enables a sizable portion of our GPU-accelerated compute fleet.
You will further develop and scale an organization of engineers focused on running production software for large scale GPU-accelerated infrastructure. This organization partners closely with storage, networking, and several other teams across NVIDIA. You will be the engineering leader responsible for interfacing with some of our NVIDIA Cloud Partners to continuously meet our production excellence goals.
What You’ll Be Doing:
Build and grow a team of software engineers and leaders focused on automating day 0, 1, and 2 for large-scale GPU clusters running on bare metal and public clouds with service levels of various kinds.
Lead the design and continuous delivery of shared automation frameworks aligned with SLOs and error budgets.
Liaise with some of our NVIDIA Cloud Partners to ensure aligned priorities and sustained production excellence.
Drive clarity and execution through high ambiguity, translating broad, and ever evolving objectives into iterative delivery milestones.
Enable internal teams by reducing operational friction and improving automation coverage across the stack.
What We Need To See:
Proven experience leading software engineering teams (incl. SRE and/or DevOps) responsible for infrastructure automation, and distributed systems.
Demonstrated ability to build software engineering organizations, driving continuous incremental execution across teams, and operate effectively in highly ambiguous environments with ever evolving objectives.
Hands-on experience designing, running, or automating cloud infrastructure atop bare metal platforms and/or VMs.
Experience deploying cloud-native services on public clouds.
Track record of representing your company or division in external partnerships with public clouds, infrastructure vendors, and to internal partner teams.
Strong foundation in incremental delivery, and technical program execution.
Excellent written and verbal communication skills, with the ability to influence across levels and disciplines.
Bachelor of Science (or equivalent experience) or Master of Science degree in Computer Science or related field, with a minimum of 10+ overall years of experience developing and leading cloud infrastructure teams, and 5+ yrs of management experience
Ways to stand out from the crowd:
Relevant experience developing organizations at public cloud companies. Background leading teams running large-scale GPU clusters. Familiarity with technologies like Linux, NVIDIA BCM, Slurm, Infiniband, Kubernetes, Slurm, distributed storage, or BlueField DPUs.
Experience developing both internal-facing platform teams and customer-facing infrastructure as a service ones.
Track record of collaboration with security, or compliance teams including in regulated environments. Familiarity with AI/ML platform workloads and their reliability or performance characteristics.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hard-working and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure to the world enabling NVIDIA’s AI supercomputer technologies to be used by anyone.
The base salary range is 284,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.