New offer - be the first one to apply!
October 7, 2025
Senior • Hybrid • On-site • Remote
$272,000 - $425,500/yr
Santa Clara, CA
NVIDIA’s Data Center MODS organization is looking for an Engineering Manager to help Cloud Service Providers (CSPs) and OEMs scale out current and next generation datacenter products. You will be responsible for validating and scaling NVIDIA’s GPU products at the system level, pushing hardware to its limits to ensure adaptability and reliability across diverse environments — from internal validation labs to hyperscale data centers. Our organization partners closely with architecture, ASIC, operations, and data center teams to build methodologies that stress every subsystem of the GPU and server platform. The team also supports diagnostics for customer deployments, tailoring stress workloads to specific configurations and use cases.
What you'll be doing:
Lead and mentor a high-performing engineering team, fostering technical growth and leadership.
Collaborate with architecture and hardware teams to drive development of stress and diagnostic software targeting GPUs, CPUs, memory, storage, and interconnects.
Lead multiple concurrent projects, balancing long-term strategy with short-term execution.
Work with Cloud Service Providers (CSPs), OEMs, and data center operators to support deployment and customization of diagnostics.
Champion continuous improvement in product quality, debug efficiency, and operational scalability.
What we need to see:
Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field or equivalent experience.
10+ overall years of experience in system software development, with 4+ years in engineering management.
Experience with C/C++/Python
Deep understanding of operating systems, kernel drivers, and hardware-software interaction.
Experience with PC/server architecture, including PCIe, NVLink, Infiniband, or Ethernet.
Consistent track record of leading feature development and multi-team debugging efforts.
Ways to Stand Out from the Crowd:
Experience with diagnostics or stress testing in large-scale data center environments.
Familiarity with GPU compute, graphics, memory subsystems, or high-speed interfaces.
Prior experience working with CSPs or OEMs on system-level validation and deployment.
Strong communication and multi-functional leadership skills.
Passion for building tools that ensure product excellence and customer success.
You will also be eligible for equity and benefits.