New offer - be the first one to apply!
August 8, 2025
Senior • On-site
$208,000 - $333,500/yr
Santa Clara, CA
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. Colossus Cloud is at the heart of GPU Bring-up infrastructure strategy and used for all of NVIDIA's software development and QA. The cloud service offers many resource types to support the various use cases, like baremetal for development, managed k8s service for CI/CD etc. As we grow and expand into new datacenters for both new product bring-up and scale, we are looking to hire Cloud Efficiency Architect. This position involves crafting, implementing, and maintaining strong models for total cost of ownership, return on investment, and usage. The efficiency insights to Infra, collaborators and finance, help enable data-driven decisions to optimize Colossus investments. The candidate must demonstrate strong business and technical competence with cloud concepts
What you'll be doing:
Colossus Utilization & Cost Model Development: Design, build, and maintain comprehensive cost models for private cloud services, including compute, storage, network, and platform services.
Developing predictive models for Colossus resource consumption and demand, applying historical data and future projections to guide TCO predictions.
Build/Test Job Costing: Create granular cost models specifically for build and test jobs within, attributing costs to individual pipelines, projects, or teams.
Organizational (OrgN) Level Cost Allocation: Develop and refine cost allocation strategies to provide clear, actionable cost breakdowns by organizational unit, department, or business function (OrgN level).
Data Analysis & Reporting: Analyze large datasets from various Colossus to identify cost anomalies, optimization opportunities, and trends. Develop and automate reports and dashboards to visualize key cost and utilization metrics for different collaborators.
Tooling & Automation: Evaluate, implement, and leverage FinOps and cloud cost management tools to improve reporting, forecasting, and optimization capabilities. Automate data collection and reporting processes where feasible.
Collaborator Communication: Present utilization models and insights in a clear, concise, and actionable manner to technical and non-technical audiences, including senior leadership.
What we need to see:
12+ years of proven experience including 5+ in Cloud TCO - billing, utilization and TCO analysis
Willing to adapt quickly and learn new skills; eager to dive in and new opportunities while leading collaborative initiatives across departments
Deep familiarity with cloud-native product / services environments
Experienced with generating power-bi or dashboards to drive actions
Has worked in large scale cloud environments
Familiarity with AI, ML infrastructure, VIBE Coding and cloud/services
MBA/MS or equivalent experience.
Ways to stand out from the crowd:
Expertise in optimizing cloud infrastructure for TCO.
Cloud native tools like AWS Cost Explorer, GCP Billing, Azure Cost Management
Solid collaborative and interpersonal skills, specifically a proven track record to effectively guide and influence within a dynamic environment
mySQL, Splunk knowledge a plus
Deep knowledge and hands-on experience with one or more major cloud providers (AWS, Azure, GCP).
You will also be eligible for equity and benefits.