November 12, 2025

Product Quality Engineer, AI/ML, Hardware, Google Cloud

Senior • On-site

$165,000 - $245,000/yr

Atlanta, GA , +1


Minimum qualifications:

  • Bachelor's degree in Electrical Engineering, Computer Engineering, Materials Science, Industrial Engineering, a related technical field, or equivalent practical experience.
  • 10 years of experience in Hardware Quality, Reliability, Product Engineering, or a similar role focused on electronic systems (e.g., servers, accelerators, networking equipment).
  • 8 years of experience leading cross-functional teams to solve technical problems and drive quality improvements.
  • Experience with AI/ML system architectures, including TPU/GPU based platforms, key components (e.g., high-speed interconnects, power delivery), and characteristic failure modes.

Preferred qualifications:

  • Master's or PhD degree in Electrical Engineering or a related field.
  • Certification in Certified Reliability/Quality Engineer (CRE/CQE).
  • 12 years of experience in Quality/Reliability, with substantial direct experience in GPU/TPU or other AI/ML accelerator hardware.
  • Experience in a technical leadership role with defining quality strategy, and collaborating with executive stakeholders.
  • Experience in a customer-facing quality role with managing executive communications and escalations for technical issues, with the ability to travel as required.
  • Excellent hardware and software debugging skills, with experience in analyzing system logs, manufacturing test data, and diagnostic outputs to pinpoint root causes.

About the job

Be part of a team that pushes boundaries, developing custom silicon solutions that power the future of Google's direct-to-consumer products. You'll contribute to the innovation behind products loved by millions worldwide. Your expertise will shape the next generation of hardware experiences, delivering unparalleled performance, efficiency, and integration.

Google Cloud is powered by advanced compute, network, storage, and Artificial Intelligence (AI) platforms, built on one of the world’s largest and most sophisticated Technical Infrastructures (TI). The Cloud Supply Chain and Operations (CSCO) teams are responsible for the fast and efficient deployment of this infrastructure.

The Global Hardware Quality and Reliability (GHQR) team ensures predictable quality and reliability across all hardware components, systems including Tensor Processing Unit/Graphics Processing Unit (TPU/GPU) AI platforms and data center infrastructure. This hardware is the foundation of Google Cloud and its AI/ML capabilities, directly contributing to Google's engaged edge.

In this role, you will own the quality and reliability strategy for Google's TPU/GPU-based AI/ML platforms. You will be the quality expert, collaborating with cross-functional partners in Design, Manufacturing, and Operations to embed quality into every product. You will also analyze data, drive root cause analysis, and influence process improvements.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.

We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

The US base salary range for this full-time position is $165,000-$245,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

  • Define and own the quality and reliability strategy for TPU/GPU hardware across its entire life-cycle, from design through field support.
  • Lead the resolution of systemic quality issues in manufacturing and the field, driving Root Cause and Corrective Actions (RCCA) using structured methodologies.
  • Collaborate with engineering teams to influence design specifications, qualification plans, and test coverage to ensure product robustness and mitigate early risks.
  • Establish and monitor key quality KPIs (e.g., Average Severity Rate (ASR), Average Failure Rate (AFR), etc.). Analyze manufacturing and field data to develop predictive models and drive improvement in design and processes.
  • Act as the primary point for customer quality, managing escalations and integrating feedback. Oversee quality and corrective actions with suppliers, including Return Material Authorizations (RMA) and Process Change Notifications (PCN) qualification.