March 24, 2025

Senior Hardware Qualification Engineer, Machine Learning, Google Cloud

Mid • On-site

$147,000 - $216,000/yr

Sunnyvale, CA

Minimum qualifications:

  • Bachelor’s degree in Electrical Engineering, Computer Engineering, Physics, a related field, or equivalent practical experience.
  • 4 years of experience working in a data center technical environment, or 3 years of experience with an advanced degree.
  • Experience with testing requirements, best practices, and algorithms for data center hardware or systems, and using scripting and automation to execute test algorithms and analyze results.
  • Experience with lab equipment (e.g., scopes, analyzers) to collect traces.

Preferred qualifications:

  • Master's degree or PhD in Electrical Engineering, Computer Engineering, Physics, or a related field.
  • 1 year of experience in technical leadership.
  • Experience with scripting and automation to execute test algorithms, and then analyze results with knowledge of lab equipment like scopes, Bidirectional Encoder Representations from Transformers (BERTs) and analyzers.
  • Experience determining the appropriate data to collect to facilitate failure analysis from vendors.
  • Experience with Machine Learning/Graphics Processing Unit (ML/GPU) hardware components, memory systems, and interconnect technologies.
  • Knowledge of signal integrity, high speed interfaces, and to collect accurate Systems Integrator (SI) data.

About the job

Google's custom-designed machines make up one of the largest computing infrastructures in the world. The Hardware Qualification team ensures that this equipment is reliable. In the Research and Development lab, you plan for and execute the most effective way to test at scale.

In this role, you will lead initiatives to do testing for machine learning hardware deployed in Google's data centers. You will give input on designs to improve our hardware until you're sure it meets Google's standards of quality, reliability, and security.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $147,000-$216,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

  • Create, test, plans and coordinate resources across test environments.
  • Perform electrical, functional, performance and reliability testing for Google's machine learning trays and solutions. Analyze results to ensure that they meet Google's requirements.
  • Report bugs and motivate corrective action where needed with external suppliers and internal Supply Quality Engineer (SQEs), commodity teams and developers.
  • Use scopes, protocol analyzers, Bidirectional Encoder Representations from Transformers (BERTs) and other lab equipment to collect precision data.
  • Organize results, communicate findings and incorporate learnings to maintain and improve the qualification process.