New offer - be the first one to apply!

October 23, 2025

Data Center Operations Manager, Machine Learning, Travel

Mid • On-site

$105,000 - $151,000/yr

Reston, VA


Minimum qualifications:

  • Bachelor's degree in a technical field, or equivalent practical experience.
  • 5 years of experience in computing infrastructure, networking, operating systems, or hardware.
  • 2 years of experience managing technical teams, vendor or contract management and delivery.
  • Ability to work non-standard hours, including working weekends, night shifts, holidays and on shift-based schedules as required.

Preferred qualifications:

  • Experience working in data center environments, including building and operating large-scale infrastructure, network and compute architecture, and their life cycle, and Linux/Unix system administration.
  • Experience building and leading a collaborative team environment, with an ability to implement and drive the safety culture.
  • Experience with data gathering, analysis and presentation skills.
  • Excellent problem-solving skills.

About the job

Google isn't just a software company. The Hardware Operations team is responsible for monitoring the state-of-the-art physical infrastructure behind Google's powerful search technology. As a Hardware Operations Manager, you will manage a team of Data Center Technicians. You will oversee the quality installation of server hardware and components and take charge of complicated installations/troubleshooting.

Your team will install, configure, test, troubleshoot and maintain hardware (like servers and its components) and server software (like Google's Linux cluster). They will also take on the configuration of more complex components such as networks, routers, hubs, bridges, switches and networking protocols. They may lead small project teams on larger installations and develop project contingency plans.

The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.

We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

The US base salary range for this full-time position is $105,000-$151,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

  • Lead a team of individuals, communicate individual and team priorities that support organizational goals to repair, fix, and perform preventative maintenance on equipment, servers, machines, or infrastructure based on issues.
  • Partner with teams to meet goals and stakeholders to manage facility activities and set/implement strategies.
  • Maintain, monitor, and execute security and operational procedures and analyze trends to identify opportunities for improvements ensuring alignment with organizational policies.
  • Support and contribute to the implementation of Environmental Health and Safety (EHS) and other compliance programs and initiatives in collaboration with other teams to ensure environmental and safety incidents are investigated, resolved, and reported.
  • Manage a team of Machine Learning (ML) Travelers remotely and contribute and support 24/7 initiatives.