New offer - be the first one to apply!

October 16, 2025

Senior Customer Reliability Engineer, Reliability Incident Management

Senior • On-site

$144,000 - $211,000/yr

New York, NY , +1


Minimum qualifications:

  • Bachelor’s degree in Science, Technology, Engineering, Mathematics, or equivalent practical experience
  • 9 years of experience troubleshooting and advocating for customer needs, and triaging technical issues (e.g., hardware, software, application, operational, process).
  • Experience in reading/debugging code written in a general purpose coding language (e.g., Java, C, C++, Python, Shell, Go or JavaScript, etc.) and in virtualization and orchestration frameworks.
  • Experience troubleshooting/advocating in a customer-facing environment, and triaging technical issues (e.g., hardware, software, application, operational, process).
  • Experience with incident management and response in a distributed systems environment.

Preferred qualifications:

  • 5 years of experience in a technical role such as Site Reliability Engineering, Technical Solutions Engineering, or Software Engineering, Customer Engineering or professional services.
  • Experience in applying SRE principles to improve the reliability and performance of systems.
  • Familiarity with facilitating resilience exercises and post-incident reviews.
  • Experience in managing technical escalations and communicating with executive stakeholders
  • Ability to lead and influence in a collaborative, cross-functional environment and also analyze systems, specifically Non-Abstract Large- Scale System Design (NALSD).
  • Excellent communication skills with the ability to convey technical concepts to a variety of audiences.

About the job

The Google Cloud Platform team helps customers transform and build what's next for their business — all with technology built in the cloud. Our products are developed for security, reliability and scalability, running the full stack from infrastructure to applications to devices and hardware. Our teams are dedicated to helping our customers — developers, small and large businesses, educational institutions and government agencies — see the benefits of our technology come to life. As part of an entrepreneurial team in this rapidly growing business, you will play a key role in understanding the needs of our customers and help shape the future of businesses of all sizes use technology to connect with customers, employees and partners.

As a Senior Customer Reliability Engineer, you will be a pivotal individual contributor on the CRIM team. You will be entrusted with providing technical leadership directly to customers. This role requires a unique blend of technical expertise, exceptional customer-facing skills, and a proactive mindset to prevent incidents before they occur. You will leverage your experience to guide customers in adopting SRE principles, navigating incidents, and ultimately building more reliable systems on Google Cloud.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $144,000-$211,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

  • Provide education directly to customers on Google's reliability principles and best practices.
  • Lead high-level architectural reviews and risk analyses to help customers identify and mitigate potential system vulnerabilities.
  • Advocate for customers within Google by providing feedback to Product and Engineering teams to improve reliability and supportability.
  • Ensure lessons are learned from incidents by supporting and guiding the postmortem review process.
  • Co-ordinate remediation actions in the customer's environment post-incident by working with frontline Technical Support Engineers.