New offer - be the first one to apply!

October 16, 2025

Customer Reliability Engineer, Reliability and Incident Management

Senior • On-site

$171,000 - $254,000/yr

New York, NY , +1


Minimum qualifications:

  • Bachelor’s degree in Science, Technology, Engineering, Mathematics, or equivalent practical experience.
  • 13 years of experience troubleshooting and advocating for customer needs, and triaging technical issues (e.g., hardware, software, application, operational, process).
  • Experience in reading/debugging code written in a general purpose coding language (e.g., Java, C, C++, Python, Shell, Go or JavaScript, etc.) and in virtualization and orchestration frameworks.
  • Experience troubleshooting /advocating in a customer-facing environment, triaging technical issues (e.g., hardware, software, application, operational, process).
  • Experience with incident management and response in a distributed systems environment.

Preferred qualifications:

  • 5 years of experience in a technical role such as Site Reliability Engineering, Technical Solutions Engineering, or Software Engineering, Customer Engineering or professional services.
  • Experience in applying SRE principles to improve the reliability and performance of systems.
  • Experience in managing technical escalations and communicating with executive stakeholders.
  • Familiarity with facilitating resilience exercises and post-incident reviews.
  • Ability to lead and influence in a collaborative, cross-functional environment and analyze systems, specifically Non-Abstract Large-Scale System Design (NALSD).
  • Excellent communication skills with the ability to convey technical concepts to a variety of audiences.

About the job

As a Customer Reliability Engineer, you will be a pivotal individual contributor on the CRIM team. You will be entrusted with providing technical leadership directly to customers. This role requires a unique blend of technical expertise, exceptional customer-facing skills, and a proactive mindset to prevent incidents before they occur. You will leverage your experience to guide customers in adopting SRE principles, navigating incidents, and ultimately build reliable systems on Google Cloud.Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $171,000-$254,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

  • Provide education on Google's reliability principles and best practices by engaging directly with customers.
  • Lead high-level architectural reviews and risk analyses to help customers identify and mitigate potential system vulnerabilities.
  • Advocate for customers within Google by providing feedback to Product and Engineering teams to improve reliability and supportability.
  • Ensure lessons are learned from incidents by supporting and guiding the postmortem review process.
  • Co-ordinate remediation actions in the customer's environment post-incident by working with frontline Technical Support Engineers.