June 8, 2026

Lead Site Reliability Engineer

Senior • Remote

20,150 - 21,700 PLN

Warsaw, Poland

This is a remote position.

Virtusa helps its Clients by becoming a true extension of their software and data development capabilities. Through the readily set up, comprehensive, and self-governing teams, we let our Clients focus on their business while we make sure that their software products and data tools scale up accordingly and with outstanding quality.

We are looking for team player to fill Lead Site Reliability Engineer position in a dynamic international project for the customer from the Tech area.

Requirements

4+ years of hands-on Platform/System Engineering experience using Go, Python, Java, Ruby, or equivalent programming languages.
3+ years of experience in an engineering role, working with a diverse and distributed team located across the globe.
Hands-on experience with containerization technologies (Docker, Kubernetes, GitHub Actions, ArgoCD, EKS, AKS, ECS).
Exposure to Infrastructure as Code (IaC) with Multi-Cloud Deployments.
Proven experience building and reliably running modern full-stack cloud applications using public cloud technologies (AWS, Azure, GCP) at scale.
Effective written and verbal communication skills to properly articulate complex technical problems to all levels of the organization and customers.
Confidence in the ability to own and deliver a roadmap tied to business priorities.
A passion for excellence, a natural problem solver, and a critical thinker who enjoys digging deep to understand issues and solve hard problems.
Degree in Computer Science, Computer Engineering, or a related field (or equivalent experience).
Experience with modern infrastructure management systems- Chef is must have (Ansible, Terraform).

Nice to have:

Expertise in building Platform-as-a-Service (PaaS) solutions.

Benefits

Professional training programs

Work with a team that’s recognized for its excellence. We’ve been featured in the Deloitte Technology Fast 50 & FT 1000 rankings. We’ve also received the Great Place To Work® certification for five years in a row

Similar jobs you might like

Technology

Link Group

Senior Site Reliability Engineer

Senior

Hybrid

Warsaw, Poland

170 - 230 PLN

🏢 Summary: The role focuses on ensuring reliability, scalability, and performance of large-scale cloud-based applications by building and maintaining resilient infrastructure. You will manage AWS cloud environments, Kubernetes clusters, and CI/CD pipelines while implementing monitoring, automation, and incident response processes. The position emphasizes Infrastructure-as-Code, observability, and continuous reliability improvements. 🗂️ Requirements: 5+ years experience in SRE, DevOps or similar role, Strong experience with AWS cloud services, Experience with Infrastructure-as-Code tools, Hands-on experience with Kubernetes, Proficiency with Docker, Experience with CI/CD pipelines, Solid knowledge of PostgreSQL or Amazon RDS, Strong SQL knowledge, Knowledge of networking concepts (VPC, DNS, troubleshooting), Strong Linux/Unix administration skills, Experience with observability tools, Experience with automation in infrastructure, Experience with incident management 📃 Skills: AWS, Terraform, Pulumi, Kubernetes, EKS, Docker, GitHub, PostgreSQL, RDS, SQL, VPC, DNS, Linux, Unix, Prometheus, Grafana, Datadog, Dynatrace, CI/CD 🏢 Description: We are looking for an experienced Site Reliability Engineer to ensure the reliability, scalability, and performance of large-scale cloud-based web applications. You will work closely with software development, cloud operations, and platform teams to build and maintain resilient infrastructure and improve system stability. Key Responsibilities: Design and maintain monitoring, alerting, and incident response systems to ensure high availability Collaborate closely with engineering, product, and architecture teams Build and manage cloud infrastructure using Infrastructure-as-Code (e.g., Terraform, Pulumi) on AWS Operate and optimize Kubernetes environments (e.g., EKS) Develop and maintain containerized applications using Docker Improve CI/CD pipelines and drive automation across deployment processes Implement and manage observability tools (logging, metrics, tracing) Participate in incident management, postmortems, and reliability improvements Support capacity planning, disaster recovery, and system scaling Contribute to security, compliance, and operational best practices Develop automation and AI-driven solutions for monitoring and incident prevention Requirements: 5+ years of experience in SRE, DevOps, or similar roles Strong experience with AWS cloud services and Infrastructure-as-Code tools Hands-on experience with Kubernetes and containerized environments Proficiency in Docker and CI/CD pipelines (e.g., GitHub Actions) Solid understanding of databases (e.g., PostgreSQL, Amazon RDS) and SQL Knowledge of networking concepts (VPC, DNS, troubleshooting tools like dig/traceroute) Strong Linux/Unix administration skills Experience with observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace) Familiarity with automation and AI-based solutions in infrastructure Strong problem-solving and incident management skills

Technology

Caspian One

Site Reliability Engineer

Senior

Hybrid

Krakow, Poland

1,400 - 1,800 PLN

🏢 Summary: Hands-on Site Reliability Engineer role focused on ensuring stability, scalability, and observability of a mission-critical distributed risk and analytics platform in hybrid cloud environments. The position centers on production reliability, incident response, automation, and continuous improvement of monitoring and deployment processes. You will collaborate with engineering teams to strengthen system resilience, performance, and operational standards. 🗂️ Requirements: Strong Java experience in distributed systems, Experience with observability and monitoring tools, Hands-on experience with hybrid cloud environments (preferably GCP), Experience with CI/CD pipelines and automation tools, Solid knowledge of Linux systems administration, Understanding of RDBMS fundamentals, Experience with job schedulers (e.g., Control-M), Ability to lead incident response and root-cause analysis 📃 Skills: Java, Grafana, Prometheus, Loki, OpenTelemetry, GCP, Jenkins, Ansible, Linux, SQL, Control-M, CI/CD 🏢 Description: We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure. In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance. This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time. Core Responsibilities Maintain and improve the reliability, uptime, and performance of distributed applications. Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews. Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks. Drive continuous improvement across automation, deployment processes, and service stability. Collaborate with cross‑functional teams to influence architecture, design, and operational standards. Support CI/CD pipelines, environment configuration, and vulnerability remediation. Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption. Required Skills & Experience Strong Java background with proven experience supporting or developing distributed systems. Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar). Hands‑on with hybrid cloud environments , ideally with GCP or another major cloud provider. CI/CD and automation experience (e.g., Jenkins, Ansible). Solid understanding of Linux , RDBMS fundamentals , and job schedulers (e.g., Control‑M or equivalents). Strong analytical mindset with a methodical approach to troubleshooting. Excellent communication skills and comfort working in Agile teams.

Technology

Link Group

Site Reliability Engineer

Mid

Hybrid

Warsaw, Poland

🏢 Summary: Hands-on Site Reliability Engineer role focused on building and scaling reliability practices across cloud and on-prem environments. The position involves improving performance, scalability, and resilience of production systems through automation, observability, and Kubernetes-based infrastructure. You will drive SRE standards and collaborate with engineering teams to enhance system stability and fault tolerance. 🗂️ Requirements: 4+ years experience in SRE, DevOps or similar roles, Strong experience with distributed systems, Strong experience with Kubernetes, Experience with AWS cloud, Hands-on automation experience with Python, Bash or Go, Solid understanding of CI/CD practices, Experience with observability and monitoring tools, Experience managing production systems 📃 Skills: Kubernetes, AWS, Python, Bash, Go, Prometheus, Grafana, CI/CD, SRE, DevOps 🏢 Description: We’re looking for a Site Reliability Engineer (SRE) to help build and scale reliability practices across our engineering organization. This is a hands-on role where you’ll work across cloud and on-prem environments, improving the performance, scalability, and resilience of critical production systems. 🔧 What you’ll be doing: • Driving SRE best practices, standards, and ways of working • Building and scaling observability & monitoring solutions (e.g. Prometheus, Grafana) • Working with Kubernetes-based infrastructure to ensure reliability and efficiency • Automating deployments, incident response, and recovery processes • Collaborating closely with engineering teams to improve system stability and fault tolerance • Contributing to a strong reliability culture (SLOs, post-mortems, continuous improvement) ✅ What we’re looking for: • 4+ years of experience in SRE / DevOps / similar roles • Strong experience with distributed systems, Kubernetes, and cloud (AWS preferred) • Hands-on approach to automation (Python, Bash, or Go) • Solid understanding of CI/CD and modern software delivery • Proactive mindset and strong ownership of production systems Name and surname*

Technology

Sigma Software

Principal Site Reliability Engineer

Senior

Remote

Warsaw, Poland

🏢 Summary: Principal Site Reliability Engineer role leading infrastructure strategy for an AI-driven SaaS platform in the finance domain. The position focuses on scaling, securing, and optimizing cloud-based systems while driving automation, reliability, and performance. You will shape CI/CD, observability, and infrastructure practices in a high-growth environment. 🗂️ Requirements: 8+ years in Site Reliability Engineering or DevOps, 2+ years in Principal or Lead role, Experience in infrastructure modernization and scaling, Strong proficiency in Python, Expertise in AWS cloud platforms, Experience with AWS ECS and EKS, Experience designing and optimizing CI/CD pipelines, Experience with Terraform for infrastructure-as-code, Strong knowledge of monitoring and observability practices 📃 Skills: Python, AWS, ECS, EKS, Terraform, GitHub, Buildkite, CICD, Monitoring, Observability 🏢 Description: Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide. In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation. If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems. Customer Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide. Project The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence. Requirements At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments Strong proficiency in Python Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite Proficiency in infrastructure‑as‑code tools such as Terraform Strong knowledge of monitoring, observability, and performance optimization practices Upper-Intermediate level of spoken and written English Would be a plus: Experience with monorepos (Turborepo, pnpm) Familiarity with modern TypeScript tools (swc, biome, oxc) Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest) Personal Profile Excellent leadership, communication, and decision‑making abilities Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments Responsibilities Define and lead infrastructure and reliability strategy across the platform Design scalable, resilient systems in collaboration with engineering teams Optimize build, testing, and deployment processes for speed and stability Establish and uphold best practices for CI/CD, monitoring, and observability Lead incident response and drive continuous improvement post‑incident Automate workflows to reduce operational toil and risk Mentor engineers and foster a culture of operational excellence Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability

Technology

Sigma Software

Principal Site Reliability Engineer

Senior

Remote

Warsaw, Poland

🏢 Summary: Senior Principal Site Reliability Engineer role focused on defining and leading infrastructure and reliability strategy for an AI‑driven SaaS platform in the finance domain. The position centers on scaling, securing, and optimizing cloud‑based systems while driving automation, observability, and operational excellence. The role combines hands‑on technical leadership with strategic decision‑making in high‑growth environments. 🗂️ Requirements: 8+ years in Site Reliability Engineering or DevOps, 2+ years in Principal or Lead SRE role, Experience in infrastructure modernization and scaling in high-growth environments, Strong proficiency in Python, Deep expertise in AWS ECS and EKS, Experience designing and optimizing CI/CD pipelines, Hands-on experience with Terraform, Strong knowledge of monitoring and observability practices, Upper-Intermediate English level 📃 Skills: Python, AWS, ECS, EKS, Terraform, GitHubActions, Buildkite, CI/CD, Monitoring, Observability, Docker, Kubernetes 🏢 Description: Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide. In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation. If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems. Customer Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide. Project The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence. Requirements At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments Strong proficiency in Python Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite Proficiency in infrastructure‑as‑code tools such as Terraform Strong knowledge of monitoring, observability, and performance optimization practices Upper-Intermediate level of spoken and written English Would be a plus Experience with monorepos (Turborepo, pnpm) Familiarity with modern TypeScript tools (swc, biome, oxc) Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest) Personal Profile Excellent leadership, communication, and decision‑making abilities Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments Responsibilities Define and lead infrastructure and reliability strategy across the platform Design scalable, resilient systems in collaboration with engineering teams Optimize build, testing, and deployment processes for speed and stability Establish and uphold best practices for CI/CD, monitoring, and observability Lead incident response and drive continuous improvement post‑incident Automate workflows to reduce operational toil and risk Mentor engineers and foster a culture of operational excellence Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability

Technology

Sigma Software

Principal Site Reliability Engineer

Senior

Remote

Bucharest, Romania

🏢 Summary: Senior Principal Site Reliability Engineer role leading infrastructure strategy for an AI-driven SaaS platform in the finance domain. Responsible for scaling, securing, and optimizing cloud-based systems while driving reliability, automation, and operational excellence. The position shapes platform performance and resilience in a high-growth, cloud-native environment. 🗂️ Requirements: 8+ years in Site Reliability Engineering or DevOps, 2+ years in Principal or Lead role, Experience in infrastructure modernization and scaling, Strong Python proficiency, Expertise with AWS cloud platforms, Experience with container orchestration (ECS, EKS), Experience designing and optimizing CI/CD pipelines, Hands-on experience with Terraform, Strong knowledge of monitoring and observability practices, Experience leading incident response and reliability improvements 📃 Skills: Python, AWS, ECS, EKS, Kubernetes, Terraform, GitHubActions, Buildkite, CICD, Monitoring, Observability 🏢 Description: Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide. In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation. If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems. Customer Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide. Project The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence. Requirements At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments Strong proficiency in Python Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite Proficiency in infrastructure‑as‑code tools such as Terraform Strong knowledge of monitoring, observability, and performance optimization practices Upper-Intermediate level of spoken and written English Would be a plus Experience with monorepos (Turborepo, pnpm) Familiarity with modern TypeScript tools (swc, biome, oxc) Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest) Personal Profile Excellent leadership, communication, and decision‑making abilities Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments Responsibilities Define and lead infrastructure and reliability strategy across the platform Design scalable, resilient systems in collaboration with engineering teams Optimize build, testing, and deployment processes for speed and stability Establish and uphold best practices for CI/CD, monitoring, and observability Lead incident response and drive continuous improvement post‑incident Automate workflows to reduce operational toil and risk Mentor engineers and foster a culture of operational excellence Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability

Technology

VISA

Staff Site Reliability Engineer

Senior

Hybrid

Warsaw, Poland

🏢 Summary: The offer is for a Staff Site Reliability Engineer focused on leading CI/CD and infrastructure automation initiatives to ensure platform resilience and scalability. The role involves optimizing pipelines, implementing Infrastructure as Code, enhancing observability, and mentoring engineers within a DevOps environment. It is a hybrid position requiring strong cloud and containerization expertise. 🗂️ Requirements: Advanced English (C1 level), Proficiency in CI/CD tools (Argo, Codefresh), Expertise in Infrastructure as Code using Terraform, Strong knowledge of Docker and Kubernetes, Experience with service mesh technologies, Proficiency in monitoring and observability tools, Experience with AWS cloud services, Hands-on experience in automating deployments and CI/CD integration, Experience participating in OnCall rotations 📃 Skills: Argo, Codefresh, Terraform, Docker, Kubernetes, Istio, Grafana, Loki, Honeycomb, OpenTelemetry, Prometheus, AWS, Golang, Java, Groovy 🏢 Description: About Us Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you. Job Description We are seeking a highly skilled and experienced Staff Site Reliability Engineer to join our DevOps squad. This role will focus on leading technical initiatives, optimizing CI/CD pipelines, automating infrastructure provisioning, and ensuring platform resilience. The ideal candidate will have a strong background in CI/CD, Infrastructure as Code (IaC), and cloud technologies, as well as the ability to mentor engineers and contribute to the overall stability and scalability of our platform. Responsibilities: Technical Leadership: Lead the implementation and optimization of CI/CD pipelines. Develop and maintain Infrastructure as Code (IaC) scripts to automate infrastructure provisioning and management. Identify and implement automation opportunities to improve efficiency and reduce maxnual effort. Ensure best practices in CI/CD and IaC to promote consistency, repeatability, and compliance. Platform Resilience: Maintain CI/CD resilience by avoiding unplanned or uncommunicated changes. Serve as an example of diligence and reliability to the team. Technical Contributions: Make high-impact technical contributions recognized by the team and organization. Write effective post-mortem documentation for internal and external stakeholders. Mentorship: Mentor and provide constructive feedback to engineers across the company. Review pull requests and source code, focusing on improving CI/CD and automation practices. Consultation and Problem-Solving: Serve as a consultant for engineers from different squads. Solve complex and unknown problems under pressure. Technology Trends and POCs: Stay up-to-date with the latest technology trends in CI/CD and automation. Lead and execute Proof of Concepts (POCs) to introduce new technologies to the team. This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Qualifications Language Proficiency: Advanced English (C1 level). Technical Expertise: Proficiency in CI/CD tools such as Argo and Codefresh. Expertise in Infrastructure as Code (IaC) tools like Terraform. Strong knowledge of Docker and Kubernetes for containerization and orchestration. Experience with service mesh technologies such as Istio. Proficiency in monitoring and observability tools like Grafana, Grafana Loki, Honeycomb, OpenTelemetry, and Prometheus. Proficiency in AWS cloud services. Experience: Hands-on experience with automating deployment processes and integrating CI/CD pipelines. Experience with support and participation in OnCall rotations. Experience with programming languages such as Golang, Java, or Groovy can be considered a plus. Soft Skills: Strong problem-solving skills, especially under pressure. Ability to mentor and provide constructive feedback to engineers. Effective communication and collaboration skills. Visa is an EEO Employer Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Technology

VISA

Staff Site Reliability Engineer

Senior

Hybrid

Warsaw, Poland

🏢 Summary: Staff Site Reliability Engineer role focused on leading and optimizing CI/CD pipelines, automating infrastructure with Infrastructure as Code, and ensuring platform resilience in a cloud-native environment. The position drives technical excellence in DevOps practices, enhances deployment automation, and maintains high availability and observability standards. It also includes mentoring engineers and contributing to scalable, reliable platform operations. 🗂️ Requirements: Advanced English (C1), Proven experience with CI/CD pipeline implementation and optimization, Hands-on experience with Infrastructure as Code, Experience automating deployment processes, Strong knowledge of containerization and orchestration, Experience with service mesh technologies, Experience with monitoring and observability tools, Proficiency in AWS cloud services, Experience with OnCall rotations 📃 Skills: CI/CD, Argo, Codefresh, Terraform, IaC, Docker, Kubernetes, Istio, Grafana, Loki, Honeycomb, OpenTelemetry, Prometheus, AWS 🏢 Description: About Us Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you. Job Description We are seeking a highly skilled and experienced Staff Site Reliability Engineer to join our DevOps squad. This role will focus on leading technical initiatives, optimizing CI/CD pipelines, automating infrastructure provisioning, and ensuring platform resilience. The ideal candidate will have a strong background in CI/CD, Infrastructure as Code (IaC), and cloud technologies, as well as the ability to mentor engineers and contribute to the overall stability and scalability of our platform. Responsibilities: Technical Leadership: Lead the implementation and optimization of CI/CD pipelines. Develop and maintain Infrastructure as Code (IaC) scripts to automate infrastructure provisioning and management. Identify and implement automation opportunities to improve efficiency and reduce maxnual effort. Ensure best practices in CI/CD and IaC to promote consistency, repeatability, and compliance. Platform Resilience: Maintain CI/CD resilience by avoiding unplanned or uncommunicated changes. Serve as an example of diligence and reliability to the team. Technical Contributions: Make high-impact technical contributions recognized by the team and organization. Write effective post-mortem documentation for internal and external stakeholders. Mentorship: Mentor and provide constructive feedback to engineers across the company. Review pull requests and source code, focusing on improving CI/CD and automation practices. Consultation and Problem-Solving: Serve as a consultant for engineers from different squads. Solve complex and unknown problems under pressure. Technology Trends and POCs: Stay up-to-date with the latest technology trends in CI/CD and automation. Lead and execute Proof of Concepts (POCs) to introduce new technologies to the team. This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Qualifications Language Proficiency: Advanced English (C1 level). Technical Expertise: Proficiency in CI/CD tools such as Argo and Codefresh. Expertise in Infrastructure as Code (IaC) tools like Terraform. Strong knowledge of Docker and Kubernetes for containerization and orchestration. Experience with service mesh technologies such as Istio. Proficiency in monitoring and observability tools like Grafana, Grafana Loki, Honeycomb, OpenTelemetry, and Prometheus. Proficiency in AWS cloud services. Experience: Hands-on experience with automating deployment processes and integrating CI/CD pipelines. Experience with support and participation in OnCall rotations. Experience with programming languages such as Golang, Java, or Groovy can be considered a plus. Soft Skills: Strong problem-solving skills, especially under pressure. Ability to mentor and provide constructive feedback to engineers. Effective communication and collaboration skills. Visa is an EEO Employer Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Technology

Link Group

Site Reliability Engineer

Senior

Remote

Warsaw, Poland

21,000 - 24,000 PLN

🏢 Summary: Senior Site Reliability Engineer responsible for end-to-end reliability of AI-driven applications and pipelines in production environments. Hands-on role focused on diagnosing, resolving, and automating production issues while improving monitoring and CI/CD processes. Ensures high performance, reliability, and standardized telemetry across AI systems. 🗂️ Requirements: 5+ years experience as SRE, Production Engineer, or Platform Engineer, Strong incident management and root cause analysis experience, Hands-on experience with Azure DevOps, Hands-on experience with Kubernetes, Hands-on experience with Datadog, Hands-on experience with Azure, Hands-on experience with CI/CD pipelines, Experience working in production environments, Ability to build and maintain monitoring and alerting systems 📃 Skills: Azure, Kubernetes, Datadog, AzureDevOps, CICD, Grafana, AI, LLM, Monitoring, Telemetry, RCA 🏢 Description: About the Role We are looking for a Senior Site Reliability Engineer who will take end-to-end ownership of reliability for AI-driven applications and pipelines. This is a hands-on engineering role, not a coordination or ticket-driven position. The ideal candidate actively diagnoses, resolves, and automates production issues rather than only designing solutions. Requirements 5+ years as SRE / Production / Platform Engineer Strong incident management & RCA experience Hands-on with: Azure DevOps, Kubernetes, Datadog, Azure, CI/CD Proactive, ownership mindset, self-driven Experience in production environments Nice to have: AI/LLM pipelines, Grafana Responsibilities Build and maintain monitoring, alerting, dashboards Lead incident response & root cause analysis Ensure reliability and performance of AI pipelines Standardize telemetry (latency, failures, throughput) Optimize CI/CD and release quality Reduce recurring incidents with engineering teams

Technology

Grid Dynamics Poland

Senior Site Reliability Engineer (SRE)

Senior

Hybrid

Warsaw, Poland

100 - 128 PLN

🏢 Summary: Senior Site Reliability Engineer role focused on ensuring reliability, performance, and resilience of enterprise products by bridging infrastructure and software engineering. The position involves hands-on Java/Spring Boot code fixes, Kubernetes-based container operations, incident response, and proactive architecture improvements. The engineer drives automation, observability, and security best practices across the SDLC. 🗂️ Requirements: 5+ years experience in SRE or Platform Engineering, Strong proficiency in Java, Strong proficiency in Spring Boot, Experience with Hibernate, Experience with Jenkins, Ability to read, analyze and fix application code, Hands-on experience with Docker, Hands-on experience with Kubernetes, Deep knowledge of Linux systems, Strong understanding of networking, Experience with distributed systems, Experience with monitoring and observability tools, Bachelor’s degree in Computer Science, Systems Engineering or equivalent experience 📃 Skills: Java, Spring, Hibernate, Jenkins, Docker, Kubernetes, Linux, Networking, Prometheus, Grafana, Splunk 🏢 Description: We are looking for an experienced Senior Site Reliability Engineer to join our team and oversee the reliability, resilience, and performance of our core enterprise products. In this role, you will bridge the gap between infrastructure operations and software engineering. You won't just react to alerts - you will proactively analyze system architecture, build automation, and dive deep into the application code (Java/Spring Boot) to fix bugs and eliminate issues at their root. Responsibilities: Architecture & Reliability: Understand the end-to-end product topology from both infrastructure and application perspectives. Identify bottlenecks, scale limitations, and unstable components, driving long-term resolutions before they impact production. Incident Response & RCA: Respond to outages, provide L3 on-call technical support (on rotation), and perform blameless Root Cause Analysis (RCA) to implement permanent fixes. Hands-on Engineering: Address defects, perform code bug fixes directly in production, and recommend architectural improvements during incident analysis. Security & Vulnerability Management: Oversee vulnerability management for applications and containers, manage patching processes, ensure compliance, and monitor certificate expirations and renewals according to global best practices. SRE Advocacy & SDLC: Represent the SRE organization in design reviews, capacity planning, and operational readiness exercises. Partner closely with development teams to embed reliability best practices early in the SDLC. Automation & Mentoring: Build automation tools to reduce manual toil and improve efficiency. Spread SRE culture, create standard documentation, and provide technical mentorship to junior team members. System Health: Oversee the production environment by tracking availability, applying learnings from observability tools, and becoming a Subject Matter Expert (SME) on core issuing products. Min requirements: Experience: 5+ years of experience in Site Reliability Engineering (SRE) or Platform Engineering roles. Software Engineering: Strong proficiency in Java, Spring Boot, Hibernate , and Jenkins. Ability to read, analyze, and fix application code. Containerization: Hands-on expertise with Docker and container orchestration using Kubernetes . Infrastructure: Deep knowledge of Linux systems, networking, and distributed architectures. Observability: Strong understanding of monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Splunk). Education: Bachelor’s degree in Computer Science, Systems Engineering, or equivalent practical experience. Soft Skills: Excellent problem-solving abilities and strong communication skills. Would be a plus: Infrastructure as Code & Cloud: Hands-on experience with tools like Terraform or Ansible, alongside familiarity with major public cloud providers (AWS, GCP, or Azure). Advanced Networking & Service Mesh: Knowledge of service mesh technologies (e.g., Istio, Linkerd) for traffic management, security, and observability in microservices architectures. Industry Experience: Previous background in the FinTech, payments, or banking sectors, with an understanding of high-security compliance standards (e.g., PCI-DSS). We offer: Opportunity to work on bleeding-edge projects Work with a highly motivated and dedicated team Competitive salary Flexible schedule Benefits package - medical insurance, sports Corporate social events Professional development opportunities Well-equipped office About us: Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

ITMAGINATION

ITMAGINATION is a technology company operating in the IT services and software development industry, specializing in delivering advanced digital and AI-driven solutions. The company focuses on building scalable, end-to-end technology systems, including AI applications, automation, and cloud-based platforms. Recognized for its rapid growth and industry impact, ITMAGINATION has been featured in the Deloitte Technology Fast 50 and FT 1000 rankings. It has also earned the Great Place To Work® certification for seven consecutive years, reflecting a strong commitment to workplace culture and employee satisfaction. The company emphasizes innovation, technical excellence, and continuous professional development, positioning itself as a forward-thinking and growth-oriented organization.

Check if your resume is ATS-ready before applying →Build an ATS-optimized resume