June 8, 2026
Lead Site Reliability Engineer
Senior • Remote
20,150 - 21,700 PLN
Warsaw, Poland
This is a remote position.
Virtusa helps its Clients by becoming a true extension of their software and data development capabilities. Through the readily set up, comprehensive, and self-governing teams, we let our Clients focus on their business while we make sure that their software products and data tools scale up accordingly and with outstanding quality.
We are looking for team player to fill Lead Site Reliability Engineer position in a dynamic international project for the customer from the Tech area.
Requirements
4+ years of hands-on Platform/System Engineering experience using Go, Python, Java, Ruby, or equivalent programming languages.
3+ years of experience in an engineering role, working with a diverse and distributed team located across the globe.
Hands-on experience with containerization technologies (Docker, Kubernetes, GitHub Actions, ArgoCD, EKS, AKS, ECS).
Exposure to Infrastructure as Code (IaC) with Multi-Cloud Deployments.
Proven experience building and reliably running modern full-stack cloud applications using public cloud technologies (AWS, Azure, GCP) at scale.
Effective written and verbal communication skills to properly articulate complex technical problems to all levels of the organization and customers.
Confidence in the ability to own and deliver a roadmap tied to business priorities.
A passion for excellence, a natural problem solver, and a critical thinker who enjoys digging deep to understand issues and solve hard problems.
Degree in Computer Science, Computer Engineering, or a related field (or equivalent experience).
Experience with modern infrastructure management systems- Chef is must have (Ansible, Terraform).
Nice to have:
Expertise in building Platform-as-a-Service (PaaS) solutions.
Benefits
Professional training programs
Work with a team that’s recognized for its excellence. We’ve been featured in the Deloitte Technology Fast 50 & FT 1000 rankings. We’ve also received the Great Place To Work® certification for five years in a row
Similar jobs you might like
Technology
ITMAGINATION
AI Senior DevOps Engineer
Senior
Remote
Warsaw, Poland
19,375 - 23,250 PLN
🏢 Summary: Remote AI Senior DevOps Engineer role focused on building and automating CI/CD and MLOps pipelines to enable seamless ML model deployment from training to production. The position bridges AI development and operations, emphasizing infrastructure as code, monitoring, DevSecOps, and scalable cloud environments. It involves close collaboration with AI teams to create a self-service model deployment platform. 🗂️ Requirements: 6–8 years experience in DevOps or SRE roles, Minimum 2 years experience in MLOps or AI/ML workloads, Expertise in Jenkins, GitHub Actions, or GitLab CI/CD, Hands-on experience with Azure DevOps, Hands-on experience with Terraform, Advanced knowledge of Docker and Kubernetes, Experience setting up Prometheus and Grafana dashboards, Experience with MySQL, PostgreSQL, or MongoDB, Familiarity with Ansible, Chef, or Puppet, Experience with SAST/DAST and vulnerability scanning in CI/CD, Experience with dataset and model versioning tools, Professional English C1 level 📃 Skills: Azure, Terraform, CloudFormation, Jenkins, GitHub, GitLab, Docker, Kubernetes, Prometheus, Grafana, MySQL, PostgreSQL, MongoDB, Ansible, Chef, Puppet, SAST, DAST, DVC, CI/CD, MLOps 🏢 Description: This is a remote position. Virtusa is seeking an AI Senior DevOps Engineer to bridge the gap between AI development and production-grade operations. You will be responsible for building the automated "highways" that allow ML models to flow from training to deployment seamlessly. This role requires a strong DevOps foundation combined with an understanding of the unique challenges of MLOps, such as GPU resource management, model versioning, and performance monitoring. Key Responsibilities: CI/CD & MLOps Pipelines: Build and maintain automated pipelines for ML models using Azure DevOps, GitHub Actions, or Jenkins. Workflow Automation: Automate model validation, packaging, and deployment workflows to ensure rapid iteration cycles. Infrastructure as Code (IaC): Use Terraform or CloudFormation to provision and manage cloud-native infrastructure, focusing on high-availability and scalability. Monitoring & Observability: Set up comprehensive monitoring for infrastructure (CPU/GPU/Memory) and model performance (latency and drift) using Prometheus and Grafana. DevSecOps Implementation: Integrate security into the heart of the pipeline, including secret management, IAM role configuration, and vulnerability scanning. Collaboration: Work closely with Data Scientists and AI Engineers to enable a self-service platform for model deployment. Requirements 6–8 years of experience in DevOps/SRE roles, with a minimum of 2 years focused on MLOps or supporting AI/ML workloads. Deep expertise in Jenkins, GitHub Actions, or GitLab CI/CD. Hands-on proficiency with Azure DevOps and Terraform (CloudFormation is a strong plus). Advanced knowledge of Docker and Kubernetes for managing distributed AI applications. Proven experience setting up Prometheus and Grafana dashboards for technical and model-specific metrics. Practical experience managing or connecting to MySQL, PostgreSQL, or MongoDB. Familiarity with Ansible, Chef, or Puppet for automated environment setup. Hands-on experience with SAST/DAST tools and automated vulnerability scanning within CI/CD pipelines. Experience with versioning tools for datasets and models (e.g., DVC or similar pipeline versioning logic). Professional English (C1) for seamless interaction with global delivery teams. Benefits Professional training programs Work with a team that’s recognized for its excellence. We’ve been featured in the Deloitte Technology Fast 50 & FT 1000 rankings. We’ve also received the Great Place To Work® certification for five years in a row
Technology
ITMAGINATION
Senior AI Full Stack Developer (Python & Angular)
Senior
Remote
Warsaw, Poland
25,000 - 28,675 PLN
🏢 Summary: Remote Senior FullStack Developer role focused on building and scaling end-to-end AI applications, agentic workflows, and modern front-end interfaces. The position involves designing AI-driven solutions, optimizing system architecture, and integrating LLM-based systems into production environments. You will work on data-heavy applications and enterprise automation within a fully remote project in Poland. 🗂️ Requirements: 5+ years of experience with Python for data-heavy or AI-integrated applications, Strong experience with Django, Flask or FastAPI, Strong mastery of Vanilla JavaScript, Hands-on experience with AngularJS and knowledge of migration to modern Angular, Experience building agentic AI systems and autonomous LLM-based workflows, Practical knowledge of LangChain, LlamaIndex and RAG, Experience with AI agent orchestration frameworks, Commercial experience with GCP or AWS, Proven experience with Google AppScript for enterprise automation, Strong understanding of RESTful APIs, Experience with asynchronous programming, Experience with end-to-end agentic workflows, Hands-on experience with GenAI technologies such as Gemini, Copilot or Claude 📃 Skills: Python, Django, Flask, FastAPI, JavaScript, AngularJS, Angular, LangChain, LlamaIndex, RAG, LLM, GCP, AWS, AppScript, REST, GenAI, Gemini, Copilot, Claude 🏢 Description: This is a remote position. As a Senior FullStack Developer at Virtusa , you will build and scale end-to-end AI applications, integrate automated workflows, and design intuitive front-end interfaces. You will diagnose architectural bottlenecks, propose AI-driven optimizations, and communicate directly with stakeholders to turn complex requirements into functional code. This is a fully remote project within Poland. Requirements At least 5 years of experience with Python (Django, Flask, or FastAPI), specifically for data-heavy or AI-integrated applications. Strong mastery of Vanilla JavaScript and hands-on experience with Angular JS (and familiarity with the migration path to modern Angular). Hands-on experience building agentic AI systems and autonomous workflows based on LLMs. Strong practical knowledge of: LangChain, LlamaIndex, RAG, AI agent orchestration frameworks. Commercial cloud experience: GCP (preferred) or AWS AppScript Expertise: Proven experience using Google AppScript for enterprise automation or custom add-ons. API Design: Deep understanding of RESTful APIs and asynchronous programming. Experience in end to end agentic workflows Strong experience in using Gen AI technologies : Gemini, co-pilot, claude AI Benefits Professional training programs Work with a team that is recognized for its excellence. We have been featured in the Deloitte Technology Fast 50 & FT 1000 rankings. We have also received the Great Place To Work® certification for seven years in a row
Technology
ITMAGINATION
Senior AI Full Stack Developer (Python & Angular)
Senior
Remote
Warsaw, Poland
25,000 - 28,675 PLN
🏢 Summary: Remote Senior FullStack Developer role focused on building and scaling end-to-end AI applications with agentic workflows and intuitive front-end interfaces. The position involves designing AI-driven optimizations, integrating LLM-based systems, and developing data-heavy backend services. The role requires close collaboration with stakeholders to translate complex requirements into production-ready solutions. 🗂️ Requirements: Minimum 5 years of experience with Python (Django, Flask, or FastAPI), Experience building data-heavy or AI-integrated applications, Strong proficiency in Vanilla JavaScript, Hands-on experience with AngularJS and knowledge of migration to modern Angular, Experience building agentic AI systems based on LLMs, Practical knowledge of LangChain, LlamaIndex, RAG, and AI orchestration frameworks, Commercial cloud experience with GCP or AWS, Proven experience with Google AppScript for enterprise automation, Strong understanding of RESTful APIs, Experience with asynchronous programming, Experience designing end-to-end agentic workflows, Hands-on experience with Generative AI tools such as Gemini, Copilot, or Claude 📃 Skills: Python, Django, Flask, FastAPI, JavaScript, AngularJS, Angular, LangChain, LlamaIndex, RAG, LLM, GCP, AWS, AppScript, REST, Gemini, Copilot, Claude 🏢 Description: This is a remote position. As a Senior FullStack Developer at Virtusa , you will build and scale end-to-end AI applications, integrate automated workflows, and design intuitive front-end interfaces. You will diagnose architectural bottlenecks, propose AI-driven optimizations, and communicate directly with stakeholders to turn complex requirements into functional code. This is a fully remote project within Poland. Requirements At least 5 years of experience with Python (Django, Flask, or FastAPI), specifically for data-heavy or AI-integrated applications. Strong mastery of Vanilla JavaScript and hands-on experience with Angular JS (and familiarity with the migration path to modern Angular). Hands-on experience building agentic AI systems and autonomous workflows based on LLMs. Strong practical knowledge of: LangChain, LlamaIndex, RAG, AI agent orchestration frameworks. Commercial cloud experience: GCP (preferred) or AWS AppScript Expertise: Proven experience using Google AppScript for enterprise automation or custom add-ons. API Design: Deep understanding of RESTful APIs and asynchronous programming. Experience in end to end agentic workflows Strong experience in using Gen AI technologies : Gemini, co-pilot, claude AI Benefits Professional training programs Work with a team that is recognized for its excellence. We have been featured in the Deloitte Technology Fast 50 & FT 1000 rankings. We have also received the Great Place To Work® certification for seven years in a row
Technology
Sigma Software
Principal Site Reliability Engineer
Senior
Remote
Warsaw, Poland
🏢 Summary: Senior Principal Site Reliability Engineer role focused on defining and leading infrastructure and reliability strategy for an AI‑driven SaaS platform in the finance domain. The position centers on scaling, securing, and optimizing cloud‑based systems while driving automation, observability, and operational excellence. The role combines hands‑on technical leadership with strategic decision‑making in high‑growth environments. 🗂️ Requirements: 8+ years in Site Reliability Engineering or DevOps, 2+ years in Principal or Lead SRE role, Experience in infrastructure modernization and scaling in high-growth environments, Strong proficiency in Python, Deep expertise in AWS ECS and EKS, Experience designing and optimizing CI/CD pipelines, Hands-on experience with Terraform, Strong knowledge of monitoring and observability practices, Upper-Intermediate English level 📃 Skills: Python, AWS, ECS, EKS, Terraform, GitHubActions, Buildkite, CI/CD, Monitoring, Observability, Docker, Kubernetes 🏢 Description: Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide. In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation. If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems. Customer Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide. Project The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence. Requirements At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments Strong proficiency in Python Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite Proficiency in infrastructure‑as‑code tools such as Terraform Strong knowledge of monitoring, observability, and performance optimization practices Upper-Intermediate level of spoken and written English Would be a plus Experience with monorepos (Turborepo, pnpm) Familiarity with modern TypeScript tools (swc, biome, oxc) Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest) Personal Profile Excellent leadership, communication, and decision‑making abilities Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments Responsibilities Define and lead infrastructure and reliability strategy across the platform Design scalable, resilient systems in collaboration with engineering teams Optimize build, testing, and deployment processes for speed and stability Establish and uphold best practices for CI/CD, monitoring, and observability Lead incident response and drive continuous improvement post‑incident Automate workflows to reduce operational toil and risk Mentor engineers and foster a culture of operational excellence Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability
Technology
Sigma Software
Principal Site Reliability Engineer
Senior
Remote
Bucharest, Romania
🏢 Summary: Senior Principal Site Reliability Engineer role leading infrastructure strategy for an AI-driven SaaS platform in the finance domain. Responsible for scaling, securing, and optimizing cloud-based systems while driving reliability, automation, and operational excellence. The position shapes platform performance and resilience in a high-growth, cloud-native environment. 🗂️ Requirements: 8+ years in Site Reliability Engineering or DevOps, 2+ years in Principal or Lead role, Experience in infrastructure modernization and scaling, Strong Python proficiency, Expertise with AWS cloud platforms, Experience with container orchestration (ECS, EKS), Experience designing and optimizing CI/CD pipelines, Hands-on experience with Terraform, Strong knowledge of monitoring and observability practices, Experience leading incident response and reliability improvements 📃 Skills: Python, AWS, ECS, EKS, Kubernetes, Terraform, GitHubActions, Buildkite, CICD, Monitoring, Observability 🏢 Description: Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide. In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation. If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems. Customer Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide. Project The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence. Requirements At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments Strong proficiency in Python Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite Proficiency in infrastructure‑as‑code tools such as Terraform Strong knowledge of monitoring, observability, and performance optimization practices Upper-Intermediate level of spoken and written English Would be a plus Experience with monorepos (Turborepo, pnpm) Familiarity with modern TypeScript tools (swc, biome, oxc) Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest) Personal Profile Excellent leadership, communication, and decision‑making abilities Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments Responsibilities Define and lead infrastructure and reliability strategy across the platform Design scalable, resilient systems in collaboration with engineering teams Optimize build, testing, and deployment processes for speed and stability Establish and uphold best practices for CI/CD, monitoring, and observability Lead incident response and drive continuous improvement post‑incident Automate workflows to reduce operational toil and risk Mentor engineers and foster a culture of operational excellence Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability
Technology
Sigma Software
Principal Site Reliability Engineer
Senior
Remote
Warsaw, Poland
🏢 Summary: Principal Site Reliability Engineer role leading infrastructure strategy for an AI-driven SaaS platform in the finance domain. The position focuses on scaling, securing, and optimizing cloud-based systems while driving automation, reliability, and performance. You will shape CI/CD, observability, and infrastructure practices in a high-growth environment. 🗂️ Requirements: 8+ years in Site Reliability Engineering or DevOps, 2+ years in Principal or Lead role, Experience in infrastructure modernization and scaling, Strong proficiency in Python, Expertise in AWS cloud platforms, Experience with AWS ECS and EKS, Experience designing and optimizing CI/CD pipelines, Experience with Terraform for infrastructure-as-code, Strong knowledge of monitoring and observability practices 📃 Skills: Python, AWS, ECS, EKS, Terraform, GitHub, Buildkite, CICD, Monitoring, Observability 🏢 Description: Are you ready to lead infrastructure strategy for a cutting‑edge AI‑driven SaaS platform? We are looking for a Principal Site Reliability Engineer with a proven track record in scaling, optimizing, and securing cloud‑based systems. This senior role offers the opportunity to shape the reliability and performance of a platform used by finance teams worldwide. In this role, you will be part of a dynamic engineering environment where your expertise will directly influence product stability and growth. You will work with advanced cloud technologies, automation tools, and AI-driven solutions, contributing to projects that push the boundaries of innovation. If you are ready to take on strategic responsibility and make a tangible impact, apply now and join us in building the future of reliable, scalable systems. Customer Sigma Software is partnering with a fast‑growing AI‑driven SaaS platform serving finance and accounting teams in high‑growth businesses. The platform automates critical workflows — from billing and collections to revenue recognition and reporting, ensuring compliance and accelerating cash flow. Leveraging advanced AI, it reduces manual work, increases operational efficiency, and supports scalability for customers worldwide. Project The project focuses on building and scaling an AI-powered SaaS solution for finance automation. It integrates advanced machine learning models with robust cloud infrastructure to deliver secure, compliant, and high‑performance services. The engineering culture emphasizes automation, resilience, and operational excellence. Requirements At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments Strong proficiency in Python Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite Proficiency in infrastructure‑as‑code tools such as Terraform Strong knowledge of monitoring, observability, and performance optimization practices Upper-Intermediate level of spoken and written English Would be a plus: Experience with monorepos (Turborepo, pnpm) Familiarity with modern TypeScript tools (swc, biome, oxc) Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest) Personal Profile Excellent leadership, communication, and decision‑making abilities Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments Responsibilities Define and lead infrastructure and reliability strategy across the platform Design scalable, resilient systems in collaboration with engineering teams Optimize build, testing, and deployment processes for speed and stability Establish and uphold best practices for CI/CD, monitoring, and observability Lead incident response and drive continuous improvement post‑incident Automate workflows to reduce operational toil and risk Mentor engineers and foster a culture of operational excellence Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability
Technology
Caspian One
Site Reliability Engineer
Senior
Hybrid
Krakow, Poland
1,400 - 1,800 PLN
🏢 Summary: Hands-on Site Reliability Engineer role focused on ensuring stability, scalability, and observability of a mission-critical distributed risk and analytics platform in hybrid cloud environments. The position centers on production reliability, incident response, automation, and continuous improvement of monitoring and deployment processes. You will collaborate with engineering teams to strengthen system resilience, performance, and operational standards. 🗂️ Requirements: Strong Java experience in distributed systems, Experience with observability and monitoring tools, Hands-on experience with hybrid cloud environments (preferably GCP), Experience with CI/CD pipelines and automation tools, Solid knowledge of Linux systems administration, Understanding of RDBMS fundamentals, Experience with job schedulers (e.g., Control-M), Ability to lead incident response and root-cause analysis 📃 Skills: Java, Grafana, Prometheus, Loki, OpenTelemetry, GCP, Jenkins, Ansible, Linux, SQL, Control-M, CI/CD 🏢 Description: We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure. In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance. This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time. Core Responsibilities Maintain and improve the reliability, uptime, and performance of distributed applications. Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews. Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks. Drive continuous improvement across automation, deployment processes, and service stability. Collaborate with cross‑functional teams to influence architecture, design, and operational standards. Support CI/CD pipelines, environment configuration, and vulnerability remediation. Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption. Required Skills & Experience Strong Java background with proven experience supporting or developing distributed systems. Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar). Hands‑on with hybrid cloud environments , ideally with GCP or another major cloud provider. CI/CD and automation experience (e.g., Jenkins, Ansible). Solid understanding of Linux , RDBMS fundamentals , and job schedulers (e.g., Control‑M or equivalents). Strong analytical mindset with a methodical approach to troubleshooting. Excellent communication skills and comfort working in Agile teams.
Technology
VISA
Staff Site Reliability Engineer
Senior
Hybrid
Warsaw, Poland
🏢 Summary: Staff Site Reliability Engineer role focused on leading and optimizing CI/CD pipelines, automating infrastructure with Infrastructure as Code, and ensuring platform resilience in a cloud-native environment. The position drives technical excellence in DevOps practices, enhances deployment automation, and maintains high availability and observability standards. It also includes mentoring engineers and contributing to scalable, reliable platform operations. 🗂️ Requirements: Advanced English (C1), Proven experience with CI/CD pipeline implementation and optimization, Hands-on experience with Infrastructure as Code, Experience automating deployment processes, Strong knowledge of containerization and orchestration, Experience with service mesh technologies, Experience with monitoring and observability tools, Proficiency in AWS cloud services, Experience with OnCall rotations 📃 Skills: CI/CD, Argo, Codefresh, Terraform, IaC, Docker, Kubernetes, Istio, Grafana, Loki, Honeycomb, OpenTelemetry, Prometheus, AWS 🏢 Description: About Us Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you. Job Description We are seeking a highly skilled and experienced Staff Site Reliability Engineer to join our DevOps squad. This role will focus on leading technical initiatives, optimizing CI/CD pipelines, automating infrastructure provisioning, and ensuring platform resilience. The ideal candidate will have a strong background in CI/CD, Infrastructure as Code (IaC), and cloud technologies, as well as the ability to mentor engineers and contribute to the overall stability and scalability of our platform. Responsibilities: Technical Leadership: Lead the implementation and optimization of CI/CD pipelines. Develop and maintain Infrastructure as Code (IaC) scripts to automate infrastructure provisioning and management. Identify and implement automation opportunities to improve efficiency and reduce maxnual effort. Ensure best practices in CI/CD and IaC to promote consistency, repeatability, and compliance. Platform Resilience: Maintain CI/CD resilience by avoiding unplanned or uncommunicated changes. Serve as an example of diligence and reliability to the team. Technical Contributions: Make high-impact technical contributions recognized by the team and organization. Write effective post-mortem documentation for internal and external stakeholders. Mentorship: Mentor and provide constructive feedback to engineers across the company. Review pull requests and source code, focusing on improving CI/CD and automation practices. Consultation and Problem-Solving: Serve as a consultant for engineers from different squads. Solve complex and unknown problems under pressure. Technology Trends and POCs: Stay up-to-date with the latest technology trends in CI/CD and automation. Lead and execute Proof of Concepts (POCs) to introduce new technologies to the team. This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Qualifications Language Proficiency: Advanced English (C1 level). Technical Expertise: Proficiency in CI/CD tools such as Argo and Codefresh. Expertise in Infrastructure as Code (IaC) tools like Terraform. Strong knowledge of Docker and Kubernetes for containerization and orchestration. Experience with service mesh technologies such as Istio. Proficiency in monitoring and observability tools like Grafana, Grafana Loki, Honeycomb, OpenTelemetry, and Prometheus. Proficiency in AWS cloud services. Experience: Hands-on experience with automating deployment processes and integrating CI/CD pipelines. Experience with support and participation in OnCall rotations. Experience with programming languages such as Golang, Java, or Groovy can be considered a plus. Soft Skills: Strong problem-solving skills, especially under pressure. Ability to mentor and provide constructive feedback to engineers. Effective communication and collaboration skills. Visa is an EEO Employer Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Technology
VISA
Staff Site Reliability Engineer
Senior
Hybrid
Warsaw, Poland
🏢 Summary: The offer is for a Staff Site Reliability Engineer focused on leading CI/CD and infrastructure automation initiatives to ensure platform resilience and scalability. The role involves optimizing pipelines, implementing Infrastructure as Code, enhancing observability, and mentoring engineers within a DevOps environment. It is a hybrid position requiring strong cloud and containerization expertise. 🗂️ Requirements: Advanced English (C1 level), Proficiency in CI/CD tools (Argo, Codefresh), Expertise in Infrastructure as Code using Terraform, Strong knowledge of Docker and Kubernetes, Experience with service mesh technologies, Proficiency in monitoring and observability tools, Experience with AWS cloud services, Hands-on experience in automating deployments and CI/CD integration, Experience participating in OnCall rotations 📃 Skills: Argo, Codefresh, Terraform, Docker, Kubernetes, Istio, Grafana, Loki, Honeycomb, OpenTelemetry, Prometheus, AWS, Golang, Java, Groovy 🏢 Description: About Us Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you. Job Description We are seeking a highly skilled and experienced Staff Site Reliability Engineer to join our DevOps squad. This role will focus on leading technical initiatives, optimizing CI/CD pipelines, automating infrastructure provisioning, and ensuring platform resilience. The ideal candidate will have a strong background in CI/CD, Infrastructure as Code (IaC), and cloud technologies, as well as the ability to mentor engineers and contribute to the overall stability and scalability of our platform. Responsibilities: Technical Leadership: Lead the implementation and optimization of CI/CD pipelines. Develop and maintain Infrastructure as Code (IaC) scripts to automate infrastructure provisioning and management. Identify and implement automation opportunities to improve efficiency and reduce maxnual effort. Ensure best practices in CI/CD and IaC to promote consistency, repeatability, and compliance. Platform Resilience: Maintain CI/CD resilience by avoiding unplanned or uncommunicated changes. Serve as an example of diligence and reliability to the team. Technical Contributions: Make high-impact technical contributions recognized by the team and organization. Write effective post-mortem documentation for internal and external stakeholders. Mentorship: Mentor and provide constructive feedback to engineers across the company. Review pull requests and source code, focusing on improving CI/CD and automation practices. Consultation and Problem-Solving: Serve as a consultant for engineers from different squads. Solve complex and unknown problems under pressure. Technology Trends and POCs: Stay up-to-date with the latest technology trends in CI/CD and automation. Lead and execute Proof of Concepts (POCs) to introduce new technologies to the team. This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Qualifications Language Proficiency: Advanced English (C1 level). Technical Expertise: Proficiency in CI/CD tools such as Argo and Codefresh. Expertise in Infrastructure as Code (IaC) tools like Terraform. Strong knowledge of Docker and Kubernetes for containerization and orchestration. Experience with service mesh technologies such as Istio. Proficiency in monitoring and observability tools like Grafana, Grafana Loki, Honeycomb, OpenTelemetry, and Prometheus. Proficiency in AWS cloud services. Experience: Hands-on experience with automating deployment processes and integrating CI/CD pipelines. Experience with support and participation in OnCall rotations. Experience with programming languages such as Golang, Java, or Groovy can be considered a plus. Soft Skills: Strong problem-solving skills, especially under pressure. Ability to mentor and provide constructive feedback to engineers. Effective communication and collaboration skills. Visa is an EEO Employer Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Technology
VISA
Sr. Site Reliability Engineer
Senior
Hybrid
Warsaw, Poland
🏢 Summary: Senior Platform Engineer role focused on building and operating a containerized, cloud-native platform supporting critical workloads. The position emphasizes reliability, resilience, and large-scale automation using SRE principles and Infrastructure-as-Code. The engineer owns the full lifecycle of Kubernetes-based infrastructure and drives automation and operational excellence across cloud environments. 🗂️ Requirements: Strong hands-on experience with AWS or Azure, Experience managing Kubernetes in production environments, Experience with Service Mesh technologies, Experience with Infrastructure as Code using Terraform, Knowledge of SRE principles and incident management practices, Experience with cloud-native microservices architecture, Understanding of observability and Golden Signals, Experience with infrastructure automation and orchestration 📃 Skills: AWS, Azure, Kubernetes, Istio, AppMesh, Linkerd, Terraform, GitOps, SRE, Microservices, Observability, ServiceMesh 🏢 Description: About Us Visa is a global leader in payments technology, enabling transactions between consumers, merchants, financial institutions, and governments in over 200 countries and territories. The company is committed to delivering secure, reliable, and innovative payment solutions worldwide. At Visa, you have the opportunity to create real impact—working on meaningful challenges, developing your skills, and contributing to solutions used globally. Job Description The Senior Platform Engineer is a senior individual contributor within the SRE Tribe, responsible for developing and maintaining a containerized platform supporting critical workloads. This role focuses on platform reliability, resilience, and automation , ensuring infrastructure is designed and operated according to SRE and cloud-native best practices. You will act as a technical expert, contributing hands-on while influencing cross-team initiatives—especially in infrastructure automation and orchestration at scale . Work Model This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager. Key Responsibilities Platform Ownership & Reliability Own the full lifecycle of platform components (design, provisioning, upgrades, decommissioning), including: Cloud infrastructure Kubernetes clusters and services Networking, ingress, and service discovery Service Mesh and data-plane components Ensure resilience using SRE principles: Fault isolation and graceful degradation Capacity planning and saturation control Reduction of operational toil Identify and mitigate reliability risks to improve platform stability Infrastructure Automation & Orchestration Design and implement infrastructure bootstrap processes: Automated provisioning of clusters and environments Repeatable platform setup and teardown Dependency-aware orchestration across cloud and Kubernetes layers Promote Infrastructure-as-Code and GitOps approaches: Reproducible and auditable platform components Automated, testable, and reversible changes Minimal manual intervention Identify automation gaps and drive improvements reducing operational risk SRE Practices & Operational Excellence Apply and promote SRE best practices: Clear ownership and runbooks Participation in on-call rotations Incident response and post-incident reviews Improve operational efficiency: Simplify maintenance and day-2 operations Standardize upgrade and rollback strategies Reduce MTTD and MTTR Ensure compliance with security and internal standards Qualifications Technical Skills Strong hands-on experience with: Public Cloud platforms (AWS preferred, Azure) Kubernetes at scale (production environments) Service Mesh (e.g., Istio, App Mesh, Linkerd) Strong understanding of: Observability and Golden Signals Incident management and on-call practices Infrastructure as Code (Terraform) Cloud-native microservices architecture Additional Skills Strong communication and collaboration skills Ability to work across teams and drive technical initiatives Additional Information Visa is an equal opportunity employer and considers all qualified applicants in accordance with applicable laws and regulations.