April 24, 2026

Senior Engineer - Site Reliability Engineering

Senior • Remote

208,000 - 312,000 PLN/yr

Krakow, Poland

Job Overview

As the Senior Software Engineer – SRE you will focus on implementing and maintaining reliability solutions across the platform. This role emphasizes hands-on engineering work, automation, and operational excellence. The Senior Software Engineer will work closely with other engineers to ensure systems are highly available, observable, and resilient.
As a member of the engineering team, the Senior Software Engineer will work closely with Infrastructure, Engineering, and Product teams to develop highly resilient, observable, and automated solutions that enhance system availability and efficiency. The ideal candidate will bring deep technical expertise, strong problem-solving skills, and a passion for reliability engineering.

Job Description and Requirements

Job Responsibilities

Implement, and advocate for best-in-class reliability, observability, and scalability practices across the platform.

Develop automated solutions for system reliability, capacity planning, and incident response to minimize manual intervention.

Participate in improving Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to enhance system reliability.

Contribute to CI/CD pipeline improvements and DevOps practices.

Support root cause analysis (RCA) investigations, drive corrective actions, and advocate for a blameless postmortem culture.

Participate in on-call rotations to ensure 24/7 availability of critical systems.

Influence and mentor engineering teams on SRE principles, DevOps culture, and best practices.

Stay ahead of industry trends, adopting new tools, frameworks, and methodologies to continually improve system reliability.

Preferred Qualifications

5+ years of experience in software engineering, site reliability engineering, or cloud infrastructure roles.

Experience with DevOps tooling and practices.

Proficient in building service-oriented architectures and cloud-native distributed systems.

Proficiency in programming languages such as Python, Go, Java, or C# or .Net.

In-depth technical understanding and experience with at least two of the following DevOps platforms: GitHub, Azure DevOps, GitLab, or Jenkins.

Hands-on experience with observability tools (e.g., Prometheus, Grafana, OpenTelemetry or others).

Strong background in CI/CD pipelines, automation, and DevOps practices.

Experience working in global, high-availability SaaS environments.

Experience implementing redundancy and disaster recovery scenarios.

Excellent teamwork and cross-group collaboration skills.

Ability to collaborate with both technical and business professionals.

Hands-on experience with Agile Project Development Methodologies.

Experience delivering complex technical solutions.

Excellent problem-solving, analytical, and communication skills.

Nice to have: Experience with Chaos Engineering and/or AI Ops .

Competencies and Skills

Automation-First Mindset – Commitment to reducing toil through scripting and automation.

Reliability Engineering – Expertise in SLOs, SLIs, error budgets, and high-availability architectures.

Incident Management & Postmortems – Experience in handling production incidents and driving continuous improvement.

Observability & Monitoring – Deep understanding of logging, monitoring, and alerting best practices.

Practical knowledge of data structures and modern data engines.

Collaboration & Communication – Ability to work across teams, influence stakeholders, and advocate for reliability improvements.

Mentorship & Coaching – Passion for mentoring engineers and building an SRE culture within the organization.

Additional Information

This role offers a unique opportunity to shape the future of SRE in a cutting-edge SaaS company, ensuring the reliability and scalability of mission-critical applications for customers worldwide. If you are passionate about solving complex reliability challenges and driving technical excellence, we’d love to hear from you!

Relativity is a diverse workplace with different skills and life experiences—and we love and celebrate those differences. We believe that employees are happiest when they're empowered to be their full, authentic selves, regardless how you identify.

Benefit Highlights:

Comprehensive health, dental, and vision plans

Parental leave for primary and secondary caregivers

Flexible work arrangements

Two, week-long company breaks per year

Additional time off

Long-term incentive program

Training investment program

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.

Relativity is committed to competitive, fair, and equitable compensation practices.

This position is eligible for total compensation which includes a competitive base salary, an annual performance bonus, and long-term incentives.

The expected salary range for this role is between following values:

208 000 and 312 000PLN

The final offered salary will be based on several factors, including but not limited to the candidate's depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position.

Required Skills:

Automation, Data Analysis, Database Management, Network Architecture, Performance Optimizations, Problem Solving, Project Management, Software Development, System Designs, Technical Leadership

Similar jobs you might like

Technology

Relativity

Senior Engineer - Site Reliability Engineering

Senior

Remote

Krakow, Poland

208,000 - 312,000 PLN/yr

🏢 Summary: Remote Senior Software Engineer – SRE role focused on building and maintaining highly available, scalable, and observable cloud-native systems. The position emphasizes automation, CI/CD improvements, incident management, and implementation of reliability best practices across SaaS platforms. The engineer collaborates cross-functionally to enhance system resilience, performance, and operational excellence. 🗂️ Requirements: 5+ years in Software Engineering, SRE, or Cloud Infrastructure roles, Experience with DevOps tools and practices, Proficiency in Python, Go, Java, C#, or .Net, Experience with at least two: GitHub, Azure DevOps, GitLab, Jenkins, Hands-on experience with observability tools, Strong experience with CI/CD pipelines and automation, Experience with cloud-native distributed systems, Experience in high-availability SaaS environments, Knowledge of SLOs, SLIs, and error budgets, Experience with redundancy and disaster recovery, Participation in on-call rotations 📃 Skills: Python, Go, Java, C#, .Net, GitHub, Azure, GitLab, Jenkins, Prometheus, Grafana, OpenTelemetry, CI/CD, DevOps, SLO, SLI, SaaS, Automation, Cloud, Agile 🏢 Description: Posting Type Remote Job Overview As the Senior Software Engineer – SRE you will focus on implementing and maintaining reliability solutions across the platform. This role emphasizes hands-on engineering work, automation, and operational excellence. The Senior Software Engineer will work closely with other engineers to ensure systems are highly available, observable, and resilient. As a member of the engineering team, the Senior Software Engineer will work closely with Infrastructure, Engineering, and Product teams to develop highly resilient, observable, and automated solutions that enhance system availability and efficiency. The ideal candidate will bring deep technical expertise, strong problem-solving skills, and a passion for reliability engineering. Job Description and Requirements Job Responsibilities Implement, and advocate for best-in-class reliability, observability, and scalability practices across the platform. Develop automated solutions for system reliability, capacity planning, and incident response to minimize manual intervention. Participate in improving Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to enhance system reliability. Contribute to CI/CD pipeline improvements and DevOps practices. Support root cause analysis (RCA) investigations, drive corrective actions, and advocate for a blameless postmortem culture. Participate in on-call rotations to ensure 24/7 availability of critical systems. Influence and mentor engineering teams on SRE principles, DevOps culture, and best practices. Stay ahead of industry trends, adopting new tools, frameworks, and methodologies to continually improve system reliability. Preferred Qualifications 5+ years of experience in software engineering, site reliability engineering, or cloud infrastructure roles. Experience with DevOps tooling and practices. Proficient in building service-oriented architectures and cloud-native distributed systems. Proficiency in programming languages such as Python, Go, Java, or C# or .Net. In-depth technical understanding and experience with at least two of the following DevOps platforms: GitHub, Azure DevOps, GitLab, or Jenkins. Hands-on experience with observability tools (e.g., Prometheus, Grafana, OpenTelemetry or others). Strong background in CI/CD pipelines, automation, and DevOps practices. Experience working in global, high-availability SaaS environments. Experience implementing redundancy and disaster recovery scenarios. Excellent teamwork and cross-group collaboration skills. Ability to collaborate with both technical and business professionals. Hands-on experience with Agile Project Development Methodologies. Experience delivering complex technical solutions. Excellent problem-solving, analytical, and communication skills. Nice to have: Experience with Chaos Engineering and/or AI Ops . Competencies and Skills Automation-First Mindset – Commitment to reducing toil through scripting and automation. Reliability Engineering – Expertise in SLOs, SLIs, error budgets, and high-availability architectures. Incident Management & Postmortems – Experience in handling production incidents and driving continuous improvement. Observability & Monitoring – Deep understanding of logging, monitoring, and alerting best practices. Practical knowledge of data structures and modern data engines. Collaboration & Communication – Ability to work across teams, influence stakeholders, and advocate for reliability improvements. Mentorship & Coaching – Passion for mentoring engineers and building an SRE culture within the organization. Additional Information This role offers a unique opportunity to shape the future of SRE in a cutting-edge SaaS company, ensuring the reliability and scalability of mission-critical applications for customers worldwide. If you are passionate about solving complex reliability challenges and driving technical excellence, we’d love to hear from you! Relativity is a diverse workplace with different skills and life experiences—and we love and celebrate those differences. We believe that employees are happiest when they're empowered to be their full, authentic selves, regardless how you identify. Benefit Highlights: Comprehensive health, dental, and vision plans Parental leave for primary and secondary caregivers Flexible work arrangements Two, week-long company breaks per year Additional time off Long-term incentive program Training investment program All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law. Relativity is committed to competitive, fair, and equitable compensation practices. This position is eligible for total compensation which includes a competitive base salary, an annual performance bonus, and long-term incentives. The expected salary range for this role is between following values: 208 000 and 312 000PLN The final offered salary will be based on several factors, including but not limited to the candidate's depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position. Required Skills: Automation, Data Analysis, Database Management, Network Architecture, Performance Optimizations, Problem Solving, Project Management, Software Development, System Designs, Technical Leadership

Technology

Relativity

Senior .NET Engineer - Automation Services

Senior

Remote

Krakow, Poland

208,000 - 312,000 PLN/yr

🏢 Summary: Senior Software Engineer role focused on building and operating a cloud-native engineering platform using .NET and Azure. The position centers on developing scalable platform services, Infrastructure as Code, and CI/CD automation to enhance developer productivity. You will drive DevOps practices, cloud infrastructure standardization, and platform observability in a fast-paced environment. 🗂️ Requirements: 5+ years of software development experience, Strong proficiency in C# and .NET, Hands-on experience with Microsoft Azure, Experience with Infrastructure as Code, Experience with CI/CD systems and DevOps pipelines, Experience with containers and orchestration tools, Ability to design scalable and secure cloud-native architectures, Agile team experience 📃 Skills: C#, .NET, Azure, Terraform, Bicep, ARM, CI/CD, DevOps, Docker, Kubernetes, AKS, IaC 🏢 Description: Posting Type Hybrid / Remote Job Overview We are looking for a Senior Software Engineer to join our Engineering Platform team, where you'll help build the cloud-native backbone that powers Relativity’s development ecosystem. You will work on high-impact initiatives that shape how our engineers build, test, deploy, and scale software—driving technical excellence across the company. As part of a growing and financially stable organization, you’ll have the opportunity to contribute in a dynamic, fast-moving environment that values autonomy, ownership, and practical action over excessive process. Our platform engineers are trusted with solving complex problems at scale, and empowered to make real decisions that improve the day-to-day experience of hundreds of developers. Job Description and Requirements Key Responsibilities Design, build, and operate platform services and developer tooling using .NET and Microsoft Azure that are scalable, secure, and reliable. Lead the development of Infrastructure as Code (IaC) components that automate and standardize cloud infrastructure. Drive adoption of DevOps practices and CI/CD automation across engineering teams. Collaborate closely with internal stakeholders to build reusable capabilities that improve development speed and consistency. Integrate observability, performance monitoring, and self-service tooling into the platform ecosystem. Mentor and support engineers through code reviews, technical discussions, and knowledge-sharing. Proactively identify areas of improvement, experiment with solutions, and deliver results—embracing our bias toward action. Contribute to initiatives that embed AI/ML into platform services and developer workflows. Qualifications 5+ years of software development experience with a strong command of C# and the .NET ecosystem. Deep hands-on experience with Azure services and cloud-native application patterns. Proficiency with Infrastructure as Code tools such as Terraform, Bicep, or ARM templates. Experience building or supporting CI/CD systems and modern DevOps pipelines. Familiarity with containers and orchestration tools (e.g., Docker, Kubernetes, AKS). Demonstrated success working in Agile product teams and collaborating cross-functionally. Strong problem-solving, communication, and mentorship skills. Bonus: Exposure to AI/ML tools or interest in applying AI to platform and engineering problems. Why Join Us? Be part of a growing, stable company where your work has long-term impact. Thrive in a fast-paced, evolving environment with clear opportunities for technical and career growth. Work in an engineering culture that values initiative, speed, and ownership. Shape the tools, practices, and environments that empower hundreds of engineers daily. Contribute to a mission-driven organization committed to innovation and excellence. Relativity is a diverse workplace with different skills and life experiences—and we love and celebrate those differences. We believe that employees are happiest when they're empowered to be their full, authentic selves, regardless how you identify. Benefit Highlights: Comprehensive health, dental, and vision plans Parental leave for primary and secondary caregivers Flexible work arrangements Long-term incentive program Training investment program All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law. Relativity is committed to competitive, fair, and equitable compensation practices. This position is eligible for total compensation which includes a competitive base salary, an annual performance bonus, and long-term incentives. The expected salary range for this role is between following values: 208 000 and 312 000PLN The final offered salary will be based on several factors, including but not limited to the candidate's depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position. Required Skills: Automation, Data Analysis, Database Management, Network Architecture, Performance Optimizations, Problem Solving, Project Management, Software Development, System Designs, Technical Leadership

Technology

Relativity

Senior Java Engineer

Senior

Remote

Krakow, Poland

208,000 - 312,000 PLN/yr

🏢 Summary: Senior Software Engineer role focused on architecting and developing model-based SaaS applications within a Data Breach Response engineering team. The position involves leading technical design, driving best practices, and improving CI/CD and developer productivity. The engineer will work across the stack to deliver scalable, high-quality software solutions aligned with business goals. 🗂️ Requirements: 5+ years of professional software development experience, 2+ years of experience delivering SaaS products, Strong knowledge of algorithms, data structures, and computational complexity, Proficiency in at least one of: Java, Python, C#, In-depth experience with at least two of: GitHub, Azure DevOps, GitLab, Jenkins, Experience with CI/CD tooling and practices, Ability to design technical solutions from specifications, Experience leading technical design and guiding best practices, Experience implementing redundancy and disaster recovery solutions 📃 Skills: Java, Python, C#, SaaS, Algorithms, DataStructures, ComputationalComplexity, GitHub, AzureDevOps, GitLab, Jenkins, CICD, DevOps, Redundancy, DisasterRecovery 🏢 Description: Job Overview Are you looking to be in a workplace where colleagues inspire one another, working together to build AI software solutions? We're looking for a Senior Software Engineer to join our Relativity Data Breach Response Engineering team. Job Description and Requirements Job Responsibilities Architect, design, and develop model-based applications Guide product improvements across schema, framework, and platform Lead the team to best-practice technical design, by example and through collaboration Understand department-wide initiatives and guidelines, work with teams to ensure implementation is aligned with Engineering-wide policies and direction Maintain a backlog of tools or technology enhancements to improve the maintainability and quality of the solutions Encourage teams to work collaboratively by advising and enforcing best practices when needed Stay up-to-date with technologies and best practices related to CI/CD tooling and developer productivity Qualifications 2+ years of professional experience delivering successful SaaS products. 5+ years of professional experience in software development Strong computer science fundamentals in algorithms, data structures and computational complexity A self-starter driven to make an impact Strong Communications skills, experience leading teams Proficiency in multiple programming languages, e.g: Java, Python, C# Able to learn at multiple levels of the stack – from fine-granularity technical implementation to high level control- and data-flow. Comfortable working from Specifications to create a Technical Approach and scoping work Able to consider the Engineering solution in the business context Obsession with software quality and empathy for customer experience In-depth technical understanding and experience with at least two of the following DevOps Platforms: GitHub, Azure DevOps, GitLab, and Jenkins Experience implementing redundancy and disaster recovery scenarios Nice to Have Experience building AI-powered products that use Natural Language Processing or Machine Learning Knowledge of Linux Knowledge of NoSQL database systems such as MongoDB, Redis, Elasticsearch Fundamental knowledge of Kubernetes and container-based systems Relativity is a diverse workplace with different skills and life experiences—and we love and celebrate those differences. We believe that employees are happiest when they're empowered to be their full, authentic selves, regardless how you identify. Benefit Highlights: Comprehensive health, dental, and vision plans Parental leave for primary and secondary caregivers Flexible work arrangements Two, week-long company breaks per year Additional time off Long-term incentive program Training investment program All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law. Relativity is committed to competitive, fair, and equitable compensation practices. This position is eligible for total compensation which includes a competitive base salary, an annual performance bonus, and long-term incentives. The expected salary range for this role is between following values: 208 000 and 312 000PLN The final offered salary will be based on several factors, including but not limited to the candidate's depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position.

Technology

Spire Global

Senior Backend Software Engineer

Senior

Hybrid

Boulder, CO

🏢 Summary: Senior Software Engineer role focused on backend and platform engineering to build reliable, secure systems supporting satellite and ground infrastructure. The position involves developing backend services, improving observability, modernizing container and CI/CD environments, and collaborating with infrastructure and cybersecurity teams. Ideal for engineers experienced in complex, high-reliability production systems and AWS-based cloud environments. 🗂️ Requirements: 5+ years professional software engineering experience, Strong Python skills, Experience with at least one compiled language (Rust, C++, Go, Java, or C), Strong backend engineering experience in production systems, Experience with AWS or GCP environments, Experience with containers and CI/CD pipelines, Proficiency in Linux environments, Strong understanding of systems engineering, performance, and reliability, Experience collaborating with infrastructure and/or cybersecurity teams, Strong communication and documentation skills 📃 Skills: Python, Rust, C++, Go, Java, C, AWS, GCP, Linux, Docker, Kubernetes, Terraform, PostgreSQL, Grafana, Databricks, Elasticsearch, GitHubActions, CI/CD, Observability, Containers 🏢 Description: About the Role Spire is looking for a Senior Software Engineer to help design, build, and improve the reliable and secure systems that support satellite and ground infrastructure. This is a hands-on engineering role focused on backend systems, platform engineering, observability, and infrastructure-aware software development. You'll work closely with software, infrastructure, and cybersecurity teams to implement scalable solutions that support operational reliability and secure engineering practices. We are not looking for a dedicated cybersecurity specialist. Instead, we're seeking an experienced software engineer who has worked alongside security teams and has experience implementing technical requirements related to security, reliability, and compliance within production systems. This role is best suited to engineers who enjoy solving technically challenging problems and have experience working on complex systems in industries such as aerospace, scientific computing, telecommunications, biotech, robotics, or other high-reliability environments. What You'll Do: - Design, develop, and maintain backend services and platform tooling - Collaborate with cybersecurity and infrastructure teams to implement secure engineering requirements - Contribute to modernising services, container environments, and CI/CD workflows - Improve telemetry, monitoring, logging, and observability across production systems - Support cloud-based infrastructure and deployment workflows in AWS environments - Participate in code reviews, architecture discussions, and engineering best practices - Work across infrastructure, backend systems, and operational tooling in a collaborative engineering environment Who You Are: Required Qualifications: - 5+ years of professional software engineering experience - Strong Python development skills - Experience with at least one compiled language such as Rust, C++, Go, Java, or C - Strong backend engineering experience building and maintaining production systems - Experience working with AWS-based environments (GCP experience also considered) - Experience with containers, CI/CD pipelines, and modern software delivery practices - Comfortable working in Linux-based development environments - Strong understanding of systems engineering, performance, reliability, and operational concerns - Experience collaborating with infrastructure and/or cybersecurity teams to implement technical requirements - Strong communication and documentation skills Preferred Qualifications: - Experience working in aerospace, scientific computing, biotech, telecommunications, robotics, or other technically complex industries - Experience in smaller engineering organisations where engineers wear multiple hats across development and operations - Familiarity with Kubernetes, Terraform, and infrastructure-as-code approaches - Experience with PostgreSQL or other relational databases - Experience building observability or telemetry solutions using tools such as Grafana, Databricks, or Elasticsearch - Familiarity with GitHub Actions or modern CI/CD tooling - Exposure to secure software development or DevSecOps practices - Advanced academic research background (PhD/Postdoc) in engineering, physics, computer science, or related scientific disciplines What We're Looking For: We value engineers who are: - Curious and adaptable - Pragmatic problem-solvers - Interested in complex systems and infrastructure - Open to learning new technologies and domains - Collaborative and low-ego - Comfortable operating in fast-moving engineering environments This role is primarily backend and platform focused. We are generally looking for engineers with stronger backend and systems experience rather than heavily frontend-focused backgrounds. Spire operates a hybrid work model, and this position will require you to work a minimum of three days per week in the office. Access to US export-controlled software and/or technology may be required for this role. If needed, Spire will arrange the necessary licenses—this is not something candidates need to have before applying. Salary Range $130,500—$171,000 USD Global Perks - Name Your Satellite Program (NYSP) Launch Attendance - Generous Time Off Policy - Education Assistance Program - Employee Assistance Program (EAP) - Employee Stock Purchase Program (ESPP) - Family Leave - Fitness Reimbursement - Employee Referral Program - Healthy snacks & beverages in every office

Technology

emagine Polska

Senior DevOps / SRE (Platform Reliability Engineer) - French fluent

Senior

Remote

Lisbon, Portugal

🏢 Summary: Senior DevOps / SRE role focused on ensuring reliability, scalability, security, and performance of a cloud-native AWS platform. The position centers on infrastructure automation, CI/CD, Kubernetes operations, observability, and implementing SRE best practices to support highly available production systems. You will lead incident management, optimize cloud costs, and drive continuous improvement of platform resilience. 🗂️ Requirements: 5+ years in DevOps/SRE/Cloud/Platform Engineering, Strong Linux administration and troubleshooting, Production experience with Kubernetes, Experience with CI/CD tools, Expertise in Infrastructure as Code, Hands-on experience with AWS, Strong networking fundamentals, Experience with monitoring and logging tools, Scripting skills (Bash or Python) 📃 Skills: AWS, Kubernetes, Docker, Helm, Terraform, Ansible, CloudFormation, Linux, GitLab, Jenkins, GitHub, Azure, Prometheus, Grafana, ELK, Datadog, Splunk, Bash, Python, TCP/IP, DNS 🏢 Description: We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and security of our platform and cloud infrastructure. You will play a key role in building and operating cloud-native systems, improving observability, automating operations, implementing SRE best practices (SLOs/SLIs), and supporting development teams to deliver highly available services. Key Responsibilities Design, implement, and maintain highly available and scalable infrastructure on AWS. Own and improve the reliability of production systems using SRE principles (SLO, SLI, error budgets). Build and manage CI/CD pipelines to support fast and safe software delivery. Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, CloudFormation, etc. Manage and optimize container orchestration platforms (Kubernetes, Docker, Helm). Implement and maintain monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK, Datadog, Splunk). Lead incident response, perform root cause analysis, and write postmortems to drive continuous improvement. Improve system performance, capacity planning, scaling strategies, and disaster recovery processes. Collaborate closely with development teams to improve deployment strategies and system resilience. Implement security best practices (IAM, secret management, vulnerability scanning, patching). Define operational standards, runbooks, documentation, and best practices for platform reliability. Participate in on-call rotation and provide senior-level support for critical production issues. Key Responsibilities (5 Main Missions) The DevOps / SRE lead will be responsible for the stability and evolution of the platform. Your role is structured around five main areas: Mission 1: AWS Infrastructure Management (Build & Run) Mission 2: CI/CD and Deployment Automation Mission 3: Monitoring, Observability, and Alerting: Global Monitoring , Log Management , Application Monitoring , Business Analytics Mission 4: Incident Management, Resilience, and Security Mission 5: FinOps and AWS Cost Optimization Key Requirements 5+ years of experience in DevOps / SRE / Cloud Infrastructure / Platform Engineering. Strong expertise in Linux systems administration and troubleshooting. Proven experience with Kubernetes in production environments. Strong experience with CI/CD tools (GitLab CI, Jenkins, GitHub Actions, Azure DevOps). Solid knowledge of Infrastructure as Code (Terraform highly preferred). Experience with AWS cloud platforms. Strong understanding of networking fundamentals (TCP/IP, DNS, load balancing, reverse proxies). Experience with observability tools: monitoring, metrics, logging, tracing. Strong scripting skills (Bash, Python, or similar). French advanced level. Nice to Have Experience with additional cloud platforms (Azure, GCP). Strong understanding of networking fundamentals.

Technology

xAI

Sr. Software Engineer (Data Center Automation)

Senior

On-site

Memphis, TN

🏢 Summary: Senior Software Engineer role focused on improving reliability and automation across multi-data center AI infrastructure. The position combines strong software engineering skills with hands-on data center and systems expertise to build observability, automate remediation, and minimize downtime in mission-critical environments. The engineer will design scalable services, optimize Linux systems, and support distributed infrastructure with near-zero downtime requirements. 🗂️ Requirements: Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering or related field (or equivalent experience), 3+ years experience in SRE, Infrastructure, DevOps, or Systems Engineering in distributed production environments, Strong programming experience in Python, Solid Linux systems administration and performance tuning experience, Experience with Docker and Kubernetes or similar orchestration tools, Experience implementing observability solutions (metrics, logging, tracing, monitoring, alerting), Knowledge of networking fundamentals (TCP/IP, routing, DNS, redundancy), Experience troubleshooting distributed systems including hardware and network issues, Experience with on-call rotations and incident response practices, Ability to collaborate with cross-functional technical teams 📃 Skills: Python, Rust, Linux, Docker, Kubernetes, Prometheus, Grafana, TCP/IP, DNS, C++, Go 🏢 Description: ABOUT THE ROLE: We are seeking a highly skilled Sr. Software Engineer to join our team in managing and enhancing reliability across a multi-data center environment. This role focuses on automating processes, building and implementing robust observability solutions, and ensuring seamless operations for mission-critical AI infrastructure. The ideal candidate will combine strong coding abilities with hands-on data center experience to build scalable reliability services, optimize system performance, and minimize downtime—including close partnership with facility operations to address physical infrastructure impacts. In an era where AI workloads demand near-zero downtime, this position plays a pivotal role in bridging software engineering principles with physical data center realities. By prioritizing automation and observability, team members in this role can reduce mean time to recovery (MTTR) by up to 50% through proactive monitoring and automated remediation, based on industry benchmarks from high-scale environments like those at hyperscale cloud providers. The primary objective of this team is to mitigate downtime and minimize impact to end-users from both scheduled and unscheduled maintenance, as well as events affecting onsite data centers. This is achieved through proactive automation, robust observability, and integrated software-physical reliability strategies, ensuring our AI infrastructure remains resilient, scalable, and at the cutting edge of innovation. RESPONSIBILITIES: - Design, develop, and deploy scalable code and services (primarily in Python and Rust, with flexibility for emerging languages) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning. - Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers. - Collaborate with cross-functional teams—including software development, network engineering, site operations, and facility operations—to identify reliability bottlenecks and automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation. - Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs. - Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration (e.g., Kubernetes), and scripting for automation. - Understand network topologies and concepts in large-scale, multi-data center environments to troubleshoot connectivity, routing, redundancy, and performance issues. - Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives to enhance overall site reliability. - Mentor junior team members and document processes to foster a culture of automation and knowledge sharing. BASIC QUALIFICATIONS: - Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience). - 3+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering in large-scale, distributed, or production environments. - Strong programming skills with proven production experience in Python; experience with Rust or strong fundamentals in a systems-level language (e.g., Go, C++). - Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments. - Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes (or similar systems). - Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools (e.g., Prometheus, Grafana), alerting, and dashboards. - Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors. - Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments. - Experience participating in on-call rotations, incident response, post-incident reviews (blameless postmortems), and reliability practices such as error budgets or SLAs. - Ability to collaborate effectively with cross-functional technical teams. PREFERRED SKILLS AND EXPERIENCE: - 5+ years of experience in SRE or infrastructure roles in hyperscale, cloud, or AI/ML training infrastructure environments. - Hands-on experience operating or scaling Kubernetes clusters at large scale. - Proficiency in Rust for systems programming and performance-critical components. - Experience integrating software reliability tools with physical data center infrastructure (power, cooling, environmental monitoring). - Experience building automated remediation, fault tolerance, disaster recovery, capacity planning, or predictive failure detection systems. - Background in optimizing Linux-based systems for AI workloads, GPU clusters, or high-throughput compute environments. - Experience with bare-metal provisioning, data center interconnects, or hybrid/multi-site failover mechanisms. - Mentoring experience and strong documentation skills.

Technology

Link Group

Senior Software Enigneer

Senior

Remote

Warsaw, Poland

130 - 160 PLN

🏢 Summary: Senior Software Engineer role focused on designing and delivering scalable solutions within complex distributed systems at significant scale. The position requires strong ownership, architectural decision-making, and collaboration across cross-functional teams. The engineer will drive high-impact technical initiatives in a large, distributed environment. 🗂️ Requirements: Degree in Computer Science or equivalent practical experience, Proven experience delivering complex, large-scale software solutions, Hands-on experience with distributed systems, Ability to make independent architectural decisions, Strong collaboration and communication skills, Ability to operate autonomously with high ownership 📃 Skills: DistributedSystems, Architecture, Scalability, Reliability, Observability, SoftwareEngineering, ComputerScience 🏢 Description: Senior Software Engineer About the Role We are looking for a Senior Software Engineer to join a high-scale product environment operating within complex distributed systems. This role is designed for a highly autonomous individual contributor who can take ownership of complex technical initiatives, influence architectural direction, and deliver scalable, reliable solutions in a large organizational setting. You will collaborate cross-functionally with engineering, product, and design teams while contributing to systems used at significant scale. Key Responsibilities Design and deliver high-complexity system components and features Make informed architectural decisions and evaluate technical trade-offs Ensure quality, scalability, reliability, and observability of delivered solutions Identify and manage technical risks, dependencies, and technical debt Develop deep understanding of the product domain and user impact Collaborate effectively across teams in a distributed engineering environment Operate with high autonomy and ownership Contribute to or elevate engineering standards within the team Mentor or support less experienced engineers when needed Requirements Education Degree in Computer Science, Software Engineering, or equivalent practical experience Experience & Competencies Proven experience as a Senior Software Engineer delivering complex, large-scale solutions Hands-on experience working with distributed systems Strong communication and collaboration skills Ability to independently drive architectural decisions Strong sense of ownership over delivered solutions Nice to Have Experience shaping or improving team-level engineering standards Experience collaborating across multiple teams in complex organizational structures

Technology

EPAM Systems

Senior Site Reliability Engineer (SRE)

Senior

Remote

🏢 Summary: The offer is for a Site Reliability Engineer responsible for ensuring high reliability, scalability, and performance of cloud-based systems. The role focuses on implementing SRE practices, automating infrastructure, managing incidents, and enhancing monitoring and CI/CD processes. You will collaborate with cross-functional teams to optimize operations and maintain service excellence. 🗂️ Requirements: Bachelor’s degree in Computer Science, Engineering, or related field, 3+ years of experience in Site Reliability Engineering or similar role, Experience with cloud platforms (AWS, GCP, or Azure), Hands-on experience with SRE practices (SLO, SLI, error budgets, postmortems, toil reduction, capacity planning, incident management), Proficiency in Python or other scripting/programming language, Experience with monitoring tools, Experience with CI/CD tools, Experience with infrastructure as code, Experience with configuration management, Knowledge of Kubernetes and Docker, English proficiency B2 or higher 📃 Skills: AWS, GCP, Azure, Python, Kubernetes, Docker, CI/CD, Terraform, Ansible, Monitoring, SLO, SLI, Git, Bash 🏢 Description: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. In this critical role, you will collaborate closely with software developers and operations teams to ensure high reliability, scalability, and efficiency of our systems, with a strong focus on meeting and exceeding customer expectations. Your expertise will be crucial in deploying, maintaining, and automating our infrastructure and application environments to ensure seamless user experiences. Your proactive involvement will be key to enhancing system reliability, optimizing resource utilization, and ensuring continuous improvement in our operational practices. Your responsibilities will include defining and tracking Service Level Objectives (SLOs), managing error budgets, and reducing toil through automation. You will play a pivotal role in driving the success of technology initiatives, maximizing their impact across the organization, and ensuring that solutions consistently meet the high standards our customers expect. Responsibilities Collaborate with development, security, quality, and operation teams to implement SRE practices and ensure system reliability Define and support required level of reliability, availability, and performance for services and applications Design and deliver Cloud-based solutions tailored to client needs Troubleshoot, mitigate, and support fixing of the infrastructure and application issues in a timely manner Implement a monitoring system for the infrastructure and application reliability Communicate technical concepts clearly to both engineering teams and management stakeholders Requirements Bachelor’s degree in Computer Science, Engineering, or a related field 3+ years of hands-on experience in Site Reliability Engineering or related roles Proven experience in any cloud (AWS/GCP/Azure) Experience with implementing SRE practices such as SLO/SLI, Error budgets, Postmortems, Reducing Toil, capacity planning, and Incident Management Python or other scripting/programming language Strong background in monitoring tools Proficiency in CI/CD tools, infrastructure as code, and configuration management Solid knowledge of container orchestration technologies (Kubernetes, Docker) English language proficiency at an Upper-Intermediate level (B2) or higher Nice to have Expertise in deployment and management of LLMs, including technologies like RAG Certification in Kubernetes, AWS/GCP/Azure, or similar technologies Proven experience in DevOps Knowledge of managing and optimizing AI/ML models in production environments, including basic deployment, monitoring, and maintenance We offer/Benefits We gather like-minded people: Engineering community of industry professionals Friendly team and enjoyable working environment Flexible schedule and opportunity to work remotely within Poland Chance to work abroad for up to 60 days annually Business-driven relocation opportunities We provide growth opportunities: Outstanding career roadmap Leadership development, career advising, soft skills, and well-being programs Certification (GCP, Azure, AWS) Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru English classes We cover it all: Stable income (Employment Contract or B2B) Participation in the Employee Stock Purchase Plan Benefits package (health insurance, multisport, shopping vouchers) Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more Referral bonuses Corporate, social and well-being events Please, note: The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview. We will reach out to selected candidates exclusively. EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Technology

Relativity

Mid Software Engineer (.NET)

Mid

Hybrid

Krakow, Poland

160,000 - 240,000 PLN/yr

🏢 Summary: Advanced Software Engineer role focused on designing and building scalable, secure cloud-native software for a high-scale web platform and automation framework. The position involves developing distributed systems, REST APIs, and serverless solutions in a modern cloud environment. You will contribute to architecture, testing strategy, and technical mentorship within an agile team. 🗂️ Requirements: Bachelor’s degree in Computer Science or related field, Minimum 2 years of software development experience, Experience with object-oriented programming using C# or Java, Experience with HTML5, JavaScript, and CSS, Experience building REST APIs, Experience developing cloud-native solutions, Experience working in an agile team, Knowledge of full application stack development, Ability to design scalable and secure systems, Experience with unit and integration testing 📃 Skills: C#, Java, HTML5, JavaScript, CSS, REST, Azure, AzureFunctions, ServiceBus, Serverless, CloudNative, AzureDevOps, DevOps, OOP, Agile, Testing, DomainDrivenDesign, EventDrivenArchitecture 🏢 Description: Job Overview Here at Relativity, we prioritize flexibility and work-life harmony. Our Hybrid work environment provides options tailored to your role and location, aiming to enhance engagement, connectivity, and productivity. Join us to experience a culture of collaboration and innovation, where connecting in-person adds value to our collective growth. Let's work together! As an Advanced Software Engineer at Relativity, you will use your development expertise, working on software projects to build our software platform, Relativity. You will help solve complex problems as we continue to improve and build great technology. This role reports into the Manager of Software Engineering. You will work on projects on a highly scalable and dynamic web system and serverless technologies. This is all using many of the newest, cloud-based technologies. You will build highly distributable systems composed of multiple databases, processing, and webservers within the massive data field. Our team owns Automated Workflows, an extensible cloud-based automation framework that connects and automates processes across distributed systems in Relativity. This is a very exciting product that has a tremendous impact for our customers. It allows them to set up automation to reduce or even eliminate the need to perform manual tasks, saving them time and enabling them to focus on other important tasks within their business. With thousands of automated runs every day and the ability to be extended by 3rd Party developers, we are focused on enhancing the capabilities of a very robust and scalable solution. Job Description and Requirements Job Responsibilities Design performant, scalable, and secure software to a high degree of quality – not simply focusing on meeting requirements Work together with a software development team to ship high-quality, performant, secure software that operates on data at a massive scale Focus on quality through comprehensive unit and integration testing and static analysis and rigorous test strategy development Improve the software development process by recommending and instituting changes in policies and procedures Participate in pair programming to improve software quality and completeness and share design and implementation knowledge Mentor less experienced engineers and provide technical guidance to build new leaders from within the team Minimum Qualifications Bachelor’s Degree (or equivalent) in Computer Science or related disciplines At least 2 years of experience in Software Development Experienced in Object Oriented Programming utilizing C#, Java or similar Experienced in HTML5, JavaScript, CSS, and related web technologies Experience building REST APIs and Cloud-Native solutions Experience working on an agile software team Knowledge of software engineering disciplines, including the ability to work comfortably in all layers of the Application Stack Preferred Qualifications Understanding of DevOps principals, experience with Azure DevOps is a plus Experience designing and developing highly scalable solutions in Azure or other cloud platforms is a plus Experience in Azure Functions, Azure Service Bus, Serverless Technologies and related cloud technologies Experience in Domain Driven Design Principles and Event Driven Architecture is a plus Excellent Knowledge of new technology trends and their application in the marketplace. Benefit Highlights: Comprehensive health, dental, and vision plans Parental leave for primary and secondary caregivers Flexible work arrangements Two, week-long company breaks per year Additional time off Long-term incentive program Training investment program Relativity is a diverse workplace with different skills and life experiences-and we love and celebrate those differences. We believe that employees are happiest when they're empowered to be their full, authentic selves, regardless how you identify. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law. Relativity is committed to competitive, fair, and equitable compensation practices. This position is eligible for total compensation which includes a competitive base salary, an annual performance bonus, and long-term incentives. The expected salary range for this role is between following values: 160 000 and 240 000PLN The final offered salary will be based on several factors, including but not limited to the candidate's depth of experience, skill set, qualifications, and internal pay equity. Hiring at the top end of the range would not be typical, to allow for future meaningful salary growth in this position. Required Skills: Engineering Principle, Hardware Integration, Innovation, Problem Solving, Process Improvements, Quality Assurance (QA), Research and Development, System Designs, Technical Documents, Troubleshooting

Technology

Link Group

Site Reliability Engineer

Mid

Hybrid

Warsaw, Poland

🏢 Summary: Hands-on Site Reliability Engineer role focused on building and scaling reliability practices across cloud and on-prem environments. The position involves improving performance, scalability, and resilience of production systems through automation, observability, and Kubernetes-based infrastructure. You will drive SRE standards and collaborate with engineering teams to enhance system stability and fault tolerance. 🗂️ Requirements: 4+ years experience in SRE, DevOps or similar roles, Strong experience with distributed systems, Strong experience with Kubernetes, Experience with AWS cloud, Hands-on automation experience with Python, Bash or Go, Solid understanding of CI/CD practices, Experience with observability and monitoring tools, Experience managing production systems 📃 Skills: Kubernetes, AWS, Python, Bash, Go, Prometheus, Grafana, CI/CD, SRE, DevOps 🏢 Description: We’re looking for a Site Reliability Engineer (SRE) to help build and scale reliability practices across our engineering organization. This is a hands-on role where you’ll work across cloud and on-prem environments, improving the performance, scalability, and resilience of critical production systems. 🔧 What you’ll be doing: • Driving SRE best practices, standards, and ways of working • Building and scaling observability & monitoring solutions (e.g. Prometheus, Grafana) • Working with Kubernetes-based infrastructure to ensure reliability and efficiency • Automating deployments, incident response, and recovery processes • Collaborating closely with engineering teams to improve system stability and fault tolerance • Contributing to a strong reliability culture (SLOs, post-mortems, continuous improvement) ✅ What we’re looking for: • 4+ years of experience in SRE / DevOps / similar roles • Strong experience with distributed systems, Kubernetes, and cloud (AWS preferred) • Hands-on approach to automation (Python, Bash, or Go) • Solid understanding of CI/CD and modern software delivery • Proactive mindset and strong ownership of production systems Name and surname*

Relativity

Relativity is a commercial launch company focused on additive manufacturing innovation. Their long-term vision is to create humanity's industrial base on Mars and expand the possibilities of the human experience. They design, build, and fly rockets to deliver customer payloads to orbit, with a medium-to-heavy lift reusable rocket called Terran R. Relativity's groundbreaking research and development in 3D printing pushes the boundaries of large-scale additive manufacturing. The company values passion, creativity, and collaboration, offering employees the opportunity to shape the future of aerospace technology and additive manufacturing. Relativity's Integrated Performance teams ensure product functionality across all systems and disciplines, while the Thermal team predicts temperatures and designs thermal architectures for reusable launch vehicles. The company is committed to transparency, fairness, and diversity in its compensation practices.

Check if your resume is ATS-ready before applying →Build an ATS-optimized resume