June 8, 2026

Site Reliability Engineer - Observability

Senior • Remote

Warsaw, Poland

Job description

Company Description

At CluePoints, we’re redefining how clinical trials are run. As the premier provider of Risk-Based Quality Management (RBQM) and Data Quality Oversight software, we harness advanced statistics, artificial intelligence, and machine learning to ensure the quality, accuracy, and integrity of clinical trial data, helping life sciences organizations bring safer, more effective treatments to patients faster.

We’re proud to be an ambitious, fast-growing technology scale-up with a dynamic and diverse international team representing more than 20 nationalities. Collaboration, flexibility, and continuous learning are part of our DNA. 

At CluePoints, you’ll find a culture where you can grow, make an impact, and have fun along the way.Guided by our values of Care, Passion, and Smart Disruption, we’re united by a shared mission: to create smarter ways to run efficient clinical trials and deliver AI-powered insights that improve human outcomes worldwide.

Role: 
The Site Reliability Engineer, Observability & RUM is responsible for improving end-to-end observability across our platforms and customer-facing applications, with a particular focus on frontend and Real User Monitoring (RUM). This role combines core SRE practices with ownership of monitoring, logging, tracing, alerting, and user-experience telemetry in production. 

You will help evolve our observability capabilities across Azure and Kubernetes environments, improve incident detection and diagnosis, and support decisions around managed versus self-managed observability tooling. You will partner closely with Engineering, Support, QA, and Security teams to ensure systems ship with actionable telemetry, dashboards, alerts, and operational runbooks.

Job requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Observability Engineering roles.

  • Strong hands-on experience with observability and monitoring platforms, including several of the following:Elastic, Grafana, Prometheus, OpenTelemetry, Sentry, monitoring agents, and managed APM/observability platforms.

  • Experience implementing and supporting Real User Monitoring (RUM) and frontend/application observability in production environments.

  • Ability to work across frontend, backend, and platform teams to improve telemetry, alerting, and incident diagnosis.

  • Experience evaluating or operating managed observability platforms and understanding the trade-offs versus self-managed stacks.

(Nice to have)

  • Experience supporting ML, AI, or LLM-backed services in production (RAG, LangSmith, Arize Phoenix, LangChain, LangGraph, Azure OpenAI, OpenAI, or Anthropic APIs).

Job responsibilities

  • Own and improveReal User Monitoring (RUM) for customer-facing applications, including browser performance, client-side errors, user journeys, and frontend service dependencies.

  • Partner with frontend, product, and engineering teams to improve visibility into user experience, JavaScript/runtime failures, page performance, and customer-impacting issues.

  • Establish and maintain end-to-end observabilityacross frontend, backend, infrastructure, and Kubernetes environments using metrics, logs, traces, dashboards, and alerting.

  • Evaluate, implement, and operate managed and self-managed observability solutions, helping guide the evolution of the observability stack.
    Support and improve observability tooling such as Sentry, Elastic, Grafana, Prometheus, OpenTelemetry, monitoring agents, and related APM platforms.
    Define and maintain SLIs, SLOs, and alerting strategies that improve service reliability, reduce noise, and enable faster detection of production issues.
    Lead or support incident detection, alert triage, live production troubleshooting, and service restoration across outage, latency, batch, file transfer, and degradation scenarios, in partnership with Support and Production teams.

Job benefits

🇵🇱 What We Offer – Poland

  • Comprehensive Health Insurance (medical, dental, and online consultations, 100% employee coverage)

  • Life Insurance through UNUM

  • Cafeteria Plan with flexible monthly credits for wellness, entertainment, and travel

  • MultiSport Card, co-financed 50/50

  • Employee Capital Plans (PPK) with 4% employer contribution

  • A hub-based hybrid model that blends flexibility with purpose — connecting teams through collaboration, learning, and a vibrant social culture.


Equal Opportunities & Data Privacy Statement
CluePoints is an equal opportunity employer committed to diversity and inclusion in the workplace.
Your personal data will be processed by CluePoints for recruitment purposes in accordance with the Regulation (EU) 2016/679 (GDPR).
If you wish for your data to be retained for future opportunities, please include the following statement in your CV:
“I consent to the processing of my personal data by CluePoints for the purposes of future recruitment processes.”

Similar jobs you might like

Technology

CluePoints

Senior DevOps Engineer

Senior

Remote

Warsaw, Poland

🏢 Summary: Full-time DevOps/SRE role focused on automating and optimizing the Software Development Lifecycle, including CI/CD pipelines, build and release processes, and developer infrastructure. The position aims to improve integration speed, reliability, and collaboration by advancing trunk-based development and modern DevOps practices. You will work closely with development and SRE teams to enhance automation, infrastructure, and software delivery performance. 🗂️ Requirements: 6+ years in DevOps or SRE in Software or SaaS environment, Strong Linux systems administration experience, Proficiency in Bash, Python or Perl scripting, Experience with Git and branching strategies, Hands-on experience with CI/CD systems, Experience with trunk-based development practices, Strong knowledge of Docker and Kubernetes, Experience with Terraform and Ansible, Experience automating build and release processes, Understanding of DevOps and SRE principles, Experience managing CI infrastructure and developer tooling 📃 Skills: Linux, Bash, Python, Perl, Git, GitLab, GitHub, Jenkins, ArgoCD, Docker, Kubernetes, Terraform, Ansible, CI/CD, SonarQube, Ontrack 🏢 Description: Job description Company Description At CluePoints, we’re redefining how clinical trials are run. As the premier provider of Risk-Based Quality Management (RBQM) and Data Quality Oversight software, we harness advanced statistics, artificial intelligence, and machine learning to ensure the quality, accuracy, and integrity of clinical trial data, helping life sciences organizations bring safer, more effective treatments to patients faster. We’re proud to be an ambitious, fast-growing technology scale-up with a dynamic and diverse international team representing more than 20 nationalities. Collaboration, flexibility, and continuous learning are part of our DNA. At CluePoints, you’ll find a culture where you can grow, make an impact, and have fun along the way.Guided by our values of Care, Passion, and Smart Disruption , we’re united by a shared mission: to create smarter ways to run efficient clinical trials and deliver AI-powered insights that improve human outcomes worldwide. The Role In this role, you’ll work at the heart of our Software Development Lifecycle (SDLC) automation efforts. You’ll be responsible for improving how we integrate, build, and release code — ensuring that developers can deliver value quickly and safely. This role is ideal for someone passionate about DevOps practices, automation, and enabling developer productivity at scale. (We cannot consider B2B Contractors for this Position) - Full time Permanent Employee applications will be considered for this vacancy. Job requirements What You’ll Bring 6+ years of experience in a DevOps or SRE role in a Software or SaaS environment Strong experience with Linux systems administration Proficiency in scripting languages (e.g., Bash, Python, or Perl) Solid experience with Git and branching strategies , with a focus on improving collaboration and enabling high-frequency integration Strong experience with CI/CD systems (e.g., GitLab CI, GitHub Actions, Jenkins, ArgoCD) and building reliable, fast feedback pipelines Good understanding of modern CI practices , including frequent integration, maintaining a stable main branch, and reducing merge complexity Experience supporting or working in environments moving toward trunk-based development (e.g., short-lived branches, incremental changes) Deep understanding of containerization tools (Docker) and orchestration (Kubernetes) Experience with infrastructure-as-code and automation tools (Terraform, Ansible) Strong understanding of DevOps principles: automation, feedback loops, and shift-left testing Experience automating software build and release processes A strong grasp of software integration workflows, dependency management, and production readiness Understanding of SRE principles and how reliability and infrastructure practices support software delivery Ability to work cross-functionally and guide teams toward improved engineering practices Job responsibilities What You’ll Be Doing Design, build, and maintain shared CI/CD pipeline templates and automation tools with a focus on fast feedback and reliable integration Develop and support internal tools for build, test, and release automation Collaborate with development teams to improve how code is integrated, tested, and prepared for production Help define and evolve best practices for source control, branching strategies, and code collaboration , supporting a move toward trunk-based development Guide teams in adopting incremental, high-quality changes that reduce risk and improve delivery flow Partner with SRE teams to align deployment strategies, observability, and infrastructure practices with application delivery Manage, optimize, and monitor developer infrastructure (CI runners, SonarQube, Ontrack, artifact repositories, etc.) Drive improvements in release readiness, code quality, and testing practices across teams Identify and remove bottlenecks in the software delivery lifecycle , improving speed without compromising reliability Continuously evaluate new tools and technologies to improve our platform and developer experience Job benefits 🇬🇧 What We Offer – United Kingdom Private Medical Insurance through Vitality Health (full hospital cover, 24/7 GP, and therapy sessions) Group Critical Illness Cover with Aviva Life Insurance (death-in-service lump sum) Pension Scheme with 9% employer contribution via Scottish Widows Opportunities for professional development and sponsored certifications A hub-based hybrid model that blends flexibility with purpose — connecting teams through collaboration, learning, and a vibrant social culture. 🇬🇧 Equal Opportunities & Data Protection Statement CluePoints is an equal opportunities employer. We celebrate diversity and are committed to creating an inclusive environment for all employees and applicants. We welcome applications from all individuals regardless of age, disability, gender identity or expression, marital or civil partnership status, pregnancy or maternity, race, religion or belief, sex, or sexual orientation. Any personal data you share during your application will be processed in accordance with the UK GDPR and the Data Protection Act 2018 and will be used solely for recruitment purposes. By submitting your application, you consent to the processing of your data for recruitment and employment purposes.

Healthcare

CluePoints

Test Manager

Senior

Remote

Warsaw, Poland

🏢 Summary: Leadership role responsible for defining and driving an automation-first quality strategy across multiple product squads in a clinical trial software environment. Combines hands-on test automation framework design with team leadership, release governance, and AI-driven quality engineering practices. Ensures end-to-end quality outcomes, risk-based testing, and release readiness aligned with product goals. 🗂️ Requirements: Proven experience leading quality engineering across multiple squads, Strong people management and mentoring experience, Hands-on experience with Playwright or similar automation frameworks, Experience in UI, API, integration, and end-to-end testing, Experience integrating automated tests into CI/CD pipelines, Strong knowledge of regression strategy and risk-based testing, Experience leading Release Readiness and Go/No-Go processes, Experience using AI coding assistants in testing workflows, Understanding of validation, traceability, and compliance practices 📃 Skills: Playwright, Automation, Testing, API, UI, E2E, CI/CD, AI, Copilot, Codex, Regression, RBQM, Compliance, Debugging, Frameworks 🏢 Description: About the job Company Description At CluePoints, we’re redefining how clinical trials are run. As the premier provider of Risk-Based Quality Management (RBQM) and Data Quality Oversight software, we harness advanced statistics, artificial intelligence, and machine learning to ensure the quality, accuracy, and integrity of clinical trial data, helping life sciences organizations bring safer, more effective treatments to patients faster. We’re proud to be an ambitious, fast-growing technology scale-up with a dynamic and diverse international team representing more than 20 nationalities. Collaboration, flexibility, and continuous learning are part of our DNA. At CluePoints, you’ll find a culture where you can grow, make an impact, and have fun along the way. Guided by our values of Care, Passion, and Smart Disruption , we’re united by a shared mission: to create smarter ways to run efficient clinical trials and deliver AI-powered insights that improve human outcomes worldwide. The Role Reporting directly to the Quality Director, you will own the testing strategy and quality outcomes for 3–5 squads within a product domain. You will define and drive an automation-first testing approach while remaining technically involved in framework design, debugging, and quality oversight. You are the direct manager for the testers in your squad and will collaborate closely with Product and Engineering teams, to ensure risk-based quality practices, test readiness, and delivery support. The role will be a combination of both 'hands on' + Leadership & Strategy. What You’ll Be Doing Quality Strategy & Squad Leadership Own end-to-end quality outcomes across multiple squads. Define and execute a scalable quality strategy aligned with product and release goals. Lead squad-level risk analysis, test planning, and execution oversight. Mentor and grow 5–8 testers (manual & automation) with goal to bring team towards hybrid testers. Automation & Technical Leadership Drive automation-first testing practices across squads. Design, evolve, and review scalable UI/API/E2E automation frameworks (Playwright or equivalent). Optimize regression and end-to-end strategies to improve release confidence. Stay hands-on in automation design, debugging, and framework improvements. AI-First Quality Engineering Champion AI-driven approaches in test design, automation development, and defect analysis. Enable teams to effectively use AI tools (e.g., Copilot, Codex) to accelerate quality engineering. Ensure AI-generated artifacts are reviewed, reliable, and maintainable. Release Readiness & Governance Lead Release Readiness and Go/No-Go meetings with structured quality metrics and risk insights. Collaborate with Product Owners, Engineering Managers, Product Managers, and Release Management. Ensure validation documentation, traceability, and compliance requirements are met. Identify risks early and implement mitigation strategies proactively. What You’ll Bring Proven experience leading quality engineering across multiple cross-functional squads. Strong people management and mentoring experience. Hands-on automation experience (Playwright or modern frameworks). Full-stack testing expertise (API, UI, integration, end-to-end). Experience integrating automation into CI/CD pipelines. Strong understanding of regression strategy, risk-based testing, and release governance. Experience driving Release Readiness / Go-NoGo meetings. Demonstrated use of AI coding assistants and AI-first testing approaches. Excellent communication, stakeholder management, and critical thinking skills.

Technology

Caspian One

Site Reliability Engineer

Senior

Hybrid

Krakow, Poland

1,400 - 1,800 PLN

🏢 Summary: Hands-on Site Reliability Engineer role focused on ensuring stability, scalability, and observability of a mission-critical distributed risk and analytics platform in hybrid cloud environments. The position centers on production reliability, incident response, automation, and continuous improvement of monitoring and deployment processes. You will collaborate with engineering teams to strengthen system resilience, performance, and operational standards. 🗂️ Requirements: Strong Java experience in distributed systems, Experience with observability and monitoring tools, Hands-on experience with hybrid cloud environments (preferably GCP), Experience with CI/CD pipelines and automation tools, Solid knowledge of Linux systems administration, Understanding of RDBMS fundamentals, Experience with job schedulers (e.g., Control-M), Ability to lead incident response and root-cause analysis 📃 Skills: Java, Grafana, Prometheus, Loki, OpenTelemetry, GCP, Jenkins, Ansible, Linux, SQL, Control-M, CI/CD 🏢 Description: We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure. In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance. This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time. Core Responsibilities Maintain and improve the reliability, uptime, and performance of distributed applications. Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews. Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks. Drive continuous improvement across automation, deployment processes, and service stability. Collaborate with cross‑functional teams to influence architecture, design, and operational standards. Support CI/CD pipelines, environment configuration, and vulnerability remediation. Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption. Required Skills & Experience Strong Java background with proven experience supporting or developing distributed systems. Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar). Hands‑on with hybrid cloud environments , ideally with GCP or another major cloud provider. CI/CD and automation experience (e.g., Jenkins, Ansible). Solid understanding of Linux , RDBMS fundamentals , and job schedulers (e.g., Control‑M or equivalents). Strong analytical mindset with a methodical approach to troubleshooting. Excellent communication skills and comfort working in Agile teams.

Technology

Yard Corporate

Site Reliability Engineer (SRE)

Senior

Hybrid

Warsaw, Poland

40,000 - 55,000 PLN

🏢 Summary: Senior Site Reliability Engineer role focused on building and standardizing SRE practices across a hybrid AWS and on-prem infrastructure. The position centers on ensuring scalability, resilience, and high availability of high-frequency, data-intensive platforms through observability, automation, and Kubernetes optimization. You will define SLOs, enhance monitoring architecture, and drive reliability culture across engineering teams. 🗂️ Requirements: 5+ years experience in SRE, DevOps, or Infrastructure Engineering supporting distributed production systems, Bachelor’s degree in Computer Science, Computer Engineering, or related field (or equivalent experience), Deep expertise in Grafana, Prometheus, Loki, and Tempo (OpenTelemetry), Strong production experience with Docker and Kubernetes, Experience managing hybrid infrastructure (AWS and on-premises), Proficiency in at least one language: Python, Go, or Bash, Hands-on experience with CI/CD pipelines and Infrastructure-as-Code, Experience defining and managing SLOs and SLAs, Willingness to participate in on-call rotation 📃 Skills: AWS, Kubernetes, Docker, Prometheus, Grafana, Loki, Tempo, OpenTelemetry, Python, Go, Bash, CI/CD, IaC, Git, Hypervisors 🏢 Description: About the Client Our client is a premier, global investment management firm operating at the intersection of finance and technology. Known for their sophisticated, data-intensive systems, they build and maintain high-performance platforms that process massive volumes of market and operational data. To support their expanding footprint, they are looking for a senior-level Site Reliability Engineer (SRE) who will take ownership of shaping, standardizing, and scaling their SRE frameworks and reliability culture from the ground up. The Role In this role, you will serve as a foundational force for SRE practices, partnering directly with Cloud, Infrastructure, and Software Engineering squads. You will work across a hybrid infrastructure (combining advanced AWS cloud environments and physical on-premises servers) to guarantee the scalability, resilience, and maximum uptime of critical, high-frequency transactional platforms. Core Responsibilities SRE Evangelism: Design, implement, and champion core reliability principles, helping technology teams adopt sustainable scaling practices. Observability Architecture: Implement, scale, and maintain end-to-end monitoring, telemetry, and distributed tracing systems utilizing Prometheus, Grafana, Loki, and Tempo (OpenTelemetry framework). Kubernetes Optimization: Establish best-practice configurations for containerized workloads, ensuring applications running on Kubernetes are highly resilient, cost-effective, and performant. Incident Management & Culture: Participate in a balanced, shared on-call rotation (averaging one week per month). Automation & Engineering: Build custom tooling and CI/CD pipelines to automate routine tasks, system health checks, and rapid disaster recovery workflows. SLO/SLA Definition: Partner with product and engineering teams to define, monitor, and enforce Service Level Objectives (SLOs) and Error Budgets. What We Look For Experience: 5+ years of hands-on experience in a dedicated SRE, DevOps, or Infrastructure Engineering role supporting complex, distributed production systems. Education: A Bachelor’s degree in Computer Science, Computer Engineering, or a related technical discipline (or equivalent practical experience). Observability Expertise: Deep, subject-matter knowledge of modern monitoring stacks, specifically Grafana, Prometheus, Loki, and Tempo (OTel). Orchestration & Containers: Strong, production-grade expertise in containerization (Docker) and orchestration (Kubernetes). Hybrid Infrastructure: Experience navigating hybrid models—managing both cloud services (AWS preferred) and physical on-premise hardware resources. Scripting/Coding: Proficiency in writing clean, maintainable code in at least one scripting or programming language (e.g., Python, Bash, or Go) to build reliable automation. Methodologies: Solid grounding in CI/CD concepts, infrastructure-as-code (IaC), and agile development processes. Soft Skills: Excellent verbal and written communication skills, with a proven ability to convey complex infrastructure and reliability concepts to both technical and non-technical stakeholders. What We Offer Stable Employment: Full-time employment contract ( Umowa o Pracę - UoP ). Tax Optimization: Eligibility for creative tax-deductible costs ( KUP - Koszty Uzyskania Przychodu). Financial Reward: Highly competitive base salary accompanied by a generous annual performance bonus . Comprehensive Health: Premium private medical care package that fully includes dental coverage (stomatologia) . Wellness & Lifestyle: MultiSport card to keep you active and healthy. Daily Perks: Pre-funded lunch card for your daily meals. Tech Stack at a Glance Cloud & Virtualization: AWS, Kubernetes, Docker, On-Premises Hypervisors Observability: Prometheus, Grafana, Loki, Tempo, OpenTelemetry (OTel) Languages: Python, Go, Bash CI/CD & Automation: Git-based pipelines, Configuration Management, IaC

Technology

EPAM Systems

Senior Site Reliability Engineer (SRE)

Senior

Remote

🏢 Summary: The offer is for a Site Reliability Engineer responsible for ensuring high reliability, scalability, and performance of cloud-based systems. The role focuses on implementing SRE practices, automating infrastructure, managing incidents, and enhancing monitoring and CI/CD processes. You will collaborate with cross-functional teams to optimize operations and maintain service excellence. 🗂️ Requirements: Bachelor’s degree in Computer Science, Engineering, or related field, 3+ years of experience in Site Reliability Engineering or similar role, Experience with cloud platforms (AWS, GCP, or Azure), Hands-on experience with SRE practices (SLO, SLI, error budgets, postmortems, toil reduction, capacity planning, incident management), Proficiency in Python or other scripting/programming language, Experience with monitoring tools, Experience with CI/CD tools, Experience with infrastructure as code, Experience with configuration management, Knowledge of Kubernetes and Docker, English proficiency B2 or higher 📃 Skills: AWS, GCP, Azure, Python, Kubernetes, Docker, CI/CD, Terraform, Ansible, Monitoring, SLO, SLI, Git, Bash 🏢 Description: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. In this critical role, you will collaborate closely with software developers and operations teams to ensure high reliability, scalability, and efficiency of our systems, with a strong focus on meeting and exceeding customer expectations. Your expertise will be crucial in deploying, maintaining, and automating our infrastructure and application environments to ensure seamless user experiences. Your proactive involvement will be key to enhancing system reliability, optimizing resource utilization, and ensuring continuous improvement in our operational practices. Your responsibilities will include defining and tracking Service Level Objectives (SLOs), managing error budgets, and reducing toil through automation. You will play a pivotal role in driving the success of technology initiatives, maximizing their impact across the organization, and ensuring that solutions consistently meet the high standards our customers expect. Responsibilities Collaborate with development, security, quality, and operation teams to implement SRE practices and ensure system reliability Define and support required level of reliability, availability, and performance for services and applications Design and deliver Cloud-based solutions tailored to client needs Troubleshoot, mitigate, and support fixing of the infrastructure and application issues in a timely manner Implement a monitoring system for the infrastructure and application reliability Communicate technical concepts clearly to both engineering teams and management stakeholders Requirements Bachelor’s degree in Computer Science, Engineering, or a related field 3+ years of hands-on experience in Site Reliability Engineering or related roles Proven experience in any cloud (AWS/GCP/Azure) Experience with implementing SRE practices such as SLO/SLI, Error budgets, Postmortems, Reducing Toil, capacity planning, and Incident Management Python or other scripting/programming language Strong background in monitoring tools Proficiency in CI/CD tools, infrastructure as code, and configuration management Solid knowledge of container orchestration technologies (Kubernetes, Docker) English language proficiency at an Upper-Intermediate level (B2) or higher Nice to have Expertise in deployment and management of LLMs, including technologies like RAG Certification in Kubernetes, AWS/GCP/Azure, or similar technologies Proven experience in DevOps Knowledge of managing and optimizing AI/ML models in production environments, including basic deployment, monitoring, and maintenance We offer/Benefits We gather like-minded people: Engineering community of industry professionals Friendly team and enjoyable working environment Flexible schedule and opportunity to work remotely within Poland Chance to work abroad for up to 60 days annually Business-driven relocation opportunities We provide growth opportunities: Outstanding career roadmap Leadership development, career advising, soft skills, and well-being programs Certification (GCP, Azure, AWS) Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru English classes We cover it all: Stable income (Employment Contract or B2B) Participation in the Employee Stock Purchase Plan Benefits package (health insurance, multisport, shopping vouchers) Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more Referral bonuses Corporate, social and well-being events Please, note: The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview. We will reach out to selected candidates exclusively. EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Technology

ITDS

Senior SRE/DevOps Technical Lead – Observability and Automation

Senior

Hybrid

Krakow, Poland

25,200 - 29,820 PLN

🏢 Summary: Senior SRE/DevOps Technical Lead role focused on building and operating advanced SRE and observability platforms in a regulated banking environment. The position drives automation, reliability, and performance of highly available systems while leading CI/CD and monitoring initiatives. It combines hands-on technical expertise with leadership of international engineering teams. 🗂️ Requirements: 8+ years of experience in SRE, DevOps, or similar roles, Strong automation and scripting experience, Proficiency in CI/CD pipelines, Experience with observability and monitoring tools, Experience maintaining highly available, low-latency systems, Experience in regulated industries (banking, fintech, insurance), Ability to work from Krakow office at least 6 days per month, Fluent English, Legal right to work in Europe 📃 Skills: Python, Go, Bash, CI/CD, AppDynamics, Grafana, Splunk, OpenTelemetry, SRE, DevOps 🏢 Description: Empower uptime and reliability — lead the next wave of observability and automation excellence! Krakow-based opportunity with hybrid work model, allowing up to 3 remote days per week. As a Senior SRE/DevOps Technical Lead , you will be working for our client, a global leader in the banking and financial services industry. You will spearhead efforts to build and operate cutting-edge SRE and observability platform solutions, ensuring system reliability, automation, and performance across a highly regulated environment. This role offers a pivotal leadership position that drives innovation and engineering ownership within a diverse international team. Your main responsibilities: Lead and develop the delivery capability for SRE/observability platform solutions, fostering excellence in automation, reliability, and monitoring. Build and maintain highly available, low-latency systems aligned with banking industry standards. Drive automation and scripting initiatives utilizing Python, Go, Bash, and other technologies. Manage and optimize CI/CD pipelines and observability/monitoring stacks such as AppDynamics, Grafana, Splunk, and OpenTelemetry. Ensure optimal system performance and reliability across global operations. Collaborate effectively with international teams across different time zones. Provide technical leadership, mentorship, and guidance to team members. You're ideal for this role if you have: 8+ years of experience in SRE, DevOps, or similar leadership roles. Strong automation and scripting skills (Python, Go, Bash, etc.). Proficiency in CI/CD pipelines and observability/monitoring tools. Proven experience maintaining highly available, low-latency systems in regulated industries such as banking, fintech, or insurance. Ability to work in the Krakow office at least 6 days per month. Fluent English communication skills for global team collaboration. It is a strong plus if you have: (optional) Certifications related to DevOps, SRE, or cloud platforms. Language Required for the role: Fluent English Eligibility for the role: Only candidates with an existing legal right to work in Europe will be considered for this role. #MAKEYourCareerBETTER Interested? Apply now and include your CV (preferably in English) along with a statement confirming your consent to the processing and storage of your personal data.

Technology

Link Group

DevOps Engineer (Observability)

Senior

Hybrid

Warsaw, Poland

130 - 145 PLN

🏢 Summary: Design and scale next-generation observability and logging solutions within an international DevOps team, focusing on building high-scale monitoring platforms and cloud-native infrastructure from the ground up. The role combines architecture, infrastructure as code, and reliability engineering for distributed systems. You will drive metrics, logging, tracing, and alerting solutions in a collaborative environment. 🗂️ Requirements: Hands-on experience with Prometheus and Grafana, Experience scaling observability tools such as Thanos or Mimir, Experience managing ELK stack or Loki logging platforms, Strong proficiency in Terraform and Terragrunt, Deep understanding of Kubernetes, Experience with distributed systems observability (metrics, logs, traces), Full professional proficiency in English 📃 Skills: Prometheus, Grafana, Thanos, Mimir, ELK, Loki, Terraform, Terragrunt, Kubernetes, Python, Go, GitHubActions, Puppet 🏢 Description: The Opportunity Join a high-performing, international team of six DevOps experts. This is not a "maintenance-only" role. You will have a seat at the table in designing, building, and scaling our next-generation observability and logging solutions from the ground up. We believe in "Attitude First." If you are an ambitious engineer who thrives on collaboration, knowledge sharing, and solving complex distributed systems challenges, we want to grow with you. Key Responsibilities Architect & Build: Design and implement end-to-end observability solutions, including metrics, logging, tracing, and advanced alerting. Platform Excellence: Operate and optimize high-scale monitoring platforms (Prometheus, Mimir, Grafana) and ELK stack logging infrastructure. Infrastructure as Code: Define and maintain all observability systems using Terraform and Terragrunt . Reliability Engineering: Ensure the scalability and performance of our systems while supporting incident detection and root cause analysis (RCA). Collaborate: Work across domains with a team that values mentoring, transparency, and collective problem-solving. Your Technical Core Observability Expert: Solid hands-on experience with Prometheus, Grafana, and scaling tools like Thanos or Mimir . Logging Architect: Proven experience managing enterprise-grade logging platforms (ELK stack or Loki). IaC Ninja: Strong proficiency in Terraform/Terragrunt to manage infrastructure. Cloud Native: Deep understanding of Kubernetes and the complexities of metrics/logs/traces in distributed systems. Language: Full proficiency in English for seamless global collaboration. Stand Out From The Crowd (Nice to Have) Coding: Ability to automate and integrate using Python or Go . CI/CD: Exposure to GitHub Actions and automated workflows. Configuration Management: Experience with Puppet. SRE Mindset: Understanding of Service Level Indicators (SLIs), Objectives (SLOs), and Error Budgets.

Technology

Smart Pension

Senior Quality Engineer

Senior

Hybrid

Krakow, Poland

17,000 - 23,000 PLN

🏢 Summary: The offer is for a Quality Engineer embedded within frontend teams to drive a shift from traditional testing to a quality-first engineering culture. The role focuses on modernising test architecture, implementing contract testing, and optimising CI/CD pipelines to enable fast, reliable, and independent frontend delivery. You will coach developers, build robust automation frameworks, and ensure high-confidence deployments in a modern JS ecosystem. 🗂️ Requirements: Deep experience with modern JavaScript frameworks (React, Vue, or Next.js), Strong experience in frontend test automation and architecture, Experience implementing contract testing (Consumer-Driven Contracts), Hands-on experience with CI/CD pipeline optimisation, Experience with service virtualization and mocking strategies, Experience with Playwright and visual regression testing, Ability to design testable component-based architectures, Experience with observability tools and error tracking in QA, Ability to eliminate flaky tests and ensure deterministic pipelines 📃 Skills: JavaScript, React, Vue, Next.js, Playwright, Pact, CI/CD, Testing, Automation, Mocking, Virtualization, Contracts, Synthetics, Observability, VisualRegression 🏢 Description: At Smart, our mission is to transform retirement, savings and financial wellbeing, across all generations, around the world. THE ROLE We aren’t looking for a "tester" to catch bugs at the end of a sprint. We are looking for a Quality Engineer who acts as a quality multiplier for our frontend engineering teams (working across more than one squad). Your goal is to move the needle from "testing as a phase" to "quality as a standard." You will be the bridge between rapid UI iteration and bulletproof reliability. By focusing on quality coaching, architectural testability, and a "contract-first" mindset, you will enable our teams to deploy with high confidence, even when the backend is a moving target or unavailable (as a test reference). Key Pillars of the Role: Influence & Cultural Coaching Shift-Left Evangelism: Partner with Product and Tech Leads early in the discovery phase to identify edge cases before a single line of code is written. Quality Mentorship: Coach developers on writing meaningful, low-flake tests. You don’t just find bugs; you teach the team how to architect code that prevents them. The "Quality Mindset": Advocate for a culture where quality is a shared responsibility, not a hand-off. You’ll question the status quo and push for "Quality over Coverage." Infrastructure & Tooling (The "Enabler") Agnostic Delivery: Build and maintain sophisticated mocking and service virtualization strategies (e.g., Pact or similar) to ensure the frontend can be developed, tested, and demoed independently of backend availability. Pipeline Optimisation: Own the teams’ CI/CD feedback loop. You’ll ensure that our pipelines are fast, deterministic, and provide actionable signals, not just a wall of red text. Testing Modernisation: Maintain and extend next-gen tools (Playwright, Visual Regression) that align with a modern component-based architecture. Strategic Delivery Contract Testing: Implement and champion Consumer-Driven Contracts to ensure frontend and backend stay in sync without requiring heavy integrated environments. Observability in QA: Shift your focus from "did it pass?" to "how is it performing?" Using synthetics and error tracking to inform your testing strategy. Risk-Based Strategy: Use data to decide where to automate and where to explore, ensuring we spend our energy on the features that matter most to users, and ensure our SLAs are achieved. WHO WE ARE LOOKING FOR The skills, experience, and aptitudes we are looking for are listed below but please don’t be discouraged from applying if you don’t meet every single one of these criteria – having a ‘can do’ attitude is sometimes more important than being able to tick every box: Technical Chops - Deep experience / understanding of the modern JS ecosystem (React, Vue, or Next.js) and how to test it. Architectural Thinking - You understand how component libraries impact quality at scale. Communication - The ability to persuade a skeptical developer why a specific architectural change will improve long-term velocity. Bravery - You aren't afraid to stop a release if the process is broken, but you’d rather fix the process so the release never has to stop. WHY THIS ROLE IS DIFFERENT You won't be siloed in a "QA Department." You will be embedded in the heart of delivery. We give you the agency to blow up legacy testing patterns and replace them with lean, modern, and highly-automated workflows. If you believe that speed is a byproduct of quality, we want to talk to you. WHAT SUCCESS IN 6 MONTHS LOOKS LIKE Frontend teams can run a full suite of meaningful tests without a backend connection, and be confident in its quality. "Flaky tests" have been eradicated through better infrastructure and developer education. The team is deploying to production multiple times a day with a "Green Build" that they actually trust. WHO WE ARE We work in partnerships with governments and financial institutions in the UK and internationally. Our cloud-native digital platform is revolutionising how people around the world think about, and save for, their retirement. At heart, we’re a financial technology business. What we do is all about innovation, and using the power of digital change to put the customer first. Our Engineers will tell you that working at Smart gives you the opportunity to play your part in developing world-class technological solutions, working with – and learning from – like-minded people. You’ll also find that, across our business, our colleagues love Smart’s culture, and how what we do means better financial outcomes for savers. That feels worthwhile, and it means that what we do, collectively, goes way beyond the nine to five of a typical working day. Don’t just take our word for it – you can see what our colleagues say about working at Smart on LinkedIn Life and Glassdoor . BENEFITS At Smart, one of the eight principles we work to is “We want happy and good people in our team”. We created a list of benefits that helps us achieve this goal: 26 days’ of paid holiday per year + Polish bank holidays 2250 PLN annual training budget to spend on your professional development Health insurance (including dental care) via TU Inter MultiSport Plus Gym Card Online English lessons during working hours Sick leave in accordance with Polish labour law Paternity and maternity leave in accordance with Polish labour law Additional employer contribution to the Employee Capital Plans (PPK) of 2.5% Death in service insurance cover Fully-paid five-week sabbatical after five years of employment In the Krakow office wellbeing services, such as manicures, massages and barbers At Smart, we are committed to creating an inclusive and equitable workplace where everyone feels valued, respected, and empowered to do their best work. We believe that diverse perspectives help us lead the way in transforming retirement, savings, and financial wellbeing. We welcome differences in background, experience, thinking, and identity, and we recognise that innovation is strongest when it is built on inclusion and fairness. We encourage applications from people of all backgrounds and experiences and do not discriminate on the basis of any protected characteristic. If you require any reasonable adjustments during the recruitment process or in the workplace, we encourage you to let us know - we are committed to supporting you. We think Smart is an awesome place to work. If it sounds like somewhere you’d like to work, too, and if you’re ready to play your part in our continued success in the future, then naturally we’d love to meet you.

Technology

Link Group

Senior Site Reliability Engineer

Senior

Hybrid

Warsaw, Poland

170 - 230 PLN

🏢 Summary: The role focuses on ensuring reliability, scalability, and performance of large-scale cloud-based applications by building and maintaining resilient infrastructure. You will manage AWS cloud environments, Kubernetes clusters, and CI/CD pipelines while implementing monitoring, automation, and incident response processes. The position emphasizes Infrastructure-as-Code, observability, and continuous reliability improvements. 🗂️ Requirements: 5+ years experience in SRE, DevOps or similar role, Strong experience with AWS cloud services, Experience with Infrastructure-as-Code tools, Hands-on experience with Kubernetes, Proficiency with Docker, Experience with CI/CD pipelines, Solid knowledge of PostgreSQL or Amazon RDS, Strong SQL knowledge, Knowledge of networking concepts (VPC, DNS, troubleshooting), Strong Linux/Unix administration skills, Experience with observability tools, Experience with automation in infrastructure, Experience with incident management 📃 Skills: AWS, Terraform, Pulumi, Kubernetes, EKS, Docker, GitHub, PostgreSQL, RDS, SQL, VPC, DNS, Linux, Unix, Prometheus, Grafana, Datadog, Dynatrace, CI/CD 🏢 Description: We are looking for an experienced Site Reliability Engineer to ensure the reliability, scalability, and performance of large-scale cloud-based web applications. You will work closely with software development, cloud operations, and platform teams to build and maintain resilient infrastructure and improve system stability. Key Responsibilities: Design and maintain monitoring, alerting, and incident response systems to ensure high availability Collaborate closely with engineering, product, and architecture teams Build and manage cloud infrastructure using Infrastructure-as-Code (e.g., Terraform, Pulumi) on AWS Operate and optimize Kubernetes environments (e.g., EKS) Develop and maintain containerized applications using Docker Improve CI/CD pipelines and drive automation across deployment processes Implement and manage observability tools (logging, metrics, tracing) Participate in incident management, postmortems, and reliability improvements Support capacity planning, disaster recovery, and system scaling Contribute to security, compliance, and operational best practices Develop automation and AI-driven solutions for monitoring and incident prevention Requirements: 5+ years of experience in SRE, DevOps, or similar roles Strong experience with AWS cloud services and Infrastructure-as-Code tools Hands-on experience with Kubernetes and containerized environments Proficiency in Docker and CI/CD pipelines (e.g., GitHub Actions) Solid understanding of databases (e.g., PostgreSQL, Amazon RDS) and SQL Knowledge of networking concepts (VPC, DNS, troubleshooting tools like dig/traceroute) Strong Linux/Unix administration skills Experience with observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace) Familiarity with automation and AI-based solutions in infrastructure Strong problem-solving and incident management skills

Technology

New offer

SpotOn

Supportability (Quality Assurance) Manager

Senior

Hybrid

Krakow, MA, Poland

🏢 Summary: Lead the QA strategy as a Confidence (QA) Manager, driving automation and quality culture across Web and Android POS products used in high-pressure restaurant environments. Own automation frameworks, CI/CD testing pipelines, and device labs while mentoring a QA team and embedding shift-left testing practices. Ensure resilient, high-performance, zero-downtime user experiences across critical ordering and payment flows. 🗂️ Requirements: 3+ years leading QA or software testing teams, Hands-on experience in automated testing for Web or Android, Experience designing and maintaining automation frameworks, Experience managing device testing labs, emulators, or cloud device farms, Experience implementing CI/CD testing pipelines, Ability to define and track quality KPIs, Fluent English (written and spoken) 📃 Skills: Playwright, Cypress, Espresso, Appium, Kotlin, React, TypeScript, Python, Django, Go, PostgreSQL, Docker, GitHub, CI/CD, Android, AI, LLM 🏢 Description: We’re not just building restaurant tech, we’re giving independent restaurants the tools to compete and win. From our award-winning point-of-sale to AI-powered profit tools, everything we do helps operators boost profit, work smarter, and keep their best people. And every solution is backed by real humans who actually give a sh*t about helping restaurants succeed. Named the #1 Restaurant POS by G2 (Fall 2025), based on ratings from real users Rated the top-rated point-of-sale (POS) for restaurants, bars, retail, and small businesses by Capterra users Awarded Great Places to Work and Built In’s Best Workplaces for multiple years running We move fast, care hard, and fight for independent restaurant operators to do what they love, and love doing it. If you’re looking to make an impact with heart and hustle, SpotOn is the place for you. At SpotOn, we don’t believe in "QA" as a safety net that catches bugs at the very end of a sprint. We believe in Confidence —the absolute certainty that the code we ship today empowers our merchants to run their businesses flawlessly tomorrow. As a Confidence (Quality Assurance) Manager , you will lead the strategic vision and the team responsible for manual and automation testing across our products. You will guide our Automation Testing (AT) strategy, build robust tools, and—most importantly— foster a culture where the mindset of writing automated tests is shared by every single software engineer on the team. You will be a technical leader, a experience advocate, and a coach, ensuring our testing velocity matches our ambitious product roadmap. Why This Role Matters (Core Business Impact) Our software does not run in sterile, quiet offices. It runs in chaotic, high-heat, high-noise environments. Picture a Saturday night dinner rush: kitchen staff are rushing, servers are tapping screens with greasy or wet fingers, the Wi-Fi network is fluctuating, credit card readers are dropping connections, and the printer just ran out of paper mid-receipt. In this environment, staff have zero patience for sluggish UI transitions, and there is no room for error. High staff turnover in the US hospitality industry also means our systems must be incredibly intuitive—any UX friction leads to order mistakes and lost revenue. A system crash, a broken payment flow, or even a 3-second lag during peak hours doesn’t just mean a minor glitch; it means cold food, lost tips for hard-working servers, angry diners, and thousands of dollars in lost revenue for the restaurant owner. Your mission as a Confidence Manager is to address these harsh, real-world conditions: Where Code Meets Physical Chaos: Ensure our software is resilient to the unexpected—such as unexpected network dropouts (offline-mode reliability), hardware peripheral disconnects (printers, cash drawers, terminals), and rapid, muscle memory UI tapping. UX and Real-User Experience Focus: Champion tests that measure actual user perceived latency, UI transitions, and end-to-end user journeys rather than just validating API endpoints. Zero Downtime during Peak Hours: Guarantee that critical login, getting menu updates, checkout, ordering, payment flows, and end-of-day reports are bulletproof across Web and Mobile platforms through regression coverage. Rapid Time-to-Market: Accelerate feature shipping by building automated, self-healing, and highly reliable CI/CD test feedback loops, removing human-bottleneck manual validation. On a daily basis you will: Drive the "Shift-Left" Quality Culture: Work closely with Engineering Managers to embed quality early in the development lifecycle. Foster a shared-ownership model in which writing automated tests is a standard practice for all software engineers. Lead and Coach: Manage, mentor, and grow a team of Confidence (QA) Engineers. Conduct 1:1s, support career progression, and foster a healthy, continuous-learning team dynamic. Own the Automation Strategy: Design and continuously refine our Automation Testing (AT) platforms and frameworks for Web and Android environments. Manage the Device Testing Lab: Oversee the infrastructure for automated testing on physical Android stations, handhelds, and peripherals, combining emulator tests. Leverage Modern AI Tools: Champion the integration of generative AI and LLM-assisted tools (e.g., Claude, automated test generation, visual regression analysis, AI-driven self-healing test scripts) to dramatically speed up the creation and maintenance of automated tests. Maintain the Testing Platform: Work closely with DevOps and Platform teams to build and maintain ultra-fast, reliable CI/CD pipelines (e.g., via GitHub Actions) that the engineering organization fully trusts. Define Quality Metrics: Track and expose actionable quality KPIs (such as user-perceived performance, test coverage, execution speed, flaky test rates, and defect leakage) to drive data-informed improvements. What skill are we looking for? Management Experience: 3+ years of experience leading, mentoring, and scaling high-performing QA or software testing teams. Strong Automation Background: A proven hands-on technical background in automated testing for Web (e.g., Playwright, Cypress) or Android (e.g., Espresso, Appium, Kotlin automation). Having experience in both is a massive plus. UX and Experience Advocate: You don't just test functionality; you test usability. You have empathy for the end-user (the server, the chef, the diner) and have experience setting up visual regression, performance budget testing, and perceived latency benchmarks. Lab & Device Infrastructure Knowledge: Experience managing device testing matrices, cloud device farms or emulators, and physical test environments containing connected POS hardware. The "Shared-Quality" Mindset: A strong track record of teaching, influencing, and persuading developers to take ownership of test automation. You know how to make automated testing a low-friction process for all engineers. AI & Velocity Booster: Experience or clear, practical strategies for utilizing AI tools to accelerate automated test authoring, visual UI validation, or test maintenance. Pragmatic Problem Solver: You have the "Figure it Out" gene. You prefer end-user outcomes and business impact over technical purity. Excellent Communicator: Fluent in English (written and verbal), with the ability to articulate quality-related challenges, environmental edge cases, and trade-offs to both technical and non-technical stakeholders. Perfect candidate also has: Familiarity with our current stack: React (TypeScript), Python (Django), Go, Kotlin, PostgreSQL, Docker, GitHub Actions . Domain experience in Fintech, Merchant Services, Point of Sale, or SaaS hospitality platforms. Here’s a bit about what we have to offer: Competitive salary Training budget 3500 PLN gross per year. Access to e-learning platforms (O’Reilly). Fully paid private healthcare in LuxMed. Access to the Worksmile platform with a monthly top-up. Subsidized access to breakfast and lunch through the vending machine in Kraków office, and lunches in Gdańsk office once a week. Group English classes with a native speaker. New Macbook Pro, 4K monitors or whatever tools you need. Flexible working hours. New, modern, bright and comfortable office space in the city centre. Access to the company’s library. Great working atmosphere. Chill out room with a PlayStation and games. Free snacks and beverages in a kitchen. Company parties and social activities. Employee referral program. Relocation Package within Poland. Parking space in the underground garage. The controller of your personal data included in your job offer and others collected during the recruitment process is SpotOn Poland spółka z ograniczoną odpowiedzialnością with its registered office in Kraków, Aleja 29 listopada 20, 31-401 Kraków, e-mail address: poland-rodo@spoton.com . We will process your personal data for the purpose of current or, if you voluntarily agree, also future recruitment processes. More information about how we proccess your data, including the basis for processing and your rights in relation to the processing, can be found on our website: https://pl.spoton.com/privacy-and-cookies-policy . Administratorem podanych danych osobowych, w tym danych zawartych w dokumentach aplikacyjnych i zebranych w toku procesu rekrutacji, jest SpotOn Poland Sp. z o.o. z siedzibą w Krakowie (ul. Aleja 29 listopada 20, 31-401 Kraków; adres e-mail do kontaktu: poland-rodo@spoton.com . Dane osobowe będą przetwarzane w celu realizacji obecnych lub – jeżeli wyrazisz na to dobrowolną zgodę – także przyszłych procesów rekrutacji. Więcej informacji o tym, jak przetwarzamy Twoje dane, w tym także informacje o podstawach prawnych przetwarzania oraz o prawach przysługujących Ci w związku z przetwarzaniem danych, znajdziesz na stronie: https://pl.spoton.com/privacy-and-cookies-policy . We will never ask candidates to pay fees, purchase equipment, or share sensitive personal or financial information during the hiring process. All legitimate communication from our recruiting team will come from an official company email address (@ spoton.com ). If something seems suspicious, please contact us at careers@spoton.com . SpotOn is an equal employment opportunity employer. Qualified candidates are considered for employment without regard to race, religion, gender, gender identity, sexual orientation, national origin, age, military or veteran status, disability, or any other characteristic protected by applicable law.