June 8, 2026

DevOps Engineer (Observability)

Senior • Hybrid

130 - 145 PLN

Warsaw, Poland

The Opportunity

Join a high-performing, international team of six DevOps experts. This is not a "maintenance-only" role. You will have a seat at the table in designing, building, and scaling our next-generation observability and logging solutions from the ground up.

We believe in "Attitude First." If you are an ambitious engineer who thrives on collaboration, knowledge sharing, and solving complex distributed systems challenges, we want to grow with you.

Key Responsibilities

Architect & Build: Design and implement end-to-end observability solutions, including metrics, logging, tracing, and advanced alerting.
Platform Excellence: Operate and optimize high-scale monitoring platforms (Prometheus, Mimir, Grafana) and ELK stack logging infrastructure.
Infrastructure as Code: Define and maintain all observability systems using Terraform and Terragrunt.
Reliability Engineering: Ensure the scalability and performance of our systems while supporting incident detection and root cause analysis (RCA).
Collaborate: Work across domains with a team that values mentoring, transparency, and collective problem-solving.

Your Technical Core

Observability Expert: Solid hands-on experience with Prometheus, Grafana, and scaling tools like Thanos or Mimir.
Logging Architect: Proven experience managing enterprise-grade logging platforms (ELK stack or Loki).
IaC Ninja: Strong proficiency in Terraform/Terragrunt to manage infrastructure.
Cloud Native: Deep understanding of Kubernetes and the complexities of metrics/logs/traces in distributed systems.
Language: Full proficiency in English for seamless global collaboration.

Stand Out From The Crowd (Nice to Have)

Coding: Ability to automate and integrate using Python or Go.
CI/CD: Exposure to GitHub Actions and automated workflows.
Configuration Management: Experience with Puppet.
SRE Mindset: Understanding of Service Level Indicators (SLIs), Objectives (SLOs), and Error Budgets.

Similar jobs you might like

Technology

Link Group

Network Observability & Automation Engineer

Senior

Hybrid

Warsaw, Poland

37,000 - 47,000 PLN

🏢 Summary: Engineering role focused on building and scaling observability and automation solutions for a global, high-performance trading network. The position involves designing telemetry, monitoring, and alerting systems while driving automation to improve reliability and operational efficiency. You will troubleshoot complex network environments and develop infrastructure tooling using Python and Infrastructure-as-Code practices. 🗂️ Requirements: Hands-on experience with network observability and automation, Practical experience with Prometheus, Grafana, Telegraf, Experience with gNMI, YANG, CiscoTelemetry, Experience with Zabbix, LibreNMS, SNMP environments, Strong Python scripting skills, Experience with Ansible, Jinja, Git, Experience with Terraform and Infrastructure-as-Code, Familiarity with NetBox, NETCONF, RESTCONF, Experience with AWS cloud networking, Strong knowledge of BGP, OSPF, PIM, IGMP, Proven troubleshooting skills in large-scale network environments 📃 Skills: Python, Ansible, Jinja, Git, Terraform, NetBox, NETCONF, RESTCONF, Prometheus, Grafana, Telegraf, Zabbix, LibreNMS, SNMP, gNMI, YANG, CiscoTelemetry, AWS, BGP, OSPF, PIM, IGMP, Telemetry, Observability, Automation 🏢 Description: Join a team building and evolving observability and automation capabilities for a global, high-performance multi-vendor trading network. This role is ideal for an engineer passionate about scalable monitoring, telemetry, automation, and operational excellence in complex network environments. What you’ll do: Build and enhance network observability platforms and automation frameworks Design scalable telemetry, monitoring, and alerting solutions across a global infrastructure Drive automation initiatives to improve network reliability, visibility, and operational efficiency Develop tooling and infrastructure integrations using Python and Infrastructure-as-Code principles Collaborate with technology and investment teams to deliver resilient business-critical solutions Troubleshoot complex network and observability issues in large-scale environments What we’re looking for: Strong hands-on experience with network observability and automation Practical expertise with modern observability stack: Prometheus, Grafana, Telegraf, telemetry technologies (gNMI, YANG, Cisco Telemetry) Experience with monitoring platforms such as Zabbix, LibreNMS, and SNMP-based environments Strong scripting and automation skills using Python, Ansible, Jinja, and Git Familiarity with automation and infrastructure tooling including Terraform, NetBox, NETCONF, and RESTCONF Experience supporting cloud networking environments (AWS preferred) Good understanding of networking fundamentals and protocols such as BGP, OSPF, PIM, and IGMP Proven troubleshooting skills within complex, large-scale infrastructure environments Who you are: Passionate about observability, automation, and scalable infrastructure Proactive, self-driven, and eager to continuously improve systems and processes Comfortable operating in fast-paced, mission-critical environments Strong communicator and collaborative problem solver

Technology

Connectis

DevOps/SRE

Senior

Remote

Warsaw, Poland

143 - 209 PLN

🏢 Summary: DevOps / SRE role focused on scaling and standardizing observability across a large enterprise environment with over 160 applications. The position involves integrating systems with a central observability model, defining SRE standards, and supporting teams in monitoring, logging, and metrics across cloud and enterprise platforms. The role is horizontal and advisory, emphasizing enablement, architecture guidance, and end-to-end visibility. 🗂️ Requirements: Minimum 5 years experience as SRE or DevOps Engineer in enterprise environments, Strong knowledge of Microsoft Azure core services and cloud architecture, Hands-on experience with Prometheus and Grafana, Practical knowledge of distributed tracing, centralized logging, metrics and visualization, Basic experience with Kubernetes deployment and container management, Practical experience with OpenTelemetry instrumentation, Experience defining and monitoring SLO and SLI, Fluent English 📃 Skills: Azure, Prometheus, Grafana, Kubernetes, OpenTelemetry, SLO, SLI, Loki, Dynatrace, Datadog, NewRelic, AppDynamics, ServiceNow, Logscale, LQL, GCP, Python, PowerShell, SAP, Oracle, Salesforce 🏢 Description: Do zespołu Observability poszukujemy doświadczonej osoby na stanowisko DevOps / SRE , który odegra kluczową rolę w skalowaniu i standaryzacji rozwiązań observability w dużej organizacji o złożonym krajobrazie technologicznym. Projekt koncentruje się na ujednoliceniu monitoringu, logowania i metryk dla ponad 160 aplikacji działających w wielu obszarach technologicznych (kilka niezależnych domen / „towerów”), obejmujących zarówno środowiska chmurowe, jak i rozbudowane systemy klasy enterprise. Rola ma charakter horyzontalny i enablementowy, jej celem jest wspieranie zespołów produktowych i utrzymaniowych w integracji systemów z centralnym modelem observability oraz współtworzenie i promowanie wspólnych standardów SRE / DevOps w skali całej organizacji. 💡 TWOJA ROLA: Analiza lokalnych rozwiązań monitoringowych i mapowanie ich do wspólnego modelu SRE. Definiowanie i promowanie standardów observability (naming, schematy danych, konwencje). Udział w PoC / pilotach (zbieranie metryk, konfiguracja, testy wysyłki danych do Azure). Wsparcie zespołów w interpretacji danych observability (RCA, SLO/SLA, diagnostyka). Współpraca z zespołami produktowymi, zespołami utrzymaniowymi oraz vendorami. Budowanie widoczności end-to-end dla krytycznych procesów biznesowych. Integracja systemów dziedzinowych z centralnym modelem observability. Doradztwo architektoniczne. 🔍 CZEGO OCZEKUJEMY OD CIEBIE? Minimum 5-letnie doświadczenie w roli SRE / DevOps Engineer w środowiskach enterprise. Znajomość platformy Microsoft Azure w zakresie core services oraz podstaw architektury chmurowej. Bardzo dobra znajomość narzędzi monitoringu i observability, w szczególności Prometheus i Grafana. Praktyczna znajomość observability : distributed tracing, centralne logowanie, metryki i wizualizacja. Podstawowa znajomość Kubernetes w zakresie deploymentu oraz zarządzania kontenerami. Praktyczna znajomość OpenTelemetry (OTel) w zakresie instrumentacji aplikacji. Doświadczenie w definiowaniu, wdrażaniu oraz monitorowaniu SLO / SLI. Biegła znajomość języka angielskiego. Mile widziane: Doświadczenie w pracy z rozbudowanymi systemami vendorowymi (takimi jak SAP , Oracle , Salesforce lub innymi platformami klasy enterprise). Doświadczenie z narzędziami klasy enterprise observability: Dynatrace, Datadog, New Relic, AppDynamics. Podstawowa znajomość ServiceNow w zakresie zarządzania incydentami i zmianami. Doświadczenie z Logscale zarządzanie logami, zapytania i analiza (LQL). Praktyczne doświadczenie z Loki - centralne logowanie. Podstawowa znajomość GCP, Python i Powershell . ✨ OFERUJEMY: 🤖 Nowoczesny proces rekrutacji z AI Rekruterem (AIR) - podczas aplikacji możesz odbyć rozmowę z wirtualnym rekruterem 24/7, bez czekania na telefon, z natychmiastowym feedbackiem i możliwością powtórzenia rozmowy (liczy się ostatnia wersja). Finalną decyzję zawsze podejmuje Rekruter Connectis. Uczestnictwo w spotkaniach integracyjnych oraz meetupach technologicznych, umożliwiających dzielenie się wiedzą i doświadczeniem. Wsparcie dedykowanej osoby kontaktowej z Connectis, dostępnej w celu pomocy w sprawach związanych z projektem. Stabilne i długoterminowe zatrudnienie w firmie o ugruntowanej pozycji na rynku. 100% zdalnie. Pełna praca zdalna, bez konieczności dojazdów. Możliwość rozwoju w nowoczesnym, dynamicznym środowisku IT. 5000 PLN za polecenie znajomych do naszych projektów. Szybki, zdalny proces rekrutacyjny. Dziękujemy za wszystkie zgłoszenia. Pragniemy poinformować, że skontaktujemy się z wybranymi osobami. 12821/NS

Technology

emagine Polska

Site Reliability Engineer

Senior

Remote

Lisbon, Portugal

🏢 Summary: Hands-on Observability Engineer role focused on building and automating enterprise-grade monitoring and observability solutions across AWS-based cloud and distributed systems. The position centers on developing infrastructure as code, CI/CD pipelines, and monitoring ecosystems to improve reliability, performance, and incident response. Approximately 90% of the role involves coding in Python and Terraform. 🗂️ Requirements: Strong hands-on experience with AWS, Strong Python development and scripting experience, Strong experience with Terraform, Experience building and maintaining CI/CD pipelines using Jenkins, Experience with Elasticsearch and ELK Stack, Experience with Linux systems, Shell scripting skills, Understanding of monitoring, logging, and alerting concepts, Experience working in Agile or DevOps environments 📃 Skills: AWS, Python, Terraform, Jenkins, Elasticsearch, ELK, Linux, Bash, CI/CD, Kubernetes, Grafana, Prometheus, Datadog, NewRelic, Snowflake, Databricks, dbt, Matillion 🏢 Description: Role Overview We are looking for a skilled and proactive Observability Engineer to implement, automate, and support enterprise-grade observability and monitoring solutions across cloud and application platforms. The ideal candidate should have strong AWS infrastructure knowledge, hands-on automation skills, and experience building reliable monitoring and alerting ecosystems for modern distributed applications. The role involves working closely with Platform Engineering, Data Engineering, and Application teams to develop observability solutions and bring operational visibility, reliability, incident detection, and platform performance. Main Responsibilities · Design, implement, and maintain observability solutions for cloud-native and distributed systems. · Build monitoring, logging, alerting, and dashboarding solutions across infrastructure and applications. · Develop automation scripts and tooling using Python. · Implement and maintain Infrastructure as Code (IaC) using Terraform. · Build and support CI/CD pipelines using Jenkins and Git-based workflows. · Configure and optimize monitoring for AWS services, Kubernetes workloads, APIs, databases, and applications. · Create actionable alerts and operational dashboards to improve incident response and system reliability. · Work with engineering teams to onboard applications into observability platforms. · Support troubleshooting, root cause analysis, and performance optimization initiatives. · Ensure observability standards, governance, and best practices are followed across projects. Key Requirements · Strong hands-on experience with Amazon Web Services (AWS). · Solid Python development/scripting experience. · Strong experience with Terraform. · Experience building and maintaining CI/CD pipelines using Jenkins. · Elasticsearch / ELK Stack experience and building queries. · Worked with Data Platforms monitoring is preferred. · Experience with Linux systems and shell scripting. · Understanding of monitoring, logging, and alerting concepts. · Experience working in Agile/DevOps environments. Nice to Have Skills Experience with any of the following is highly desirable: · Snowflake · Databricks · dbt · Matillion · Grafana · New Relic · Datadog · Prometheus · Elasticsearch / ELK Stack experience NOTES: We are looking for an Engineer who loves to build. This is a highly technical role—90% of the job is hands-on coding in python and terraform.

Technology

Grid Dynamics Poland

Site Reliability Engineer

Senior

Hybrid

Warsaw, Poland

🏢 Summary: Site Reliability Engineer role focused on leading the cloud platform layer of a large-scale enterprise migration to GCP, with full ownership of observability and FinOps capabilities. The position involves architecting cost attribution, distributed tracing, monitoring, and performance engineering solutions in a production-grade Kubernetes environment. You will work on complex distributed systems, extending multi-language codebases and managing infrastructure as code in a regulated enterprise setting. 🗂️ Requirements: 4–6 years software or DevOps engineering experience, 2–3 years hands-on cloud infrastructure management in production, Strong GCP expertise including GKE and Cloud Run, Proven experience building observability solutions with OpenTelemetry, Experience with distributed tracing and profiling in distributed systems, Advanced Python scripting for automation and tooling, Strong Terraform proficiency with multi-environment setups, Ability to read and modify Kotlin and Java codebases, Experience implementing monitoring, alerting, and SLOs for containerized/serverless services, Experience with infrastructure cost attribution and cloud billing APIs 📃 Skills: GCP, GKE, CloudRun, Kubernetes, OpenTelemetry, Terraform, Python, Kotlin, Java, FinOps, PubSub, Bigtable, Docker, SLO, Tracing 🏢 Description: We are looking for a Site Reliability Engineer to join a high-stakes global tech ecosystem and drive the delivery of a critical enterprise platform migration to the cloud. Your core mission will be to architect, build, and productionalize the observability and cost intelligence (FinOps) layer for a massive, multi-year financial platform transformation. You will take end-to-end ownership of the cloud platform layer, giving internal stakeholders full visibility into platform behavior, performance, and infrastructure spend. Working alongside a nearshore team of senior engineers, you will solve highly complex architectural challenges in a production-grade, distributed system. Responsibilities: End-to-End Infrastructure & FinOps Ownership: Architect and implement a cloud usage and cost attribution dashboard, providing detailed per-pod and per-service cost breakdown using cloud billing APIs and internal FinOps hubs. Advanced Observability & Tracing: Instrument end-to-end distributed tracing using OpenTelemetry, configuring collectors within Kubernetes environments and exporting traces to cloud monitoring systems utilizing RED metrics. Performance Engineering & Stress Testing: Write custom tooling from scratch to deliver database performance monitoring, load testing, and trend analysis for critical underlying storage layers. Monitoring & Alerting Automation: Build and deploy scalable production monitoring, custom alerting policies, and SLO tracking for containerized and serverless services. Infrastructure as Code: Independently manage, write, and apply infrastructure modifications using Terraform, working within established enterprise repository standards, modules, and environment state management. Cross-Language Codebase Extension: Read, debug, and extend existing platform code across a diverse stack including Kotlin, Java, and Python to seamlessly integrate technical metrics without disrupting business logic. Quality & Release Assurance: Implement rigorous unit testing with high code coverage for all newly developed monitoring tools to comply with strict enterprise quality gates and sign-offs. Min requirements: Experience: 4 to 6 years of professional software or DevOps engineering experience, with at least 2 to 3 years of hands-on cloud infrastructure management in production. Advanced Cloud Infrastructure: Deep operational proficiency with Google Cloud Platform (GCP), specifically with managing and configuring workload-level alerting on Google Kubernetes Engine (GKE) and Cloud Run. Observability & OpenTelemetry: Proven track record of building observability solutions in distributed systems, using OpenTelemetry (both auto and manual instrumentation) alongside distributed tracing and profiling tools. Strong Automation Scripting: Intermediate-to-advanced fluency in Python for writing custom test tooling, metrics integration scripts, and backend automation from scratch. Solid Infrastructure as Code: Strong proficiency in Terraform, including experience with multi-environment setups, workspaces, and corporate module standards. Polyglot & JVM Familiarity: Practical ability to read, understand, and modify existing backend codebases written in Kotlin and Java. Crucial Non-Technical Skills: Extreme technical autonomy to resolve blockers independently, rapid onboarding skills into large unfamiliar codebases, and fluent written English for async alignment and pull requests. Process Alignment: Ability to thrive in a highly regulated enterprise environment with strict peer reviews, robust documentation requirements, and formal deployment procedures. Would be a plus: Domain Knowledge: Previous experience working within financial services, fintech, investment banking, or other highly regulated industries. Enterprise Streaming Tools: Working knowledge of cloud messaging systems (such as Cloud Pub/Sub) utilized for inter-service communication. Advanced Storage Engines: Familiarity with high-throughput distributed database architectures, such as Google Cloud Bigtable. Systems Languages Awareness: Ability to read or debug foundational code written in low-level systems languages like Rust or C++ during multi-stack production deployments. We offer: Opportunity to work on bleeding-edge projects Work with a highly motivated and dedicated team Competitive salary Flexible schedule Benefits package - medical insurance, sports Corporate social events Professional development opportunities Well-equipped office About us: Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Technology

Yard Corporate

Site Reliability Engineer (SRE)

Senior

Hybrid

Warsaw, Poland

40,000 - 55,000 PLN

🏢 Summary: Senior Site Reliability Engineer role focused on building and standardizing SRE practices across a hybrid AWS and on-prem infrastructure. The position centers on ensuring scalability, resilience, and high availability of high-frequency, data-intensive platforms through observability, automation, and Kubernetes optimization. You will define SLOs, enhance monitoring architecture, and drive reliability culture across engineering teams. 🗂️ Requirements: 5+ years experience in SRE, DevOps, or Infrastructure Engineering supporting distributed production systems, Bachelor’s degree in Computer Science, Computer Engineering, or related field (or equivalent experience), Deep expertise in Grafana, Prometheus, Loki, and Tempo (OpenTelemetry), Strong production experience with Docker and Kubernetes, Experience managing hybrid infrastructure (AWS and on-premises), Proficiency in at least one language: Python, Go, or Bash, Hands-on experience with CI/CD pipelines and Infrastructure-as-Code, Experience defining and managing SLOs and SLAs, Willingness to participate in on-call rotation 📃 Skills: AWS, Kubernetes, Docker, Prometheus, Grafana, Loki, Tempo, OpenTelemetry, Python, Go, Bash, CI/CD, IaC, Git, Hypervisors 🏢 Description: About the Client Our client is a premier, global investment management firm operating at the intersection of finance and technology. Known for their sophisticated, data-intensive systems, they build and maintain high-performance platforms that process massive volumes of market and operational data. To support their expanding footprint, they are looking for a senior-level Site Reliability Engineer (SRE) who will take ownership of shaping, standardizing, and scaling their SRE frameworks and reliability culture from the ground up. The Role In this role, you will serve as a foundational force for SRE practices, partnering directly with Cloud, Infrastructure, and Software Engineering squads. You will work across a hybrid infrastructure (combining advanced AWS cloud environments and physical on-premises servers) to guarantee the scalability, resilience, and maximum uptime of critical, high-frequency transactional platforms. Core Responsibilities SRE Evangelism: Design, implement, and champion core reliability principles, helping technology teams adopt sustainable scaling practices. Observability Architecture: Implement, scale, and maintain end-to-end monitoring, telemetry, and distributed tracing systems utilizing Prometheus, Grafana, Loki, and Tempo (OpenTelemetry framework). Kubernetes Optimization: Establish best-practice configurations for containerized workloads, ensuring applications running on Kubernetes are highly resilient, cost-effective, and performant. Incident Management & Culture: Participate in a balanced, shared on-call rotation (averaging one week per month). Automation & Engineering: Build custom tooling and CI/CD pipelines to automate routine tasks, system health checks, and rapid disaster recovery workflows. SLO/SLA Definition: Partner with product and engineering teams to define, monitor, and enforce Service Level Objectives (SLOs) and Error Budgets. What We Look For Experience: 5+ years of hands-on experience in a dedicated SRE, DevOps, or Infrastructure Engineering role supporting complex, distributed production systems. Education: A Bachelor’s degree in Computer Science, Computer Engineering, or a related technical discipline (or equivalent practical experience). Observability Expertise: Deep, subject-matter knowledge of modern monitoring stacks, specifically Grafana, Prometheus, Loki, and Tempo (OTel). Orchestration & Containers: Strong, production-grade expertise in containerization (Docker) and orchestration (Kubernetes). Hybrid Infrastructure: Experience navigating hybrid models—managing both cloud services (AWS preferred) and physical on-premise hardware resources. Scripting/Coding: Proficiency in writing clean, maintainable code in at least one scripting or programming language (e.g., Python, Bash, or Go) to build reliable automation. Methodologies: Solid grounding in CI/CD concepts, infrastructure-as-code (IaC), and agile development processes. Soft Skills: Excellent verbal and written communication skills, with a proven ability to convey complex infrastructure and reliability concepts to both technical and non-technical stakeholders. What We Offer Stable Employment: Full-time employment contract ( Umowa o Pracę - UoP ). Tax Optimization: Eligibility for creative tax-deductible costs ( KUP - Koszty Uzyskania Przychodu). Financial Reward: Highly competitive base salary accompanied by a generous annual performance bonus . Comprehensive Health: Premium private medical care package that fully includes dental coverage (stomatologia) . Wellness & Lifestyle: MultiSport card to keep you active and healthy. Daily Perks: Pre-funded lunch card for your daily meals. Tech Stack at a Glance Cloud & Virtualization: AWS, Kubernetes, Docker, On-Premises Hypervisors Observability: Prometheus, Grafana, Loki, Tempo, OpenTelemetry (OTel) Languages: Python, Go, Bash CI/CD & Automation: Git-based pipelines, Configuration Management, IaC

Technology

emagine Polska

Senior Infrastructure / DevOps Engineer

Senior

Hybrid

Lisbon, Portugal

🏢 Summary: Senior Infrastructure / DevOps Engineer role focused on operating and evolving on-premises infrastructure while expanding cloud capabilities in hybrid environments. The position combines infrastructure administration, automation, CI/CD, and Kubernetes operations, supporting mission-critical production systems. Includes mandatory 24x7 L2 on-call participation to ensure high availability and reliability. 🗂️ Requirements: 7+ years experience in infrastructure or DevOps roles, Strong hands-on experience with on-premises environments, Solid experience with Proxmox and KVM virtualization, Strong Linux systems administration expertise, Proven production experience with Ansible, Knowledge of IP networking, routing, VLANs and firewalls, Experience with SAN/NAS storage and high-availability architectures, Experience with CI/CD tools, Experience with monitoring and observability stacks, Scripting skills in Bash or Python, Experience with PostgreSQL or Oracle administration, Hands-on experience with Docker, Operational knowledge of Kubernetes, Experience with AWS or Azure, Availability for 24x7 L2 on-call rotation 📃 Skills: Proxmox, KVM, Linux, Ansible, CICD, GitLab, Jenkins, Bash, Python, PostgreSQL, Oracle, Docker, Kubernetes, AWS, Azure, Prometheus, Grafana, SAN, NAS, VLAN, Firewalls, IP, Routing 🏢 Description: We are looking for a Senior Infrastructure / DevOps Engineer with strong hands-on experience in on-premises infrastructure and automation, who is evolving or consolidating their skills in cloud environments . This role is critical to the stability, reliability and continuous improvement of our technology platforms, including mission-critical production systems. This position combines infrastructure administration, DevOps practices, automation and continuous operations , with mandatory participation in a x7 on-call rotation (L2 support) . Key Responsibilities On-Premises Infrastructure (Core Focus) Operate, maintain and evolve on-premises infrastructure , ensuring stability, performance and high availability. Administer virtualization environments based on Proxmox (KVM) , including clusters, capacity planning and HA. Manage storage systems (SAN/NAS), including performance, redundancy, backup and recovery. Perform Linux systems administration , including hardening, patching, security and best practices. Work with networking concepts such as IP networking, VLANs, basic routing and firewalls . DevOps & Automation Automate infrastructure provisioning and configuration using Ansible and Infrastructure as Code principles. Build, maintain and improve CI/CD pipelines . Collaborate closely with development teams to improve deployment, reliability and troubleshooting processes. Reduce manual operational tasks through automation and standardization. Cloud Operations (Growth & Consolidation Phase) Operate and support public cloud environments (AWS, Azure) , primarily using IaaS services. Work in hybrid environments (on-prem + cloud) , ensuring integration, security and operational consistency. Apply cloud best practices related to cost management, security and reliability. Operate and support Kubernetes workloads , with a focus on day-to-day operations, networking, storage and troubleshooting. Reliability, Operations & On-Call Ensure high levels of availability, reliability and operational excellence . Participate actively in a 24x7 on-call rotation (L2 support) for production systems. Respond to incidents, perform root cause analysis and define corrective and preventive actions. Implement and maintain solutions (e.g., Prometheus, Grafana) for monitoring, logging and metrics . Collaboration & Documentation Work closely with engineering and product teams in a multidisciplinary environment. Create and maintain clear and practical documentation (runbooks, operational guides, handbooks). Support onboarding and promote operational and infrastructure best practices. Key Requirements 7+ years of experience in infrastructure, systems, DevOps or similar roles. Strong, hands-on experience with on-premises environments . Solid experience with Proxmox (KVM) . Strong knowledge of Linux systems administration . Proven experience using Ansible in production environments. Good understanding of networking fundamentals (IP, routing, firewalls, VLANs). Strong understanding of storage concepts and high-availability architectures. Experience with CI/CD tools (GitLab CI, Jenkins or equivalent). Experience with monitoring and observability stacks . Scripting skills (Bash, Python or similar). Experience with database administration (e.g. PostgreSQL, Oracle). Hands-on experience with container technologies such as Docker. Functional knowledge of Kubernetes , with an operational focus. Experience with public cloud platforms (AWS and/or Azure) . Profile & Mindset Strong operational mindset and sense of ownership. Comfortable working with critical production systems . Able to work autonomously and make technical decisions. Clear interest in growing cloud skills , grounded in strong traditional infrastructure experience. Practical, resilient and solution-oriented approach. Strong communication and collaboration skills. Professional proficiency in English (written and spoken).

Technology

emagine Polska

Observability Specialist

Senior

Hybrid

Warsaw, Poland

🏢 Summary: The offer is for an Observability Specialist responsible for designing, implementing, and maintaining a scalable telemetry and monitoring infrastructure in cloud-native environments. The role focuses on Kubernetes observability, Elastic Stack management, and performance optimization using modern telemetry standards. It involves driving SRE practices and ensuring high system reliability through advanced monitoring and AIOps solutions. 🗂️ Requirements: Experience monitoring Kubernetes (OpenShift) environments, Hands-on implementation of OpenTelemetry for logs, traces, and metrics, Strong expertise in ELK stack deployment and maintenance, Proficiency in automating Elastic environments using Ansible, Experience with Application Performance Monitoring for code-level analysis, Knowledge of shard optimization, mapping, and Index Lifecycle Management, Experience defining and monitoring SLOs and managing Error Budgets, Integration of observability solutions with major cloud providers 📃 Skills: Kubernetes, OpenShift, OpenTelemetry, Elasticsearch, Logstash, Kibana, Ansible, ElasticAPM, AIOps, SRE, ILM, Sharding, Mapping, Cloud 🏢 Description: Introduction & Summary We are seeking an experienced Observability Specialist dedicated to ensuring the reliability and performance of our systems. This role involves collaborating with enterprise architects and IT professionals to design, implement, and oversee a scalable telemetry infrastructure. The ideal candidate will possess deep expertise in ELK or similiar technologies and modern telemetry standards. Main Responsibilities As our Observability Engineer, your core duties will include: Architectural Collaboration: Partner with system architects and local engineering teams in Denmark to design resilient monitoring solutions. Monitor Kubernetes environments with OpenTelemetry (OTel) standards for logs, traces, and metrics. Manage centralized data collection and automate Elastic deployments using Ansible. Utilize Elastic APM for identifying code-level bottlenecks and resolving latency issues. Implement AIOps configurations for proactive anomaly detection and automated root-cause analysis. Drive Site Reliability Engineering (SRE) methodologies across teams. Elastic Stack Management: Deploy, scale, and maintain Elasticsearch, Logstash, and Kibana (ELK) environments. Key Requirements Cloud-Native Observability: Strong skills in monitoring Kubernetes (Openshift) environments and integrating with major cloud providers. APM & Distributed Tracing: Expertise in Application Performance Monitoring (APM) to identify code-level bottlenecks and latency issues. OpenTelemetry (OTel): Hands-on experience implementing OpenTelemetry (or similiar) standards for logs, traces, and metrics to ensure vendor-neutral telemetry. Infrastructure as Code (IaC): Proficiency in automating Elastic environments with Ansible. Performance Engineering: Expert-level knowledge of shard optimization, mapping, and Index Lifecycle Management (ILM) to balance high performance with cost control. SRE Methodology: Experience defining and monitoring Service Level Objectives (SLOs) and managing Error Budgets. Strong communication skills for collaboration with IT teams. NIce to Have: Elastic Stack Mastery: Deep expertise in architecting and managing Elasticsearch, Logstash, and Kibana (ELK) at scale. Data Ingestion & Fleet: Proven experience deploying Elastic Agent and Fleet for centralized agent management and data collection. AIOps & Machine Learning: Ability to configure Elastic ML models for proactive anomaly detection and automated root cause analysis. Other Details This is position based in Warsaw, flexible Hybrid model, focused on leading-edge observability solutions in a dynamic and collaborative environment.

Technology

emagine Polska

Senior DevOps / SRE (Platform Reliability Engineer) - French fluent

Senior

Remote

Lisbon, Portugal

🏢 Summary: Senior DevOps / SRE role focused on ensuring reliability, scalability, security, and performance of a cloud-native AWS platform. The position centers on infrastructure automation, CI/CD, Kubernetes operations, observability, and implementing SRE best practices to support highly available production systems. You will lead incident management, optimize cloud costs, and drive continuous improvement of platform resilience. 🗂️ Requirements: 5+ years in DevOps/SRE/Cloud/Platform Engineering, Strong Linux administration and troubleshooting, Production experience with Kubernetes, Experience with CI/CD tools, Expertise in Infrastructure as Code, Hands-on experience with AWS, Strong networking fundamentals, Experience with monitoring and logging tools, Scripting skills (Bash or Python) 📃 Skills: AWS, Kubernetes, Docker, Helm, Terraform, Ansible, CloudFormation, Linux, GitLab, Jenkins, GitHub, Azure, Prometheus, Grafana, ELK, Datadog, Splunk, Bash, Python, TCP/IP, DNS 🏢 Description: We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and security of our platform and cloud infrastructure. You will play a key role in building and operating cloud-native systems, improving observability, automating operations, implementing SRE best practices (SLOs/SLIs), and supporting development teams to deliver highly available services. Key Responsibilities Design, implement, and maintain highly available and scalable infrastructure on AWS. Own and improve the reliability of production systems using SRE principles (SLO, SLI, error budgets). Build and manage CI/CD pipelines to support fast and safe software delivery. Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, CloudFormation, etc. Manage and optimize container orchestration platforms (Kubernetes, Docker, Helm). Implement and maintain monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK, Datadog, Splunk). Lead incident response, perform root cause analysis, and write postmortems to drive continuous improvement. Improve system performance, capacity planning, scaling strategies, and disaster recovery processes. Collaborate closely with development teams to improve deployment strategies and system resilience. Implement security best practices (IAM, secret management, vulnerability scanning, patching). Define operational standards, runbooks, documentation, and best practices for platform reliability. Participate in on-call rotation and provide senior-level support for critical production issues. Key Responsibilities (5 Main Missions) The DevOps / SRE lead will be responsible for the stability and evolution of the platform. Your role is structured around five main areas: Mission 1: AWS Infrastructure Management (Build & Run) Mission 2: CI/CD and Deployment Automation Mission 3: Monitoring, Observability, and Alerting: Global Monitoring , Log Management , Application Monitoring , Business Analytics Mission 4: Incident Management, Resilience, and Security Mission 5: FinOps and AWS Cost Optimization Key Requirements 5+ years of experience in DevOps / SRE / Cloud Infrastructure / Platform Engineering. Strong expertise in Linux systems administration and troubleshooting. Proven experience with Kubernetes in production environments. Strong experience with CI/CD tools (GitLab CI, Jenkins, GitHub Actions, Azure DevOps). Solid knowledge of Infrastructure as Code (Terraform highly preferred). Experience with AWS cloud platforms. Strong understanding of networking fundamentals (TCP/IP, DNS, load balancing, reverse proxies). Experience with observability tools: monitoring, metrics, logging, tracing. Strong scripting skills (Bash, Python, or similar). French advanced level. Nice to Have Experience with additional cloud platforms (Azure, GCP). Strong understanding of networking fundamentals.

Technology

Link Group

Senior Devops Engineer

Senior

Hybrid

Warsaw, Poland

28,000 - 38,000 PLN

🏢 Summary: Senior DevOps Engineer role focused on owning and evolving cloud-native infrastructure and CI/CD platforms that support large-scale data processing systems. The position combines hands-on engineering and strategic impact to ensure scalable, secure, and reliable production environments. You will design, automate, and optimize platform services enabling efficient delivery of data-driven applications. 🗂️ Requirements: 5+ years in DevOps, SRE, or infrastructure engineering, Experience supporting distributed production systems, Hands-on experience with public cloud platforms, Strong knowledge of containerization and orchestration, Experience with infrastructure as code, Strong scripting or programming skills, Experience building and maintaining CI/CD pipelines, Knowledge of observability practices and tools, Strong troubleshooting and incident response skills in Linux environments 📃 Skills: AWS, Docker, Kubernetes, Terraform, Python, Bash, CI/CD, Linux, Monitoring, Logging, Alerting 🏢 Description: Senior DevOps Engineer We are looking for an experienced engineer to take ownership of our infrastructure and platform ecosystem, supporting large-scale data processing systems and enabling efficient, reliable software delivery. This role combines hands-on engineering with strategic impact — you will design, build, and evolve the platform that underpins data pipelines and production services, ensuring scalability, security, and operational excellence across environments. Key Responsibilities Own and evolve CI/CD and automation platforms to support fast and reliable delivery of data-driven applications Design and manage cloud-native infrastructure supporting high-volume data ingestion, processing, and serving Build and maintain infrastructure as code to ensure consistency and scalability across environments Manage containerized environments and orchestration platforms to deliver resilient and scalable services Implement observability solutions (monitoring, logging, alerting) to ensure full system visibility and reliability Automate deployment processes, configuration management, and system recovery workflows Collaborate with engineering, data, and compliance teams to deliver secure and production-ready solutions Drive incident management practices and continuous improvement initiatives Contribute to platform strategy, tooling decisions, and mentoring within the team Requirements 5+ years of experience in DevOps, SRE, or infrastructure engineering roles Strong experience supporting production systems in distributed environments Hands-on experience with public cloud platforms (AWS or similar) Solid knowledge of containerization and orchestration technologies (Docker, Kubernetes) Experience with infrastructure as code tools (e.g., Terraform) Strong scripting/programming skills (Python, Bash, or similar) Experience building and maintaining CI/CD pipelines and automation tooling Knowledge of observability practices and tools Strong troubleshooting and incident response skills in Linux environments Excellent communication skills and ability to work cross-functionally Nice to Have Experience working with large-scale data platforms Exposure to regulated environments or compliance requirements Experience contributing to platform or engineering standards

Technology

Link Group

Senior Devops Engineer

Senior

Hybrid

Warsaw, Poland

28,000 - 38,000 PLN

🏢 Summary: Senior DevOps Engineer role focused on owning and evolving cloud-native infrastructure and CI/CD platforms supporting large-scale data processing systems. The position combines hands-on engineering with strategic platform development to ensure scalable, secure, and reliable production environments. You will design, automate, and maintain infrastructure and observability solutions across distributed systems. 🗂️ Requirements: 5+ years in DevOps, SRE, or infrastructure engineering, Experience supporting production systems in distributed environments, Hands-on experience with public cloud platforms (AWS or similar), Strong knowledge of Docker and Kubernetes, Experience with infrastructure as code tools (Terraform), Strong scripting/programming skills (Python or Bash), Experience building and maintaining CI/CD pipelines, Knowledge of observability, monitoring, and logging tools, Strong troubleshooting and incident response skills in Linux environments 📃 Skills: AWS, Docker, Kubernetes, Terraform, Python, Bash, Linux, CICD, Observability, Automation, Infrastructure, Cloud 🏢 Description: Senior DevOps Engineer We are looking for an experienced engineer to take ownership of our infrastructure and platform ecosystem, supporting large-scale data processing systems and enabling efficient, reliable software delivery. This role combines hands-on engineering with strategic impact — you will design, build, and evolve the platform that underpins data pipelines and production services, ensuring scalability, security, and operational excellence across environments. Key Responsibilities Own and evolve CI/CD and automation platforms to support fast and reliable delivery of data-driven applications Design and manage cloud-native infrastructure supporting high-volume data ingestion, processing, and serving Build and maintain infrastructure as code to ensure consistency and scalability across environments Manage containerized environments and orchestration platforms to deliver resilient and scalable services Implement observability solutions (monitoring, logging, alerting) to ensure full system visibility and reliability Automate deployment processes, configuration management, and system recovery workflows Collaborate with engineering, data, and compliance teams to deliver secure and production-ready solutions Drive incident management practices and continuous improvement initiatives Contribute to platform strategy, tooling decisions, and mentoring within the team Requirements 5+ years of experience in DevOps, SRE, or infrastructure engineering roles Strong experience supporting production systems in distributed environments Hands-on experience with public cloud platforms (AWS or similar) Solid knowledge of containerization and orchestration technologies (Docker, Kubernetes) Experience with infrastructure as code tools (e.g., Terraform) Strong scripting/programming skills (Python, Bash, or similar) Experience building and maintaining CI/CD pipelines and automation tooling Knowledge of observability practices and tools Strong troubleshooting and incident response skills in Linux environments Excellent communication skills and ability to work cross-functionally Nice to Have Experience working with large-scale data platforms Exposure to regulated environments or compliance requirements Experience contributing to platform or engineering standards

Link Group

Link Group is a company operating in the technology industry, focusing on the development of high-quality web applications. The company emphasizes creating intuitive user interfaces and robust backend services, particularly using technologies like React JS and Kotlin. Link Group values collaboration, as evidenced by its emphasis on teamwork with architects, designers, and cross-functional teams. The company is committed to improving user experience and interface design through iterative development and feedback, highlighting its dedication to innovation and quality in its products.

Check if your resume is ATS-ready before applying →Build an ATS-optimized resume