June 8, 2026
DevOps Engineer (Observability)
Senior • Hybrid
130 - 145 PLN
Warsaw, Poland
The Opportunity
Join a high-performing, international team of six DevOps experts. This is not a "maintenance-only" role. You will have a seat at the table in designing, building, and scaling our next-generation observability and logging solutions from the ground up.
We believe in "Attitude First." If you are an ambitious engineer who thrives on collaboration, knowledge sharing, and solving complex distributed systems challenges, we want to grow with you.
Key Responsibilities
Architect & Build: Design and implement end-to-end observability solutions, including metrics, logging, tracing, and advanced alerting.
Platform Excellence: Operate and optimize high-scale monitoring platforms (Prometheus, Mimir, Grafana) and ELK stack logging infrastructure.
Infrastructure as Code: Define and maintain all observability systems using Terraform and Terragrunt.
Reliability Engineering: Ensure the scalability and performance of our systems while supporting incident detection and root cause analysis (RCA).
Collaborate: Work across domains with a team that values mentoring, transparency, and collective problem-solving.
Your Technical Core
Observability Expert: Solid hands-on experience with Prometheus, Grafana, and scaling tools like Thanos or Mimir.
Logging Architect: Proven experience managing enterprise-grade logging platforms (ELK stack or Loki).
IaC Ninja: Strong proficiency in Terraform/Terragrunt to manage infrastructure.
Cloud Native: Deep understanding of Kubernetes and the complexities of metrics/logs/traces in distributed systems.
Language: Full proficiency in English for seamless global collaboration.
Stand Out From The Crowd (Nice to Have)
Coding: Ability to automate and integrate using Python or Go.
CI/CD: Exposure to GitHub Actions and automated workflows.
Configuration Management: Experience with Puppet.
SRE Mindset: Understanding of Service Level Indicators (SLIs), Objectives (SLOs), and Error Budgets.
Similar jobs you might like
Technology
emagine Polska
Site Reliability Engineer
Senior
Remote
Lisbon, Portugal
🏢 Summary: Hands-on Observability Engineer role focused on building and automating enterprise-grade monitoring and observability solutions across AWS-based cloud and distributed systems. The position centers on developing infrastructure as code, CI/CD pipelines, and monitoring ecosystems to improve reliability, performance, and incident response. Approximately 90% of the role involves coding in Python and Terraform. 🗂️ Requirements: Strong hands-on experience with AWS, Strong Python development and scripting experience, Strong experience with Terraform, Experience building and maintaining CI/CD pipelines using Jenkins, Experience with Elasticsearch and ELK Stack, Experience with Linux systems, Shell scripting skills, Understanding of monitoring, logging, and alerting concepts, Experience working in Agile or DevOps environments 📃 Skills: AWS, Python, Terraform, Jenkins, Elasticsearch, ELK, Linux, Bash, CI/CD, Kubernetes, Grafana, Prometheus, Datadog, NewRelic, Snowflake, Databricks, dbt, Matillion 🏢 Description: Role Overview We are looking for a skilled and proactive Observability Engineer to implement, automate, and support enterprise-grade observability and monitoring solutions across cloud and application platforms. The ideal candidate should have strong AWS infrastructure knowledge, hands-on automation skills, and experience building reliable monitoring and alerting ecosystems for modern distributed applications. The role involves working closely with Platform Engineering, Data Engineering, and Application teams to develop observability solutions and bring operational visibility, reliability, incident detection, and platform performance. Main Responsibilities · Design, implement, and maintain observability solutions for cloud-native and distributed systems. · Build monitoring, logging, alerting, and dashboarding solutions across infrastructure and applications. · Develop automation scripts and tooling using Python. · Implement and maintain Infrastructure as Code (IaC) using Terraform. · Build and support CI/CD pipelines using Jenkins and Git-based workflows. · Configure and optimize monitoring for AWS services, Kubernetes workloads, APIs, databases, and applications. · Create actionable alerts and operational dashboards to improve incident response and system reliability. · Work with engineering teams to onboard applications into observability platforms. · Support troubleshooting, root cause analysis, and performance optimization initiatives. · Ensure observability standards, governance, and best practices are followed across projects. Key Requirements · Strong hands-on experience with Amazon Web Services (AWS). · Solid Python development/scripting experience. · Strong experience with Terraform. · Experience building and maintaining CI/CD pipelines using Jenkins. · Elasticsearch / ELK Stack experience and building queries. · Worked with Data Platforms monitoring is preferred. · Experience with Linux systems and shell scripting. · Understanding of monitoring, logging, and alerting concepts. · Experience working in Agile/DevOps environments. Nice to Have Skills Experience with any of the following is highly desirable: · Snowflake · Databricks · dbt · Matillion · Grafana · New Relic · Datadog · Prometheus · Elasticsearch / ELK Stack experience NOTES: We are looking for an Engineer who loves to build. This is a highly technical role—90% of the job is hands-on coding in python and terraform.
Technology
emagine Polska
Observability Specialist
Senior
Hybrid
Warsaw, Poland
🏢 Summary: The offer is for an Observability Specialist responsible for designing, implementing, and maintaining a scalable telemetry and monitoring infrastructure in cloud-native environments. The role focuses on Kubernetes observability, Elastic Stack management, and performance optimization using modern telemetry standards. It involves driving SRE practices and ensuring high system reliability through advanced monitoring and AIOps solutions. 🗂️ Requirements: Experience monitoring Kubernetes (OpenShift) environments, Hands-on implementation of OpenTelemetry for logs, traces, and metrics, Strong expertise in ELK stack deployment and maintenance, Proficiency in automating Elastic environments using Ansible, Experience with Application Performance Monitoring for code-level analysis, Knowledge of shard optimization, mapping, and Index Lifecycle Management, Experience defining and monitoring SLOs and managing Error Budgets, Integration of observability solutions with major cloud providers 📃 Skills: Kubernetes, OpenShift, OpenTelemetry, Elasticsearch, Logstash, Kibana, Ansible, ElasticAPM, AIOps, SRE, ILM, Sharding, Mapping, Cloud 🏢 Description: Introduction & Summary We are seeking an experienced Observability Specialist dedicated to ensuring the reliability and performance of our systems. This role involves collaborating with enterprise architects and IT professionals to design, implement, and oversee a scalable telemetry infrastructure. The ideal candidate will possess deep expertise in ELK or similiar technologies and modern telemetry standards. Main Responsibilities As our Observability Engineer, your core duties will include: Architectural Collaboration: Partner with system architects and local engineering teams in Denmark to design resilient monitoring solutions. Monitor Kubernetes environments with OpenTelemetry (OTel) standards for logs, traces, and metrics. Manage centralized data collection and automate Elastic deployments using Ansible. Utilize Elastic APM for identifying code-level bottlenecks and resolving latency issues. Implement AIOps configurations for proactive anomaly detection and automated root-cause analysis. Drive Site Reliability Engineering (SRE) methodologies across teams. Elastic Stack Management: Deploy, scale, and maintain Elasticsearch, Logstash, and Kibana (ELK) environments. Key Requirements Cloud-Native Observability: Strong skills in monitoring Kubernetes (Openshift) environments and integrating with major cloud providers. APM & Distributed Tracing: Expertise in Application Performance Monitoring (APM) to identify code-level bottlenecks and latency issues. OpenTelemetry (OTel): Hands-on experience implementing OpenTelemetry (or similiar) standards for logs, traces, and metrics to ensure vendor-neutral telemetry. Infrastructure as Code (IaC): Proficiency in automating Elastic environments with Ansible. Performance Engineering: Expert-level knowledge of shard optimization, mapping, and Index Lifecycle Management (ILM) to balance high performance with cost control. SRE Methodology: Experience defining and monitoring Service Level Objectives (SLOs) and managing Error Budgets. Strong communication skills for collaboration with IT teams. NIce to Have: Elastic Stack Mastery: Deep expertise in architecting and managing Elasticsearch, Logstash, and Kibana (ELK) at scale. Data Ingestion & Fleet: Proven experience deploying Elastic Agent and Fleet for centralized agent management and data collection. AIOps & Machine Learning: Ability to configure Elastic ML models for proactive anomaly detection and automated root cause analysis. Other Details This is position based in Warsaw, flexible Hybrid model, focused on leading-edge observability solutions in a dynamic and collaborative environment.
Technology
Link Group
Network Observability & Automation Engineer
Senior
Hybrid
Warsaw, Poland
37,000 - 47,000 PLN
🏢 Summary: Engineering role focused on building and scaling observability and automation solutions for a global, high-performance trading network. The position involves designing telemetry, monitoring, and alerting systems while driving automation to improve reliability and operational efficiency. You will troubleshoot complex network environments and develop infrastructure tooling using Python and Infrastructure-as-Code practices. 🗂️ Requirements: Hands-on experience with network observability and automation, Practical experience with Prometheus, Grafana, Telegraf, Experience with gNMI, YANG, CiscoTelemetry, Experience with Zabbix, LibreNMS, SNMP environments, Strong Python scripting skills, Experience with Ansible, Jinja, Git, Experience with Terraform and Infrastructure-as-Code, Familiarity with NetBox, NETCONF, RESTCONF, Experience with AWS cloud networking, Strong knowledge of BGP, OSPF, PIM, IGMP, Proven troubleshooting skills in large-scale network environments 📃 Skills: Python, Ansible, Jinja, Git, Terraform, NetBox, NETCONF, RESTCONF, Prometheus, Grafana, Telegraf, Zabbix, LibreNMS, SNMP, gNMI, YANG, CiscoTelemetry, AWS, BGP, OSPF, PIM, IGMP, Telemetry, Observability, Automation 🏢 Description: Join a team building and evolving observability and automation capabilities for a global, high-performance multi-vendor trading network. This role is ideal for an engineer passionate about scalable monitoring, telemetry, automation, and operational excellence in complex network environments. What you’ll do: Build and enhance network observability platforms and automation frameworks Design scalable telemetry, monitoring, and alerting solutions across a global infrastructure Drive automation initiatives to improve network reliability, visibility, and operational efficiency Develop tooling and infrastructure integrations using Python and Infrastructure-as-Code principles Collaborate with technology and investment teams to deliver resilient business-critical solutions Troubleshoot complex network and observability issues in large-scale environments What we’re looking for: Strong hands-on experience with network observability and automation Practical expertise with modern observability stack: Prometheus, Grafana, Telegraf, telemetry technologies (gNMI, YANG, Cisco Telemetry) Experience with monitoring platforms such as Zabbix, LibreNMS, and SNMP-based environments Strong scripting and automation skills using Python, Ansible, Jinja, and Git Familiarity with automation and infrastructure tooling including Terraform, NetBox, NETCONF, and RESTCONF Experience supporting cloud networking environments (AWS preferred) Good understanding of networking fundamentals and protocols such as BGP, OSPF, PIM, and IGMP Proven troubleshooting skills within complex, large-scale infrastructure environments Who you are: Passionate about observability, automation, and scalable infrastructure Proactive, self-driven, and eager to continuously improve systems and processes Comfortable operating in fast-paced, mission-critical environments Strong communicator and collaborative problem solver
Technology
Spyrosoft
DevOps Engineer (Senior)
Senior
Remote
Krakow, Poland
110 - 200 PLN
🏢 Summary: The offer is for a Cloud Infrastructure Specialist responsible for ensuring production stability, secure cloud networking, and scalable infrastructure within AWS and Azure environments. The role focuses on hands-on infrastructure as code, observability, and CI/CD optimization while closely collaborating with a development team. It emphasizes autonomy and direct impact on production reliability rather than ticket-based support. 🗂️ Requirements: Proven experience maintaining production-grade environments, Hands-on experience with AWS services (Lambda, API Gateway, DynamoDB, RDS, S3, SNS, SQS, EC2, ECS, WAF, VPC, Route53, ALB/NLB, Cognito, IAM), Hands-on experience with Azure services, Strong practical experience with cloud networking (VPC/VNet, subnetting, routing, peering, NAT, security groups, firewalls), Hands-on experience with Datadog for monitoring, logging, alerting, In-depth commercial experience with AWS and Azure, Proficiency with Terraform or AWS CDK, Experience building and maintaining CI/CD pipelines using GitLab CI/CD, Ability to support automated deployments and infrastructure changes 📃 Skills: AWS, Azure, Terraform, CDK, TypeScript, Datadog, Prometheus, Grafana, Loki, Kubernetes, AKS, Lambda, APIGateway, DynamoDB, S3, SNS, SQS, EC2, ECS, WAF, VPC, Route53, ALB, NLB, Cognito, IAM, RDS, Redshift, BlobStorage, PostgreSQL, GitLabCI, AzureDevOps, RabbitMQ, InfluxDB 🏢 Description: You will join a substantial project as a key infrastructure specialist. You won't be managing a ticket queue; instead, you will partner directly with a mid-sized team of developers (~15 people) to ensure system stability and scalability. We are looking for someone who acts independently and is ready to ensure production reliability and secure cloud network configuration. Our Tech Stack You are not expected to know everything upfront, but this is the environment you will work with: IaC: Terraform, AWS CDK (TypeScript) Clouds: AWS & Azure Observability: Datadog, Prometheus, Grafana, Loki Core Azure: Kubernetes (AKS), Blob Storage, PostgreSQL Core AWS: Lambda, API Gateway, DynamoDB, S3, SNS, SQS, EC2, WAF, VPC CI/CD: GitLab CI, Azure DevOps Other: RabbitMQ, InfluxDB, Renovate Requirements: Production Experience: proven track record of configuring and maintaining production-grade environments. Hands-on experience with AWS (Lambda, API Gateway, DynamoDB, Redshift, RDS, S3, SNS, SQS, EC2, ECS, WAF, VPC, Route53, ALB/NLB, Cognito, IaM) Hands-on experience with Azure. Observability & monitoring: hands‑on experience with Datadog for monitoring, logging, alerting and performance analysis in production environments. Cloud Networking: strong practical experience in configuring Cloud Networks (VPC/VNet, Subnetting, Routing, Peering, NAT Gateways, Security Groups/Firewalls). Cloud Expertise: in-depth knowledge and commercial experience with Azure and AWS. Tooling: proficiency with Terraform or AWS CDK. Practical experience in building and maintaining CI/CD pipelines using GitLab CI/CD, supporting automated deployments and infrastructure changes. High autonomy and ability to communicate technical concepts to a cross-functional team. Nice to have: Experience with TypeScript , especially in AWS CDK or serverless applications. Main responsibilities: Production Stability: Maintain high availability and security of production environments. Cloud Networking: Configure and manage VPCs/VNets, subnets, routing tables, peering, and network isolation. Infrastructure as Code: Provision and manage resources using Terraform or AWS CDK. Developer Support: Optimize CI/CD pipelines and assist developers in understanding infrastructure constraints. Observability: Maintain monitoring stacks to ensure full system visibility.
Technology
Andersen
DevOps Engineer (AWS)
Senior
Remote
Warsaw, MZ, Poland
3,400 - 5,200 EUR
🏢 Summary: The offer is for a DevOps Engineer (AWS) to design, build, and operate a centralized observability and monitoring platform in AWS, migrating capabilities to EKS with a Grafana-based stack. The role focuses on Infrastructure as Code, GitOps practices, CI/CD automation, and ensuring scalable, reliable, and cost-efficient cloud infrastructure. It involves hands-on work with AWS, Kubernetes, Terraform, and production-grade monitoring solutions. 🗂️ Requirements: 4+ years of experience as DevOps Engineer or similar role, Strong hands-on experience with AWS (VPC, networking, load balancers, IAM, security), Solid experience with Kubernetes (EKS) and cluster operations, Strong Terraform expertise (modules, remote state, environment separation), Hands-on experience with Argo CD and Helm, Experience with CI/CD pipelines (GitHub Actions or GitLab CI), Experience with Prometheus and Grafana, Solid Linux knowledge, Hands-on experience with Docker, Understanding of cloud networking and CIDR planning, Experience troubleshooting production incidents and root cause analysis, Upper-Intermediate English or higher 📃 Skills: AWS, EKS, Kubernetes, Terraform, ArgoCD, Helm, GitHubActions, GitLabCI, Prometheus, Grafana, Linux, Docker, VPC, IAM, CIDR 🏢 Description: Andersen is hiring a DevOps Engineer (AWS) in the EU for a project building scalable observability solutions and modern monitoring infrastructure in AWS. The customer is a next-generation technology company delivering innovative digital solutions for the online gaming industry, including gaming platforms, CRM systems, payment solutions, and content aggregation. The company focuses on building modular, high-performance ecosystems for regulated markets, combining advanced technologies with deep domain expertise to support scalable global growth. The project is focused on refactoring and centralizing the observability platform by migrating observability capabilities to AWS EKS with a Grafana-based stack. It involves designing and implementing the target architecture using Infrastructure as Code and a GitOps approach to ensure scalable, maintainable, and reliable monitoring solutions. Responsibilities: Designing, building, and operating AWS-based cloud infrastructure in hybrid environments. Deploying, maintaining, and troubleshooting Kubernetes (EKS) clusters and application workloads. Ensuring infrastructure reliability, security, and cost efficiency. Managing infrastructure using Terraform with a modular structure and remote state. Implementing and supporting GitOps workflows using Argo CD and Helm. Building and maintaining CI/CD pipelines for infrastructure and applications. Investigating and resolving production incidents across cloud, Kubernetes, and CI/CD layers. Implementing autoscaling strategies at pod and cluster levels. Building and supporting monitoring and logging solutions. Collaborating with platform, SRE, and architecture teams. Must-haves: Proven experience working as a DevOps Engineer, DevOps Architect, or similar role for 4+ years. Strong hands-on experience with AWS, including VPC, networking, load balancers, IAM, and security fundamentals. Solid experience with Kubernetes (EKS), including cluster operations and production troubleshooting. Strong Terraform expertise, including modules, remote state, and environment separation. Hands-on experience with Argo CD and Helm. Experience with CI/CD pipelines using GitHub Actions or GitLab CI. Experience with monitoring and observability tools such as Prometheus and Grafana. Solid Linux fundamentals and hands-on experience with containers (Docker). Understanding of cloud networking concepts, including CIDR planning. Experience troubleshooting production incidents and performing root cause analysis. Cost-awareness and experience optimizing cloud infrastructure usage. Strong communication skills and ownership mindset. Level of English – from Intermediate+ and above. Nice-to-haves: Bachelor's degree in Computer Science, Engineering, or a related field. Experience with Jenkins. Experience with autoscaling technologies such as Karpenter or KEDA. Experience with log aggregation tools such as Loki or ELK stack. Experience working in enterprise or regulated environments. Reasons why this job would be interesting to you: Experience in teamwork with leaders in FinTech, Healthcare, Retail, Telecom, and others. Andersen cooperates with such businesses as Samsung, Siemens, Johnson & Johnson, BNP Paribas, Ryanair, Mercedes, TUI, Verivox, Allianz, T-Systems, etc.. The opportunity to change the project and/or develop expertise in an interesting business domain. Job conditions – you can work both fully remotely and from the office or can choose a hybrid variant. Guarantee of professional, financial, and career growth! The company has introduced systems of mentoring and adaptation for each new employee. The opportunity to earn up to an additional 1,000 USD per month, depending on the level of expertise, which will be included in the annual bonus, by participating in the company's activities. Access to the corporate training portal, where the entire knowledge base of the company is collected and which is constantly updated. Bright corporate life (parties / pizza days / PlayStation / fruits / coffee / snacks / movies). Certification compensation (AWS, PMP, etc). Referral program. Private health insurance and compensation for sports activities. Your personal data is protected in accordance with GDPR regulations. Learn more: https://andersenlab.com/privacy-policy/pl Join us! https://people.andersenlab.com/
Technology
N-iX
Middle DevOps Engineer (#5068)
Mid
Remote
Krakow, Poland
5,000 - 5,500 USD
🏢 Summary: DevOps Engineer role focused on building, scaling, and securing cloud infrastructure while enabling efficient CI/CD workflows. The position involves managing Kubernetes-based environments on AWS, optimizing automation, and ensuring high availability and performance of systems. The role also includes infrastructure as code, monitoring, database management, and secure authentication integration. 🗂️ Requirements: BA/BS in technical field or equivalent experience, 5+ years in DevOps, SRE, or Infrastructure Engineering, Strong experience with Kubernetes, Deep knowledge of AWS core services, Experience with containerization technologies, Strong understanding of networking concepts, Proficiency with infrastructure-as-code tools, Experience with monitoring tools, Strong scripting or programming skills, Solid understanding of system security best practices 📃 Skills: Kubernetes, AWS, EC2, S3, IAM, RDS, Docker, Helm, Terraform, Pulumi, CloudFormation, Prometheus, Grafana, CloudWatch, Python, Bash, Go, PostgreSQL, SAML, OAuth2, OIDC, ELK, Loki, FluentBit, GitHubActions, Jenkins, CircleCI, Ansible, EKS, GKE 🏢 Description: We are looking for a skilled and driven DevOps Engineer to join our growing team. In this role, you will take ownership of building, maintaining, and scaling the infrastructure that powers our platform. You will ensure our systems are secure, performant, and highly available, while enabling seamless development and deployment workflows. Responsibilities: Design, implement, and manage scalable infrastructure using Kubernetes and AWS. Optimize CI/CD pipelines to improve build and deployment times and reduce friction. Monitor and troubleshoot infrastructure performance and availability. Manage and maintain relational databases, primarily PostgreSQL. Implement and support secure authentication systems using SSO protocols (e.g., SAML, OIDC, OAuth2). Enhance infrastructure as code using tools like Terraform and Ansible. Ensure security best practices are applied across all infrastructure components. Collaborate cross-functionally with development, QA, and product teams. Drive automation of operational tasks to increase team efficiency and reduce manual toil. Required Skills: BA/BS in a technical or engineering discipline or equivalent experience 5+ years of experience in a DevOps, SRE, or Infrastructure Engineering role. Strong experience with Kubernetes (EKS, GKE, or self-managed). Deep knowledge of AWS core services (EC2, S3, IAM, RDS, etc.). Knowledge of containerization technologies (e.g., Docker, Kubernetes, Helm) Solid understanding of networking concepts (VPCs, subnets, routing, firewalls, DNS). Proficiency with infrastructure-as-code tools (Terraform, Pulumi, or CloudFormation). Comfortable with monitoring tools (Prometheus, Grafana, CloudWatch, etc.). Strong scripting or programming ability (Python, Bash, or Go). Solid understanding of system security and best practices. Preferred Skills: Familiarity with SSO protocols such as SAML, OAuth2, and OpenID Connect. Experience managing and tuning PostgreSQL in production environments. Exposure to log aggregation tools (ELK, Loki, or Fluent Bit). Experience with CI/CD tools like GitHub Actions, Jenkins, or CircleCI. We offer*: Flexible working format - remote, office-based or flexible A competitive salary and good compensation package Personalized career growth Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more) Active tech communities with regular knowledge sharing Education reimbursement Memorable anniversary presents Corporate events and team buildings Other location-specific benefits
Technology
Spyrosoft
Senior Platform Optimization & Observability Engineer
Senior
Remote
Wroclaw, Poland
150 - 200 PLN
🏢 Summary: The offer is for a technical role responsible for owning platform health, optimization, and the observability stack within a complex enterprise environment. The position focuses on improving virtualization and storage performance, enhancing security posture, optimizing disaster recovery, and migrating monitoring and security capabilities to a new ELK-based observability platform. The role combines deep operational work with migration and optimization initiatives across infrastructure and monitoring systems. 🗂️ Requirements: Hands-on experience optimizing virtualization platforms, Strong storage performance and capacity optimization skills, Experience with platform security hardening, Deep operational experience with ELK stack, Experience migrating dashboards, queries, and reports from Log Analytics, Strong understanding of disaster recovery optimization and recovery metrics 📃 Skills: ELK, APM, VMware, Hyper-V, KVM, Proxmox, LogAnalytics, Monitoring, Alerting, Virtualization, Storage, Security, DisasterRecovery, Azure 🏢 Description: Tech stack: ELK stack (observability, APM, security) VMware, Hyper‑V, KVM / Proxmox Log Analytics (migration source) Monitoring and alerting platforms Requirements: Hands‑on experience optimizing virtualization platforms Strong storage performance and capacity optimization skills Experience with platform security hardening Deep operational experience with ELK stack Experience migrating dashboards, queries, and reports from Log Analytics Strong understanding of DR optimization and recovery metrics Nice to have: Compliance scanning tools (CIS) SOC 1 / SOC 2 / C5 familiarity Sentinel rule migration experience Experienced in using AI tools in day-to-day workflow Project description: You will own platform health, optimization, and the observability stack in a complex enterprise environment. The project focuses on improving platform performance, security posture, DR effectiveness, and migrating monitoring and security capabilities to a new observability platform. Main responsibilities: Optimize virtualization and storage platforms Expand observability with APM and security capabilities Migrate monitoring and security assets from Azure tooling Optimize logging, alerting, and retention strategies Review and improve DR and firewall configurations Collaborate with network and security engineers
Technology
ITDS
Senior SRE/DevOps Technical Lead – Observability and Automation
Senior
Hybrid
Krakow, Poland
25,200 - 29,820 PLN
🏢 Summary: Senior SRE/DevOps Technical Lead role focused on building and operating advanced SRE and observability platforms in a regulated banking environment. The position drives automation, reliability, and performance of highly available systems while leading CI/CD and monitoring initiatives. It combines hands-on technical expertise with leadership of international engineering teams. 🗂️ Requirements: 8+ years of experience in SRE, DevOps, or similar roles, Strong automation and scripting experience, Proficiency in CI/CD pipelines, Experience with observability and monitoring tools, Experience maintaining highly available, low-latency systems, Experience in regulated industries (banking, fintech, insurance), Ability to work from Krakow office at least 6 days per month, Fluent English, Legal right to work in Europe 📃 Skills: Python, Go, Bash, CI/CD, AppDynamics, Grafana, Splunk, OpenTelemetry, SRE, DevOps 🏢 Description: Empower uptime and reliability — lead the next wave of observability and automation excellence! Krakow-based opportunity with hybrid work model, allowing up to 3 remote days per week. As a Senior SRE/DevOps Technical Lead , you will be working for our client, a global leader in the banking and financial services industry. You will spearhead efforts to build and operate cutting-edge SRE and observability platform solutions, ensuring system reliability, automation, and performance across a highly regulated environment. This role offers a pivotal leadership position that drives innovation and engineering ownership within a diverse international team. Your main responsibilities: Lead and develop the delivery capability for SRE/observability platform solutions, fostering excellence in automation, reliability, and monitoring. Build and maintain highly available, low-latency systems aligned with banking industry standards. Drive automation and scripting initiatives utilizing Python, Go, Bash, and other technologies. Manage and optimize CI/CD pipelines and observability/monitoring stacks such as AppDynamics, Grafana, Splunk, and OpenTelemetry. Ensure optimal system performance and reliability across global operations. Collaborate effectively with international teams across different time zones. Provide technical leadership, mentorship, and guidance to team members. You're ideal for this role if you have: 8+ years of experience in SRE, DevOps, or similar leadership roles. Strong automation and scripting skills (Python, Go, Bash, etc.). Proficiency in CI/CD pipelines and observability/monitoring tools. Proven experience maintaining highly available, low-latency systems in regulated industries such as banking, fintech, or insurance. Ability to work in the Krakow office at least 6 days per month. Fluent English communication skills for global team collaboration. It is a strong plus if you have: (optional) Certifications related to DevOps, SRE, or cloud platforms. Language Required for the role: Fluent English Eligibility for the role: Only candidates with an existing legal right to work in Europe will be considered for this role. #MAKEYourCareerBETTER Interested? Apply now and include your CV (preferably in English) along with a statement confirming your consent to the processing and storage of your personal data.
Technology
Link Group
Senior Azure DevOps Engineer
Senior
Remote
Bialystok, Poland
140 - 155 PLN
🏢 Summary: Design, deploy, and maintain high-availability Azure cloud environments with a strong focus on AKS and Infrastructure as Code using Terraform. The role centers on secure, scalable, and well-monitored Azure infrastructure, including networking, identity, databases, and disaster recovery. You will drive automation and operational excellence across the Azure ecosystem. 🗂️ Requirements: Experience managing Azure Kubernetes Service (AKS) clusters, Proficiency in Infrastructure as Code using Terraform, Experience with YAML and Helm for Kubernetes deployments, Administration of Azure networking components (VNETs, NSGs), Management of Azure VMs, Storage Accounts, and ACR, Implementation of Azure AD (Entra ID) and IAM policies, Administration of Azure SQL and SQL Server environments, Configuration of monitoring with Azure Monitor, App Insights, and Log Analytics, Design and implementation of disaster recovery and backup strategies 📃 Skills: Azure, AKS, Kubernetes, Terraform, YAML, Helm, VNET, NSG, ACR, AzureAD, IAM, AzureSQL, SQLServer, AzureMonitor, AppInsights, LogAnalytics, Velero 🏢 Description: Role Overview We are looking for a highly skilled Azure Cloud & Platform Engineer to join our infrastructure team. In this role, you will be responsible for designing, deploying, and maintaining high-availability cloud environments with a heavy focus on container orchestration ( AKS ) and Infrastructure as Code ( Terraform ). You will ensure that our Azure ecosystem is secure, scalable, and monitored to the highest standards. Key Responsibilities Kubernetes Orchestration: Manage and optimize Azure Kubernetes Services (AKS) , including cluster configuration, scaling, and lifecycle management. Infrastructure as Code (IaC): Develop and maintain automated infrastructure deployments using Terraform , YAML , and Helm charts. Cloud Administration: Oversee core Azure resources including Networking (VNETs, NSGs), Storage Accounts, Azure VMs, and Container Registries (ACR). Security & Identity: Implement and manage Azure Active Directory (Azure AD/Entra ID) and Identity & Access Management (IAM) policies to ensure a "least privilege" environment. Database Management: Administer Azure SQL environments, including SQL Server, individual databases, and Elastic Pools. Observability & Monitoring: Set up and maintain robust monitoring solutions using Azure Monitor, App Insights, and Log Analytics . Disaster Recovery: Design and implement Disaster Recovery (DR) mechanisms and backup strategies (e.g., using Velero ). Technical Documentation: Create and maintain comprehensive documentation for system configurations, architecture setups, and operational procedures. Preferred Skills Experience with Velero for Kubernetes backups. Knowledge of the ELK Stack (ElasticSearch, Logstash, Kibana). Experience with Open Source monitoring tools: Prometheus, Grafana, and Loki . Familiarity with Ansible for configuration management. Exposure to Apache Kafka messaging systems. Candidate Profile The ideal candidate is a proactive engineer who prioritizes automation over manual intervention. You should be comfortable working in a fast-paced environment, taking ownership of cloud resources, and ensuring that all solutions are documented and resilient. Your approach should combine technical depth in Azure with a broader understanding of DevOps best practices.
Technology
Caspian One
Site Reliability Engineer
Senior
Hybrid
Krakow, Poland
1,400 - 1,800 PLN
🏢 Summary: Hands-on Site Reliability Engineer role focused on ensuring stability, scalability, and observability of a mission-critical distributed risk and analytics platform in hybrid cloud environments. The position centers on production reliability, incident response, automation, and continuous improvement of monitoring and deployment processes. You will collaborate with engineering teams to strengthen system resilience, performance, and operational standards. 🗂️ Requirements: Strong Java experience in distributed systems, Experience with observability and monitoring tools, Hands-on experience with hybrid cloud environments (preferably GCP), Experience with CI/CD pipelines and automation tools, Solid knowledge of Linux systems administration, Understanding of RDBMS fundamentals, Experience with job schedulers (e.g., Control-M), Ability to lead incident response and root-cause analysis 📃 Skills: Java, Grafana, Prometheus, Loki, OpenTelemetry, GCP, Jenkins, Ansible, Linux, SQL, Control-M, CI/CD 🏢 Description: We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure. In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance. This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time. Core Responsibilities Maintain and improve the reliability, uptime, and performance of distributed applications. Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews. Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks. Drive continuous improvement across automation, deployment processes, and service stability. Collaborate with cross‑functional teams to influence architecture, design, and operational standards. Support CI/CD pipelines, environment configuration, and vulnerability remediation. Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption. Required Skills & Experience Strong Java background with proven experience supporting or developing distributed systems. Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar). Hands‑on with hybrid cloud environments , ideally with GCP or another major cloud provider. CI/CD and automation experience (e.g., Jenkins, Ansible). Solid understanding of Linux , RDBMS fundamentals , and job schedulers (e.g., Control‑M or equivalents). Strong analytical mindset with a methodical approach to troubleshooting. Excellent communication skills and comfort working in Agile teams.