June 8, 2026

Senior IT Operations Specialist (Production Support & Release Management)

Senior • Hybrid

115 - 130 PLN

Warsaw, Poland

Responsibilities:

  • Support Release Management processes and software deployments to production environments

  • Monitor, investigate, troubleshoot and resolve issues in production systems

  • Act as L2 support for incident handling, ensuring timely resolution and escalation when needed

  • Perform incident management, root cause analysis and contribute to preventive improvements

  • Support operational stability, system reliability and service availability across distributed platforms

  • Collaborate with development, infrastructure and support teams to improve production readiness and operational excellence

  • Contribute to monitoring, alerting and observability practices to proactively identify risks and performance issues

  • Support secure communication and authentication mechanisms across systems and services

 

Requirements:

  • 5+ years of experience in IT Operations, Production Support, DevOps or Site Reliability-related roles

  • Hands-on experience with Docker and Kubernetes

  • Experience supporting CI/CD processes using Jenkins

  • Knowledge of version control platforms such as GitHub or Bitbucket

  • Experience working with Kafka in distributed environments

  • Good understanding of SQL and troubleshooting data-related issues

  • Knowledge of microservices architecture and distributed systems support

  • Experience with authentication and authorization technologies (LDAP, Kerberos, JWT)

  • Understanding of cryptographic tools and certificate handling (OpenSSL, keytool)

  • Knowledge of secure communication protocols, including HTTPS

  • Hands-on experience with logging and observability tools, including Splunk

  • Experience with monitoring and alerting tools such as Prometheus, InfluxDB and Grafana

  • Strong analytical and troubleshooting skills, particularly in incident and problem management

  • Good interpersonal and communication skills, with ability to work effectively across teams

  • Fluency in English (both written and spoken)

Nice to have:

  • Experience with PostgreSQL administration or support

  • Familiarity with enterprise-grade production environments in regulated or large-scale organizations

  • Exposure to release automation and operational process improvements 

Offer:

  • Private medical care

  • Co-financing for the sports card

  • Constant support of dedicated consultant

  • Employee referral program

Similar jobs you might like

Technology

B2Bnetwork

IT Operations Specialist

Mid

Hybrid

Warsaw, Poland

🏢 Summary: IT Operations Specialist role focused on supporting production systems, ensuring stable and secure software releases, and handling incident management in collaboration with development and release teams. The position involves monitoring, troubleshooting, and maintaining microservices-based environments using modern DevOps and containerization technologies. 🗂️ Requirements: Experience with Docker, Experience with Kubernetes, Experience with CI/CD tools (Jenkins), Experience with GitHub or Bitbucket, Hands-on experience with Kafka, Basic knowledge of SQL, Knowledge of LDAP, Kerberos, JWT, Knowledge of OpenSSL and keytool, Understanding of HTTPS, Experience with Splunk, Experience with Prometheus, InfluxDB, or Grafana, Understanding of microservices architecture 📃 Skills: Docker, Kubernetes, Jenkins, GitHub, Bitbucket, Kafka, SQL, LDAP, Kerberos, JWT, OpenSSL, keytool, HTTPS, Splunk, Prometheus, InfluxDB, Grafana, Microservices, PostgreSQL 🏢 Description: Job Description: We are looking for an IT Operations Specialist to support the maintenance and delivery of systems in a production environment. The role involves close collaboration with development and release teams to ensure stable and secure system operations. Responsibilities: Supporting Release Management in delivering software to the production environment Monitoring, investigating, and resolving issues in production systems Handling incidents and performing root cause analysis (RCA) Acting as L2 support for incident management Requirements (must-have): Experience with Docker and Kubernetes Familiarity with CI/CD tools such as Jenkins Experience with code repositories: GitHub or Bitbucket Hands-on experience with Kafka Basic understanding of SQL Knowledge of authentication and authorization mechanisms (LDAP, Kerberos, JWT) Understanding of cryptography tools (OpenSSL, keytool) Knowledge of secure communication protocols (HTTPS) Experience with logging tools (Splunk) Familiarity with monitoring and alerting tools (Prometheus, InfluxDB, Grafana) Understanding of microservices architecture Good interpersonal and communication skills Nice to have: Experience with PostgreSQL What we offer: Opportunity to work with modern technologies Involvement in large-scale, impactful projects Professional growth and development opportunities Collaboration within an experienced and supportive team

Technology

Link Group

Senior Cloud Developer

Senior

Hybrid

Warsaw, Poland

130 - 170 PLN

🏢 Summary: The offer is for a senior Cloud & DevOps professional responsible for designing and deploying scalable cloud-native applications and critical infrastructure. The role focuses on driving CI/CD automation, improving cloud architecture, and ensuring security and quality standards across delivery pipelines. It also involves mentoring teams and leading cloud adoption initiatives in an Agile environment. 🗂️ Requirements: Minimum 5 years experience in cloud-focused or DevOps role, Hands-on experience with Kubernetes, Docker, Terraform, Experience with at least one cloud provider (AWS/Azure/GCP/IBM), Strong knowledge of Jenkins, Ansible, ArgoCD, GitLab, Ability to design and manage CI/CD pipelines, Understanding of RESTful APIs, data quality, security protocols, Experience working with Agile methodologies (SCRUM/Kanban), Familiarity with JIRA and Confluence, Professional English communication skills 📃 Skills: Kubernetes, Docker, Terraform, AWS, Azure, GCP, IBM, Jenkins, Ansible, ArgoCD, GitLab, REST, SQL, PostgreSQL, SQLServer, Python, JavaScript, SpringBoot, Nuxt.js, Vue, JIRA, Confluence 🏢 Description: Responsibilities Design, deploy, and assess scalable native cloud applications and critical infrastructure components. Drive the evolution of CI/CD processes and build/deployment pipelines using Jenkins, Docker, Kubernetes, and GitLab . Consult and collaborate with cross-functional IT teams on cloud architecture, optimization, and security standards. Lead the implementation of test strategies and best practices to ensure high-quality delivery within automated pipelines. Support and mentor delivery teams in adopting new cloud initiatives, ensuring alignment with company security policies. Requirements Minimum 5 years of experience in a similar cloud-focused role with a strong problem-solving mindset. Expertise in Cloud & DevOps tools : Hands-on experience with Kubernetes, Docker, Terraform , and at least one major provider (AWS/Azure/GCP/IBM). Advanced CI/CD proficiency : Strong knowledge of Jenkins, Ansible, ArgoCD , and GitLab version control. Deep understanding of IT Architecture : Proven ability to handle RESTful APIs, Data Quality, and Security protocols. Strong Communication & Leadership : Ability to engage with stakeholders (IT, Business, Clients) in English (written and oral). Agile Mindset : Familiarity with SCRUM/Kanban and tools like JIRA/Confluence. Technical Versatility (Nice to have) : Proficiency in Python, JavaScript and experience with frameworks like SpringBoot or Nuxt.js/VUE . Data Management : Basic knowledge of SQL Server or PostgreSQL.

Technology

Creotech

DevOps

Mid

On-site

Warsaw, Poland

🏢 Summary: The offer is for a DevOps Engineer responsible for designing, automating, and maintaining CI/CD pipelines and infrastructure for flight software and related systems. The role focuses on infrastructure as code, container orchestration, observability, and operational reliability across on-premise and cloud environments. It involves supporting development teams and ensuring secure, reproducible, and resilient delivery processes in mission-critical projects. 🗂️ Requirements: Minimum 3 years in DevOps or related role, Hands-on experience with GitLab CI/CD or Jenkins, Production experience with Docker and Kubernetes, Experience with Terraform and Ansible for infrastructure automation, Knowledge of Prometheus, Grafana, Loki or OpenTelemetry, Ability to troubleshoot distributed systems performance and stability issues, Experience with Linux systems, Experience with artifact repositories such as Nexus, Ability to work with technical documentation in English 📃 Skills: GitLab, Jenkins, Docker, Kubernetes, Terraform, Ansible, Prometheus, Grafana, Loki, OpenTelemetry, Linux, Nexus, Git, CI/CD 🏢 Description: Tasks: Build and maintain CI/CD pipelines in GitLab CI/CD and Jenkins for flight software, ground systems, and simulation tools; Ensure build reproducibility, versioning, and artifact archiving in Git and Nexus; Maintain and automate test environments (including hardware-in-the-loop), on-premise, cloud, and hybrid; Develop and maintain Infrastructure as Code using Terraform and Ansible; Manage the container environment (Docker, image registries) and Kubernetes clusters; Configure and enhance monitoring, logging, and telemetry (Prometheus, Grafana, Loki, OpenTelemetry); Collaborate with the security team on hardening, access controls, and auditability; Participate in incident handling, root cause analysis, and improving operational readiness; Support development teams by providing tools, templates, and delivery process automation. Requirements: At least 3 years of experience in a DevOps or related role; Hands-on experience with GitLab CI/CD and/or Jenkins; Experience with Docker and Kubernetes in production environments; Experience in infrastructure automation using Terraform and Ansible; Knowledge of observability and monitoring solutions (Prometheus, Grafana, Loki, OpenTelemetry); Ability to diagnose performance and stability issues in distributed environments; Experience working with Linux systems and artifact repositories (e.g., Nexus); Strong teamwork and technical communication skills; Good command of English for working with technical documentation. Nice to have: Experience in mission-critical environments (space, aerospace, telco, automotive, defense); Knowledge of hardware-in-the-loop concepts and embedded systems integration; Experience with hybrid clusters and multi-environment deployment automation; Knowledge of SRE practices, incident response, and postmortem analysis; Experience implementing security policies in CI/CD pipelines. We offer: Participation in high-responsibility technology projects. Real influence on the architecture of DevOps processes and engineering environments. Work with a modern tool stack, offering a high level of autonomy and ownership. Opportunities to develop expertise in critical systems and space technologies. Stable employment in a modern company with a well-established market position. A friendly, collaborative work environment and a well-coordinated team. An attractive salary and benefits package (including private medical care and a sports card).

Technology

TQLO SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ

Senior Platform Engineer (Kubernetes)

Senior

Hybrid

Warsaw, Poland

140 - 180 PLN

🏢 Summary: The role focuses on building, automating, and optimizing cloud infrastructure and CI/CD processes in AWS or GCP environments. You will work hands-on with Kubernetes and modern DevOps tools to ensure scalable, secure, and reliable software delivery. The position involves close collaboration with engineering teams to enhance infrastructure and deployment practices. 🗂️ Requirements: Proven experience in Infrastructure Engineering, DevOps, SRE, or Backend Development, Strong expertise in Kubernetes (cluster management and application deployment), Strong knowledge of CI/CD principles, Practical experience with CI/CD tools, Willingness to work with GitHub Actions, Experience with cloud platforms (AWS or GCP) 📃 Skills: Kubernetes, Terraform, Helm, ArgoCD, AWS, GCP, GitHubActions, CICD 🏢 Description: Responsibilities: ● Work hands-on with modern tools such as Kubernetes, Terraform, Helm, and ArgoCD. ● Develop and continuously enhance internal cloud infrastructure across AWS or GCP ● Design, maintain, and optimize CI/CD pipelines using GitHub Actions to ensure reliable and efficient software delivery. ● Collaborate closely with engineering teams to understand their requirements, resolve issues, and drive automation and scalability initiatives. ● Advocate for and implement best practices in infrastructure, deployment processes, and cloud security across the organization. Requirements: ● Proven experience in infrastructure engineering, DevOps, Site Reliability Engineering (SRE), or backend development. ● In-depth expertise in Kubernetes—including both cluster management and application deployment. ● Strong understanding of CI/CD principles and practical experience with any CI/CD tools (willingness to work with GitHub Actions is essential). ● Proactive and ownership-driven mindset with strong problem-solving abilities. ● Excellent communication skills and a collaborative approach in cross-functional teams. Nice to Have: ● Experience managing OpenShift environments. ● Familiarity with Java and Gradle build systems. ● Awareness of security best practices in cloud-native and containerized environments. Working mode: ● One day per week in the office (Warsaw center) ● Additional 20 days of holidays per year

Technology

Toro Performance Sp. z o.o.

DevOps

Mid

Remote

🏢 Summary: Fully remote DevOps role focused on streamlining development and operations through automation, CI/CD pipelines, and infrastructure as code. The position involves improving system reliability, managing containerized environments, and optimizing application and infrastructure performance. You will collaborate with development teams to implement DevOps best practices and scalable deployment solutions. 🗂️ Requirements: Experience with continuous integration and continuous delivery, Strong knowledge of containerization technologies, Experience with Infrastructure as Code (IaC), Experience with automation of processes, Experience with GitHub and GitHub Actions, Ability to monitor and troubleshoot infrastructure and applications 📃 Skills: CI/CD, Docker, Kubernetes, IaC, Automation, GitHub, GitHubActions, Grafana, Dynatrace 🏢 Description: Location: fully remote Tasks: Collaborate with cross-functional teams to streamline development and operations processes. Implement and manage continuous integration and delivery pipelines for efficient software delivery. Enhance system reliability through automation of repetitive tasks and proactive identification of potential issues. Work closely with development teams to integrate DevOps best practices into the software development lifecycle. Monitor, troubleshoot, and optimize infrastructure and application performance. Implement and manage containerization and orchestration tools for scalable and efficient deployment. Collaborate on the design and implementation of infrastructure as code (IaC) solutions. Skills: Knowledge of continuous delivery and deployment. Strong background in containerization technologies. Experience with infrastructure as code (IaC) and automation. Ability to collaborate effectively in cross-functional teams. Experience in GitHub and GitHub Actions. Ideally Grafana, Dynatrace.

Technology

DCG

Senior DevOps Engineer (AI & Platform Operations)

Senior

Hybrid

Warsaw, Poland

120 - 125 PLN

🏢 Summary: The role focuses on ensuring stable and reliable AI and platform operations in production environments, with ownership of incident management, monitoring, and deployment oversight. The engineer works closely with development teams to diagnose issues, improve observability, and enhance service continuity through automation and operational excellence. This position emphasizes production support, RCA, and continuous improvement rather than building infrastructure from scratch. 🗂️ Requirements: 5+ years in IT operations or production support roles, Experience owning incidents end-to-end including RCA, Minimum 2 years working within ITIL framework, Experience in Agile delivery environments, Proficiency with log analysis and alerting tools, Experience with observability and monitoring tools, Hands-on experience supporting services on Kubernetes, Experience with CI/CD pipelines and deployment troubleshooting, Experience with relational databases and query analysis, Ability to trace issues across Java-based application stacks 📃 Skills: ITIL, Splunk, Apica, Sysdig, Prometheus, Grafana, Kubernetes, Jenkins, Oracle, DB2, Spring, Hibernate, Kafka, XML, JSON, Java, J2EE, Datastage, Bash, Python, Ansible 🏢 Description: As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for: Senior DevOps Engineer (AI & Platform Operations) Responsibilities: Incident & Problem Management: Own the RCA process for production incidents — diagnose, resolve, and put preventive measures in place so issues don't recur Production Monitoring & Support: Continuously monitor service health, detect anomalies early, and act before they become incidents Deployment Execution: Trigger and oversee release deployments through existing CI/CD pipelines; troubleshoot failed deployments and coordinate rollbacks when needed Environment Oversight: Keep Pre-Production and Production environments stable and aligned — not building them from scratch, but ensuring they behave as expected day to day Runbook & Knowledge Management: Document operational procedures, known issues, and resolution steps to build a reliable knowledge base for the team Cross-team Collaboration: Work shoulder-to-shoulder with development and platform teams to triage issues, clarify operational requirements, and close the feedback loop between prod and dev Identify recurring pain points and propose automation or tooling to reduce toil Improve observability coverage — dashboards, alerts, log queries — to catch issues faster Contribute to service continuity initiatives and disaster recovery drills Requirements: 5+ years in IT operations, application support (2nd/3rd line), or a similar production-facing role Proven track record of owning incidents end-to-end — from alert to RCA to prevention 2+ years working within an ITIL framework (incident, problem, change management) Experience working in Agile delivery environments alongside development teams Excellent English communication skills — able to explain technical issues clearly to both engineers and non-technical stakeholders Proficiency with log analysis and alerting tools: Splunk, Apica, Sysdig Observability tooling: Prometheus, Grafana — reading dashboards, tuning alerts Comfortable operating services running on Kubernetes (checking pod health, reading logs, triggering restarts — not cluster administration) Familiarity with Jenkins pipelines to execute and troubleshoot deployments Relational databases (Oracle, DB2) — querying, interpreting execution plans, identifying data-related incidents Working knowledge of Spring/Hibernate application behavior, Kafka message flows, XML/JSON payloads — enough to trace an issue through the stack Nice to have: Java/J2EE development background (helps enormously when reading stack traces and working with dev teams) IBM Datastage operational experience Scripting (Bash, Python) for automation of repetitive operational tasks Ansible for applying configuration changes in controlled operational scenarios Offer: Private medical care Co-financing for the sports card Constant support of dedicated consultant Employee referral program

Technology

Link Group

Senior Site Reliability Engineer

Senior

Hybrid

Warsaw, Poland

170 - 230 PLN

🏢 Summary: The role focuses on ensuring reliability, scalability, and performance of large-scale cloud-based applications by building and maintaining resilient infrastructure. You will manage AWS cloud environments, Kubernetes clusters, and CI/CD pipelines while implementing monitoring, automation, and incident response processes. The position emphasizes Infrastructure-as-Code, observability, and continuous reliability improvements. 🗂️ Requirements: 5+ years experience in SRE, DevOps or similar role, Strong experience with AWS cloud services, Experience with Infrastructure-as-Code tools, Hands-on experience with Kubernetes, Proficiency with Docker, Experience with CI/CD pipelines, Solid knowledge of PostgreSQL or Amazon RDS, Strong SQL knowledge, Knowledge of networking concepts (VPC, DNS, troubleshooting), Strong Linux/Unix administration skills, Experience with observability tools, Experience with automation in infrastructure, Experience with incident management 📃 Skills: AWS, Terraform, Pulumi, Kubernetes, EKS, Docker, GitHub, PostgreSQL, RDS, SQL, VPC, DNS, Linux, Unix, Prometheus, Grafana, Datadog, Dynatrace, CI/CD 🏢 Description: We are looking for an experienced Site Reliability Engineer to ensure the reliability, scalability, and performance of large-scale cloud-based web applications. You will work closely with software development, cloud operations, and platform teams to build and maintain resilient infrastructure and improve system stability. Key Responsibilities: Design and maintain monitoring, alerting, and incident response systems to ensure high availability Collaborate closely with engineering, product, and architecture teams Build and manage cloud infrastructure using Infrastructure-as-Code (e.g., Terraform, Pulumi) on AWS Operate and optimize Kubernetes environments (e.g., EKS) Develop and maintain containerized applications using Docker Improve CI/CD pipelines and drive automation across deployment processes Implement and manage observability tools (logging, metrics, tracing) Participate in incident management, postmortems, and reliability improvements Support capacity planning, disaster recovery, and system scaling Contribute to security, compliance, and operational best practices Develop automation and AI-driven solutions for monitoring and incident prevention Requirements: 5+ years of experience in SRE, DevOps, or similar roles Strong experience with AWS cloud services and Infrastructure-as-Code tools Hands-on experience with Kubernetes and containerized environments Proficiency in Docker and CI/CD pipelines (e.g., GitHub Actions) Solid understanding of databases (e.g., PostgreSQL, Amazon RDS) and SQL Knowledge of networking concepts (VPC, DNS, troubleshooting tools like dig/traceroute) Strong Linux/Unix administration skills Experience with observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace) Familiarity with automation and AI-based solutions in infrastructure Strong problem-solving and incident management skills

Technology

N-iX

Middle DevOps Engineer (#5068)

Mid

Remote

Krakow, Poland

5,000 - 5,500 USD

🏢 Summary: DevOps Engineer role focused on building, scaling, and securing cloud infrastructure while enabling efficient CI/CD workflows. The position involves managing Kubernetes-based environments on AWS, optimizing automation, and ensuring high availability and performance of systems. The role also includes infrastructure as code, monitoring, database management, and secure authentication integration. 🗂️ Requirements: BA/BS in technical field or equivalent experience, 5+ years in DevOps, SRE, or Infrastructure Engineering, Strong experience with Kubernetes, Deep knowledge of AWS core services, Experience with containerization technologies, Strong understanding of networking concepts, Proficiency with infrastructure-as-code tools, Experience with monitoring tools, Strong scripting or programming skills, Solid understanding of system security best practices 📃 Skills: Kubernetes, AWS, EC2, S3, IAM, RDS, Docker, Helm, Terraform, Pulumi, CloudFormation, Prometheus, Grafana, CloudWatch, Python, Bash, Go, PostgreSQL, SAML, OAuth2, OIDC, ELK, Loki, FluentBit, GitHubActions, Jenkins, CircleCI, Ansible, EKS, GKE 🏢 Description: We are looking for a skilled and driven DevOps Engineer to join our growing team. In this role, you will take ownership of building, maintaining, and scaling the infrastructure that powers our platform. You will ensure our systems are secure, performant, and highly available, while enabling seamless development and deployment workflows. Responsibilities: Design, implement, and manage scalable infrastructure using Kubernetes and AWS. Optimize CI/CD pipelines to improve build and deployment times and reduce friction. Monitor and troubleshoot infrastructure performance and availability. Manage and maintain relational databases, primarily PostgreSQL. Implement and support secure authentication systems using SSO protocols (e.g., SAML, OIDC, OAuth2). Enhance infrastructure as code using tools like Terraform and Ansible. Ensure security best practices are applied across all infrastructure components. Collaborate cross-functionally with development, QA, and product teams. Drive automation of operational tasks to increase team efficiency and reduce manual toil. Required Skills: BA/BS in a technical or engineering discipline or equivalent experience 5+ years of experience in a DevOps, SRE, or Infrastructure Engineering role. Strong experience with Kubernetes (EKS, GKE, or self-managed). Deep knowledge of AWS core services (EC2, S3, IAM, RDS, etc.). Knowledge of containerization technologies (e.g., Docker, Kubernetes, Helm) Solid understanding of networking concepts (VPCs, subnets, routing, firewalls, DNS). Proficiency with infrastructure-as-code tools (Terraform, Pulumi, or CloudFormation). Comfortable with monitoring tools (Prometheus, Grafana, CloudWatch, etc.). Strong scripting or programming ability (Python, Bash, or Go). Solid understanding of system security and best practices. Preferred Skills: Familiarity with SSO protocols such as SAML, OAuth2, and OpenID Connect. Experience managing and tuning PostgreSQL in production environments. Exposure to log aggregation tools (ELK, Loki, or Fluent Bit). Experience with CI/CD tools like GitHub Actions, Jenkins, or CircleCI. We offer*: Flexible working format - remote, office-based or flexible A competitive salary and good compensation package Personalized career growth Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more) Active tech communities with regular knowledge sharing Education reimbursement Memorable anniversary presents Corporate events and team buildings Other location-specific benefits

Technology

Team Connect

Operations Engineer (Kafka Platform Support)

Mid

Remote

Warsaw, Poland

130 - 140 PLN

🏢 Summary: IT Operations Engineer role focused on supporting and maintaining a Kafka-based messaging platform in a production environment, ensuring high availability, stability, and performance. The position emphasizes incident management, troubleshooting, monitoring, and user support rather than platform development. The engineer works closely with internal users and engineering teams to maintain operational excellence and continuously improve support processes. 🗂️ Requirements: Hands-on experience with Apache Kafka or similar event streaming platforms, Understanding of distributed systems concepts (partitioning, replication, scaling), Experience in IT operations, production support, or platform support roles, Strong troubleshooting skills in production environments, Experience with monitoring, logging, and alerting tools, Knowledge of Git and version control practices, Familiarity with GitLab CI/CD pipelines, Experience with incident management processes and ticketing tools, Experience working with runbooks and SOPs, Fluency in English (written and spoken) 📃 Skills: Kafka, DistributedSystems, Monitoring, Logging, Alerting, Git, GitLab, CI/CD, IncidentManagement, AWS, Kubernetes, Grafana, Prometheus 🏢 Description: About Company: Team Connect is Poland’s leading nearshore and offshore IT provider. Since 2008 we successfully create and develop software for our clients. We specialize in Agile and DevOps-based software development. From the analysis stage through implementation. We develop backend, frontend, and mobile applications. For one of our clients, we are looking for an Operations Engineer (Kafka Platform Support) This role is ideal for an IT Operations Engineer focused on maintaining and supporting a Kafka-based platform in production. The position emphasizes operational excellence, incident management, and user support rather than platform development, ensuring reliable and efficient system performance. Key Responsibilities: IT Operations & Platform Support Operate, monitor, and maintain the Kafka-based messaging platform in a production environment Ensure platform availability, stability, and performance in line with operational SLAs Monitor system health using logs, metrics, and alerting tools Perform routine operational checks and maintenance activities Incident Management & Troubleshooting Handle incidents and service requests via ticketing systems and internal support channels Troubleshoot issues across Kafka components (brokers, producers, consumers, integrations) Analyze logs, metrics, and system behavior to identify root causes Escalate complex issues to engineering teams where necessary Runbook Execution & Operational Processes Execute operational procedures based on runbooks and standard operating procedures (SOPs) Perform configuration changes (topics, access controls, settings) following established processes Maintain and continuously improve operational documentation and runbooks User Support & Communication Act as a primary support contact for internal users of the Kafka platform Provide technical support via collaboration tools (e.g., Slack, Teams) Assist users with troubleshooting and best practices Translate user-reported issues into actionable insights for technical teams Collaboration & Continuous Improvement Work closely with engineering and platform teams to resolve incidents Identify recurring operational issues and suggest improvements or automation Participate in incident reviews and post-mortems Provide feedback to improve platform usability and support processes Required Skills & Qualifications: Technical Skills Hands-on experience with Apache Kafka or similar event streaming platforms Understanding of distributed systems (partitioning, replication, scaling) Strong troubleshooting skills in production IT environments Experience with monitoring, logging, and alerting tools Knowledge of Git and version control practices Familiarity with GitLab CI/CD and working with existing pipelines IT Operations Experience Experience in IT operations, production support, or platform support roles Familiarity with incident management processes and tools Experience working with runbooks, SOPs, and structured support models Communication & Collaboration Strong communication skills and ability to explain technical issues clearly Experience working with internal customers and cross-functional teams Customer-focused mindset with a proactive approach to support Fluency in English (written and spoken) Nice to Have: Experience with AWS or other cloud platforms Familiarity with Kubernetes and containerized environments Experience with monitoring tools such as Grafana and Prometheus Benefits: Long-term cooperation Multisport, private healthcare, life insurance Training budget English lessons Support from a dedicated partnership consultant

Technology

xBerry Sp. z o.o.

DevOps Engineer

Senior

Remote

Wrocław, Poland

20,000 - 28,000 PLN/mo

🏢 Summary: DevOps Engineer role focused on maintaining and enhancing a complex, on-premise automation platform deployed globally on Linux and Kubernetes. The position involves advanced troubleshooting, incident response, and development of automation, monitoring, and self-healing mechanisms to reduce on-site interventions. Includes international travel and participation in an on-call rotation to ensure high system reliability. 🗂️ Requirements: Strong Linux (Ubuntu) administration and troubleshooting experience, Hands-on Kubernetes cluster management and troubleshooting, Practical Docker experience, Solid networking knowledge and network diagnostics skills, Experience with NFS and storage troubleshooting, Operational knowledge of GPU and CUDA environments, Experience with RabbitMQ, Experience with PostgreSQL, Ability to handle production incidents and system upgrades, Willingness to participate in on-call rotation, Readiness for international travel and on-site work 📃 Skills: Linux, Ubuntu, Kubernetes, Docker, Networking, NFS, CUDA, GPU, RabbitMQ, PostgreSQL 🏢 Description: Position Overview Important: Travel & On-Call Requirements This role requires readiness for long-distance international travel to customer sites . The systems are deployed globally and, when issues cannot be resolved remotely, on-site interventions may be necessary , including deployments, upgrades, and complex troubleshooting activities. Additionally, the position includes participation in a rotational on-call / standby schedule , ensuring operational continuity and the ability to respond to critical incidents outside of standard working hours. We are looking for an experienced DevOps Engineer to join a team responsible for the maintenance and further development of a complex automation system deployed on-premise at customer sites . The system is based on Linux (Ubuntu) and a containerized Kubernetes architecture . The platform consists of multiple cooperating application and infrastructure components, including: backend services GPU-based computing components (CUDA) communication layer storage networking components The environment is characterized by high operational complexity and strong dependencies between system layers (OS, Kubernetes, applications, networking, storage). Systems are deployed across multiple locations worldwide and often operate in environments with limited local IT support, which requires high reliability and well-defined operational procedures. Responsibilities Incident Handling and System Maintenance Diagnosing and resolving issues related to: Kubernetes clusters containers (Docker) Linux (Ubuntu) operating system networking storage (including NFS) Analyzing logs and service health across application and infrastructure layers Restoring full system functionality in production environments Performing system deployments and upgrades at customer sites Participating in on-site interventions when issues cannot be resolved remotely Automation, Observability, and System Resilience Designing and developing automated troubleshooting mechanisms Early detection of infrastructure and application-level issues Automated validation of the health of key system components: OS Kubernetes containers storage networking Building health checks and observability solutions (metrics, alerts, dashboards) Creating and maintaining: runbooks standard recovery procedures automated self-healing mechanisms Documenting common incidents, root causes, and resolution methods Technical Requirements Strong experience with Linux (Ubuntu) system administration and troubleshooting Hands-on experience with Kubernetes, including cluster troubleshooting and container analysis Practical knowledge of Docker Solid understanding of networking and diagnosing network-related issues Experience with NFS / storage troubleshooting Operational knowledge of GPU / CUDA environments (compatibility, stability) Experience working with: RabbitMQ PostgreSQL Additional Requirements Willingness to participate in an on-call / standby rotation Readiness for business travel, including on-site customer visits Ability to work independently in complex, distributed environments Strong analytical and problem-solving skills We offer Flexible working hours Remote work options Medical care program MultiSport Integration events A contract of employment or self-employment, depending on You