New offer - be the first one to apply!

April 10, 2026

QA and Performance Testing Engineering Lead

Senior • Remote

130 - 150 PLN

Warsaw, Poland

QA & Performance Engineering Lead (AI/LLM Focus)

The Role

We are seeking a high-caliber QA and Performance Engineering Lead to spearhead the testing strategy for enterprise-grade AI and LLM solutions. In this role, you will define the architecture for functional, non-functional, and performance testing, ensuring that complex AI agent workflows and large-scale applications meet the highest standards of reliability and compliance. You will act as a bridge between traditional QA excellence and the cutting-edge requirements of GenAI evaluation.

Core Responsibilities & Technical Expertise

  • Strategic QA Leadership: Leverage 10+ years of experience leading enterprise-wide testing initiatives within Fortune 500 environments to design comprehensive QA architectures.

  • AI/LLM Specialized Evaluation: Implement advanced metrics for model assessment, including BLEU, ROUGE, perplexity, and specialized scoring for hallucination and grounding rates.

  • Performance & Resilience Engineering: Build frameworks for load, stress, and chaos testing to ensure system stability under extreme conditions and peak workloads.

  • Automation & Orchestration: Engineer robust CI/CD test pipelines using Azure DevOps or GitHub Actions, focusing on automated API testing (Pytest/Postman) and integrated test harnesses.

  • Agentic Workflow Validation: Design testing strategies for multi-step AI agents, covering tool chaining, orchestration, and context injection accuracy.

  • Data Governance & Compliance: Apply deep knowledge of data lineage (Purview/Unity Catalog) and maintain strict traceability and auditability standards required in regulated industries.

  • Lifecycle Management: Oversee model release gates, registry promotions, and the management of synthetic datasets and versioning.

Key Deliverables

  • Unified Testing Framework: A standardized taxonomy and coverage model spanning unit, integration, E2E, and AI agent workflows.

  • AI Evaluation Suite: A comprehensive suite for validating model consistency, toxicity, and correctness, supported by Proof-of-Concept (PoC) validations.

  • Automated Performance Harness: Scalable workload models designed for peak-load scenarios and resiliency benchmarking.

  • Smart Quality Gates: Automated pass/fail scoring mechanisms embedded directly into release pipelines across all quality dimensions.

  • Advanced Observability: Implementation of "Golden Dashboards" tracking real-time metrics such as latency-per-thought, grounding quality, and functional pass rates.

Professional Profile

  • Expertise in Enterprise QA Architecture (Functional + Non-functional + Performance).

  • Deep understanding of ML/LLM lifecycle and model promotion pipelines.

  • Strong background in Regulated Industries (ensuring compliance and audit readiness).

  • Hands-on experience with Synthetic Data generation and dataset versioning.