New offer - be the first one to apply!
June 30, 2026
Site Reliability Engineer - Cybersecurity
Senior • On-site
Palo Alto, CA
ABOUT xAI
xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
ABOUT THE ROLE
The Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform but will also cross over with the X Social platform. The ideal candidate will have experience in the banking, money transmission, and P2P payments industry. We emphasize working with large distributed systems and security platforms at scale, with an automation-first mindset.
You'll be responsible for securing and maintaining the reliability of X Money's infrastructure. You'll work closely with cross-functional teams to enhance security measures, improve system resilience, and implement best practices.
RESPONSIBILITIES
- Build and secure mission-critical applications in a hybrid cloud environment.
- Manage identities and roles effectively.
- Monitor and remediate infrastructure to comply with regulations and best practices (e.g., PCI, NIST CSF).
- Maintain a SIEM and all data pipelines needed for reliable alerting.
- Design and implement secure container standards and automation to enable frictionless developer workflows.
- Maintain Kubernetes security aligned with current best practices.
- Build, deploy, and maintain security operations infrastructure using Python, Terraform, and Puppet.
- Secure and enhance CI/CD pipelines.
- Integrate and maintain code scanning platforms.
- Develop dashboards and alerts from security metrics.
- Own security projects: identify issues and implement solutions.
- Apply critical analysis and problem-solving skills.
BASIC QUALIFICATIONS
- Proven experience securing hybrid AWS/on-premises environments, including IAM and overall security posture.
- Strong proficiency in Python, Terraform, and Puppet.
- Certifications like CISA, CRISC, CGEIT, Security+, CASP+, or similar preferred.
- Deep expertise in Kubernetes and container security.
- Hands-on expertise building GitHub Actions and workflows.
- Extensive experience with Prometheus, Grafana, CloudWatch, and Karma.
- Well versed in management and integrations of Wazuh.
- Hands-on experience with security scanning tools (Semgrep, Trivy, Falco).
- Proactive mindset with strong ownership and problem-solving skills.
- Excellent critical thinking and analytical abilities.
- Located in the SF Bay Area or willing to relocate.
COMPENSATION AND BENEFITS
$180,000 - $440,000 USD
Base salary is just one part of our total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Similar jobs you might like
Technology
New offer

xAI
Exceptional Software Engineer
Senior
On-site
Palo Alto, CA
🏢 Summary: This role involves working on the most critical product or technical challenges, contributing directly to the development of truth-seeking AI systems. The position is hands-on and impact-driven, with clear project ownership defined before the offer stage. It offers a highly competitive compensation package including equity and comprehensive benefits. 🗂️ Requirements: Strong belief in truth-seeking AI as a critical problem, Exceptional software engineering skills, Ability to thrive in meritocratic environments, High work ethic and strong prioritization skills, Strong communication skills, Ability to contribute hands-on to critical technical challenges 📃 Skills: Software, Engineering, AI 🏢 Description: ABOUT THE ROLE: You will work on the most critical product or technical challenge at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You believe truth-seeking AI is the most important and challenging problem. You take pride in your work and thrive in meritocratic environments. You are an exceptional software engineer. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of the total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short- and long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.
Technology

xAI
Exceptional Software Engineer
Senior
On-site
Seattle, WA
🏢 Summary: This role involves working on the most critical product or technical challenges within a cutting-edge AI organization focused on building truth-seeking AI systems. The position requires direct, hands-on contribution to high-impact engineering problems in a fast-paced, meritocratic environment. Candidates will tackle core technical initiatives aligned with advancing AI capabilities. 🗂️ Requirements: Exceptional software engineering skills, Proven ability to solve complex technical problems, Hands-on experience in building and delivering software systems, Ability to work on high-priority product or technical challenges, Strong technical communication skills 📃 Skills: Software, Engineering, AI, Programming, Systems 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: You will work on the most critical product or technical challenge at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You believe truth-seeking AI is the most important and challenging problem. You take pride in your work and thrive in meritocratic environments. You are an exceptional software engineer. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Technology

xAI
Exceptional Software Engineer
Senior
On-site
Palo Alto, CA
🏢 Summary: Opportunity for an exceptional software engineer to tackle the most critical product or technical challenges within a high-impact AI environment. The role involves hands-on contribution to core systems and direct influence on cutting-edge AI development. 🗂️ Requirements: Exceptional software engineering ability, Proven experience building and delivering complex software systems, Ability to work on high-priority technical challenges, Hands-on development experience in production environments, Strong technical communication skills 📃 Skills: Programming, Algorithms, DataStructures, Systems, AI, Testing, Debugging 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: You will work on the most critical product or technical challenge at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You believe truth-seeking AI is the most important and challenging problem. You take pride in your work and thrive in meritocratic environments. You are an exceptional software engineer. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Technology
New offer

xAI
Software Engineer - Networking Software and Services
Senior
On-site
Palo Alto, CA
🏢 Summary: Opportunity to build and scale automation-first network software and services supporting large-scale GPU supercomputing fabrics for AI training and inference. The role focuses on developing tools for network management, metrics collection, provisioning, monitoring, and auto-remediation while implementing Infrastructure as Code best practices. You will design highly scalable, reliable systems that orchestrate tens of thousands of network devices in production environments. 🗂️ Requirements: Deep experience working with network engineers and network topologies, Strong knowledge of physical and logical network architectures, Strong knowledge of network protocols, Proven experience designing scalable and reliable software systems, Experience building systems that orchestrate large-scale network devices, Ability to implement Infrastructure as Code best practices, Experience enhancing deployment pipelines, Ability to create and define meaningful metrics for prioritization, Strong communication skills 📃 Skills: Python, Go, TCP/IP, BGP, RDMA, IaC, Networking, Automation 🏢 Description: ABOUT xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. About the Role As part of the Network Software and Services for AI (nssAI) team, you will build cutting-edge software, services, and frameworks to empower Network Development Engineers. You will work hands-on across all facets of network management, including metric collection, configuration, zero-touch provisioning, monitoring, and auto-remediation, driving automation-first solutions for production and ancillary networks. The role involves developing extensible tools, streamlining complex processes, and ensuring high reliability to support AI training and inference workloads. Focus - Building software and tools with extensive metrics coverage for large-scale GPU supercomputing network fabrics used for AI training and serving inference queries. - Implementing Infrastructure as Code best practices, enhancing deployment pipelines, and ensuring robust and secure service delivery across production environments. Preferred Skills and Experience - Deep experience collaborating with network engineers using extensive knowledge of physical and logical network topologies and protocols. - Expert knowledge and proven history of designing scalable and reliable software from the ground up. - Experience building and orchestrating tens of thousands of network devices at high speed. - Ability to thrive in ambiguity and create metrics to help prioritize team focus. Tech Stack - Python - Go - TCP/IP - BGP - RDMA Annual Salary Range $150,000 - 250,000k Benefits Base salary is part of the total rewards package, which includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short- and long-term disability insurance, life insurance, and additional discounts and perks. xAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.
Technology

xAI
X Developer Platform – Forward Deployed Engineer, X API
Senior
On-site
Palo Alto, CA
🏢 Summary: Hands-on Forward Deployed Engineer role focused on building production solutions, sample applications, and developer tools on top of the X API platform for Enterprise and Indie developers. The position combines deep technical implementation with developer experience, emphasizing APIs, real-time data, and AI/agent integrations. You will work directly with customers to drive adoption and improve SDKs, documentation, and tooling across the ecosystem. 🗂️ Requirements: 5+ years in customer-facing technical role, Proficiency in at least two programming languages (Python, JavaScript, TypeScript, Java), Experience shipping production-quality software and developer tools, Strong knowledge of API design principles, Experience with REST or GraphQL APIs, Understanding of real-time streaming systems, Experience with authentication mechanisms, Hands-on experience building SDKs or developer tools, Experience creating technical documentation and guides, Experience with AI or LLM integrations 📃 Skills: Python, JavaScript, TypeScript, Java, REST, GraphQL, APIs, Streaming, Authentication, SDKs, CLI, LLM, AI, Rust, Scala, JVM 🏢 Description: ABOUT xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: X Developer Platform is responsible for the partner ecosystem of B2B and B2C developers that build solutions using X's extensive API set. With more than 500 million active users and billions of posts per week, X offers robust, real-time and historical data, insights and engagement opportunities across a wide range of organizations and industries. Our API gives you the ability to learn from and engage with the conversation on X and we supply developers with the tools to further uncover, build on, and share the value of this conversation with the world. Our mission broadly is to achieve the state of X, "The Everything App", and be the clear digital townsquare of Earth. We make our ever-expanding universe of social media data available via our extensive API suite with consistent and reliable architecture so the world can realize the full potential of this amazing stream of information. Our team helps collect, process, enrich and deliver hundreds of billions of signals a day through the X API platform. Our products are highly available, scalable, optimized, respectful of X's user base, and truly essential for our customers who build their businesses on X data. We are seeking an exceptional Forward Deployed Engineer who will work at the intersection of deep technical implementation and world-class developer experience. In this hands-on role, you will be on the front lines building production solutions for Enterprise customers, creating sample apps for both Enterprise and Indie developers, improving DevEx tools (SDKs, MCPs, CLIs, sandbox environments, and more), and authoring comprehensive how-to-guides — with an emphasis for X API + Agentic/AI use cases and integrations. You are genuinely obsessed with developers, our documentation, and making every X API product super accessible, intuitive, and delightful to build with. You will work directly with customers while collaborating closely with internal product and engineering teams to drive adoption, reduce friction, and maximize long-term value across the entire developer ecosystem. X Developer API Products Include: Real time streaming access to X data Historical access to archived X data Insight into X engagement data Enrichments on X objects derived from the latest machine learning technologies Flexible access to aggregate data End to end developer experience RESPONSIBILITIES: Partner directly with Enterprise customers to understand their business needs and rapidly build production-grade solutions, prototypes, integrations, and accelerators using the X API platform. Design, develop, and maintain high-quality sample applications, starter kits, reference implementations, and code examples for both Enterprise teams and Indie developers to accelerate adoption and showcase best practices. Build, enhance, and ship developer experience tools such as SDKs, MCPs, CLIs, Sandbox/Test Environments, and other internal/external tooling that dramatically improves developer productivity and ease of use. Research, write, and continuously improve comprehensive how-to guides, tutorials, cookbook recipes, technical blogs, and educational content — with special emphasis on X API integrations, real-time data, AI Agents, LLMs, and emerging use cases. Obsess over every aspect of developer experience: continuously audit and elevate our documentation, onboarding flows, code samples, and overall accessibility to make X API products the most approachable and powerful in the industry. Gather real-world feedback from customers and the broader developer community, advocate passionately for their needs internally, and collaborate with Product and Engineering teams to influence roadmap and feature prioritization. Diagnose complex technical issues, bugs, and edge cases; provide expert-level troubleshooting, workarounds, and long-term solutions while turning learnings into public guides and tooling improvements. Streamline support processes and create scalable materials that close knowledge gaps and accelerate success for both managed partners and self-serve developers. BASIC QUALIFICATIONS: Exceptional coding proficiency in two or more languages (Python, JavaScript/TypeScript, Java, etc.) with a proven track record of shipping production-quality software, sample apps, prototypes, and developer tools. Strong understanding of API design principles, REST/GraphQL, real-time streaming systems, authentication, and modern AI/agent workflows. Hands-on experience building developer-facing assets: sample applications, reference implementations, DevEx tools, and high-quality technical documentation/guides. Deep, genuine passion for developer experience (DevEx) — you instinctively identify friction and love removing it through better docs, tools, and accessible APIs. 5+ years of experience in a customer-facing technical role (partner engineering, solutions architecture, developer relations, forward-deployed engineering, or similar) working directly with enterprise customers. Ability to work comfortably and professionally with diverse stakeholders (software developers, product managers, technical executives, and business leaders) to define and deliver shared objectives. Excellent project management skills with the ability to scope, execute, and drive initiatives autonomously in a fast-paced environment. Outstanding verbal and written communication skills, including the ability to translate complex technical topics into clear, engaging documentation and presentations. Strong attention to detail and a solution-oriented mindset that turns customer problems into scalable improvements. PREFERRED SKILLS AND EXPERIENCE: Previous experience building or significantly contributing to developer platforms, tools, SDKs, interactive playgrounds, or educational content. Hands-on knowledge of AI/Agent frameworks, LLM integrations, or building AI-powered applications on top of data APIs. Industry experience in social media, enterprise software, data analytics, real-time/streaming data, or related spaces. Strong familiarity with Rust, Scala (ideally), or JVM-based programming languages. A public portfolio of sample apps, open-source contributions, technical blogs, guides, or DevEx tools that demonstrate your builder mindset and developer obsession. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Technology
New offer

xAI
Sr. Software Engineer (Data Center Automation)
Senior
On-site
Memphis, TN
🏢 Summary: Senior Software Engineer role focused on improving reliability and automation across multi-data center AI infrastructure. The position combines strong software engineering skills with hands-on data center and systems expertise to build observability, automate remediation, and minimize downtime in mission-critical environments. The engineer will design scalable services, optimize Linux systems, and support distributed infrastructure with near-zero downtime requirements. 🗂️ Requirements: Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering or related field (or equivalent experience), 3+ years experience in SRE, Infrastructure, DevOps, or Systems Engineering in distributed production environments, Strong programming experience in Python, Solid Linux systems administration and performance tuning experience, Experience with Docker and Kubernetes or similar orchestration tools, Experience implementing observability solutions (metrics, logging, tracing, monitoring, alerting), Knowledge of networking fundamentals (TCP/IP, routing, DNS, redundancy), Experience troubleshooting distributed systems including hardware and network issues, Experience with on-call rotations and incident response practices, Ability to collaborate with cross-functional technical teams 📃 Skills: Python, Rust, Linux, Docker, Kubernetes, Prometheus, Grafana, TCP/IP, DNS, C++, Go 🏢 Description: ABOUT THE ROLE: We are seeking a highly skilled Sr. Software Engineer to join our team in managing and enhancing reliability across a multi-data center environment. This role focuses on automating processes, building and implementing robust observability solutions, and ensuring seamless operations for mission-critical AI infrastructure. The ideal candidate will combine strong coding abilities with hands-on data center experience to build scalable reliability services, optimize system performance, and minimize downtime—including close partnership with facility operations to address physical infrastructure impacts. In an era where AI workloads demand near-zero downtime, this position plays a pivotal role in bridging software engineering principles with physical data center realities. By prioritizing automation and observability, team members in this role can reduce mean time to recovery (MTTR) by up to 50% through proactive monitoring and automated remediation, based on industry benchmarks from high-scale environments like those at hyperscale cloud providers. The primary objective of this team is to mitigate downtime and minimize impact to end-users from both scheduled and unscheduled maintenance, as well as events affecting onsite data centers. This is achieved through proactive automation, robust observability, and integrated software-physical reliability strategies, ensuring our AI infrastructure remains resilient, scalable, and at the cutting edge of innovation. RESPONSIBILITIES: - Design, develop, and deploy scalable code and services (primarily in Python and Rust, with flexibility for emerging languages) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning. - Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers. - Collaborate with cross-functional teams—including software development, network engineering, site operations, and facility operations—to identify reliability bottlenecks and automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation. - Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs. - Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration (e.g., Kubernetes), and scripting for automation. - Understand network topologies and concepts in large-scale, multi-data center environments to troubleshoot connectivity, routing, redundancy, and performance issues. - Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives to enhance overall site reliability. - Mentor junior team members and document processes to foster a culture of automation and knowledge sharing. BASIC QUALIFICATIONS: - Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience). - 3+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering in large-scale, distributed, or production environments. - Strong programming skills with proven production experience in Python; experience with Rust or strong fundamentals in a systems-level language (e.g., Go, C++). - Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments. - Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes (or similar systems). - Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools (e.g., Prometheus, Grafana), alerting, and dashboards. - Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors. - Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments. - Experience participating in on-call rotations, incident response, post-incident reviews (blameless postmortems), and reliability practices such as error budgets or SLAs. - Ability to collaborate effectively with cross-functional technical teams. PREFERRED SKILLS AND EXPERIENCE: - 5+ years of experience in SRE or infrastructure roles in hyperscale, cloud, or AI/ML training infrastructure environments. - Hands-on experience operating or scaling Kubernetes clusters at large scale. - Proficiency in Rust for systems programming and performance-critical components. - Experience integrating software reliability tools with physical data center infrastructure (power, cooling, environmental monitoring). - Experience building automated remediation, fault tolerance, disaster recovery, capacity planning, or predictive failure detection systems. - Background in optimizing Linux-based systems for AI workloads, GPU clusters, or high-throughput compute environments. - Experience with bare-metal provisioning, data center interconnects, or hybrid/multi-site failover mechanisms. - Mentoring experience and strong documentation skills.
Technology
New offer

xAI
Sr. Software Engineer (Data Center Automation)
Senior
On-site
Palo Alto, CA
🏢 Summary: Senior Software Engineer role focused on building automation and observability solutions to enhance reliability across multi-data center AI infrastructure. The position combines strong programming skills with hands-on data center and Linux systems expertise to minimize downtime and optimize performance. It involves developing scalable services, improving monitoring and incident response, and collaborating across infrastructure and facility teams. 🗂️ Requirements: Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering or related field (or equivalent experience), 3+ years experience in SRE, Infrastructure, DevOps, or Systems Engineering in large-scale production environments, Strong production experience in Python, Solid Linux systems administration and kernel-level knowledge, Experience with containerization and orchestration (Docker, Kubernetes or similar), Experience implementing observability solutions (metrics, logging, tracing, monitoring, alerting), Understanding of networking fundamentals (TCP/IP, routing, DNS, redundancy), Experience troubleshooting distributed systems, hardware and network issues, Experience with on-call rotations and incident response practices (SLAs, error budgets), Ability to collaborate with cross-functional technical teams 📃 Skills: Python, Rust, Linux, Kubernetes, Docker, Prometheus, Grafana, TCP/IP, DNS, Scripting, Automation, Observability, Monitoring, Tracing, Networking 🏢 Description: ABOUT xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: We are seeking a highly skilled Sr. Software Engineer to join our team in managing and enhancing reliability across a multi-data center environment. This role focuses on automating processes, building and implementing robust observability solutions, and ensuring seamless operations for mission-critical AI infrastructure. The ideal candidate will combine strong coding abilities with hands-on data center experience to build scalable reliability services, optimize system performance, and minimize downtime—including close partnership with facility operations to address physical infrastructure impacts. In an era where AI workloads demand near-zero downtime, this position plays a pivotal role in bridging software engineering principles with physical data center realities. By prioritizing automation and observability, team members in this role can reduce mean time to recovery (MTTR) by up to 50% through proactive monitoring and automated remediation. The primary objective of this team is to mitigate downtime and minimize impact to end-users from both scheduled and unscheduled maintenance, as well as events affecting onsite data centers. This is achieved through proactive automation, robust observability, and integrated software-physical reliability strategies. RESPONSIBILITIES: - Design, develop, and deploy scalable code and services (primarily in Python and Rust) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning. - Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers. - Collaborate with cross-functional teams to identify reliability bottlenecks and automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation. - Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs. - Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration, and scripting for automation. - Understand network topologies and concepts in large-scale, multi-data center environments to troubleshoot connectivity, routing, redundancy, and performance issues. - Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives. - Mentor junior team members and document processes to foster a culture of automation and knowledge sharing. BASIC QUALIFICATIONS: - Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience). - 3+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering in large-scale, distributed, or production environments. - Strong programming skills with proven production experience in Python; experience with Rust or another systems-level language (e.g., Go, C++) is essential. - Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments. - Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes. - Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools, alerting, and dashboards. - Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors. - Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments. - Experience participating in on-call rotations, incident response, post-incident reviews, and reliability practices such as error budgets or SLAs. - Ability to collaborate effectively with cross-functional teams. PREFERRED SKILLS AND EXPERIENCE: - 5+ years of experience in SRE or infrastructure roles in hyperscale, cloud, or AI/ML training environments with multi-data center setups. - Hands-on experience operating or scaling Kubernetes clusters at large scale, including automation for provisioning and high availability. - Proficiency in Rust for systems programming and performance-critical components. - Experience integrating software reliability tools with physical data center infrastructure (power, cooling, environmental monitoring). - Experience building automated remediation, fault tolerance, disaster recovery, capacity planning, or predictive failure detection systems. - Background in optimizing Linux-based systems for AI workloads, GPU clusters, or high-throughput compute environments. - Experience with bare-metal provisioning, data center interconnects, or hybrid/multi-site failover mechanisms. - Mentoring experience and strong documentation skills. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Technology

xAI
Member of Technical Staff – Web Engineering
Senior
On-site
Palo Alto, CA
🏢 Summary: Fullstack/Web Engineer role focused on building and optimizing high-performance, real-time user-facing features for a large-scale social platform. The position emphasizes frontend excellence while contributing across the stack, including scalable backend systems and real-time analytics. You will own features end-to-end, driving architecture, performance, and reliability for globally deployed products. 🗂️ Requirements: 2+ years web development experience, Expertise in TypeScript, Expertise in Node.js, Expertise in React or modern web frameworks, Expertise in CSS/SASS, Experience with UI/UX design, Experience optimizing performance and security, Experience building scalable, high-concurrency systems, Backend development experience, Proficiency in Rust, Go, Java, Python, or Scala 📃 Skills: TypeScript, Node.js, React, CSS, SASS, Rust, Go, Java, Python, Scala, HTML, WebSockets, REST, GraphQL, SQL, NoSQL, Testing, CI/CD, Monitoring, Security 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.About the Role: We're looking for exceptional Fullstack / Web Engineers that can work across the stack but have a passion for frontend development and a keen eye for design. You'll architect and optimize user-facing features that power real-time conversations for millions worldwide. Dive into cutting-edge technologies and scalable backend systems, collaborating with top-tier talent to push the boundaries of web performance and innovation. You have the ability to thrive in a fast-paced environment, where you proactively tackle high-impact challenges that shape the future of social media—perfect for engineers passionate about crafting seamless, responsive experiences that drive global engagement and redefine digital interaction. Responsibilities: Own and drive features from inception and design to implementation and launch, being the web expert on your team. Build and maintain high-quality, performant products and features, leveraging the most modern and cutting edge web standards, technologies, frameworks, and AI tooling. Responsible for fullstack features, including user dashboards, personalized experiences, content delivery, interactive tools, assessments, and real-time analytics Lead architecture, scalability, and reliability decisions for high-concurrency, low-latency systems. Uphold engineering excellence via testing, monitoring, deployment, and secure data handling. Drive technical/product decisions with teams and deploy global features to maximize user value. Basic Qualifications: 2+ years of web development experience. Expert in TypeScript, Node.js, and modern web frameworks (e.g., React). Expert in modern CSS/SASS Experience in high-quality UI and UX design Proven track record of optimizing applications for performance, security, and offline functionality. Preferred Skills and Experience: 5+ years of experience in a web frontend role, working on a large scale consumer app. Experience with backend development, proficiency in one or more of the following: Rust, Go, Java, Python, Scala. Compensation and Benefits: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Technology

xAI
Senior Data Analyst- Fraud & AML
Senior
On-site
Palo Alto, CA
12,333 - 18,333 USD/yr
🏢 Summary: Senior Data Scientist role focused on designing and optimizing AML and fraud detection models to strengthen financial crime compliance and transaction monitoring. The position involves building advanced analytics solutions, coverage assessment frameworks, and automated reporting to support BSA/AML, OFAC, and regulatory requirements. It is a cross-functional, high-impact role combining machine learning, compliance expertise, and scalable data solutions. 🗂️ Requirements: 7+ years data science experience in financial services, 4+ years experience in fraud and financial crime compliance, Master's degree in quantitative field, Experience building transaction monitoring models in regulated environment, Strong knowledge of BSA/AML and SAR processes, Understanding of sanctions screening and model risk management, Proficiency in Python and SQL, Experience supporting regulatory examinations, U.S. work authorization under ITAR requirements 📃 Skills: Python, SQL, MachineLearning, Statistics, AML, BSA, OFAC, SAR, FraudDetection, TransactionMonitoring, Compliance, ModelRisk, DataAnalytics, Dashboards, Automation, RPA 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: We are looking for a Senior Data Scientist to join our Compliance Program and play a pivotal role in modernizing and strengthening our financial crime detection capabilities. You will architect, build, and optimize data-driven transaction monitoring models, coverage assessment frameworks, and advanced analytics solutions that directly support BSA/AML regulatory compliance, including but not limited to SAR filing, Customer Identification Program elements, and Enhanced Due Diligence measures. The role will also support the OFAC Sanctions, Fraud and overall risk prioritization across multiple products and jurisdictions. This is a high-impact, cross-functional role that blends advanced analytics, machine learning, and deep compliance domain expertise. You will work closely with Compliance, Engineering, Model Risk, Product, and external regulators to ensure our controls are robust, defensible, and scalable. RESPONSIBILITIES: Design, develop, and enhance AML and fraud models, rules, and heuristics using Python, SQL, and AI-enabled tooling; partner with the Compliance Machine Learning team on model reviews to improve detection rates and reduce false positives. Build and maintain interactive performance dashboards and automated reporting solutions that track key risk, productivity, and capacity metrics for senior leadership and regulators. Architect and implement enterprise-wide Transaction Monitoring Coverage Assessment frameworks, including standardized methodologies for gap identification, root-cause analysis, remediation planning, and ongoing sustainability monitoring. Lead complex data initiatives, including extraction of SAR filing metrics with product-level breakdowns and development of jurisdiction- and typology-specific SAR narrative generator tools. Embed data science best practices into product launches and feature rollouts to proactively identify and close monitoring coverage gaps. Support regulatory examinations (e.g., NYDFS Part 504) by preparing analytical documentation, third-party validation materials, and executive certification packages. Drive continuous improvement of compliance operations through automation, process optimization, and advanced analytics. BASIC QUALIFICATIONS: 7+ years of hands-on data science / advanced analytics experience in financial services, with at least 4 years focused on fraud and financial crime compliance. Master's degree (or higher) in Applied Mathematics, Statistics, Data Science, Actuarial Science, or a related quantitative field. Proven track record of building and optimizing transaction monitoring models, coverage frameworks, or compliance analytics programs in a regulated environment (fintech, bank, or payment company preferred). Deep understanding of BSA/AML regulations, suspicious activity reporting, customer due diligence, sanctions screening, and model risk management principles. Demonstrated ability to translate complex regulatory requirements into actionable data solutions and present findings to senior leadership and regulators. Certified Anti-Money Laundering Specialist (CAMS) or equivalent compliance certification is strongly preferred. PREFERRED SKILLS AND EXPERIENCE: Experience leading cross-functional initiatives involving Engineering, Legal, Product Compliance, and external consulting partners. Background in building internal case management systems, SAR automation tools, or RPA solutions. Familiarity with AML detection platforms. Track record of delivering measurable impact (e.g., reduced case volumes, improved detection of high-risk activity, increased operational efficiency). COMPENSATION AND BENEFITS: $148,000- $220,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. ITAR REQUIREMENTS: To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
Technology
New offer

xAI
Backend Engineer - API
Senior
On-site
Palo Alto, CA
🏢 Summary: Engineering role focused on building and operating a highly scalable, low-latency API infrastructure that serves AI models globally. The position involves owning end-to-end distributed systems for high-throughput inference, including model serving, request routing, and observability. It requires deep expertise in Rust or C++ and strong experience with distributed systems and production-grade infrastructure. 🗂️ Requirements: Expert knowledge of Rust or C++, Experience designing and maintaining horizontally scalable distributed systems, Experience building reliable production infrastructure, Knowledge of service observability and reliability best practices, Experience operating PostgreSQL, Clickhouse, or MongoDB, Strong understanding of high-throughput, low-latency systems 📃 Skills: Rust, C++, Go, PostgreSQL, Clickhouse, MongoDB, gRPC, Docker, Kubernetes, TensorRT, vLLM, SGLang 🏢 Description: ABOUT THE ROLE: As an ideal candidate you have a good understanding of how highly scalable and reliable production infrastructure is built. Most of our backend infrastructure is written in Rust. Familiarity with a compiled language such as C++, Rust, or Go is highly beneficial. RESPONSIBILITIES: Build the xAI API that serves our models to developers worldwide Own the end-to-end system responsible for high-throughput inference, handling billions of tokens per minute with low latency and high availability, including model serving infrastructure, request routing, SDK development, rate limiting, observability, and efficient scaling BASIC QUALIFICATIONS: Expert knowledge of either Rust or C++ Experience in designing, implementing, and maintaining reliable and horizontally scalable distributed systems Knowledge of service observability and reliability best practices Experience in operating commonly used databases such as PostgreSQL, Clickhouse, and MongoDB PREFERRED SKILLS AND EXPERIENCE: Experience with LLM inference engines and serving frameworks (e.g., SGLang, TensorRT, vLLM) Experience designing or building with agent SDKs and agent orchestration frameworks Experience with Docker, Kubernetes, and containerized applications Expert knowledge of gRPC (unary, response streaming, bi-directional streaming, REST mapping) COMPENSATION AND BENEFITS $180,000 - $440,000 USD Base salary is just one part of the total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short- and long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.