June 30, 2026

Site Reliability Engineer - Cybersecurity

Senior • On-site

Palo Alto, CA

ABOUT xAI

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

ABOUT THE ROLE

The Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform but will also cross over with the X Social platform. The ideal candidate will have experience in the banking, money transmission, and P2P payments industry. We emphasize working with large distributed systems and security platforms at scale, with an automation-first mindset.

You'll be responsible for securing and maintaining the reliability of X Money's infrastructure. You'll work closely with cross-functional teams to enhance security measures, improve system resilience, and implement best practices.

RESPONSIBILITIES

Build and secure mission-critical applications in a hybrid cloud environment.
Manage identities and roles effectively.
Monitor and remediate infrastructure to comply with regulations and best practices (e.g., PCI, NIST CSF).
Maintain a SIEM and all data pipelines needed for reliable alerting.
Design and implement secure container standards and automation to enable frictionless developer workflows.
Maintain Kubernetes security aligned with current best practices.
Build, deploy, and maintain security operations infrastructure using Python, Terraform, and Puppet.
Secure and enhance CI/CD pipelines.
Integrate and maintain code scanning platforms.
Develop dashboards and alerts from security metrics.
Own security projects: identify issues and implement solutions.
Apply critical analysis and problem-solving skills.

BASIC QUALIFICATIONS

Proven experience securing hybrid AWS/on-premises environments, including IAM and overall security posture.
Strong proficiency in Python, Terraform, and Puppet.
Certifications like CISA, CRISC, CGEIT, Security+, CASP+, or similar preferred.
Deep expertise in Kubernetes and container security.
Hands-on expertise building GitHub Actions and workflows.
Extensive experience with Prometheus, Grafana, CloudWatch, and Karma.
Well versed in management and integrations of Wazuh.
Hands-on experience with security scanning tools (Semgrep, Trivy, Falco).
Proactive mindset with strong ownership and problem-solving skills.
Excellent critical thinking and analytical abilities.
Located in the SF Bay Area or willing to relocate.

COMPENSATION AND BENEFITS

$180,000 - $440,000 USD

Base salary is just one part of our total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Similar jobs you might like

Technology

xAI

Backend Engineer - API

Senior

On-site

Palo Alto, CA

🏢 Summary: Engineering role focused on building and owning a high-throughput, low-latency API and backend infrastructure for large-scale model inference. The position involves designing and operating reliable, horizontally scalable distributed systems that serve billions of tokens per minute. You will develop and maintain model serving, routing, SDKs, and observability within a production-grade environment. 🗂️ Requirements: Expert knowledge of Rust or C++, Experience building and maintaining horizontally scalable distributed systems, Experience designing reliable high-availability production infrastructure, Knowledge of observability and reliability best practices, Experience operating PostgreSQL, Clickhouse, or MongoDB 📃 Skills: Rust, C++, Go, PostgreSQL, Clickhouse, MongoDB, gRPC, Docker, Kubernetes, TensorRT, vLLM, SGLang, REST, SDK 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: As an ideal candidate you have a good understanding of how highly scalable and reliable production infrastructure is built. Most of our backend infrastructure is written in Rust. So familiarity with a compiled language such as C++, Rust, or Go is highly beneficial. RESPONSIBILITIES: Build the xAI API that serves our models to developers worldwide Own the end-to-end system responsible for high-throughput inference, handling billions of tokens per minute with low latency and high availability, including model serving infrastructure, request routing, SDK development, rate limiting, observability, and efficient scaling BASIC QUALIFICATIONS: Expert knowledge of either Rust or C++ Experience in designing, implementing, and maintaining reliable and horizontally scalable distributed systems Knowledge of service observability and reliability best practices Experience in operating commonly used databases such as PostgreSQL, Clickhouse, and MongoDB PREFERRED SKILLS AND EXPERIENCE: Experience with LLM inference engines and serving frameworks (e.g., SGLang, TensorRT, vLLM) Experience designing or building with agent SDKs and agent orchestration frameworks Experience with Docker, Kubernetes, and containerized applications Expert knowledge of gRPC (unary, response streaming, bi-directional streaming, REST mapping) COMPENSATION AND BENEFITS $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Technology

xAI

Software Engineer - Platform Security

Mid

On-site

Palo Alto, CA

99,996 - 258,000 USD/yr

🏢 Summary: Software Engineer role focused on platform security, building AI-driven security tools and securing Kubernetes-based infrastructure and applications. The position involves designing scalable backend systems, identifying vulnerabilities, and driving secure engineering practices in a fast-paced environment. 🗂️ Requirements: 3+ years of experience in fast-paced technology environments, Expertise in Python, Rust, or Go, Experience building scalable tools or systems from scratch, Proficiency in scalable backend architecture design, Familiarity with security testing frameworks, Experience with Docker and Kubernetes, Knowledge of SBOM management and dependency scanning, Strong problem-solving and clean coding skills 📃 Skills: Python, Rust, Go, Kubernetes, Docker, Grok, BurpSuite, OWASP, SAST, DAST, SBOM 🏢 Description: SpaceXAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: We are seeking a talented and driven Software Engineer to join Platform Security team, where you will build cutting-edge security solutions to protect our Kubernetes-based infrastructure and advance secure AI-driven systems. In this role, you will design and implement AI-powered security tools, proactively address vulnerabilities, and champion secure engineering practices across the organization. Ideal candidates are passionate about impactful innovation, excel at writing clean, efficient code, and thrive in fast-paced environments to support SpaceXAI's mission of creating a trusted and secure global digital platform. RESPONSIBILITIES: - Design and build AI-driven security tooling and agents using Grok to identify, analyze, and mitigate vulnerabilities in the platform infrastructure and customer facing application(s) - Proactively identify security problems to solve and own the design and implementation end-to-end - Collaborate and be a security champion while driving technical decisions across the organization BASIC QUALIFICATIONS: - 3+ years of experience in fast-paced, high-impact environments, ideally at startups or tech-driven companies. - Expertise in Python, Rust, or Go, with strong problem-solving skills and a focus on clean, efficient code. - Certifications like CISA, CRISC, CGEIT, Security+, CASP+, or similar preferred. - Proven experience building tools or systems from scratch, with a focus on scalable solutions. - Proficiency in designing scalable backend architectures to support secure systems. - Familiarity with security testing frameworks (e.g., Burp Suite, OWASP ZAP, SAST/DAST). - Experience with Docker and Kubernetes for deploying and securing containerized applications. - Knowledge of software supply chain tools, including SBOM management and dependency scanning. PREFERRED SKILLS AND EXPERIENCE: - Experience developing AI-driven security tools or integrating AI into security workflows. - Familiarity with Kubernetes-based environments and securing cloud-native infrastructure. - Proven ability to drive technical decisions and influence security practices across teams. - A passion for challenging the status quo and building transformative security solutions. - Strong collaboration skills, with experience working in dynamic, cross-functional teams. - A sense of humor and adaptability to thrive in a fast-paced, mission-driven environment. COMPENSATION AND BENEFITS: - $100,000 - $258,000 USD - Base salary is just one part of the total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. SpaceXAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.

Technology

SpaceXAI

Member Of Technical Staff - Cloud Infrastructure

Senior

On-site

Palo Alto, CA

180,000 - 440,004 USD/yr

🏢 Summary: Senior Infrastructure Engineer role focused on building and operating secure, scalable AI infrastructure for US government projects across bare metal and classified cloud environments. The position involves Kubernetes-based infrastructure management, GPU cluster operations, observability, automation, and compliance-driven reliability engineering. This is an in-person role in Palo Alto or Washington, DC with up to 50% travel. 🗂️ Requirements: Active Top Secret (TS) security clearance, 5+ years of infrastructure or site reliability engineering experience, Experience building and maintaining scalable systems, Proficiency with Pulumi, Terraform, or Ansible, Deep knowledge of Kubernetes, CNI, CRI, and CSI, Experience with incident management and SLAs/SLOs, Strong communication and documentation skills, Ability to work in secure or government environments, Willingness to travel up to 50%, On-site availability in Palo Alto, CA or Washington, DC 📃 Skills: Kubernetes, CNI, CRI, CSI, Pulumi, Terraform, Ansible, GPU, Kyverno, ArgoCD, Go, IaC, Observability, SLA, SLO 🏢 Description: SpaceXAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: We are seeking a highly skilled Senior Infrastructure Engineer to join our US Government Team, focused on designing, building, and operating secure, scalable infrastructure for critical government projects. In this role, you will develop and manage training and inference clusters, as well as highly reliable applications, across bare metal, classified cloud, and hybrid cloud architectures. You will leverage your expertise in Kubernetes and GPU hardware to deliver robust, secure systems that support large-scale AI workloads while meeting stringent federal compliance requirements. This role demands a passion for automation, observability, and ensuring system integrity in a fast-paced, high-security environment. RESPONSIBILITIES: - Develop and optimize software to provision and manage xAI's infrastructure across on-premise, virtual machine, and classified cloud environments, enabling efficient scaling for US government initiatives. - Enhance the reliability, performance, and cost-effectiveness of infrastructure to support large-scale AI and application workloads in secure, classified settings. - Collaborate with xAI engineers to understand workload requirements and design tailored solutions that meet government-specific needs and compliance standards. - Implement robust observability, monitoring, and security practices to ensure the integrity, availability, and confidentiality of critical systems, adhering to federal protocols. - Manage storage infrastructure using Infrastructure-as-Code (IaC) tools such as Pulumi, Terraform, or Ansible, with a focus on secure data handling. - Drive system reliability through incident management, postmortems, and the definition of clear SLAs and SLOs, while maintaining security and compliance. - This is an in-person role based in Palo Alto, CA or Washington, DC, with up to 50% travel required. BASIC QUALIFICATIONS: - Active Top Secret (TS) security clearance. - 5+ years of experience as an Infrastructure Engineer, Site Reliability Engineer, or similar role, with a focus on building and maintaining reliable, scalable systems, preferably in secure or government environments. - Proficiency in managing storage infrastructure with IaC tools such as Pulumi, Terraform, or Ansible. - Deep understanding of the Kubernetes stack, including CNI, CRI, CSI, and related components. - Demonstrated ability to improve system reliability through incident management, postmortems, and defining SLAs/SLOs. - Excellent communication and documentation skills, with the ability to handle sensitive information concisely and accurately. PREFERRED SKILLS AND EXPERIENCE: - Deep familiarity with installing and using GPU hardware, including setting up drivers, debugging issues, and ensuring reliability. - Experience with high-traffic web or mobile application workloads, including optimizing Kubernetes for large-scale deployments in classified or federal settings. - Familiarity with chaos engineering, capacity planning, or similar practices for ensuring system resilience in government projects. - Proficiency with tools such as Kyverno, ArgoCD, or Go programming for infrastructure automation. - Strong sense of ownership, curiosity, and enthusiasm for tackling complex technical challenges in secure environments. - Passion for problem-solving and a proactive drive to deliver impactful results while adhering to security protocols. - Certifications in security-related fields (e.g., CISSP) or experience in secure federal environments. COMPENSATION AND BENEFITS: - $180,000 - $440,000 USD - Base salary is just one part of the total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. SpaceXAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.

Technology

xAI

Software Engineer - Kernels/CUDA (C++)

Senior

On-site

Seattle, WA

🏢 Summary: High-impact compute infrastructure role focused on building and optimizing massive GPU supercomputers for AI training and inference. The position involves low-level systems programming, GPU kernel optimization, Linux kernel internals, orchestration, and distributed infrastructure to improve scalability, reliability, and performance of AI workloads. 🗂️ Requirements: Deep systems programming experience in C/C++ or Rust, Experience with large-scale GPU clusters or distributed compute infrastructure, Hands-on GPU kernel optimization using CUTLASS, custom kernels, or Nsight, Knowledge of Linux kernel internals, scheduling, virtualization, or orchestration, Experience building high-performance AI training or inference infrastructure, Ability to optimize memory-bound and compute-bound workloads, Experience operating exabyte-scale storage systems 📃 Skills: CUDA, CUTLASS, Tensor, Nsight, Linux, KVM, Firecracker, Kubernetes, C++, Rust 🏢 Description: SpaceXAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: We are building one of the world's largest AI supercomputers from the ground up. As part of the Compute Infrastructure team, you will own both the raw GPU supercomputer and the platform layer that runs on top of it. You will work across the full stack — from low-level GPU kernel optimizations and Linux kernel internals to massive-scale orchestration and virtualization — to make training and inference at SpaceXAI as fast, reliable, and scalable as possible. This is a broad, high-impact role that combines hardcore supercompute and compute infrastructure work. Your contributions will directly accelerate Grok's training speed and overall AI progress. RESPONSIBILITIES: Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using CUTLASS, Tensor Cores, and Nsight for maximum performance Profile, debug, and eliminate bottlenecks across GPU memory hierarchy, networking fabric, filesystems, and multi-GPU operation Collaborate closely with AI research teams to deliver production-grade performance and scalability PREFERRED SKILLS AND EXPERIENCE: Deep low-level systems programming (C/C++/PTX/SASS) Strong experience with large-scale GPU clusters or distributed compute infrastructure at production scale Hands-on work with GPU kernel optimization (CUTLASS, custom kernels, Nsight profiling) Track record of building or running high-performance infrastructure for AI workloads (training or inference platforms) Ability to reason from first principles and optimize for both memory-bound and compute-bound scenarios COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at SpaceXAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.SpaceXAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Technology

xAI

Software Engineer - Networking Software and Services

Senior

On-site

Palo Alto, CA

🏢 Summary: Opportunity to build and scale automation-first network software and services supporting large-scale GPU supercomputing fabrics for AI training and inference. The role focuses on developing tools for network management, metrics collection, provisioning, monitoring, and auto-remediation while implementing Infrastructure as Code best practices. You will design highly scalable, reliable systems that orchestrate tens of thousands of network devices in production environments. 🗂️ Requirements: Deep experience working with network engineers and network topologies, Strong knowledge of physical and logical network architectures, Strong knowledge of network protocols, Proven experience designing scalable and reliable software systems, Experience building systems that orchestrate large-scale network devices, Ability to implement Infrastructure as Code best practices, Experience enhancing deployment pipelines, Ability to create and define meaningful metrics for prioritization, Strong communication skills 📃 Skills: Python, Go, TCP/IP, BGP, RDMA, IaC, Networking, Automation 🏢 Description: SpaceXAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: As part of the Network Software and Services for AI (nssAI) team at SpaceXAI, you'll build cutting-edge software, services, and frameworks to empower our Network Development Engineers. Working hands-on, you'll tackle all facets of network management—metric collection, configuration, zero-touch provisioning, monitoring, and auto-remediation—driving automation-first solutions for SpaceXAI's production and ancillary networks. Expect to develop extensible tools, streamline complex processes, and ensure rock-solid reliability to support SpaceXAI's mission of accelerating human scientific discovery through AI. RESPONSIBILITIES: Building software and tools with extensive metrics coverage for some of the world's largest GPU supercomputing network fabrics used for AI training and serving customer inference queries. Implement IaC best practices, enhancing deployment pipelines, and ensuring robust, secure service delivery across our production environments. BASIC QUALIFICATIONS: Deep experience collaborating with network engineers daily using extensive knowledge of network topologies, physical and logical, and network protocols. Expert knowledge and proven history with designing scalable and reliable software from the ground up that can build and orchestrate tens of thousands of network devices at lightning speeds. Ability to thrive in ambiguity, creating metrics that will help prioritize the focus of the team and your own. COMPENSATION AND BENEFITS: $150,000 - 250,000k Base Base salary is just one part of our total rewards package at SpaceXAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.SpaceXAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Technology

xAI

Sr. Software Engineer (Data Center Automation)

Senior

On-site

Palo Alto, CA

🏢 Summary: Senior Software Engineer role focused on building automation and observability solutions to enhance reliability across multi-data center AI infrastructure. The position combines strong programming skills with hands-on data center and Linux systems expertise to minimize downtime and optimize performance. It involves developing scalable services, improving monitoring and incident response, and collaborating across infrastructure and facility teams. 🗂️ Requirements: Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering or related field (or equivalent experience), 3+ years experience in SRE, Infrastructure, DevOps, or Systems Engineering in large-scale production environments, Strong production experience in Python, Solid Linux systems administration and kernel-level knowledge, Experience with containerization and orchestration (Docker, Kubernetes or similar), Experience implementing observability solutions (metrics, logging, tracing, monitoring, alerting), Understanding of networking fundamentals (TCP/IP, routing, DNS, redundancy), Experience troubleshooting distributed systems, hardware and network issues, Experience with on-call rotations and incident response practices (SLAs, error budgets), Ability to collaborate with cross-functional technical teams 📃 Skills: Python, Rust, Linux, Kubernetes, Docker, Prometheus, Grafana, TCP/IP, DNS, Scripting, Automation, Observability, Monitoring, Tracing, Networking 🏢 Description: ABOUT xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: We are seeking a highly skilled Sr. Software Engineer to join our team in managing and enhancing reliability across a multi-data center environment. This role focuses on automating processes, building and implementing robust observability solutions, and ensuring seamless operations for mission-critical AI infrastructure. The ideal candidate will combine strong coding abilities with hands-on data center experience to build scalable reliability services, optimize system performance, and minimize downtime—including close partnership with facility operations to address physical infrastructure impacts. In an era where AI workloads demand near-zero downtime, this position plays a pivotal role in bridging software engineering principles with physical data center realities. By prioritizing automation and observability, team members in this role can reduce mean time to recovery (MTTR) by up to 50% through proactive monitoring and automated remediation. The primary objective of this team is to mitigate downtime and minimize impact to end-users from both scheduled and unscheduled maintenance, as well as events affecting onsite data centers. This is achieved through proactive automation, robust observability, and integrated software-physical reliability strategies. RESPONSIBILITIES: - Design, develop, and deploy scalable code and services (primarily in Python and Rust) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning. - Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers. - Collaborate with cross-functional teams to identify reliability bottlenecks and automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation. - Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs. - Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration, and scripting for automation. - Understand network topologies and concepts in large-scale, multi-data center environments to troubleshoot connectivity, routing, redundancy, and performance issues. - Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives. - Mentor junior team members and document processes to foster a culture of automation and knowledge sharing. BASIC QUALIFICATIONS: - Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience). - 3+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering in large-scale, distributed, or production environments. - Strong programming skills with proven production experience in Python; experience with Rust or another systems-level language (e.g., Go, C++) is essential. - Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments. - Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes. - Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools, alerting, and dashboards. - Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors. - Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments. - Experience participating in on-call rotations, incident response, post-incident reviews, and reliability practices such as error budgets or SLAs. - Ability to collaborate effectively with cross-functional teams. PREFERRED SKILLS AND EXPERIENCE: - 5+ years of experience in SRE or infrastructure roles in hyperscale, cloud, or AI/ML training environments with multi-data center setups. - Hands-on experience operating or scaling Kubernetes clusters at large scale, including automation for provisioning and high availability. - Proficiency in Rust for systems programming and performance-critical components. - Experience integrating software reliability tools with physical data center infrastructure (power, cooling, environmental monitoring). - Experience building automated remediation, fault tolerance, disaster recovery, capacity planning, or predictive failure detection systems. - Background in optimizing Linux-based systems for AI workloads, GPU clusters, or high-throughput compute environments. - Experience with bare-metal provisioning, data center interconnects, or hybrid/multi-site failover mechanisms. - Mentoring experience and strong documentation skills. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Technology

xAI

Exceptional Software Engineer

Senior

On-site

Seattle, WA

🏢 Summary: This role involves working on the most critical product or technical challenges within a cutting-edge AI organization focused on building truth-seeking AI systems. The position requires direct, hands-on contribution to high-impact engineering problems in a fast-paced, meritocratic environment. Candidates will tackle core technical initiatives aligned with advancing AI capabilities. 🗂️ Requirements: Exceptional software engineering skills, Proven ability to solve complex technical problems, Hands-on experience in building and delivering software systems, Ability to work on high-priority product or technical challenges, Strong technical communication skills 📃 Skills: Software, Engineering, AI, Programming, Systems 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: You will work on the most critical product or technical challenge at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You believe truth-seeking AI is the most important and challenging problem. You take pride in your work and thrive in meritocratic environments. You are an exceptional software engineer. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Technology

xAI

Exceptional Software Engineer

Senior

On-site

Palo Alto, CA

🏢 Summary: Opportunity for an exceptional software engineer to tackle the most critical product or technical challenges within a high-impact AI environment. The role involves hands-on contribution to core systems and direct influence on cutting-edge AI development. 🗂️ Requirements: Exceptional software engineering ability, Proven experience building and delivering complex software systems, Ability to work on high-priority technical challenges, Hands-on development experience in production environments, Strong technical communication skills 📃 Skills: Programming, Algorithms, DataStructures, Systems, AI, Testing, Debugging 🏢 Description: About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.ABOUT THE ROLE: You will work on the most critical product or technical challenge at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You believe truth-seeking AI is the most important and challenging problem. You take pride in your work and thrive in meritocratic environments. You are an exceptional software engineer. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Technology

xAI

Software Engineer– X Core Product

Mid

On-site

Palo Alto, CA

🏢 Summary: Software Engineer role focused on building and scaling a high-volume platform serving 600M+ users, with ownership across backend services, infrastructure, AI integrations, and fullstack features. The position involves designing low-latency distributed systems, collaborating across mobile and web teams, and driving architecture, scalability, and reliability decisions. 🗂️ Requirements: 2+ years of experience with large-scale consumer applications, Proficiency in distributed systems, Experience with high-scale, low-latency environments, Knowledge of Rust, Knowledge of Go, Knowledge of Python, Knowledge of Java, Experience with high-volume streaming systems, Ability to build backend services and APIs, Experience with microservices architecture 📃 Skills: Rust, Go, Python, Java, APIs, Microservices, DistributedSystems, Streaming, iOS, Android, Web, Backend, Fullstack, Analytics, AI 🏢 Description: SpaceXAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: As a Software Engineer for X Platform, you'll join the thirty person team responsible for building and scaling X. You will be tasked with independently owning significant parts of the system end-to-end: from intuitive user interfaces to robust backend services, data infrastructure, and deep AI integrations. This is a unique opportunity to impact while creating products that are loved by 600M+ users globally. RESPONSIBILITIES: - Develop backend services, APIs, and data models to support high-volume, multi-user environments. - Work with iOS, Android & Web client engineers to ship products. - Design robust infrastructure and microservices for payments, transactions, growth, monetization, and engagement across platforms. - Build and maintain fullstack features, including user dashboards, personalized experiences, content delivery, interactive tools, assessments, and real-time analytics. - Lead architecture, scalability, and reliability decisions for high-concurrency, low-latency systems. - Uphold engineering excellence via testing, monitoring, deployment, and secure data handling. BASIC QUALIFICATIONS: - Proficiency in distributed systems for high-scale, low-latency environments; languages like Rust, Go, Python & Java, and high volume streaming systems. - 2+ years of experience working on large scale consumer applications. PREFERRED SKILLS AND EXPERIENCE: - 5+ years of experience working on large scale consumer applications or early-mid stage startup experience as a founding engineer, emphasizing rapid prototyping, user-centric design, and AI solutions. COMPENSATION AND BENEFITS: - $180,000 - $440,000 USD - Base salary is just one part of the total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. SpaceXAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.

Technology

SpaceXAI

Software Engineer– X Core Product

Mid

On-site

Palo Alto, CA

129,996 - 234,996 USD/yr

🏢 Summary: Software Engineer role focused on building and scaling a high-volume consumer platform with end-to-end ownership across frontend, backend, infrastructure, and AI integrations. The position involves developing distributed systems, microservices, analytics, and real-time features for products used by hundreds of millions of users. Candidates should have experience with large-scale applications, low-latency systems, and modern backend technologies. 🗂️ Requirements: 2+ years experience with large-scale consumer applications, Proficiency in distributed systems, Experience with high-scale low-latency environments, Experience with backend services and APIs, Experience with high-volume streaming systems, Strong communication skills, Ability to work in high-concurrency environments 📃 Skills: Rust, Go, Python, Java, APIs, Microservices, iOS, Android, Web, AI, Streaming, Analytics 🏢 Description: SpaceXAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: As a Software Engineer for X Product/Platform, you'll join the thirty person team responsible for building and scaling X. You will be tasked with independently owning significant parts of the system end-to-end: from intuitive user interfaces to robust backend services, data infrastructure, and deep AI integrations. This is a unique opportunity to impact while creating products that are loved by 600M+ users globally. RESPONSIBILITIES: - Develop backend services, APIs, and data models to support high-volume, multi-user environments. - Work with iOS, Android & Web client engineers to ship products. - Design robust infrastructure and microservices for payments, transactions, growth, monetization, and engagement across platforms. - Build and maintain fullstack features, including user dashboards, personalized experiences, content delivery, interactive tools, assessments, and real-time analytics. - Lead architecture, scalability, and reliability decisions for high-concurrency, low-latency systems. - Uphold engineering excellence via testing, monitoring, deployment, and secure data handling. BASIC QUALIFICATIONS: - Proficiency in distributed systems for high-scale, low-latency environments; languages like Rust, Go, Python & Java, and high volume streaming systems. - 2+ years of experience working on large scale consumer applications. PREFERRED SKILLS AND EXPERIENCE: - 5+ years of experience working on large scale consumer applications or early-mid stage startup experience as a founding engineer, emphasizing rapid prototyping, user-centric design, and AI solutions. COMPENSATION AND BENEFITS: - $180,000 - $440,000 USD - Base salary is just one part of the total rewards package, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. SpaceXAI is an equal opportunity employer. For details on data processing, view the Recruitment Privacy Notice.

xAI

xAI is an artificial intelligence company focused on developing advanced AI systems designed to accurately understand the universe and support humanity’s pursuit of knowledge. Operating within the AI and technology industry, the company emphasizes engineering excellence, innovation, and scientific curiosity. xAI maintains a small, highly motivated team and a flat organizational structure, encouraging hands-on contribution, initiative, and direct impact from all members. The organization values strong communication, rigorous work ethic, and a commitment to continuous learning, fostering a culture where leadership is earned through demonstrated excellence and proactive problem-solving.

Check if your resume is ATS-ready before applying →Build an ATS-optimized resume