New offer - be the first one to apply!
September 17, 2025
Senior • Hybrid • On-site • Remote
$184,000 - $287,500/yr
Santa Clara, CA
Join our NVIDIA Solutions Architecture team to support the Enterprise Products team as a Senior Network Engineer, where your passion and expertise in networking, compute hardware, storage, and cloud-native software will be pivotal. We are on the lookout for a multifaceted professional with a profound understanding of large network design, distributed systems, and datacenter architecture. As a key member of our team, you will engage in a collaborative, multi-disciplinary approach to craft scalable datacenter implementations for enterprise-grade AI systems. Additionally, key aspects of your role will be the automation of repetitive networking activities.
Your role involves translating high-level goals into detailed specifications for platforms and datacenter architectures, leading to the development of robust implementations. Collaborating with other specialists in compute hardware, software, and storage domains, you will be developing, validating, and profiling reference cluster designs specifically tailored for enterprise datacenter environments. At the core of our mission is the building and validation of on-prem cloud-ready solutions that seamlessly interoperate across various cloud service providers (CSPs), enabling the realization of hybrid enterprise AI solutions. If you are ready to embark on a journey that combines innovation, collaboration, and groundbreaking technology, then this opportunity is tailored for you.
What you'll be doing:
Own the deployment of scalable datacenter networking for enterprise AI/ML systems.
Deploy and validate cluster designs, optimizing them for enterprise facilities.
Collaborate closely with other experts in network, compute, software, and storage to drive innovation.
Lead multi-disciplinary projects, addressing high-level goals and complex challenges.
Engineer on-premises cloud-native solutions that flawlessly integrate with diverse cloud providers.
Assume a pivotal role for the compute and hardware architecture domain, driving expertise and excellence.
Showcase a multidisciplinary understanding of Ethernet, InfiniBand, data center LAN (local area networking), WAN (wide area networking), and SD (software-defined) networks.
Conduct TCO analysis, optimizing datacenter efficiency for cost-effectiveness.
Finding opportunities for operational improvements and collaborating with teams to build solutions that improve excellence and sustainability in network operations.
What we need to see:
Bachelor's degree or equivalent experience, with 8-10+ years in hardware or infrastructure architecture.
Proven expertise in designing and deploying on-prem cloud-native platforms, with deep understanding of scaling and resilience at chassis, rack, cluster, and data center levels.
In-depth knowledge of networking protocols and technologies including Ethernet, TCP/IP, VLAN, VXLAN, BGP, EVPN, MPLS, QoS, and Infiniband. Skilled in evaluating, designing, and optimizing complex network architectures for performance, security, and resilience.
Extensive experience with optical networking and cabling, fiber types, and transceiver modules (SFP/SFP+, QSFP, OSFP), including their signal modulation, FEC, and compatibility with multiple switch platforms and software configurations.
Strong grasp of cloud-native systems with emphasis on high availability, scalability, and security in compute environments. Demonstrated system-level thinking to enhance reference designs.
Hands-on experience with infrastructure as code and monitoring tools: Base Command Manager (BCM), Ansible, Terraform, Grafana, Prometheus.
Proficient with Linux (including Cumulus OS), and scripting languages such as Python and Bash.
Familiarity with NVIDIA networking products including Mellanox switches, Cumulus Linux, BlueField DPUs, and Infiniband technologies.
Demonstrated leadership in cluster design, especially in networking, security, and remote access management. Experienced in working independently and with distributed teams across time zones. Collaborates closely with SMEs to ensure swift production issue resolution and maintain customer satisfaction.
Strong written and verbal skills for effectively communicating complex technical concepts to diverse audiences. Capable of creating clear documentation including Methods of Procedure (MoPs) and deployment guides.
Ways to stand out from the crowd:
Broad experience across Networking, Compute, Storage, and Platform Sizing, with a focus on Infrastructure Cost Optimization and TCO analysis for datacenter environments.
Strong understanding of network topologies, load balancing, and congestion control algorithms; experienced in both practical and standards-based approaches, including engagement with open-source communities.
Proficient in Python with a personal GitHub showcasing relevant projects. Skilled in Kubernetes, Docker, and performance monitoring tools such as Grafana, Prometheus, and Datadog.
Hands-on experience with networking simulators including NVIDIA Air, GNS3, and EVE-NG, for digital twin and virtual network testing.
Strong collaboration and communication skills, maintaining an accountable work ethic passionate about achieving goals.
You will also be eligible for equity and benefits.