August 30, 2025

Senior Applied Research Scientist, Multimodal Retrieval

Senior • Hybrid • On-site • Remote

$224,000 - $356,500/yr

Santa Clara, CA

NVIDIA’s Retriever team is seeking a Senior Applied Research Scientist with experience researching, developing, and deploying deep learning models at scale across a range of modalities. You’ll join a team of Applied Research Scientists, Machine Learning and MLOps Engineers working on the next generation of retrieval pipelines for RAG, with a focus on the ingestion of modalities beyond text.

At NVIDIA we’re building the framework upon which production RAG systems are based. We have contributed to top research models in the text embedding space, topping the MTEB leaderboard, Vidore V1/V2 and have developed commercially viable versions of these models for use in production systems by our customers. Come be a part of our world-class team building the future of Retrieval.

What you’ll be doing:

Working with our team of researchers to develop efficient and performant models and pipelines that extract text content from images, video, audio and other modalities.
Building vision pipelines for document ingestion, including page layout analysis, object detection, and OCR.
Exploring and crafting datasets, metrics, experiments, and validation scripts to develop standard methodologies for research. These methodologies will offer customers clear guidance on which models and pipelines to apply in specific contexts.
Helping ML Engineers scale pipelines to production capability through the development of NVIDIA Inference Microservices (NIMs) and blueprints which demonstrate how to deploy NIMs in a pipeline effectively.
Writing papers, blog posts, documentation and trainings that help customers understand and take advantage of our research.
Keeping up to date with the latest developments in Retrieval across academia and industry.

What we need to see:

Candidates with a Master's, Ph.D. or equivalent experience in retrieval or multimodal research are preferred, along with a track record of publication in leading conferences like CVPR, ICCV, ECCV, KDD, etc.
Hands-on experience developing computer vision models and pipelines, with preference for document-focused tasks such as layout analysis, table or figure detection, and OCR. Competitive results in computer vision competitions on Kaggle or similar platforms is a plus.
An understanding of the state of the art in retrieval research, with a focus on multimodal content retrieval.
10+ years of experience developing multimodal systems across a range of models and platforms. Information retrieval experience is a big plus.
Knowledge of best practices in batching, streaming, and scaling of ingestion pipelines to support real-world applications.
Excellent Python programming skills and a strong understanding of the Python deep learning ecosystem (PyTorch, Tensorflow, MXNet, etc).
An ability to share and communicate your ideas clearly through blog posts, papers, kernels, GitHub, etc.
Strong communication and interpersonal skills are essential, as well as the capability to collaborate within a dynamic, distributed team. A history of mentoring junior engineers and interns is a plus.

Location is flexible and the team is remotely situated, focusing on NA/EU time zones.

GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with most major technology providers and support a broad range of Fortune 500 companies in their machine and deep learning needs. With deep learning, we can teach AI to do almost anything. New internet services, like Google Assistant, have learned speech from sound and provide a more natural way to access information. Self-driving cars use deep learning to recognize the space the car inhabits, the lanes in which it drives, and the objects to avoid. In healthcare, neural networks trained with millions of medical images can find clues in MRIs that until now could only be found through invasive biopsies. In recommendation systems, we learn how to understand users' desires and serve them what they truly are looking for. These are just a few examples. AI will spur a wave of social progress unmatched since the Industrial Revolution.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until September 2, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Nvidia

NVIDIA Corporation founded in 1993 by Jen-Hsun Huang, Chris Malachowsky, and Curtis Priem, NVIDIA Corporation has carved out a leading position in the technology industry. Based in Santa Clara, California, NVIDIA is renowned for its GeForce series of GPUs, which cater to both gaming and professional applications. The company's innovative graphics processing units are integral to various sectors, from gaming to machine learning and data centers. As a frontrunner in the semiconductor industry, NVIDIA continues to leverage emerging technologies like AI and machine learning to stay ahead of the curve.