Performance Machine Learning Engineer

Replicate

San Francisco Bay Area
Post Date: October 6, 2024
Applications 0
Views 1

Job Overview

logo You’re an engineer who lives and breathes high-performance machine learning. You have a deep understanding of how to make AI models run faster and more efficiently, and you’re excited about pushing the boundaries of what’s possible with current hardware.
At Replicate, we’re building the fastest way to deploy machine learning models. Your role will be crucial in optimizing the performance of the diverse range of models we host, ensuring they run as efficiently as possible on our infrastructure.
We’re looking for the right person, not just someone who checks boxes, so you don’t need to satisfy all of these things. But, you might have some of these qualities:Strong applied engineering skills. You’ve deployed machine learning models in scaled-up production environments and know the challenges that come with it.Deep expertise in CUDA programming and GPU acceleration techniques. You can write custom kernels in your sleep.Proficiency in C++ and Python. You’re comfortable diving deep into low-level optimizations and high-level model architectures alike.Extensive experience with deep learning frameworks like Torch or JAX. You know their strengths, weaknesses, and how to squeeze every ounce of performance out of them.A solid grasp of machine learning algorithms, especially with a focus on diffusion models, large language models, or other generative AI techniques.Familiarity with model quantization techniques, distillation, model pruning, etc. You understand the tradeoffs and know when to apply which technique.You stay up-to-date with the latest developments in ML performance optimization. When a new technique drops, you’re already thinking about how to implement it.
You might be particularly good for this job if:You’ve written custom CUDA kernels to significantly improve model latency and can share war stories about the process.You can discuss the tradeoffs between fp8 and int8 quantization in depth, and have applied either (or both) to whatever hot new model dropped last week.You get excited about diving into academic papers on ML optimization techniques and turning them into practical, production-ready code.

Job Detail

Shortlist Apply Now

Apply with Linkedin Never pay anyone for job application test or interview.

Related Jobs (3232)

Machine Learning Engineer – REMOTE on December 21, 2024
Senior Demo Engineer – REMOTE on December 15, 2024
Senior Cryptography Engineer – REMOTE on December 12, 2024
Senior Data Analyst on December 6, 2024
Programmatic Senior Analyst on December 6, 2024
Data Analyst on December 6, 2024
Survey Methodologist on December 6, 2024
Business Intelligence und Data Analytics Consultant (m/w/d) on December 6, 2024
Virtual Data Analyst / Entry level (Remote) on December 6, 2024
Data Analyst on December 6, 2024

Safety Information

Safety Tips For Candidate

DYOR.
Safety Tips For Candidate

Always check the employer\'s offer.