Senior / Principal Infrastructure Engineer - ML Platform

See more jobs from Roblox Corporation

17 days old

Apply Now

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. 

At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there. 

A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

ML Platform @ Roblox today supports hundreds of ML use cases and billions of inferences per day across Discovery, Safety, Engine, and much more. As an Infrastructure Engineer on ML Platform, you will be responsible for bootstrapping, scaling, and maintaining infrastructure for the entire ML platform. We are looking for accomplished engineers to help build the next generation of ML Ecosystem Tooling.

You Will:

  • Bootstrap and maintain Kubernetes and Cloud infrastructure for ML Platform components--Serving Layer, Metadata Store, Model Registry, and Pipeline Orchestrator.
  • Set technical strategy and oversee development of high scale and reliable infrastructure systems.
  • Propose and implement new platform tooling to improve time to production for MLEs and Data Scientists across the full ML lifecycle.
  • Work on infrastructure projects such as GPU fleet management, hybrid-cloud orchestration, and writing custom Kubernetes controllers and resources.
  • Stay abreast of industry trends in machine learning and infrastructure to ensure the adoption of leading-edge technologies and practices.
  • Partner across organizations to build tooling, interfaces, and visualizations that make the ML@Roblox a delight to use.

You Have:

  • 4+ years of professional experience and a tool chest of system design experience upon which to draw to build scalable, reliable platforms for all of Roblox.
  • Experience running and managing Kubernetes at scale – e.g. 100s-1000s of nodes, serving 100k+ QPS – and ideally have written your own Kubernetes controllers
  • Bachelor's degree in Computer Science, Computer Engineering, Data Science, or a similar technical field.

You Are:

  • Proficient in DevOps tooling such as Docker, Kubernetes, CI/CD systems, and bootstrapping cloud infrastructure (AWS, GCP, etc.)
  • Experienced with the end-to-end ML model lifecycle such as model serving, training, model CI/CD, and GPU resources management, and have built ML platform features that are delightful to use.
  • An automation advocate: you're passionate about infrastructure-as-code and automating painful manual processes.
  • A reliability nut: you love digging into tricky postmortems and identifying weaknesses in complicated systems.
  • Passionate about supporting internal partners (data scientists and ML Engineers) to meet and understand their needs.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits.

Annual Salary Range
$233,840$283,780 USD

Roles that are based in our San Mateo, CA Headquarters are in-office Tuesday, Wednesday, and Thursday, with optional in-office on Monday and Friday (unless otherwise noted).

You’ll Love: 

  • Industry-leading compensation package
  • Excellent medical, dental, and vision coverage
  • A rewarding 401k program
  • Flexible vacation policy (varies by exemption status)
  • Roflex - Flexible and supportive work policy 
  • Roblox Admin badge for your avatar
  • At Roblox HQ: 
    • Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
    • Onsite fitness center and fitness program credit
    • Annual CalTrain Go Pass

Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Roblox also provides reasonable accommodations for all candidates during the interview process.