Remote Job Description

The Compute and Storage Infrastructure (CSI) team provides foundational infrastructure services to other engineering teams at Netflix. We create solutions with high leverage that multiply the productivity of other teams. Our platforms act as an enabler for media processing teams. 

One of our main products is Stratum, a large-scale, next generation serverless function platform designed to handle media specific computational tasks. Stratum is the foundation of the Cosmos platform for media processing. Our team’s products are critical to Netflix— every video in the Netflix catalog has been processed by Stratum or one of its predecessors. Stratum is the primary compute platform for most engineers at Netflix in the media processing space. Stratum uses another complementary product that we develop, MezzFS, a FUSE based solution for efficiently accessing large files in S3.  We also develop Nirvana, an observability solution for media processing workloads that run on top of Stratum.  Due to CSI’s high scale, the impact of even minor improvements to the efficiency or developer experience of our products is enormous.

CSI is part of a media-focused engineering group which provides highly available infrastructure for content production and processing across all Netflix productions and licensed content. Infrastructure pieces like massive scale media processing platforms (1, 2), workflows (Conductor), media asset management, collaboration, reporting, data movement, and data processing are some of the key services we build. All of this is custom built on top of Amazon Web Services (AWS) infrastructure.

About the role

As an engineer on CSI you will help us build and grow innovative solutions in the media compute space. You’ll work on resource scheduling in a distributed polyglot compute platform running at massive scale. You’ll gain exposure to building observable, efficient, highly available and fault tolerant systems. In this role you will have the opportunity to drive direction, own development end-to-end, manage stakeholder relationships, provide actionable feedback and insights to colleagues, and create technical solutions at scale.

About You

You are self-motivated and can work independently, while also being able to partner closely with other engineers on a project. You are passionate about building quality products and want to own development and operations end-to-end, leading with the right architecture, and following sound engineering principles to deliver maintainable, performant and a predictable experience. You are a problem solver and like to challenge yourself, but you are not afraid to reach out when you need help, and enjoy helping other engineers.

Strongly Preferred Skills:

  • Experience operating production systems to a high degree of operational excellence e.g. as an SRE or developer with a strong ops focus
  • A demonstrated passion for developer experience and developer productivity.
  • A background in distributed systems

Nice to haves:

  • Experience evangelizing new platforms and driving adoption of your team’s tooling. It’s a bonus if you’re a developer or power user of batch compute, PaaS or FaaS solutions or other dev tooling where developer experience is a priority.
  • Python experience
  • Some level of full stack experience and willingness to do fullstack work, even if not an expert in UI
  • Experience in roles where high customer empathy was required