Remote Job Description

The Netflix Open Connect Content Delivery Network is our in-house custom built network and server infrastructure responsible for streaming all of your favorite movies and series.  We strive to deliver a great Netflix viewing experience in over 190 countries so our customers can watch whatever, whenever, interruption free.  We are seeking seasoned Reliability Engineers with extensive experience in *nix, networking, data analysis and large-scale service operations experience to design, scale, operate, automate, and analyze our globally distributed CDN.  Come join us and play a meaningful role in our journey to entertain the world!


Spotlight on CDN Reliability Engineering Teams in Open Connect:
Open Connect CDN Reliability Engineering Team is responsible for end-to-end operations, availability, reliability, scalability, and the quality of experience delivered from Netflix’s Open Connect Services. Working with Netflix engineering teams and external partners, with a focus on finding ways to improve the design and operation of the OC services, to make them more scalable, reliable, efficient and secure.

Responsibilities:

  • Drive continual improvement in resilience, quality of experience, security, monitoring, instrumentation and automation with the primary goal to maintain highly scalable and reliable CDN services worldwide
  • Aggregate, analyze, and correlate large amounts of server and application performance data. Use the innovative Netflix Big Data platform as a highly flexible, specialized and efficient toolset for service delivery optimization and system reliability improvements
  • Provide technical design, deployment and engineering assistance to ISP partners to integrate our Open Connect Appliances
  • Handle Tier 3 escalation for service delivery production issues
  • Have lots of discussions about all the great content and your favorite movies and series

Qualifications:

  • Service Reliability/Operational experience running large scale high performance systems & Internet services
  • Knowledge of and proven experience with CDNs and HTTP cache/proxy technologies
  • Expert-level knowledge of Unix or Linux system administration at scale. We happen to use FreeBSD
  • Strong scripting and automation skills (Python, Perl, Go)
  • Knowledge of networking concepts and application protocols, especially TCP/IP, BGP, HTTP/S and DNS
  • Some experience with container and container orchestration technologies (Docker, Kubernetes)
  • Ability to work in a highly collaborative environment and to communicate effectively with internal and external partners
  • Preferred - BS in Computer Science, Electrical Engineering or Computer Engineering (or equivalent professional experience)

Highlighted Roles:

  • CDN Site Reliability Engineer - Quality of Experience (QoE), Resilience and Security
  • CDN Site Reliability Engineer -  Platform and Automation


CDN Site Reliability Engineer - Quality of Experience (QoE), Resilience and Security

  • Focusing on instrumenting, monitoring, reporting, and enhancing our product experience. 
  • Analyzing dependencies, validating service behavior during failure, and improving the resilience of our distributed services and applications. 
  • Contribute to ongoing security initiatives related to server and infrastructure security, monitoring, and DDoS mitigation. 

CDN Site Reliability Engineer - Platform and Automation

  • Influence innovation through rapid automation,  all while maintaining high reliability
  • Support infrastructure at scale through operational initiatives, maintenance, upgrades, and/or repairs
  • Analyze operational insights, identifying issues/opportunities, and work cross functionally to address those opportunities