As a Software Engineer for the Query teams, you will be responsible for contributing to the technical vision for how we query the data. You will operate the query engine inside the event platform, as well as build distributed, high-throughput and low-latency systems with a strong focus on availability, resilience and durability. Working closely with our product managers, you will build new features and scale our systems to keep up with the demands of our growing business.
Datadog's Distributed Systems teams build and run the intake, storage, and query engines for every dashboard and alert for Datadog’s customers. The organization is split into 3 major groups; the Metrics group processing time series data, the Events team processing event data, and the Resources group processing high cardinality metadata.
Datadog is growing rapidly, and our Distributed Systems teams are at the core of our growth: today we ingest over 10 trillion points a day, each of which we make available for query in seconds. They're building a platform which is running in the cloud, is always on, at low latency and high throughput.
The Query teams are responsible for the query API that sits on top of the Metrics, Events and Resources. It allows our frontend and public API to run complex queries on our data. At its core, we have an engine that parses, plans, and distributes these queries, and then aggregates these into final results. The team is responsible for the evolution of these production systems to allow for more powerful queries that run across distributed storage layers.
- Code (in Go, Rust or Java) new and existing services to scale out our events platform pipelines
- Contribute to the design of the query API architecture and surrounding systems
- Debug and solve challenging cross-systems issues in production
- Help improve our engineering tooling and practices
- You have been building applications for 4+ years and know the systems you’ve worked on from top to bottom
- You have backend programming experience
- You have architected, built, and operated distributed systems to solve problems at high scale
- You have a BS/MS/PhD in a scientific field or equivalent experience
- You want to work in a fast-paced, high-growth startup environment that respects its engineers and customers
- You've worked at high scale with systems like Akka, Redis or Kafka
- You’ve written your own data pipelines before
- You have a strong background in statistics
- You have significant experience with Go, Rust or a JVM based language
This is a remote position
Datadog is the monitoring and security platform for cloud applications. Our SaaS product is used by organizations of all sizes across a wide range of industries to enable digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stack, allowing for seamless collaboration and problem-solving among Dev, Ops and Security teams globally. Given the resilience of cloud technologies and importance placed today in digital operations and agility, Datadog continues to innovate and is well positioned for the long term.