Job Overview
We are looking for a Staff Engineer to help us scale Datadog’s Alerting Platform, which is responsible for the core systems that define and schedule monitors, create alerts, and ensure the accuracy and timeliness of the end to end alerting process across the platform.
This is a unique opportunity to contribute to one of the most critical platforms at Datadog. Customers can configure monitors and generate alerts for virtually every product in our unified platform. It’s imperative that we maintain our customers’ trust by delivering these notifications reliably. In practice, this means the alerting platform has to be the most reliable platform.
As we grow we have to design systems that can degrade furthermore while still ensuring the best customer experience without breaking. This staff engineer will focus on two critical components: the alerting scheduler, responsible for scheduling the timely evaluation of millions of monitors each minute, and the state processor that makes the critical decision about when a transition in monitor state has occurred. These distributed systems are tied together, one being the consumer (state machine) of the other (scheduler). The reliability and fault tolerance of these systems together, and across the entire alerting platform, is critical to Datadog’s customer trust and business success. Upcoming initiatives to achieve an order of magnitude increase in reliability will require deep changes to these complex systems.
At Datadog, we place value in our office culture – the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.
What You’ll Do:
Design and drive high priority, high visibility projects that increase the platform’s resilience and scalability across multiple teams Lead and guide others through architectural decisions for new and existing distributed, high-throughput, real-time systemsIdentify potential system risks and trends in reliability, and design solutions to address themProvide input on prioritization of engineering-led initiatives in short- and long-term planning and roadmapsCollaborate closely with partner platforms that integrate and depend on the alerting platform to provide critical capabilities to their customers
Who You Are:
You have led cross-team initiatives in a platform or infrastructure-focused environment for 2+ years. You have led impactful technical initiatives in an environment where performance, reliability, and accuracy are first-order concernsYou have a reliability-oriented mindset and care deeply about designing and building resilient architecturesYou have significant back end programming experienced and have architected, built, and operated distributed systems to solve problems at high scale
Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That’s okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.
Benefits and Growth:
New hire stock equity (RSUs) and employee stock purchase plan (ESPP)Continuous professional development, product training, and career pathingIntradepartmental mentor and buddy program for in-house networkingAn inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)Access to Inclusion Talks, our internal panel discussionsFree, global mental health benefits for employees and dependents age 6+Competitive global benefits
Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.
About Datadog:
Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another. Learn more about #DatadogLife on Instagram, LinkedIn, and Datadog Learning Center.
Equal Opportunity at Datadog:
Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Here are our Candidate Legal Notices for your reference.
Your Privacy:
Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.
Job Detail
Related Jobs (2180)
-
Machine Learning Engineer – REMOTE on December 21, 2024
-
Blockchain Engineer – REMOTE on December 19, 2024
-
Research and Development Engineer (DeFi, Distributed Systems) – REMOTE on December 16, 2024
-
Senior Compiler Engineer – REMOTE on December 13, 2024
-
Senior Cryptography Engineer – REMOTE on December 12, 2024
-
AI & Data Scientist Intern – REMOTE on December 22, 2024
-
Senior Data Analyst on December 6, 2024
-
Programmatic Senior Analyst on December 6, 2024
-
Survey Methodologist on December 6, 2024
-
Data Analyst on December 6, 2024