Checkly is looking for an experienced Site Reliability Engineer. This is a great opportunity to join an early stage company, influence the product roadmap and help us do what we love most: building the best monitoring platform for developers.

Make our reliability product more reliable

Checkly is — in essence — a reliability company. People trust our software to alert them when their software goes “poof”. We use AWS Lambda/SQS/SNS/S3, Heroku, Postgres, Redis and soon ClickHouse to make this happen, from 20+ locations around the world.
Build and shape our SRE practices
You will play a key role in defining how to “do reliability”. Together with your coworkers in the product engineering teams, you will be responsible for:
  • Observability of our backend platform: define bottlenecks, track them and fix them.
  • Optimize our performance and reduce error rates: from wild queries, to slow queues to Heisenbugs.
  • Streamlining our on-call process and optimizing our runbooks.
  • Work with the product folks to have reliability baked in to everything we do: define SLO’s and SLA’s and enforce them.
  • You have deep experience in operating and troubleshooting mission critical SaaS environments as an SRE.
  • You have deep working experience with AWS, SQL & OLAP databases and Node.js.
  • You like to work in a growing company with experienced founders.
  • You know how to communicate with coworkers and customers in English.
  • You are quick to pick up on new stuff and enjoy the process of learning new things.
  • You love making software!
Bonus points
  • Experience with building SaaS tools for developers.
  • Obsessed with browser automation in the cloud.
  • Competitive salary
  • Working hours are flexible and we support families: you can pick up your kids without worrying about work.
  • An open, healthy workplace we all can enjoy and grow
  • Work with the latest technologies
  • Modern laptop and equipment provided