
Senior Site Reliability Engineer
AiraloPosted 3/11/2025

Senior Site Reliability Engineer
Airalo
Job Location
Job Summary
Airalo is seeking a Full-time Site Reliability Engineer to join their remote-first team. As a Site Reliability Engineer, you will be responsible for developing and maintaining reliable, scalable, and efficient systems. You will define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs), conduct blameless post-incident reviews, drive automation of operational tasks, and mitigate operational risks. The ideal candidate has 5+ years of experience as a Site Reliability Engineer or in a similar role, with strong knowledge of AWS services, container orchestration, Kubernetes, observability principles, and incident management. Airalo values diversity, equity & inclusion and offers benefits such as health insurance, work-from-anywhere stipend, annual wellness & learning credits, and an all-expenses-paid company retreat. If you are passionate about building and maintaining highly reliable systems, we would love to hear from you!
Job Description
Responsibilities include but are not limited to:
- Develop and maintain reliable, scalable, and efficient systems.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and improve system reliability.
- Conduct blameless post-incident reviews to identify root causes and implement preventive measures.
- Drive automation of operational tasks and incident response.
- Develop and maintain runbooks and playbooks for common operational tasks and incident response.
- Mitigate operational risks.
- Work with software engineers to design systems for reliability, scalability, and maintainability.
- Continuously evaluate and optimize system performance, capacity, and cost.
- Participate in on-call rotation and be available to troubleshoot and resolve critical issues.
Must-haves:
- Bachelor’s degree in Computer Engineering or a similar discipline.
- 5+ years of experience as a Site Reliability Engineer or in a similar role.
- 3+ years of experience with AWS services including strong knowledge of container orchestration.
- 2+ years of Kubernetes experience
- Deep understanding of observability principles and tools (logging, monitoring, tracing).
- Experience with incident management and postmortem analysis.
- Experience and interest in infrastructure as a code approach (Terraform).
- Experience with chaos engineering and other techniques for testing system resilience.
- Experience with CI/CD tools such as GitHub Actions.
- Proficiency in at least one programming language (Python, Go, Java, etc.) for automation and tooling.
- Comfortable with messaging systems (SNS, SQS, etc)
- Ability to work independently and collaboratively in a fast-paced environment.
- Team player and open to new ideas.
- Good communication skills and fluency in English.
Good to have:
- Prior experience with Scrum and other agile methods.
- Certification in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or similar.
- Experience with AI-driven SRE tools for anomaly detection and improvements
- Contributions to open-source SRE projects or communities.
- Prior work experience in telecommunications.
- Knowledge of eSIM and GSMA related technologies and services.