Senior Software Engineer - Infrastructure
CourseraJob Summary
We are seeking a Senior Software Engineer to design, implement, and maintain our infrastructure on AWS. The ideal candidate will have 5+ years of experience in SRE, Infrastructure, or DevOps roles with a focus on AWS. They should be proficient in at least one programming language, Docker, and infrastructure automation tools like Terraform. Strong communication and collaboration skills are required, along with excellent problem-solving and analytical skills. The successful candidate will work independently and as part of a team to ensure the reliability, performance, and scalability of our Coursera Labs applications and services.
Job Overview:
As a Senior Software Engineer in our team, you will play a critical role in designing, implementing, and maintaining our highly available, scalable, and fault-tolerant infrastructure on AWS. You will be a part of the Hand-on-Learning software engineering team based in North America to ensure the reliability, performance, and scalability of our Coursera Labs applications and services. This position requires a strong sense of ownership, technical expertise, communication skills, ability to work both independently and collaborate with engineers in a different time zone.
Responsibilities:
Architect solutions to scale up and maintain a system already running thousands of on-demand student Docker containers concurrently from over 1 TB of course Lab images.
Manage services, networks, storage, deployment, security, and monitoring in AWS.
Keep disaster recovery components ready for use and participate in disaster simulations.
Tune Linux instances to maximize performance and stability while minimizing hosting costs.
Design processes to automate software updates.
On-call to analyze failures, create technically detailed JIRA tickets, and restore production systems.
Assist with maintaining environments for software development and QA.
Work with other engineers on the team to improve software performance, stability, and diagnostics collection.
Automate deployment, testing, and configuration management using tools like Jenkins
Monitor for trends in usage that will require hosting/instance/pricing adjustments.
Stay up-to-date with emerging technologies and industry trends to drive continuous improvement of our infrastructure and processes
Basic Qualifications:
5+ years of experience working in SRE, Infrastructure, or DevOps roles, with a focus on AWS
Deep understanding of AWS services such as EC2, CloudFormation, CodeDeploy, etc
Proficiency in at least one programming language (Python, Go, Java, etc.).
Deep knowledge of Docker.
Strong experience with infrastructure automation tools such as Terraform
Technical diagnostics at the application, Linux system, and cloud levels
Excellent communication and collaboration skills,
Strong problem-solving and analytical skills, with the ability to work independently and as part of a team
#LI-SP2