Site Reliability Engineer, Lead

Posted a month ago

Sheffield, South Yorkshire
Any
External
Expires In 2 months

About the Company:
Our client is a remote-first company with team members across the globe, offering a SaaS-based Learning Management System that powers the world's leading education programs. They help large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the tools an organization needs to create, manage, track, and improve highly personalized learning experiences for customers, partners, and employees.
Successful Candidate:
The ideal candidate for this role should have:
SaaS experience
Experience and ability to thrive in a small-medium high-growth environment
Interest in upskilling and learning new technologies
Curiosity, creativity, and innovation
Flexibility in working hours and ability to collaborate across different time zones
Role Overview:
The Lead Site Reliability Engineer plays a crucial role in guiding the Platform Team to achieve exceptional standards of reliability, performance, and stability across all applications. This role involves defining and implementing industry-leading practices to shape the strategic direction of platform operations and establish benchmarks for engineering excellence.
Responsibilities:
Lead the SRE Team, setting clear goals and priorities aligned with business objectives.
Collaborate with the department Director to develop and execute strategies enhancing technological capabilities.
Ensure platforms and systems operate smoothly, remaining highly available, scalable, and fault-tolerant.
Implement best practices for continuous monitoring, preventive maintenance, and rapid response.
Assess system performance, identify bottlenecks, and make data-driven infrastructure recommendations.
Educate engineering teams on best practices for coding and application performance.
Develop and refine incident management protocols and lead efforts to resolve high-impact issues.
Work closely with other teams to ensure platform initiatives align with company goals.
Participate in a 24x7 on-call rotation to respond to alerts and monitor virtual infrastructure.
Requirements:
8+ years of software engineering experience
5+ years working with Ruby on Rails
Proven experience leading SRE teams
3+ years in infrastructure and operations
Expertise with SQL databases like PostgreSQL
Experience with cloud computing on AWS and/or Google Cloud
Ability to analyze unfamiliar code bases, document solutions, and train operational teams
Comfort working in a team-oriented and collaborative environment
Clear communication, proactive help-seeking, and task ownership
Desired Experience:
Developing solutions using server automation tools like Ansible
Writing and maintaining CI/CD pipelines and services
Education:
Bachelor’s degree in Computer Science or related technical field
#J-18808-Ljbffr

View more similar jobs