Advance Search

Browse Jobs

Principal Site Reliability Engineer

Posted 2 months ago

  • London, Greater London
  • Any
  • External
  • Expires In a month
BPP Education is entering a new phase of its growth and evolution, attracting thousands more students each year and expanding into new verticals and new markets globally. The BPP Product & Technology (P&T) organisation is evolving rapidly, and driving transformation of its platforms, digital products and experiences, in order to help BPP Education scale and meet the growth of the business in the coming years.We’re looking for a talented principal software reliability engineer (SRE) to help us build best in class products and deliver amazing user experiences, to deliver scalable, secure and performant experiences that delight and engage learners during their time studying with BPP and beyond, throughout their working lives.As the Principal Software Reliability Engineer, you will report to the Engineering Manager, bringing your technical expertise to our growing product engineering teams, leveraging modern software development practices that will deliver business value at pace.
You will be accountable for designing, implementing and maintaining systems that ensure the reliability, scalability and availability of our software products.
This role is key as we transform BPP Education to become more customer centred, design and data informed, to build products that meet and exceed our users’ needs across our education ecosystem.
Key Responsibilities:Accountability for the execution of the technical vision and ensure it is aligned with business goals.Coach & mentor SRE engineers across the business in designing and implementing systems that ensure the reliability and availability of software products.Create and execute a strategy for monitoring, alerting & automation tools that improve system reliability, scalability & stability.Lead incident management response in production systems.Analyse system metrics and logs to identify opportunities for improvement and prevent future or recurring incidents.Collaborate with your peers in architecture, product, design, data and security to identify & mitigate risks to the system reliability.Contribute and evolve the internal software engineering practices and standards as the team scales.Be up-to-date with industry best practices, new technologies, and emerging trends.
Essential Skills:Proven experience in a similar software engineering or SRE role working in an agile environment.Deep knowledge of cloud networking, security and native functionality in AWS.Expertise in Infrastructure as Code (IaC) using frameworks such as Terraform.Expertise in monitoring and automation tools such as New Relic, DataDog & GitHub Actions.Strong background in software development, architecture or operations.Proficient knowledge of modern full stack technologies such as Typescript, React, Node.js, & Next.js.Expert knowledge in relational and non-relational database technologies such as RDS, Dynamo & Redis.Experience coaching & mentoring a diverse group of engineers.Excellent verbal and written communication skills.
Core Skills: AWS, Terraform, Datadog, Typescript, React
Other Skills: Cloud Security, Github, DynamoDB, Redis
Seniority: Lead
#J-18808-Ljbffr
Apply