Principal Site Reliability Engineer (SRE)

Posted 25 days ago

London, Greater London
Any
External
Expires In 2 months

Principal Site Reliability Engineer (SRE) page is loaded
Principal Site Reliability Engineer (SRE) Apply
locations
London, Warwick Court
time type
Full time
posted on
Posted 20 Days Ago
job requisition id
70771
There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference.
We are a
premierassetmanagerfocused on delivering global investment management excellence and retirement services that investors can rely on today and in the future.
The work we do matters. We invite you to explore the opportunity to join us and grow your career with us.
Job Title:
Principal
Site Reliability Engineer (SRE)
Department:CDO Technology Group
Summary:
We are
seeking
a highly motivated and experienced
Principal
Site Reliability Engineer (SRE) to join
the
CDO Technology
leadership team to
stand up and lead the SRE function within CDO Technology
.
In this role
, you will
be responsible for
ensuring the availability, latency, performance, efficiency, and stability of our critical infrastructure, which supports
a range of
data platforms,
applications
,
and services. You will collaborate closely with development teams to implement and
maintain
reliable and scalable systems while adhering to industry best practices and security standards.
Responsibilities:
Availability:
Proactively
monitor
and proactively
identify
potential issues that could
impact
the availability of our systems.
Implement and
maintain
automated alerting mechanisms to notify the
appropriate parties
of potential outages or performance degradation.
Collaborate with development teams to design and implement solutions that enhance system resilience and reduce downtime
.
Latency
Analyze performance metrics to
identify
and resolve latency bottlenecks in our infrastructure.
Implement performance optimization techniques and tools to improve the overall responsiveness of our systems.
Work with development teams to ensure that new features and code changes do not introduce performance regressions.
Performance:
Develop and
maintain
metrics dashboards to track key performance indicators (KPIs) for our critical systems.
Identify
performance trends and anomalies that may
indicate
potential issues or areas for improvement.
Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems.
Efficiency
Optimize resource
utilization
and minimize unnecessary expenditure on IT infrastructure.
Identify
and implement cost-effective solutions to improve the efficiency of our IT operations.
Release Management:
Design
and implement automated deployment and rollback procedures to mitigate risks associated with software updates.
Monitor the performance of new releases and address any issues that arise promptly.
Lead the
team that executes the release management
.
Monitoring:
Design, implement, and
maintain
a comprehensive monitoring infrastructure to track the health and performance of our systems.
Analyze monitoring data to
identify
potential issues and proactively troubleshoot problems before they
impact
users.
Develop and implement alerts and notifications for critical events to ensure
timely
intervention.
Emergency Response
:
Build a
nd
lead
the
team that r
espond
s
promptly to incidents and work
s
collaboratively to resolve them
in a timely manner
.
Analyze root causes of incidents to
identify
and implement preventive measures to minimize their recurrence.
Document incident responses and
communicate
lessons learned to enhance our incident handling processes.
C
ollaborate
with your peers on the leadership team
to define a multi-year technical roadmap. Stay
up to date
with industry developments and enterprise
infrastructure, and
anticipate significant risks.
W
ork with development teams to review architecture design to ensure high availability and proper disaster recovery strategy
C
ollaborate with reliability and infrastructure engineering team in T Rowe Price to build
synergy
in
tooling
for the implementation of observability, tracing, and alerting
Qualifications:
Bachelor's degree in Computer Science
, Information Technology, or a related field preferred.
10
+ years of experience as a Site Reliability Engineer or equivalent in a similar role.
Proven experience in monitoring, analyzing, and
optimizing
the performance of large-scale distributed systems.
Expertise
in Linux systems administration, including managing servers, operating systems, and network configurations.
Strong scripting and automation skills, preferably with experience in Bash, Python, or similar languages.
Familiarity with
AWS
.
Experience with DevOps tools and practices, such as GitLab CI/CD, and Docker.
Excellent troubleshooting and problem-solving skills with a knack for
identifying
and resolving complex technical issues.
Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders.
A passion for
maintaining
high availability, performance, and reliability of critical systems in a fast-paced financial environment.
Benefits:
Competitive salary and comprehensive benefits package.
Opportunity to work with
cutting-edge
technologies and contribute to the development of innovative solutions.
Collaborative and supportive work environment with a focus on continuous learning and professional development.
T. Rowe Price operates a hybrid working model with a minimum of two days per week in the London office expect
Commitment to Diversity, Equity, and Inclusion:We strive for equity, equality, and opportunity for all associates. When we embrace the power of diversity and create an environment where people can bring their authentic and best selves to work, our firm is stronger, and we create greater value for our clients.
Our commitment and inclusive programming aim to lift the experience for each associate and builds allies for our global associate community.
We know that a sense of belonging is key not only to your success at the firm, but also to your ability
to bring your best each day.T. Rowe Price is an equal opportunity employer and values diversity of thought, gender, and race. We believe our continued success depends upon the equal treatment of all associates and applicants for employment without discrimination on the basis of race, religion, creed, colour, national origin, sex, gender, age, mental or physical disability, marital status, sexual orientation, gender identity or expression, citizenship status, military or veteran status, pregnancy, or any other classification protected by country, federal, state, or local law.
Similar Jobs (1)
Principal Software Engineer – Developer Services Group London
locations
2 Locations
time type
Full time
posted on
Posted 13 Days Ago
#J-18808-Ljbffr

View more similar jobs