Advance Search

Browse CVs

Senior System Engineer

Posted 2 days ago

  • London, Greater London
  • Any
  • External
Job Title: Linux, HPC, and Kubernetes Systems EngineerInterested in this role You can find all the relevant information in the description below.Location: Remote and onsite required as needs be in WallingfordJob Type: Contract 3 Months - Inside IR35Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquiti platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.Key Responsibilities:Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.Install, configure, and troubleshoot Kubernetes clusters in a production environment.Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.Manage resource allocation, workload scheduling, and performance tuning for HPC environments.Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.Automate system processes and improve operational efficiency using scripting (Bash, Python, etc.).Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.Provide documentation, training, and technical support to end-users and internal stakeholders.Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.Monitor system health and performance using appropriate tools (e.g., Prometheus, Grafana) and take proactive measures to address potential issues.Qualifications:Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent work experience.Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.Deep understanding of containerization technologies like Docker and Kubernetes.Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.Proficiency in scripting languages (Bash, Python) for automation and process improvement.Knowledge of cluster management and workload scheduling software (e.g., SLURM, PBS) for HPC environments.Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.Preferred Skills:Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).Familiarity with container networking, service discovery, and load balancing (e.g., Istio, Envoy).Knowledge of DevOps tools and methodologies (e.g., Ansible, Terraform).Experience with virtualization and container security practices.Experience working in research, academic, or enterprise-level environments.Benefits:Competitive salary and benefits package.Health, dental, and vision insurance.Paid time off, holidays, and professional development opportunities.Opportunity to work in a cutting-edge technological environment.