Sub Banner Default Image

Site Reliability Engineer

Back to job search

Site Reliability Engineer

  • Job type:

    Permanent

  • Job ref:

    J765

  • Published:

    about 1 month ago

  • Expiry date:

    2021-02-27

We are looking for an experienced Site Reliability Engineer to join an existing technical team at an exciting growing technology company. As they enter the next growth phase for the company, they are looking for an individual who will be able to build upon the ground­breaking work already undertaken by the existing team. The role is permanent and a work-from-home position so open to anyone on the M62 corridor. This really is an exciting opportunity to be part of the deign and infrastructure of technology that will truly add value in peoples lives. If you are looking to work for an organization that are making a difference, then this could be the ideal role. As a Site Reliability Engineer, you’ll be working in a fast-paced environment, working alongside internal development & technical teams to further develop and maintain the existing services and applications which go together to make up the service offering. The core web technology stack includes PHP, Laravel/Lumen and VueJS with native apps for IOS & Android, running on containers inside an EKS cluster, and spread across develop, staging, pre-prod & prod environments. The infrastructure is built around Kubernetes and currently hosted within AWS and described in Terraform so knowledge and experience of these are essential. Given the above, you will need a strong and proven knowledge in containerisation & virtualisation technologies, as well as associated programming languages to help build tools and systems. Additionally, they also have a number of legacy, or non-containerised, system & servers which also need maintaining and moving forward to be in-line with other systems which fall under the responsibility of this role. Responsibilities Leading engineering & development teams in building highly fault-tolerant, scalable applications. Developing tools to ensure our services can scale and are highly available. They always try to manage their ops tasks with automation, by adopting open source tools or developing bespoke tools as required Day to day development support and monitoring of production server and network environments by developing and deploying logging and monitoring tools. Developing applications to increase code quality throughout various codebases. Supporting disaster recovery, backup, redundancy and capacity planning activities. Liaise with other teams, and stake-holders team to plan & implement improvements to the overall service Research and suggest new protocols and approaches to further enhance the overall service performance Stay up-to-date with the latest technology trends