Back to job search
We are looking for an experienced Site Reliability Engineer to join an existing technical team at an exciting growing technology company. As they enter the next growth phase for the company, they are looking for an individual who will be able to build upon the groundbreaking work already undertaken by the existing team. The role is permanent and a work-from-home position so open to anyone on the M62 corridor. This really is an exciting opportunity to be part of the deign and infrastructure of technology that will truly add value in peoples lives. If you are looking to work for an organization that are making a difference, then this could be the ideal role.
As a Site Reliability Engineer, you’ll be working in a fast-paced environment, working alongside internal development & technical teams to further develop and maintain the existing services and applications which go together to make up the service offering.
The core web technology stack includes PHP, Laravel/Lumen and VueJS with native apps for IOS & Android, running on containers inside an EKS cluster, and spread across develop, staging, pre-prod & prod environments. The infrastructure is built around Kubernetes and currently hosted within AWS and described in Terraform so knowledge and experience of these are essential.
Given the above, you will need a strong and proven knowledge in containerisation & virtualisation technologies, as well as associated programming languages to help build tools and systems. Additionally, they also have a number of legacy, or non-containerised, system & servers which also need maintaining and moving forward to be in-line with other systems which fall under the responsibility of this role.
Leading engineering & development teams in building highly fault-tolerant, scalable applications.
Developing tools to ensure our services can scale and are highly available. They always try to manage their ops tasks with automation, by adopting open source tools or developing bespoke tools as required
Day to day development support and monitoring of production server and network environments by developing and deploying logging and monitoring tools.
Developing applications to increase code quality throughout various codebases.
Supporting disaster recovery, backup, redundancy and capacity planning activities.
Liaise with other teams, and stake-holders team to plan & implement improvements to the overall service
Research and suggest new protocols and approaches to further enhance the overall service performance
Stay up-to-date with the latest technology trends
about 1 month ago