Platform Site Reliability Engineer
Equifax is where you can power your possible. If you want to achieve your true potential, chart new paths, develop new skills, collaborate with bright minds, and make a meaningful impact, we want to hear from you.
Platform Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems while adhering to Equifax engineering principles. Our SREs are responsible for overall system operation and we use a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.
What you’ll do
Manage and support system(s) uptime across cloud-native (AWS) architectures.
Build infrastructure as code (IAC) patterns that meet security and engineering standards using Terraform and/or scripting with cloud CLIs that will be adopted and consumed by application teams.
Build new and mature existing CI/CD pipelines patterns for building, testing and deploying both applications and cloud resources, using in Gitlab or Jenkins, and or other cloud-native toolchains.
Solve problems and triage complex distributed architecture service map.
Document and maintains runbooks that are comprehensive and detailed to manage detect, remediate and restore services.
Lead availability blameless postmortem and own the call to action to remediate recurrences.
On call for high severity application incidents and improving run books to improve MTTR
Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management.
Effectively communicate to technical peers and team members in both written and verbal formats.
What experience you need
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required
3+ years of experience configuring and administering Kubernetes in a public cloud
3+ years experience in the automation and orchestration of containers (Docker, Kubernetes, etc.)
5+ years experience in scripting languages such as Python, Bash
5+ years experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
5+ years experience of cross-functional knowledge with systems, storage, networking, security and databases
5+ years experience working with continuous integration and continuous delivery tooling and practices
What could set you apart
You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
You have experience managing Infrastructure as code via tools such as Terraform or CloudFormation
You are passionate for automation with a desire to eliminate toil whenever possible
You’ve built software or maintained systems in a highly secure, regulated or compliant industry
You thrive in and have experience and passion for working within a DevOps culture and as part of a team
We offer comprehensive compensation and healthcare packages, 401k matching, paid time off, and organizational growth potential through our online learning platform with guided career tracks.
Are you ready to power your possible? Apply today, and get started on a path toward an exciting new career at Equifax, where you can make a difference!
Equifax is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Function:Function - Tech Engineering and Service Ops