Lead Site Reliability Engineer
Location: Maryland Heights
Posted on: September 25, 2022
This position is responsible for leading design, development and
implementation efforts of cloud based technologies. In this role,
you will use your development and operations knowledge to identify
and prioritize issues, find solutions to common problems and mentor
and support junior staff to help support our Cloud infrastructure
enterprise wide. This includes working with our entire engineering
organization and Enterprise Architecture.
MAJOR DUTIES AND RESPONSIBILITIES
Actively and consistently supports all efforts to simplify and
enhance the customer experience.
- Take ownership and accountability of the Product/site
- Assist in analyzing code for reliability issues, components,
and infrastructure and system level problems.
- Work with architects, teach leads, test leads and stakeholders
to identify points of failure.
- Define and lead Blue-Green deployment approach to enable
- Lead and improve the tooling and automation of our
infrastructure to minimize manual work, increase performance, and
decrease the frequency and severity of incidents.
- Lead technical hands on implementations for our Cloud service
- Define the type of alert requirements, exceptions and messages
to be monitored that will trigger the alerts and recovery.
- Establish best practices for system logging, monitoring, health
checks, and recovery.
- Define approach for scale up and scale down and ensure
Infrastructure provisioning scripts and automation meet required
- Work with QA lead, Tech leads, architects to ensure test
automation, security testing is integrated with our Cloud solutions
- Lead or assist with Root Cause Analyses (RCAs).
- Provide critical input into the selection, configuration, and
implementation of new and existing technology solutions.
- Demonstrate high ownership and ability to drive issues to
- Highly organized and have the ability to juggle many tasks
without losing sight to the highest priority items.
- Perform other duties as requested.
Required Skills/Abilities and Knowledge
Ability to read, write, speak and understand English
- Advanced experienced with the VMWare suite of products
- Advanced experienced with managing both physical and Virtual
- Advanced experienced with multiple operating systems (e.g.
Windows and Linux)
- Hands-on experience in one or more of cloud computing services
(e.g. AWS, Microsoft Azure, Google Cloud Platforms, IBM, etc.)
- Advanced experience implementing a variety of cloud service
models (e.g. Private, Public, Multi-Cloud)
- Proficient scripting in one or more languages (e.g. Python,
Shell, PowerShell, Ansible or Perl)
- Advanced experience with CI/CD tools (Puppet, Ansible,
- Advanced experience managing monitoring and alerting tools
- Prior experience working in an Agile environment
- Familiar with containerized workloads (e.g. Kubernetes,
- Advanced experienced with firewalls, routing and load
- Skilled in troubleshooting methodologies
- Must have excellent written and oral communications, including
technical documents, and process documents.
- Requires attention to detail and excellent organizational
- Ability to contribute independently as well as be a team
- Advanced experience managing small projects
- Self-starter, ability to manage tasks with little
Bachelor's degree in Computer Science or related field, or
Required Related Work Experience and Number of Years
Network experience - 5+ yrs
System Administration experience - 5+ yrs
Troubleshooting - 5+ yrs
Container Services - 2+ yrs
Scripting - 3+ yrs
Preferred Related Work Experience and Number of Years
rVMware System Administration experience - 8+ yrs
TKGI Enterprise Pivotal Container Services - 2+ yrs
VMware NSX-T - 2+ yrs.
vROPs, Log Insight, vRNI, vRIL - 3+ yrs
Cisco networking - 3+ yrs
Firewall configuration management - 3+ yrs
Load Balancer configuration management - 3+ yrs
CI/CD experience in a customer facing, production environment - 1+
Experience as a Site-Reliability/DevOps System Engineer - 3+
AWS or other cloud computing platforms - 1+ yrs
On Call support, on a rotation basis
ISY314 311719-1 311719BR
Keywords: SPECTRUM, Springfield , Lead Site Reliability Engineer, Engineering , Maryland Heights, Illinois
Didn't find what you're looking for? Search again!