SpringfieldILRecruiter Since 2001
the smart solution for Springfield jobs

Lead Site Reliability Engineer

Location: Maryland Heights
Posted on: September 25, 2022

Job Description:

This position is responsible for leading design, development and implementation efforts of cloud based technologies. In this role, you will use your development and operations knowledge to identify and prioritize issues, find solutions to common problems and mentor and support junior staff to help support our Cloud infrastructure enterprise wide. This includes working with our entire engineering organization and Enterprise Architecture.

Actively and consistently supports all efforts to simplify and enhance the customer experience.

  • Take ownership and accountability of the Product/site reliability.
  • Assist in analyzing code for reliability issues, components, and infrastructure and system level problems.
  • Work with architects, teach leads, test leads and stakeholders to identify points of failure.
  • Define and lead Blue-Green deployment approach to enable zero-downtime deployment.
  • Lead and improve the tooling and automation of our infrastructure to minimize manual work, increase performance, and decrease the frequency and severity of incidents.
  • Lead technical hands on implementations for our Cloud service offerings.
  • Define the type of alert requirements, exceptions and messages to be monitored that will trigger the alerts and recovery.
  • Establish best practices for system logging, monitoring, health checks, and recovery.
  • Define approach for scale up and scale down and ensure Infrastructure provisioning scripts and automation meet required implementation.
  • Work with QA lead, Tech leads, architects to ensure test automation, security testing is integrated with our Cloud solutions and pipeline.
  • Lead or assist with Root Cause Analyses (RCAs).
  • Provide critical input into the selection, configuration, and implementation of new and existing technology solutions.
  • Demonstrate high ownership and ability to drive issues to resolution.
  • Highly organized and have the ability to juggle many tasks without losing sight to the highest priority items.
  • Perform other duties as requested.
    Required Skills/Abilities and Knowledge
    Ability to read, write, speak and understand English
    • Advanced experienced with the VMWare suite of products
    • Advanced experienced with managing both physical and Virtual infrastructure
    • Advanced experienced with multiple operating systems (e.g. Windows and Linux)
    • Hands-on experience in one or more of cloud computing services (e.g. AWS, Microsoft Azure, Google Cloud Platforms, IBM, etc.)
    • Advanced experience implementing a variety of cloud service models (e.g. Private, Public, Multi-Cloud)
    • Proficient scripting in one or more languages (e.g. Python, Shell, PowerShell, Ansible or Perl)
    • Advanced experience with CI/CD tools (Puppet, Ansible, Jenkins)
    • Advanced experience managing monitoring and alerting tools
    • Prior experience working in an Agile environment
    • Familiar with containerized workloads (e.g. Kubernetes, Openshift, TKGI)
    • Advanced experienced with firewalls, routing and load balancing
    • Skilled in troubleshooting methodologies
    • Must have excellent written and oral communications, including technical documents, and process documents.
    • Requires attention to detail and excellent organizational skills
    • Ability to contribute independently as well as be a team player
    • Advanced experience managing small projects
    • Self-starter, ability to manage tasks with little supervision
      Required Education
      Bachelor's degree in Computer Science or related field, or equivalent experience

      Required Related Work Experience and Number of Years
      Network experience - 5+ yrs
      System Administration experience - 5+ yrs
      Troubleshooting - 5+ yrs
      Container Services - 2+ yrs
      Scripting - 3+ yrs

      Preferred Related Work Experience and Number of Years
      rVMware System Administration experience - 8+ yrs
      TKGI Enterprise Pivotal Container Services - 2+ yrs
      VMware NSX-T - 2+ yrs.
      vROPs, Log Insight, vRNI, vRIL - 3+ yrs
      Cisco networking - 3+ yrs
      Firewall configuration management - 3+ yrs
      Load Balancer configuration management - 3+ yrs
      CI/CD experience in a customer facing, production environment - 1+ yrs
      Experience as a Site-Reliability/DevOps System Engineer - 3+ yrs
      AWS or other cloud computing platforms - 1+ yrs

      Office Environment
      On Call support, on a rotation basis
      ISY314 311719-1 311719BR

Keywords: SPECTRUM, Springfield , Lead Site Reliability Engineer, Engineering , Maryland Heights, Illinois

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Illinois jobs by following @recnetIL on Twitter!

Springfield RSS job feeds