Staff Site Reliability Engineer Job – Bannockburn, IL

Why You'll Love This Job

The Staff Site Reliability Engineer pioneers solutions to guarantee reliability, performance, and the integrity of our digital health solutions. As a key member of our engineering leadership team, this position will play a pivotal role in shaping our technology strategy, architecture, standards, and priorities while providing technical leadership across multiple engineering teams.

Responsibilities

Cross-functional Leadership: Own the end-to-end operational integrity of the platform, understanding and contributing to the bigger picture of the organization. Collaborate with cross-functional development and platform teams, providing expert-level guidance to deploy and maintain critical applications. Provide governance of our platform as a service environment.
Observability and Reliability: Focus primarily on maintaining the reliability and scalability of production systems, employing techniques to manage service quality. Iteratively architect and implement cutting-edge solutions for application resiliency and fault tolerance.
Drive automation and continuous improvement: Provide forward thinking on technology and innovative solutions with a strong emphasis on automation, eliminating manual operation, and enhancing operational excellence.
Cloud Infrastructure Ownership: Architect, build, and manage highly available, scalable, and fault-tolerant cloud infrastructure on Microsoft Azure. Establish and enforce reliability standards (SLIs, SLOs, error budgets) for Azure-based platforms and shared services.
Mentorship & Growth: Serve as a mentor and thought leader, coaching engineers across teams while fostering a culture of technical excellence, innovation, and continuous improvement.
Engineering Practices: Integrate SRE principles directly into the software development lifecycle, guiding teams to a reliability culture. Advocate for the adoption of automated testing and observability practices to ensure high-quality and efficient delivery.

Skills & Qualifications

EDUCATION

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

REQUIRED SKILLS

10+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles, with at least 2 years in a Principal or Staff Engineer position.
Deep hands-on experience with .NET and core Azure Cloud services (e.g., App Services, Azure Functions, AKS).
Strong hands-on experience building reliable, reusable infrastructure using Infrastructure as Code (Terraform, Bicep).
Strong understanding of cloud cost management and optimization strategies.
Solid understanding of SRE best practices, design patterns, and system integration.
Strong troubleshooting skills across complex cloud infrastructure and production environments.
Excellent communication and leadership skills, especially when dealing with complexity or ambiguity within platform and cross-functional environments.
Proficiency with at least one programming or scripting language (e.g., Python, Go, PowerShell, Bash).
Ability to influence and work in a collaborative cross-team environment.
Proactive and ownership-oriented mindset.

PREFERRED SKILLS

Experience in Healthcare technology, including clinical provider environments and patient engagement platforms.
Experience with observability tools and performance tuning in production environments.
Experience with backup, disaster recovery, and business continuity in cloud environments.

WORKING CONDITIONS

Hybrid preferred in the greater Chicago area. Travel to the Bannockburn, IL office on a monthly basis.
Work environment – Fast-paced, collaborative, and dynamic work environment with a focus on teamwork and meeting tight deadlines.
Hours – 8am to 5pm Central Time; after hours work as needed, emergency on-call for security incidents.

Apply Now

Back to All Jobs