Site Reliability Engineering Practitioner® (SREP) Certification Training

Course Outline

This Site Reliability Engineering Practitioner® (SREP) Certification course introduces ways to scale services economically and reliably in an organization. This SRE Practitioner training explores strategies to improve agility, cross-functional collaboration, and transparency of the health of services towards building resiliency by design, automation, and closed-loop remediations.

Site Reliability Engineering Practitioner® (SREP) Certification Training Benefits

  • In this SRE Practitioner Training course, you will learn how to:

    • Successfully implement a flourishing SRE culture in your organization.
    • Manage the organizational impact of introducing SRE.
    • Build security and resilience by design in a distributed, zero-trust environment.
    • Prepare for the DevOps Institute SRE Practitioner certification exam.
    • Participation in unique exercises designed to apply concepts.
    • Get sample documents, templates, tools, and techniques.
    • Access to additional value-added resources and communities.
    • Continue learning and face new challenges with after-course one-on-one instructor coaching.
  • SRE Practitioner Training Prerequisites 

    • It is highly recommended that learners attend Learning Tree course 3694, Site Reliability Engineering Foundation® (SREF) Certification Training, before attending the SRE Practitioner course.
    • An understanding and knowledge of common SRE terminology, concepts, principles, and related work experience are recommended.
  • SRE Practitioner Certification Information

    • To earn the SRE Practitioner certificate, you must successfully pass (65%) of the 90-minute examination, consisting of 40 multiple-choice questions. 
    • The certification is governed and maintained by DevOps Institute.

SRE Practitioner Certification Training Outline

Module 1: SRE Anti-Patterns

  • Rebranding Ops or DevOps or Dev as SRE
  • Users notice an issue before you do
  • Measuring until my Edge
  • False positives are worse than no alerts
  • Configuration management trap for snowflakes
  • The Dogpile: Mob incident response
  • Point fixing
  • Production Readiness Gatekeeper
  • Fail-Safe really?

Module 2: SLO is a Proxy for Customer Happiness

  • Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
  • Defining System boundaries in a distributed ecosystem for defining correct SLIs
  • Use error budgets to help your team have better discussions and make better data-driven decisions
  • Overall, reliability is only as good as the weakest link on your service graph
  • Error thresholds when 3rd party services are used

Module 3: Building Secure and Reliable Systems

  • SRE and their role in Building Secure and Reliable systems
  • Design for Changing Architecture
  • Fault-tolerant Design
  • Design for Security
  • Design for Resiliency
  • Design for Scalability
  • Design for Performance
  • Design for Reliability
  • Ensuring Data Security and Privacy

Module 4: Full-Stack Observability

  • Modern Apps are Complex & Unpredictable
  • Slow is the new down
  • Pillars of Observability
  • Implementing Synthetic and End-user monitoring
  • Observability driven development
  • Distributed Tracing
  • What happens to monitoring?
  • Instrumenting using Libraries and Agents

Module 5: Platform Engineering and AIOPs

  • Taking a Platform Centric View solves Organizational scalability challenges such as fragmentation, inconsistency, and unpredictability
  • How do you use AIOps to improve resiliency?
  • How can DataOps help you in the journey?
  • A simple recipe to implement AIOps
  • Indicative measurement of AIOps

Module 6: SRE & Incident Response Management

  • SRE Key Responsibilities towards incident response
  • DevOps & SRE and ITIL
  • OODA and SRE Incident Response
  • Closed Loop Remediation and the Advantages
  • Swarming – Food for Thought
  • AI/ML for better incident management

Module 7: Chaos Engineering

  • Navigating Complexity
  • Chaos Engineering Defined
  • Quick Facts about Chaos Engineering
  • Chaos Monkey Origin Story
  • Who is adopting Chaos Engineering?
  • Myths of Chaos
  • Chaos Engineering Experiments
  • GameDay Exercises
  • Security Chaos Engineering
  • Chaos Engineering Resources

Module 8: SRE is the Purest form of DevOps

  • Key Principles of SRE
  • SREs help increase reliability across the product spectrum
  • Metrics for Success
  • Selection of Target areas
  • SRE Execution Model
  • Cultural and Behavioral Skills are key
  • SRE Case study
Course Dates - North America
Course Dates - Europe
Attendance Method
Additional Details (optional)