Site Reliability Engineering Practitioner® (SREP) Certification Training
Course Outline
This Site Reliability Engineering Practitioner® (SREP) Certification course introduces ways to scale services economically and reliably in an organization. This SRE Practitioner training explores strategies to improve agility, cross-functional collaboration, and transparency of the health of services towards building resiliency by design, automation, and closed-loop remediations.
Site Reliability Engineering Practitioner® (SREP) Certification Training Benefits
-
In this SRE Practitioner Training course, you will learn how to:
- Successfully implement a flourishing SRE culture in your organization.
- Manage the organizational impact of introducing SRE.
- Build security and resilience by design in a distributed, zero-trust environment.
- Prepare for the DevOps Institute SRE Practitioner certification exam.
- Participation in unique exercises designed to apply concepts.
- Get sample documents, templates, tools, and techniques.
- Access to additional value-added resources and communities.
- Continue learning and face new challenges with after-course one-on-one instructor coaching.
-
SRE Practitioner Training Prerequisites
- It is highly recommended that learners attend Learning Tree course 3694, Site Reliability Engineering Foundation® (SREF) Certification Training, before attending the SRE Practitioner course.
- An understanding and knowledge of common SRE terminology, concepts, principles, and related work experience are recommended.
-
SRE Practitioner Certification Information
- To earn the SRE Practitioner certificate, you must successfully pass (65%) of the 90-minute examination, consisting of 40 multiple-choice questions.
- The certification is governed and maintained by DevOps Institute.
SRE Practitioner Certification Training Outline
Module 1: SRE Anti-Patterns
- Rebranding Ops or DevOps or Dev as SRE
- Users notice an issue before you do
- Measuring until my Edge
- False positives are worse than no alerts
- Configuration management trap for snowflakes
- The Dogpile: Mob incident response
- Point fixing
- Production Readiness Gatekeeper
- Fail-Safe really?
Module 2: SLO is a Proxy for Customer Happiness
- Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
- Defining System boundaries in a distributed ecosystem for defining correct SLIs
- Use error budgets to help your team have better discussions and make better data-driven decisions
- Overall, reliability is only as good as the weakest link on your service graph
- Error thresholds when 3rd party services are used
Module 3: Building Secure and Reliable Systems
- SRE and their role in Building Secure and Reliable systems
- Design for Changing Architecture
- Fault-tolerant Design
- Design for Security
- Design for Resiliency
- Design for Scalability
- Design for Performance
- Design for Reliability
- Ensuring Data Security and Privacy
Module 4: Full-Stack Observability
- Modern Apps are Complex & Unpredictable
- Slow is the new down
- Pillars of Observability
- Implementing Synthetic and End-user monitoring
- Observability driven development
- Distributed Tracing
- What happens to monitoring?
- Instrumenting using Libraries and Agents
Module 5: Platform Engineering and AIOPs
- Taking a Platform Centric View solves Organizational scalability challenges such as fragmentation, inconsistency, and unpredictability
- How do you use AIOps to improve resiliency?
- How can DataOps help you in the journey?
- A simple recipe to implement AIOps
- Indicative measurement of AIOps
Module 6: SRE & Incident Response Management
- SRE Key Responsibilities towards incident response
- DevOps & SRE and ITIL
- OODA and SRE Incident Response
- Closed Loop Remediation and the Advantages
- Swarming – Food for Thought
- AI/ML for better incident management
Module 7: Chaos Engineering
- Navigating Complexity
- Chaos Engineering Defined
- Quick Facts about Chaos Engineering
- Chaos Monkey Origin Story
- Who is adopting Chaos Engineering?
- Myths of Chaos
- Chaos Engineering Experiments
- GameDay Exercises
- Security Chaos Engineering
- Chaos Engineering Resources
Module 8: SRE is the Purest form of DevOps
- Key Principles of SRE
- SREs help increase reliability across the product spectrum
- Metrics for Success
- Selection of Target areas
- SRE Execution Model
- Cultural and Behavioral Skills are key
- SRE Case study
- choosing a selection results in a full page refresh