PagerDuty Blog

Democratize Your Team’s Automation Capabilities With PagerDuty® Automation Actions

Let’s face it. Incidents can be expensive—really expensive. But the high cost of incidents within a production environment isn’t always due to a compromised service or negative customer experience. According to PagerDuty response data, over 50% of an incident’s lifespan is spent with first responders in the investigation and mobilization phases (what we call ‘triage’)— in other words, determining what might have gone wrong and calling the right person to fix it.

With the above statistic in mind, it’s clear that the shadow expense of your incident lifecycle is that of your people’s time—the engineer who discovered the incident, the on-call engineer who responded to the issue and determined root cause, and every other subject matter expert that gets looped into the incident lifecycle. And when you sprinkle in manual processes to the entire response timeline, things can get pricey. Very pricey.

The fact of the matter is, your developer organization’s time is just as valuable and important as the business’s bottom line. And as service and application development continues to grow in complexity, “time saved” becomes an even more important metric to track, quantify, and continuously improve. Finding a way to automate different aspects of the incident response process can help save your team’s time and bolster efficiency across the board. How can you do this, you ask? Enter PagerDuty® Automation Actions (formerly PagerDuty Rundeck Actions).

PagerDuty® Automation Actions

PagerDuty® Automation Actions add-on connects your first-line responders to corrective automation directly within PagerDuty. Instead of pushing escalations to specialists when an incident kicks off, responders can triage and resolve incidents themselves using safely delegated automation. As a result, teams reduce MTTR, lower interruptions to specialists, and quickly diagnose and remediate incidents.

PagerDuty® Automation Actions connects automated diagnostics and remediation to the incident response workflow. Automated Diagnostics are a set of actions for production services that your responders can automatically invoke when an incident occurs. Rather than having to escalate to expert specialists who manually run common tests, responders can safely and securely invoke this automation themselves from within PagerDuty and see responses delivered in real time back to your incident timeline.

Run designated actions such as service restarts, diagnostics, and more

With these diagnostic tests, responders can more efficiently escalate the incident to the right specialist for resolution, rather than involving a large group or escalating up the typical responder ladder. The specialists will be able to see the results of those common diagnostics and can get started right away. 

Additionally, teams can also invoke these actions and collaborate on the incident directly from their Slack instance. This eliminates the need to access a service through a terminal and context switch between windows, creating a faster and more efficient way to resolve incidents—while also reducing escalations to specialists. As you mature your use of automated diagnostics, you can start using it for things like automated remediation and triggering using Event Intelligence.

PagerDuty® Automation Actions helps solve four main problem areas within an organization’s response process:

  • Siloed expertise. First-line responders don’t know the genetic makeup of every single application or service within an organization’s environment.
  • Consistent interruptions to specialists. Responders escalate to the engineer they think is the specialist of that application or service, taking time away from innovation and slowing time-to-resolution.
  • Repetitive and manual diagnostic steps. The first steps when an incident kicks off are often the same. These same, manual steps have to be actioned on before you can begin resolving the incident. 
  • Complex and sprawling production environments. Knowing which systems to access and what actions to take can take time. Additionally, not every responder has the authority to access specific production systems, often making the escalation process difficult and time-consuming.

PagerDuty® Automation Actions solves the above issues by:

  • Delegating automation across teams. Deploy automated procedures to first-line responders that are typically invoked by specialists.
  • Resolving incidents faster with fewer escalations. By creating automation for common requests and operations, teams can spend less time figuring out who to escalate to and more time on a fix.
  • Triggering human-assisted/self-healing automation. Invoke diagnostic actions before responders are even paged using PagerDuty’s Event Orchestration.
  • Safely invoking automation with security in mind. Responders only see actions they have the authorization to invoke for impacted systems in an incident, and all actions are logged to maintain a strong security posture.

 

To summarize the above with some quick bullets, PagerDuty® Automation Actions helps teams:

  • Decrease response times up by to 30 minutes and MTTR by up to 25%
  • Reduce the volume of incidents that get escalated up the ladder
  • Distribute subject matter expertise across response teams
  • Trigger human-assisting and self-healing automation before responders even get paged
  • Invoke secure automation behind firewalls and VPCs
  • Deploy automated actions in place of manual procedures
  • Enrich incident documentation for smoother postmortems and reduced operator work

To learn more about the PagerDuty automation portfolio, visit our automation hub. If you want to learn more about PagerDuty Automation Actions and how it can help your team save time and money, contact your account manager or learn more today.