If you weren’t able to attend our recent webinar, “5 Tips for World Class Incident Management and Escalation” webinar, we’re happy to provide you a quick recap as well as our Q&A session from the webinar.
In our webinar, we discussed how to automate your alerting and incident management process so you never miss another critical incident. We also laid out our top 5 tips for how to help your Help Desk or IT Operations teams thrive with best practices.
Also, if you’d like to catch our webinar in full, feel free to access it on-demand here.
Let’s get started!
Our 5 Tips for World Class Incident Management and Escalation
Tip #1: Invest in monitoring and alerting tools.
Most companies already have various software systems. Be sure to review your tools occasionally to ensure they are still effective. Utilizing monitoring and alerting tools should be a cornerstone of any IT department. The key to monitoring tools is to ensure you are monitoring your key business functions in an effective and feasible way.
Tip #2: Use a tool to connect your people, processes, and technology.
Here at Coyote Creek, we are an Atlassian Platinum Solution Partner so we use Opsgenie to do this. The “old school” method of forwarding all of your alerts to an on-shift engineer is not only old and outdated but frustrating to Engineers. We’ve eliminated all of that for the engineer by utilizing Opsgenie as our Incident management platform.
Tip #3: Automate your incident management workflow.
Opsgenie sends a text, an email, and calls our escalation managers based on our rules and we configured Opsgenie to pull all of the information the technician normally provides and collates it into an informative alert for our engineers and escalation managers.
So when an incident happens, the escalation manager gets a message with the ticket name, a summary description, a detailed description, the affected client, and a direct link to the incident in our ticketing system.
Opsgenie also opens our conference bridge, posts a summary message to our company’s chat solution (We use Slack, but a large number of integrations are available), and invites the right personnel to this conference bridge.
All of this works together to make our notification process simple, fast, and effective.
Tip #4: Create rules to determine who should be alerted, when, and how.
From the engineering side, nothing feels worse than calling the wrong person at 3 am on a Sunday, but sadly, that can happen in the chaos of an active incident. We allow Opsgenie to manage our notification policies based on the rules we’ve created. It makes the calls as needed.
We feed Opsgenie everything and allow it to process the monitoring through integration with our monitoring tools.
Tip #5: De-duplicate Alerts
With our Opsgenie configuration, we automatically deduplicate alerts based on a few criteria. This allows an engineer to work on the incident without being repeatedly notified about the incident. Allowing them to focus on the task at hand, resolving the incident.
Q: What are some of the things you really like about Opsgenie?
One thing I really like about Opsgenie is the automated escalation. As a Systems Engineer that works the night shift a lot, when a critical incident occurs in the middle of the night, I want to focus all my attention on resolving the technical issue and bring the system back up. If I have to focus on the logistics of notifying and escalating, that takes my focus away and uses up critical time that can be used to resolve the issue at hand.
Q: What has changed for your team since implementing Opsgenie?
Interestingly, one thing we have noticed that has changed is talking about alerts themselves. Because Opsgenie automates the process of alerting and escalating, we spend less time communicating the alerts and more time resolving the alert. The time spent on “did you see that CPU alert for server X, or did you see the DB offline” is reduced because we know the alerts have gotten to the right engineers and managers.
Q: Our company does not have robust monitoring, can Opsgenie work for us?
You don’t need a robust monitoring system to be able to use Opsgenie. A simple script with email notification running on a scheduled task can work well with Opsgenie. Opsgenie creates alerts with just these tools in place and routes them to the right personnel.
Q: Can I export alerts?
Yes, alerts can be exported for review. Also, alert and incident response reports can be generated weekly/monthly to help companies review ongoing issues and dedicate resources to those issues. For example, if an Exchange server is constantly reporting high memory utilization, the report will identify this, and the appropriate actions can be taken to rectify this issue.
Q: Does Opsgenie work with other Atlassian products? Which ones?
Yes. Opsgenie has deep integrations with Bitbucket, Jira, Jira Service (Desk) Management, etc.
If you have a question that did not get answered here or would like our assistance in licensing and utilizing Opsgenie, please reach out today.
At Coyote Creek as an Atlassian Platinum Solution Partner, we know it’s the human connection that makes the difference in all that we do.