Examples of Using Autotask Problem & Incident Management

A Narcoleptic Server

Let’s say that your remote monitoring and management (RMM) system throws an alert for an offline server every afternoon. The alert is for the same server every day, and you simply haven’t had time to figure out what is causing the server to be overloaded and appear offline so consistently. Now, let’s also say that each time the server goes offline, you get a new ticket in Autotask for the server being offline. After a few weeks, you’ll have a bunch of tickets for the same issue. Chances are, two engineers on your team have unknowingly been working on the issue, albeit from two different tickets, and neither knows that the other is working on the issue. How can you more effectively track the root issue (the fact that the server keeps going offline) as well as all the instances of that issue (the ticket from today, and from yesterday, and the day before) without clogging up your ticket system? How do your engineers know what ticket to work on to fix the root issue?

There are two things you need to track in this situation; first off, if the server is offline right now, you should use the ticket from the RMM tool to track the fact that the server is offline right now. This ticket would be considered an Incident ticket, and it, and the rest of the RMM-created tickets, should be associated to a single Problem ticket. The Problem ticket would be used to track the fact that the server goes offline every day, whereas the Incident tickets would be used to track all unique the times that the server went offline. Then, when the “server offline” ticket is created each day, your engineer should be sure the server is back online and then close the ticket was created that day. The Problem ticket (that is used to track the fact that there’s an underlying and unresolved issue) will remain open until the root issue is fixed.

If you rely on ticket Priorities to prioritize what your engineers work on (and you should), then this approach will soon prove quite helpful. Since a server that is offline right now is a high-priority issue (I’d consider it critical), you’ll want the Incident tickets (the tickets that are created each day) to be prioritized as very high. Once the server is back online, though, that critical ticket can be closed (since there’s no longer a critical issue pending). The Problem ticket (that will remain open for a longer time while you fix the root issue) can be prioritized as something a bit lower than critical (since it’s less important to fix an unstable server than it would be to fix a different server that was offline right now).

An Email Brownout

Or, maybe one of your client-facing services (email hosting, web hosting, spam filtering, VoIP, etc) experiences an outage for an afternoon. While that service is offline, you’re likely to have a number of customers contact your help desk to report the outage (or something that’s directly related to the outage). Once you have a bunch of tickets open for different customers, but all for the same root issue, how do you ensure that all your customers are updated in a timely manner when there’s an update to the situation?

You need to keep track of all the customers that have reported the issue so you can proactively communicate to them when the service is restored. But, you also need to identify a single ticket you’re going to use to track all the work the engineers are doing to restore the service. In this example, you would want to create a new Problem ticket that the engineers will use to track the work performed to bring the service back online. This ticket may stay open for even a few days after the issue has been resolved, and it would be used to track any needed followup or longer-term fixes to prevent this issue from returning.

When the engineers bring the service back online, they can easily add a note the single Problem ticket and have that note added to each Incident ticket automatically. In this way, the engineering team doesn’t need to find all the tickets that are related to this issue to update them – they only need to update the single Problem ticket.

Both of these situations are excellent candidates for using Problem Management in Autotask.

In both examples, the engineers should create a single Problem ticket to identify the root issue (even if the cause isn’t yet identified), and then use the Incident tickets to track each manifestation of that root issue.

Would you like to learn more?

Check out this video that shows how this all fits together.