When Kaseya announced its Automation Exchange back in August, I was honored to be asked to present a short webinar on quality monitor sets. You can read more about this in my blog on the www,MSPBuilder.com web site. In a nutshell, using the default / sample monitors or not taking the time to learn how to properly create your own monitors can seriously impact your operation. It does not matter if you use Kaseya or something else - every vendor provides example monitor sets that often showcase what can be done rather than how it should be done. These are not intended for production use in most cases, as there's rarely a "one size fits all" solution. Every MSP is different and has different customer requirements, so monitor optimization is fairly personal.

This was made painfully clear last week when we deployed our RMM Suite to an MSP client. Our Service Desk application is designed to parse our monitors and take action where appropriate, but pass any "foreign" monitor set directly to the PSA. As part of initial testing, our monitors were applied only to the client's internal organization but not to their customers, which still had Kaseya's sample monitor sets applied. When we turned on the Service Desk, we were inundated with alerts from the non-optimized monitors (mostly Kaseya sample monitor sets). Within 90 minutes we had processed just over 2000 alerts from these sample monitor sets!

To put this in perspective, none of the customer's 30 internal machines generated a single alert with the optimized monitors applied, while the roughly 450 monitored customer agents generated 2083 alerts. In our MSP practice (with all agents having optimized alerts), 2530 monitored agents generated only 31 alerts for the entire day. Granted - this is a slow week for tickets, but we usually average around 45 alert tickets on a typical busy day for that number of agents. This is the value of optimizing your monitor sets.

The key difference is the time we spent on reviewing and optimizing the alerts and the monitor sets that define them. Things we considered include:

  • Can we do something for this alert? If not - why alert on it? Maybe it should be logged for trend analysis instead.
  • Is the threshold appropriate? Have we considered the platform capability, system optimization (or lack thereof), or other environmental limitations when defining the alert?
  • Is the rearm period reasonable? One of the biggest reasons for alert floods is a rearm period too short for a remediation action to be performed.
  • If this is a performance alert, is someone available to look at it RIGHT NOW? If not, log and don't alert on performance, as you usually won't be able to identify the reason for performance hits unless you can look at it while it's happening.

The point here is that time taken to review and optimize your monitor sets is an investment, not an expense in time and labor. Internally, we've seen reductions of Service Desk tickets by 63% after moving to the optimized monitor sets. After 18 months of using the optimized monitors (plus the auto-remediation capability of Service Desk), we've been able to move staff from help-desk to project work, keeping just 2-3 engineers on help desk each day. Our help desk team does live call answer of customer calls in addition to handling the alert tickets. Project work is typically more profitable, so the time spent optimizing the monitors has had direct and measurable results in the bottom line - worth every minute!