Kaseya Community

Monitoring set, generate ticket after second failed check of monitored service

  • So I have created a monitor set for a particular service. Right now, I have it check to see if the service is running, if it is not running, create ticket in KSD. all of this works fine. But what i want to do now is check the service, if the service is not running do nothing maybe wait a few minutes and do another check and if the service is not running on the end point, then create a ticket in KSD.

    Is that possible?

  • Sure - but it can be a bit challenging without some extra tools.

    If you use the Intake procedure, you can identify a service failure and the service name. Perform a delay and check the service again. If running, cancel the ticket, otherwise allow the ticket to be created and update the parameters.

    One of the challenges in doing this is getting the information you need. When the alert message says "The DHCP Server service is not running", how do you get the name of the service? It's everything after "The " and before " service", but Kaseya doesn't provide the tools to do that kind of string manipulation. You're either writing and calling external scripts or creating convoluted procedural code to accomplish this.

    Our service desk gets a service failure alert and we extract the information. We then use the API to call a specific external remediation command (script) that is run via an agent procedure. It performs a couple of different service remediation tasks (restart, stop/wait/start, finally a kill service exe then start. It returns the success/fail status. We then either close the ticket as "remediated" or send it as open to be assigned at our help desk.

    We use a Multi-Tool that we developed to perform the string manipulation, data extraction and result analysis. It should be available through the exchange shortly for $299. It provides over 60 functions to make service desk procedures easier to develop - the tools and functions you'd find in a full programming language.

    Some other things that we do in a similar vein are

    - Get a KNM Gateway fail alert, wait 10 minutes for the second and either alarm if it arrives or not if it doesn't

    - Get the first patch failure ticket for a machine. Collect all additional patch fail tickets for that machine that arrive within an hour, extracting the failed KB # and cancel the alert, then summarize all the failed patches for the machine with a single ticket.

    - Get a service fail alert, trigger a remediation procedure, then after 10 minutes, report the result

    - Track every alert with an automatic remediation process. If the same alert and successful remediation occurs, say, 4 times in an hour, it's a "repeating" alert and gets flagged, even if remediation is successful.

    So - definitely possible!

    Glenn

  • What is the intake procedure. Where can I find this
  • In service desk - it's technically called the "Ticket Request Mapping" procedure.

    Create a simple "dummy" procedure there to make sure it works, then you can go to Common Config / Incoming Email and select that procedure as the Request Mapping procedure. Then go build your logic in the mapping procedure and extend out into the stages.

    When an alert arrives, it's processed by the DeDup procedure, then the Request Mapping procedure. The commands available here are a bit limited, but it's where we do all of the processing to determine if the event should be processed or cancelled. This was one of the core reasons for developing the Multi-Tool. If we decide to process it, we move it to a set of stage procedures at route the event to processing and remediation stages and finally a completion stage that determines if and what kind of ticket to send to our PSA. We use an external PSA but rely heavily on Service Desk to perform initial evaluation, triage, auto-remediation, and classification before it is ever touched by an engineer. The combination of SD-based remediation and carefully crafted monitors (including some "smart monitors"), we've seen certain types of alerts reduced by almost 80%, and an overall reduction of alerts by 60%. Most of this comes from the SD process and smart alerting that reduces the nuisance tickets - like getting an "AV Defs Outdated" alert an hour before it auto-updates.

    Glenn

  •  

    Buster Davis

    So I have created a monitor set for a particular service. Right now, I have it check to see if the service is running, if it is not running, create ticket in KSD. all of this works fine. But what i want to do now is check the service, if the service is not running do nothing maybe wait a few minutes and do another check and if the service is not running on the end point, then create a ticket in KSD.

    Is that possible?

    What is your current setup in the monitor set?

    You may be able to just use the basic monitor set configuration.

    If Restart Attempts is set to 1 or above, it will attempt to restart the service after it has stopped once a specific amount of time has passed (Restart Interval) if it is unable to restart thereafter it will create a ticket - is that what you are already using?

    Have you tried this and does that achieve what you are looking for or are you looking for a more advanced workflow?

  • ,  I think you can do this with suggestion.    With the monitor set, Kaseya will not create a ticket (alert) until it falls through these settings and is still not able to restart the service.

    So in your case, you would just set the restart Interval for the period of time you were willing to wait, and keep the re-start attempts minimal (not sure if it would be 1 or 2 here), and you should get the desired result without the complexity of Service Desk.

  • We don't want the service to be restarted, we want to only check that the service is not running, then pause then check it again and if it is still down then send the alert/create the ticket

  • Buster, I think the best way to do that would be to create a monitor set.

    Restart attempts set to 0

    And have this action an agent procedure that checks the service a second time, and this procedure then triggers your alarm or desired action