I've got a service desk procedure that kicks off an external remediation process. The SD procedure writes to a SQL table, which triggers the external process. The table is updated with the results of the process. Depending on the priority of the event, I want to wait 10 or 20 minutes to determine if the process completed successfully.
Currently, I use a Goal procedure to handle this, but it isn't working as we need. If the alert arrives at 2am, for example, the remediation process launches within a minute, but the goal process doesn't start counting until 8am, when the client's coverage window starts. So, at 8:10am it finally determines that the remediation was unsuccessful. I want to know by 2:10 am so that I can perform a secondary process, such as an advanced remediation tasks reserved for certain conditions, or simply escalate the ticket so it's in the help desk before the staff arrive.
Currently, the goal (and escalation) procedure times for any value of 1 day or less are based on "client coverage window", which explains the behavior. Seems strange that a simple goal timer isn't available.
Is anyone aware of an alternative for this? (there's no SuspendProcedure function.)
Would there be any harm in calling a "Sleep 600" command that you could see (possibly multiple sleep commands running on the Kaseya server)?
I would think you would be better off calling executeProcedure() to reschedule the executing procedure for 10 minutes from now instead of sleeping for 10 minutes. You might have to change the logic of your procedure slightly to get this to work. Possibly add an if statement to see if this is the first call or if this is the call that was scheduled to run in 10 min, but it is hard to say without seeing the procedure.
Ah, it only it were that easy, I would not have even dealt with Goal and Escalation procedures, much less asked this question! Thanks for the ideas, though.. :D
There are only two options available to me when I am in the SD Procedure editor:
scheduleAgentProcedure - which allows me to pick a procedure from the Agent Procedures library, set a delay, and select an agent where the procedure should run. This does not help me continue processing in Service Desk after waiting for a remediation period to complete. (now, if there were an option to wait for completion of the agent procedure, this might work!)
executeSubProcedure, which allows me to select a procedure from within the Service Desk sub-procedure library, but - that executes immediately.
I'm considering trying an executeShellCommand (which runs a command on the Kaseya server) and simply does a "Sleep.exe 600". I only have a small number of situations where I need to know ASAP after a timeout period that the remediation was successful or not, and those don't happen that often. Most are not critical and can wait (like patch failure summarization). I'm really wondering if there's an issue with a long running (10-20 minute) script even though it takes virtually no CPU cycles. I'm also wondering why Kaseya never saw the value of a goal / escalation procedure not tied to customer coverage times.
One method we deploy is to schedule the Agent Procedure from SD , and then move the Ticket into a "Waiting" Stage.
The Agent procedure then runs , does what ever it is designed to do and returns what ever results etc. to the procedure.
The same Agent Procedure can then create a new SD ticket with the results included as the Summary/Description etc.
Then utilising some clever Dedupping functions you can get the New ticket results appended as a note to the ticket in the Waiting Stage, and then dependent on the detail from the Duplicate ticket added ( ie the ticket that represents the results of the agent procedure ) and using a Change Procedure force the Ticket to move to another stage which could in turn trigger another Goal or Escalation or Stage Entry procedure etc
I think the problem I'm having getting my brain in gear with this is that the event I'm processing isn't associated with an agent, so I don't have a place to run an Agent Procedure.
Here's an example that most of us have experienced at some point - a UPS sends an alert that it's on battery power. Service desk receives the email, identifies the client and creates a ticket. I want that ticket to go "on hold" for 5 minutes, waiting for the next email to say that power has returned. The first message with "Power Lost" creates a record in a temporary SQL table with the device ID and a flag value indicating the FAIL state.
When the second message arrives with the "Power Restored" state, it checks the database for a record and if found, clears the flag. If the record is not found, the event is dropped because the alert has already moved on to the help desk. No ticket is ever associated with the "power restored" event - it simply clears the FAIL state if it exists.
So, when the timer associated with the first event fires, the database is checked. If the FAIL state is clear, either no ticket or a closed ticket is generated for the fail event because the Power Restored event cleared the flag. However, if the second event didn't yet arrive, the FAIL state flag will still be set - the procedure deletes the record and proceeds with processing the ticket over to the PSA as an open power failed ticket.
I have several such paired events that arrive via email from non-agent devices. Some - like the power failure - need shorter hold times than something like a router saying it switched to a backup link. Of course, we could hold the ticket longer to get the "restore" event matched up, but just how long is long enough without being too long? Our management has dictated wait times of 5 to 15 minutes for most events depending on the event type and criticality, and 1 hour for network link failover.
Our service desk is designed to handle these events, and for customers with extended but not 24-7 coverage, the on-call team gets an early-morning call as a heads up to these potential issues that occurred overnight. (6am weekdays and 8am weekends/holidays). Customers on 24-7 coverage would trigger the call immediately. The only challenge I'm having with the current Goal-based logic is that when an extended coverage customer (8am-10pm, for example) has an event occur at 2am, the goal timer doesn't start until 8am, where we'd like it to run at 2:00, fire at 2:10, and invoke the 6am call to the on-call team to make sure the problem still doesn't exist.
I've added the delay function to our Multi-Tool* and will be testing it tomorrow to see how well it works. I'm really curious to see if there will be any issue with this apparently "long running" command. If it works without issue, it will solve this problem without "flaming hoops of procedure code" and let me abandon the Goal/Escalation procedures for these. For things like the patch fail summarization, the Goal procedure will work just fine, even if the ticket listing the failed patches doesn't come in until Monday morning.
*[Our Multi-Tool provides 60 functions to support SD programming, including decimal precision math and comparisons, string manipulation, time & date data and calculations, Boolean logic, and even a pair of network functions. If Delay works, it will be 61 functions :) ]
Thanks for the creative juice, Paul, and for listening.
Wow ... OK .. And what I think is missing from SD is the ability to just run a SD Procedure at some set interval completely independent of any ticket creation
A) You could run a generic Agent procedure on the VSA that is triggered via the 1st event and scheduled to run at some later interval
B) How about have a procedure that is triggered off each new ticket that does a generic lookup of your SQL Table. So the New ticket is just used as a trigger and assuming you get new tickets regularly , then it is a crude form of initiating a procedure to run at regular intervals ?
Whats this "Multi-Tool" your referring to ?
Well, you could "schedule" a SD procedure by using Cron or Task Scheduler to send an email to SD with a specific subject, no?. :)
Our "Multi-Tool" is a utility that extends our Service Desk programming capabilities with:
13 Math functions with double-precision accuracy: +, -, *, / plus Modulo division, INT, ABS, ROUND, Dec/Hex conversions, Increment/Decrement, and random (0-99).
6 Comparisons with double-precision accuracy and auto-adaptation to string-compare.
4 Boolean logic functions - And, Or, Not, and "Return T or F", which evaluates a text (like "Yes" or "True") into a T/F value)
15 string manipulation functions including Left, Mid, Right, 2 InStr methods, Len, Reverse, Split and Split Field Count to manipulate delimited strings, ASC and CHR functions, Text Replace, case conversion, and software version string (a.b.c.d format) comparisons.
2 network functions - nslookup by name or IP, returning ip, name, or both, and InSubnet to return T/F if a given IP is in a defined network and mask.
19 time functions - from providing a timestamp, time, date, day name, day number (0 or 1 based), Julian day, month, year, Is-Weekend (T/F), return Date or Time part from TimeStamp, Convert between cTime and Timestamp (essential for time calculations), TimeDiff (seconds, minutes, hours, days, or years between 2 timestamp values), Next Time Occurrence (what is the date for the next "second Thursday"), InTimeRange (current or specific time is between two timestamp values - 2 functions, 1 handles "today/tomorrow", the other can span seconds to years).
When I started to automate our Service Desk, I was frustrated when Kaseya told me that 10 was less than 4 when it did string-based comparisons - well sure, 1 was less than 4 if comparing it a character at a time made sense! The Multi-Tool solved that and a host of other issues.
Our SD knows when the help desk is active (M-F 8a-5p) and can decide to send a voice alert or not based on the current time and whether the help desk is operating - just two calls to IsWeekend and InTimeRange give me that answer. (we replaced Kits with this at much lower cost and tighter control.) I can allow performance alerts during specific weekday hours and ignore them at other times by using the InTimeRange function. Increment or Decrement allows me to easily track how many times something has happened or when a count hits zero.
I can send you some info when I get into the office tomorrow if you'd like. We're about a week away from deploying our revised MSPBuilder site making this and a few other MSP/Administrative tools available for purchase. We'll have some free goodies as well, including a GUI tool that can manage Windows scheduled tasks, including the ability to push a task to hundreds of systems at once. MultiTool is inexpensive - just $299US - and has a full User Guide documenting installation, all functions, and plenty of examples.
We solve for a very similar situation by using two different escalation stages:
- Stage 1: executes the remediation and escalates to Stage 2
- Stage 2: escalation threshold set for X minutes (we adjust based on the nature of the alert as part of the Entry procedure). Escalates to Stage 1
- The remediation will throw an "all clear" alarm if it detects the problem has been resolved. Deduplication will use this alarm to close the ticket.
We added counters into the ticket variables so the loop exits after a predefined number of cycles (so, for example, we can attempt to resolve an issue a few times before escalating to a human).
Happy to elaborate more if helpful.
This was escalated to support, who suggested changes to the coverage type. That won't work as it will result is generating calls to the on-call team outside of the customer's actual coverage window. Kaseya's PM was on-site yesterday and we reviewed the Goal and Escalation procedure logic and discussed the potential for enhancements there.
Currently, I'm testing a change to the overall logic. I'm using the dedup procedure to add records to the SQL table to count the events, and re-sending a properly formatted email using the detected client's email as a "from" address. The DeDup procedure discards the second event that it submitted.
The Intake procedure now simply cancels any alert from the original event, and will only receive one re-submitted ticket - any others removed by the DeDup process. The Multi-Tool delay function simply sleeps the re-submitted ticket for 15 minutes. After the delay, it checks the SQL table to determine if 1 or 2 events were recorded. If 1, ticket is cancelled, if 2, alert is processed. No longer depends on Goal or Escalation procedures that in turn depend upon customer coverage times.
Thanks for all the feedback as the combination of ideas was very helpful in coming up with this method.