We're seeing a problem where one of our clients agents are no longer registering reboots. All six servers are scripted reboot each week on Wednesday at 7pm. We've set the agent status alert set to 1 minute and all of our servers check in every 30 seconds.
When I check the "Last Reboot Time" column on the agent page, Kaseya generally reports a week or two prior. I confirm reboots are working by logging into the servers and running "net stats srv".
These servers are virtual and relatively fast, but they certainly take over one minute to reboot. Anyone experience this and know of a solution?
Seems like there is two issues here - please see below;
The first issue sounds as though the last reboot time not showing properly in the VSA UI or not being updated properly.
The second issue is actually not an issue, but how the agent status alert works and is designed. The following is a writeup on how these alerts work:
Offline Alert (Behind the scenes)
On the Agent monitoring UI:Agent has not checked in for _______ Min, Rearm alert after ___________(1) "Has not checked in" entry is to signal when an alert to be raised. The very first agent offline alert would be produced after the agent is offline for that long period of time.(2) The rearm setting determines how long the alert will be disabled before it's automatically re-enabled. Which meant, right after the first agent offline alert, you won't receive another offline alert for this offline until REARM length + your "hasn't checked in" period.Because of our distributed nature of agents/KServer, any short network delay or network noise would cause Agent NOT to check in properly, then, KServer will have no choice but to consider that as an offline condition. This might produce alert you don't want. To prevent these false alarms from happening, we have implemented mechanism to wait for 2 agent checkins before we signal an agent offline alert. So, even if you put both 0s in the settings above, you won't get any quicker offline alert until 2 X Agent checkin intervals.After alerts are created, there is a background system process that will send our offline emails. It usually runs every 2 minutes or so, so, in the worst case, you should get those offline alerts processed every 2 minutes under normal system load.
So, 1 Minute in the first option, will not actually trigger an alert after 1 minute but after the 2 checkin periods are failed and the system processes the notification.
Most likely these systems are rebooting before this condition occurs. A good way to test this is stopping the agent services on a test device, and verifying the time it takes before that alarm/ticket/script/email is created.
Hope this helps explain things a bit.
Did some testing by stopping the service and timing it. Started at 5 minutes and went down from there. I do not get alerts at 3 minutes. This is with the 1 minute trigger.
I'm guessing our servers are just rebooting too fast to register an offline event. I've noticed that a couple other client servers have not been reporting offline during the scheduled reboot. The thing these servers have in common are they are virtual servers and have pretty good specs.
I suppose there's no way to actually set it to zero?
The last reboot time in the VSA GUI is accurate and works reliably - although sometimes it takes a few moments after the reboot to refresh (usually less than a minute).
You can determine the true last reboot time from the event logs (system event 6006, from memory) - thats the only reliable way as far a we know.
The monitoring thing not picking up reboots on fast virtuals is pretty normal - we don't consider that a failing either (if fact w'd prefer not to be alerted of agent offline for a scheduled serer reboot) - if you want a positive reboot alert email, don't rely on the generic offline alert, setup an event log monitor as above, instead.
Further to the above, my NOC console displays the last rebooted stat from the VSA UI - we use this a positive confirmation of server reboots/uptimes.
Craig, thanks for the tip. I'm going to test using the event log for acknowledging reboots. We simply can't have a reboot taking place without our knowing.
What do you have your agent status threshold set to for general offline alerts?
I completely accept the fact that some servers are just too fast to signal an offline status. That part is fine. However, I'm still very concerned that Kaseya doesn't accurately register the "Last Reboot Time" in our agent menu. We often refer to that for various tickets.
Sorry to but in, but I just wanted to make sure I'm reading everything correctly.
You stated that the VSA GUI is accurate and works reliably. However, this conflicts with what groffnetwork said in the first post:
"When I check the 'Last Reboot Time' column on the agent page, Kaseya generally reports a week or two prior. I confirm reboots are working by logging into the servers and running 'net stats srv'."
You both are talking about the same place, right?
The duration of time it takes to reboot a machine should not be a qualifier when updating this flag.
Hi Zestysoft - yes I can confirm that for us, the last reboot time shown in the VSA is completely accurate - I've never experienced any issues with that data being incorrect.
OP is trying to equate "server offline" detection in the VSA as being equivalent to "server has rebooted" - and this isn't the case, as described above (mainly virtuals rebooting faster than the kaseya offline detection timeout).
I explained how to monitor the event logs for a true and reliable server reboot indication. I believe this is also how Kaseya determins the reboot time.
I also note helpdesk.kaseya.com/.../35994418-Collecting-last-reboot-times-for-the-agent - perhaps OP isn't monitoring the logs so Kasey isn't collecting the correct event IDs to know when a reboot has occurred?
Read through my first post again. We are rebooting.
The servers go offline for a few minutes. No offline alert, no acknowledgement of "Last Known Reboot".
Yours may be accurate, but ours is not.
Thanks though, we're just going to open a ticket instead.
Interesting info, but how about this. I noticed the same issue with several machines at a site not showing a current last reboot. I was on-site changing out a couple pieces of equipment and physically powered off all the computer in the site. After re-starting the server, I then turned back on all the workstations. So they were off-line a good 15 minutes. When I logged into the console none of the computers had an updated last reboot! So out of curiosity I initiated a reboot from the console (these machines will reboot in about 1-2mins tops) bingo, the last reboot time updated immediately. Next I did a restart on the computer itself and presto it also showed a updated reboot time immediately! So unless I missed something it appears even being physically off for an extended period does not seem to trigger an updated reboot. Thoughts?