Kaseya Community

Server Down Notifications are PAINFUL

  • Alerts:
    All Servers have a 5 minute offline, 30 minute rearm Agent Status alert.

    When (2) servers go offline within the same group, a seperate alert is sent stating the group appears offline: (x) machines are offline.

    The individual agent status alerts STILL OCCUR. This must be by design, but its horrible. Is there no way to automatically suspend all alarms for a group while 2+ servers are down? For my org, server down is top priority, so if even 1 server was down we're on it immediately anyways. The 30 minute re-arm is just a "hey, I'm still down", as a precaution.

    Example: we had a site go offline. Site has 30 servers. 30 alerts were sent, then one more alert for group offline. 30 minutes later, 31 more alerts. Especially with a 5 minute "window" before alerting, I'd think Kaseya should know "Hey I have an entire group showing offline, send ONE alerts that says Group Offline".

    Legacy Forum Name: Server Down Notifications are PAINFUL,
    Legacy Posted By Username: boostmr2
  • We do not see this bevaiour, but I don't recall doing anything specific to set it up this way. Maybe your alerting is glitchy... Maybe ours is! I would open a ticket with Kaseya to ask. I can definitely tell you that in our scenario, when 5 servers go offline, we only get 1 alert stating that 5 servers in the group went offline. We do not get 1 per server in this case.

    Legacy Forum Name: Monitor Sets,
    Legacy Posted By Username: arobar
  • Want to trade servers? LOL

    Yes I opened a ticket with Kaseya this morning, waiting on feedback. lately their support has been fairly responsive (tier 1 anyways). If it requires escalation I have waited almost a week for an answer before, but this is nothing new...system has always done this.

    I looked everywhere for a way to configure the "group alarms" or maybe a setting under Agent status, I can't find anything. Maybe its tied to the emails? I have the Agent Status Alerts execute an email to a public folder. Connectwise email parser picks it up and makes a ticket. Also, an email is sent to a distro that does text messaging. Maybe its because it's an "email" and not the "Alarm"? The only work-around I can come up with is to execute a script when the alert trips....have it ping the public gateway IP, and then send an email from the script if it fails saying (issue for group...). It will take a long time to setup up, be a management nightmare, and also won't be able to tell if its single server or site that is down (although just a notification that its either will satisfy me at this point).

    Waiting to hear from Kaseya.

    Legacy Forum Name: Monitor Sets,
    Legacy Posted By Username: boostmr2
  • I'm still digging. This particular site has 29 servers (sorry about saying 30).

    Lets call the site "Customer". Customer has a few sub groups.
    Group List:

    Customer - 29 servers
    Customer.location1 - 1 server
    Customer.location2 - 1 server
    Customer.location3 - 1 server

    The naming policy allows ALL servers to show under "Customers". Then, the machines that exist under a location that is not the main location, are filed under a sublocation, of which there are 3.

    For some reason, my alert email states "21 machines offline". I have confirmed that every machine went offline at the same time, even the ones at the remote locations (VPN'd and gateway is the same, which went down. Naming policy designates the subgroup dynamically by private IP range).

    Reguardless of Kaseya only seeing 21 machines as offline instead of all 29, I checked and every 30 minutes it sent notifications for each server that went offline, even the ones listed in the email of 21 machines. Very odd. I'll post any info from Kaseya. I'm still looking into a misconfiguration in the naming policy or subgroups, something to explain why I still recieved individual notification despite have a the entire group down. This also happens for small 2 server sites, I've seen it but the impact is not nearly as bad.

    Legacy Forum Name: Monitor Sets,
    Legacy Posted By Username: boostmr2
  • I've seen this behaviour too, I'm guessing it has to do with the checking times. Since all the servers don't check in at the same time (stagered by ~30seconds) the alarms get triggered and different intervals, should the check in times be close enough (who knows how that works) you'll get that single email stating that x number of machines are offline at customer.location(x).

    Why it won't "sync up" after that is beyond me... Normally by the time the second or third email is sent we're working on the issue or have suspended the alarms for what ever reason.

    Legacy Forum Name: Monitor Sets,
    Legacy Posted By Username: thirteentwenty
  • Yeah. One of the main reasons i give a full 5 minutes before alerting an offline server is to give that "buffer" needed if an entire site is down. And mind you, the "21 servers offline" message occurred at the same time as the "agent is offline" alerts did. The only thing missing in this scenario is the individual agent alerts should not be sent, "suppressed" if you will. I have a ticket open with Kaseya, and they say Agent Status is supposed to suppress the individual alerts. I'm waiting fro a response.

    Legacy Forum Name: Monitor Sets,
    Legacy Posted By Username: boostmr2
  • Update: Today I recieved notification from the Kaseya dev team that this is a "bug". A hotfix has alreayd been developed, so they are getting it approved and rolling it out. Looks like we'll have a fix soon.

    Legacy Forum Name: Monitor Sets,
    Legacy Posted By Username: boostmr2