What is the best way to apply Disk Space Monitoring for Servers in Policy Management? I applied a Catch-All Monitor Set with monitoring of C-Z drives at the Global level, but I'm getting Alerts on Servers with System Restore Drives, etc. How can I avoid these False Alarms for drives <10GB?
Do I need to apply individual Disk Space Policies and Monitor Sets, or is there a way to use a Catch-All/Global Monitor Set?
I can tell you what we did, but it isn't a quick turn around. Kaseya doesn't do a good job of disk space monitoring. We built individual monitor sets for each drive letter at a few different percentages. We then built automation to apply the applicable sets to non-removable volumes. We then built a little web interface to set a percentage per volume so we can customize the percentage monitoring or disable a volume altogether; it defaults to 10 or 20%, but this allows us to change it as required.
We developed "Smart Monitors" for this. It's an application that VSA deploys and executes daily. The Smart Monitor for Disk runs hourly for 24-hours. On each run, it gets a list of all volumes (drive letters and mounted volumes), drops those below specific sizes or containing certain labels (like "recovery"). It then calculates a custom threshold for each volume. Small volumes might be 12.6%, while huge volumes might be 0.35%. These calculations can be overridden, where necessary (of 3100 agents, we have 4 volumes with overrides).
The monitor also does not trigger immediately - we call it "Transient Suppression". If the threshold is crossed less than 75%, the monitor waits up to 48 hours to see if it resolves, triggering the alert if it doesn't. Exceeding the threshold by 75% or more will immediately generate an alert.
Finally, the smart monitor will invoke our Daily Maintenance tool for Disk Cleanup, and tell it to "run aggressively" - instead of removing files 7+ days old, it removes files 1+ day old. It cleans all known temp locations, plus folders that you can specify.
When the monitor determines that it's running for the first time each day, it calculates each volume's utilization, then adds it to a 30-day record. It projects the current utilization rate out 30 days and triggers a warning event if the monitor projects the threshold will be exceeded within the next 30 days.
Regardless of the monitoring platform, disk capacity is difficult to monitor effectively with standard, percentage based tools, and most of these monitors are bound to specific drives, and don't understand mounted volumes. This has eliminated false alerts from disk space monitoring in our environment.
@Glenn, so you use your application/program instead of a Monitor Set to monitor the disk pace?
Yes, we have no agent based disk space monitors. All disk space monitoring is done via the Smart Monitor application.
I have tried a bunch of different Kaseya monitors to get a good way to monitor. I even tried their trend monitor but they failed and stop monitoring on about 90% of the enpoints we tested it on. I ended up doing something like Glenn (but not nearly as involved) to weed out drives that fill up all the time (backup drives) or to alert at higher percentages for important drives (sql, exchange) I am still testing it but did it via powerhsell and custom fields.
To be clear, our solution is more than a script - For disk capacity:
We have other Smart Monitors in the solution stack, including
I've got a few more Smart Monitors simmering on the back burner. Watch for release next year.
OK, so I created a couple of Filtering Scripts to execute when the Monitor Set Fires off a Low Disk Space Alert, but in Policy Management> actual Policy> Monitor Sets, is there a wildcard for "MachineID" for the Agent Procedure to run on? I specify my Agent Procedure/Script, bull all my devices show up under the MachineID drop down. I want the script to run on the machine that generated the Alarm. Do I leave this field blank, or use an *, etc. for wildcard?
By leaving it blank, it will run on the machine that that generated the alert.
Thanks @Cesar. I thought that was the case and when the re-arm time elapsed, the script ran and I did see the expected results.
We too built a custom solution - as far as I can tell, low disk alerts only trigger when an audit is run -- it's not 'real time' or even regularly checked. This was a killer for us....if you only audit every few days (as Kaseya suggests as a best practice) you'll rarely pick up a problem in time.
The issues with % free vs capacity, removable drives, system restore partitions etc. are well documented -- basically disk monitoring in Kaseya is poorly thought out, and we simply don't trust it.