We have moved from SaaS to On-Prem and would like to make more use of KNM to monitor hardware. We have mainly Dell physical servers running Windows and Dell physical servers running VMware with some HP servers running Windows.
Any good guides or advice on best things to do from others ?
Before doing network scans on all of your equipment... go through and setup your own monitor templates. Establish/discover one system of each infrastructure type and setup your template based on what you want to monitor. DCs, Hyper-V/VMware hosts, SQL servers... Setup a separate AD account for use with KNM monitoring.
We use KNM extensively for monitoring server availability and performance.
Operational Availability - these tell us if a server is operational, or if network services are operating properly (not just "service up", but "valid response to query"). Our generic "Server Operational" alerts after 15 minutes - short enough to be effective without alerting on a basic reboot. Other operational availability alerts trigger after 20-30 minutes of sustained failure to minimize false alerts.
Performance Monitors - we create multiple tiers of performance monitors in templates, so that we can apply monitors based on the capability of the platform and the optimization performed, if any. Performance alerts trigger only after 30 minutes of sustained alarm condition. We also drive these alerts through Service Desks to allow "monitor only (never alert), alert only when help-desk is staffed, alert between specific hours, or alert always. We never enable performance alerts without some level of system optimization. We also restrict performance monitors to workday operating hours - we expect systems to be driven hard after hours for backups and other time-critical procedures.
The time-restrictions for certain alert types is built into our Service Desk component - see the Automation Exchange for more information. We also have a "Multi-Tool" with several time calculation functions if you want to "roll your own" service desk logic.
We also use a dedicated account for KNM, although we use a local administrator account instead of a domain account. Don't use a Domain Admin account for this!
KNM is tied to Discovery, so your Discovery module needs to be in good shape before you deploy KNM. When we deploy our RMM Suite, we usually delete ALL of the discovered devices and networks, recreate them properly, and name the networks so they can be identified if the gateway itself fails. Recognize also that since you can monitor assets without agents installed, you won't always have a way to associate devices with your PSA. Since we only monitor servers and network assets with KNM, we delete all unwanted (workstation) assets from discovery once we've completed the discovery and installed agents. This keeps the display much more manageable as well.
OK thanks for the info, I'll have a look through each of our customers and see what similarities they have.
Thanks Glenn, I've had a look through a little bit and we have done the discoveries and named the networks accordingly. It doesn't seem that straight forward to monitor a physical disk in an ESX host for instance ? On a physical box with Windows installed the Dell event logs work well but with ESX if a disk fails, obviously it cannot do this with ESX, do you use KNM to monitor ESX at all ?
You can use ILO or DRAK monitors. Both of them have something like "overall system health" that would trigger on critical alerts.
.18.104.22.168.4.1.674.10822.214.171.124.0 for DRAK
126.96.36.199.188.8.131.52.1.3.0 for ILO