We've recently deployed KNM to monitor ESX hardware, most of which are Dell PowerEdge servers. We're using the built-in wbem_esxi_health_.lua script, which is supposed to monitor RAID status, among other things.
As a test, we removed the power supply from our test server. As expected, we received an error. Plugged the PSU back in, and the error cleared. Then, we pulled a drive from our server. Both the server and ESX noted the error, but KNM did not. We contacted Kaseya support, and were told that they've confirmed this as a bug. They behaved as though they were just discovering this bug.
KNM has been out for a while. I can't possibly be the first guy who's had a hard drive fail. Have others experienced this? Can you provide a workaround?
For IBMs and Dells use SNMP against the DRAC and IMM. There are MIBS/OUIDs for 'OverallSystemHealth'. And they work a treat..
can you give some more details on this.....
This is a common problem to all third-party monitoring tools; this is because ESXi only supports natively monitoring on a VERY small number of RAID cards.
Most of the time you need to load a manufacturer-specific CIM driver to retrieve RAID status - hint, if you can't see the RAID status in the vSphere console under health status, KNM won't see anything to monitor either.
Here's the adaptec CIM for ESXi link: www.adaptec.com/.../cim_vmware_v7_31_18856_zip.php You'll find most decent RAID vendors make a CIM provider for their cards and your version of ESXi.
Once you get your CIM driver up and running and can see the RAID status in the vSphere health status section, THEN within Kaseya you can build an SNMP lookup for it, or write an LUA script to monitor it.
Some good information there Craig will keep it in mind for some of the ESXi monitoring I'll be doing soon.
I install esxi hosts using the server manufacturers custom iso so I get the correct CIM and drivers.
Most of the custom iso's can be found from my.vmware.com but ie. Dell is not there. Dell's iso have to be d/l'ed from their support page.
And you can add the manufacturers vmware software depots into VUM so you can install the latest versions via it.
Can you give me an example of what you've used for Dell? I worked w/ a Kaseya tech trying to setup this very thing, and we could not get it working.
I use the Dell ISO. Even when I use the vanilla ISO, I am able to see RAID status in the vSphere console. As a test, an nAble tech had be browse the MOB (not sure what that is exactly, but it's a web interface). Nothing was reporting there, using either ISO.
We tried SNMP monitoring on the iDRAC, but were unsuccessful.
We've started using the iDRAC remote syslog feature to post status warnings/errors to KNM. Works like a champ. You can use error FAN1000 in iDRAC to test.
The IDRAC solution is a good option although IDRAC only started monitoring Storage after a certain version (I think > 6).
The alternative is to install on the host the VIB from DELL (below link is for VSphere 6)
And once that's done you can monitor storage (and pretty much any other Sensor) using either LUA Script or the CIM Indicator (Class OMC_DiscreteSensor).
This would work regardless of the IDRAC version you are using.
All of our servers have the Dell VIBs installed. The problem we're running into is that no one (not even Kaseya) seems to be able to show us a LUA script, CIM setup, or SNMP setup that works.
We've been working w/ Michael Duncan at Kaseya, and have made some progress, but it's slow going.
Hello Jason,This script (modified from an original kaseya script) should give you a LUA to monitor physical disk drives of a Host running VMWare.Make sure CIM is enabled on the host- Configuration Tab - Advanced Settings - UserVars.CIMEnabled = 1and modify the main function with your user name and password.Please forgive the quality of the script, you can start this as a base and make it better (at least it's a start)
There are few things I still have to work on:
Returning Values that can be "charted" and understand why the GetAccountUser / Password functions don't work in my environment (they should get the value from the KNM Authentication Settings) so for now the credentials are "hardcoded".Other than that, this allows me to get an alert every time a Disk fails.I tested it by taking out a disk from an Array and I get the alert correctly.
Hope it helps.