Hi guys and girls, another post similar to my last,
After i wrote that query to pull in events i wondered exactly what i could do with it - turns out it is quite extensible and quite powerful.
I thought to create a "Event ID percentage score" similar to the patch score given in the default Kaseya reports.
If you run the following Query in Microsoft Excel, and create a pivot table for it.... you get an accurate percentage of the EventID's that should consume the most of your time
select b.displayname, a.eventid, a.source, b.groupName, count(a.agentguid) cntfrom ntEventLog20120717 a, machnametab bwhere a.agentguid = b.agentguidand a.eventid not like '25'and a.eventid not like '26'and a.eventid not like '30'and a.eventid not like '31'and a.eventid not like '32'and a.eventid not like '34'and a.eventid not like '35'and a.eventid not like '36'and a.eventid not like '37'and a.eventid not like '45'and a.eventid not like '47'and a.eventid not like '50'and a.eventid not like '53'and a.eventid not like '54'and a.eventid not like '55'and a.eventid not like '59'and a.eventid not like '60'and a.eventid not like '61'and a.eventid not like '62'and a.eventid not like '63'and a.eventid not like '70'and a.eventid not like '77'and a.eventid not like '100'and a.eventid not like '101'and a.eventid not like '102'and a.eventid not like '103'and a.eventid not like '107'and a.eventid not like '108'and a.eventid not like '109'and a.eventid not like '114'and a.eventid not like '118'and a.eventid not like '120'and a.eventid not like '124'and a.eventid not like '126'and a.eventid not like '128'and a.eventid not like '130'and a.eventid not like '134'and a.eventid not like '135'and a.eventid not like '141'and a.eventid not like '142'and a.eventid not like '145'and a.eventid not like '148'and a.eventid not like '152''group by b.groupName, b.displayname, a.eventid, a.sourceorder by cnt desc
From there, you take the sum of the top 5 occuring event ID's filtered by the site you want to check, and divide it by the grand total. This site for example, i'd say was "90.00%" healthy.
Of course, health is a made up figure, but i can justify it easy enough to my customer.
Out of 20 servers this customer had 5 events occuring on 2 servers that made up 90% of their errors. Thats justification enough to me.
Of course, with that query, there are 2 things. 1. Change the date each day2. The "a.EventID not like '##' - the actual query i use has 1000 or so event ID's to ignore - remember that the table being queried is events collected not alarms, so the ignore sets don't apply.
Next up, Performance Monitoring, this one i'm pretty proud of. I've yet to come across a good way of reporting performance.
SELECT b.displayName, a.monitorCounterId, a.counterValue, b.groupName, COUNT(a.agentGuid) AS cnt, monitorCounter.nameFROM monitorCounterLog20120717 AS a INNER JOIN machNameTab AS b ON a.agentGuid = b.agentGuid INNER JOIN monitorCounter ON a.monitorCounterId = monitorCounter.monitorCounterIdWHERE monitorCounter.name like 'Available MBytes < 256' and counterValue < 256 and counterValue not like '-999' and counterValue not like '-998' or monitorCounter.Name like 'Page Reads/sec' and counterValue > 90 or monitorCounter.name like 'Page Writes/sec' and counterValue > 90 or monitorCounter.name like 'Pages/sec' and counterValue > 150 or monitorCounter.name like '% Committed Bytes In Use' and counterValue > 85GROUP BY b.groupName, b.displayName, a.monitorCounterId, a.counterValue, monitorCounter.nameORDER BY cnt DESC
A few things to note about this
1. The counter value is the greater or less than value it takes for our performance monitoring to exceed a threshold2. In the Datadump / work sheet in excel, you need to create a "Percentage" column. This is, the count of the times the thresholds were exceeded divided by the number of times you collected that data.2A. We collect each performance metric 96 times a day, or even 15 minutes. If the threshold was exceeded 9.6 times, then i can safely say the server was busy 10% of the day
3. When you pivot from that table on the fields in the above table - you will see which servers were really busy for that period of time.
I know it's a bit confusing to wrap your head around - it took me ages to figure all this out, but it was a worth while process.
Any queries or additions? Let me know if i have a flaw in my logic or i missed anything :)
well that didn't work - for those interested - right click and save the pictures as you need to view them