Happy New Year!
I was faced with a need to identify any machines that had non-responsive C:\ drive counters, i.e. those machines which had a counter status of "Not Responding" or "No Current Data." Identifying these machines in SQL is easy enough (I believe there are queries already on the forums to get a list of machines with non-responsive counters); however, I needed a way to flag these machines in Kaseya so that I could throw the kitchen sink at them script-wise (ensure that Performance Logs & Alerts is Automatic and Started and that LogicalDisk performance counters aren't explicitly disabled in the Registry).
So, I made the query you see below. A few must read caveats:
1) You need to be on-prem with direct database access in order to run this query
2) You'll need to create a Machine Custom Field (I named mine Unique_Snowflake* but you can do whatever you'd like--just update the variable in the SQL code) that will contain either True or False--True if the machine has a non-responsive C:\ drive counter and False if it's fine:
3) (optional) You'll need to pre-populate the Machine Custom Field you just created with False
4) You'll need to create a view for machines where the Unique_Snowflake* (or whatever) MCF is True -and- False to help identify machines that are borked and OK, respectively:
5) Your Monitor Set that monitors the C:\ drive must have the C:\ drive explicitly mentioned as the monitored object, and you must know the Counter Name of that object:
6) This script uses today's monitorCounterLogYYYYMMDD table, where YYYYMMDD are calculated when the script is run (hence the on-the-fly query creation). Make sure your Disk monitor set has collected data (for example, if you're polling for disk free space on an hourly basis, the monitorCounterLogYYYYMMDD table might not have data until at least 1 AM for all Agents).
7) USE THIS SCRIPT AT YOUR OWN RISK. This script directly modifies your Kaseya database by updating the auditRsltManualFieldValues for the MCF that you specify. Kaseya's database schema may change. This has been tested on R8 8.0.0.12, but it's possible that these tables' names will change and that this query might break. This might not be the best or most efficient SQL query, so please go easy on me
<Beginning of SQL>
-- Declare the user-defined variables
DECLARE @fieldName NVARCHAR(100)
DECLARE @monitorSetName NVARCHAR(100)
DECLARE @driveLetter NVARCHAR(1)
-- Set the user-defined variables
SET @fieldName = 'Unique_Snowflake*' -- The name of the MCF that acts as a "flag" set to True on problem machines
SET @monitorSetName = '_AM - DISK (Disk Free Space - Servers)*' -- The name of the Monitor Set that monitors the primary drive
SET @driveLetter = 'C' -- The letter of the primary hard drive that the Monitor Set references via Counter Name
-- Validate that the MCF exists
DECLARE @fieldNameFK NVARCHAR(100)
SET @fieldNameFK = ISNULL((SELECT id FROM auditRsltManualFields WHERE fieldName = @fieldName),'Does Not Exist')
IF @fieldName != 'Does Not Exist'
BEGIN
-- Nobody is a unique snowflake yet... set any existing defined values for the "flag" MCF to False
UPDATE auditRsltManualFieldValues
SET fieldValue = 'False'
WHERE fieldNameFK = @fieldNameFK
-- Define the tableName based on today's date
DECLARE @monitorCounterLogTableName VARCHAR(100)
DECLARE @currentYear NVARCHAR(4)
SET @currentYear = DATENAME(YEAR, GETDATE()) -- Returns 4 digits
DECLARE @currentMonth NVARCHAR(2)
SET @currentMonth = CAST(RIGHT('0' + RTRIM(DATEPART(MONTH, GETDATE())),2) AS NVARCHAR(2)) -- Returns 2 digits
DECLARE @currentDay NVARCHAR(2)
SET @currentDay = CAST(RIGHT('0' + RTRIM(DATENAME(DAY, GETDATE())),2) AS NVARCHAR(2)) -- Returns 2 digits
SET @monitorCounterLogTableName =
(SELECT 'monitorCounterLog'
+ @currentYear
+ @currentMonth
+ @currentDay)
SELECT @monitorCounterLogTableName AS 'Using table:'
-- From here on out, we have to use EXEC because the table names are themselves variables
DECLARE @sqlCommand NVARCHAR(MAX)
SET @sqlCommand = 'IF OBJECT_ID(''' + @monitorCounterLogTableName + ''') IS NOT NULL
DECLARE @uniqueSnowflakeAgentGUID NVARCHAR(100)
DECLARE uniqueSnowflakeCursor CURSOR LOCAL FOR
SELECT DISTINCT m.agentGuid FROM ' + @monitorCounterLogTableName + ' AS l
JOIN machNameTab AS m ON l.agentGuid = m.agentGuid
JOIN monitorCounter AS c ON l.monitorCounterId = c.monitorCounterId
JOIN monitorSet AS s ON c.monitorSetId = s.monitorSetId
WHERE c.name = ''' + @driveLetter + '''
AND l.counterValue < 0
AND s.name = ''' + @monitorSetName + '''
OPEN uniqueSnowflakeCursor
FETCH NEXT FROM uniqueSnowflakeCursor INTO @uniqueSnowflakeAgentGUID
WHILE @@FETCH_STATUS = 0
IF EXISTS (SELECT agentGuid FROM auditRsltManualFieldValues
WHERE agentGuid = @uniqueSnowflakeAgentGUID
AND fieldNameFK = ' + @fieldNameFK + ')
SELECT @uniqueSnowflakeAgentGUID AS ''MCF being updated for agentGuid:''
SET fieldValue = ''True''
AND fieldNameFK = ' + @fieldNameFK + '
END
ELSE
SELECT @uniqueSnowflakeAgentGUID AS ''MCF being added for agentGuid:''
INSERT INTO auditRsltManualFieldValues VALUES(@uniqueSnowflakeAgentGUID, ' + @fieldNameFK + ', ''True'')
CLOSE uniqueSnowflakeCursor
DEALLOCATE uniqueSnowflakeCursor
SELECT ''Unable to proceed... the ' + @monitorCounterLogTableName + ' table does not exist'' AS ''Status:''
END'
SELECT @sqlCommand AS 'Executing the following SQL query:'
EXEC (@sqlCommand)
<End of SQL>
I hope at least someone else out there has needed a SQL query like this
I've been looking into this lately and have made some headway. I've been stalled out as we are migrating to a whole new environment for R8 and have to build a lot from scratch.
I look forward to testing this in our new R8 environment.
As an additional, I've spent a lot of time with Kaseya working on fixing this issue when we find this. We had over 30 that were not reporting when we did a manual audit. The general KB response they give helped us get down to 20. After stepping through the issues and deconstructing how each part worked, in about a week we got down to one server that was not reporting and that was a WIN2K that was missing the counter completely.
I'll have to review our doco if there is anyone who is stuck getting a server to respond. Reloading the client's OS was NOT a valid option for us.
Great work Brian!
Matt, could you post the link to that KB? Thanks!
Awesome work-around! Care to share the kitchen sink script? :) I know we can set folder permissions, diskperf -YV, logman query, perhaps even logman start based on the ctr #s, but it seems the ultimate solution is to remove the monitor set and re-apply it. Not sure that can, or should, be automated. :/
As always, Brian, great stuff! Thanks! (Too bad I found this post right after I finished manually sweeping through 675 servers to find disk monitoring failures). I have a few things to add that might help:
1. Here's an article that helped fixed disk monitoring issues on some servers for us:
social.technet.microsoft.com/.../19374.windows-performance-monitor-unable-to-add-these-counters.aspx
"lodctr /e:PerfDisk" is the shell command for enabling PerfMon counters for disk monitoring -- not "lodctr /e:Performance Counters" as the article suggests. Running only the command "LODCTR /R" fixed disk monitoring on a Server 2008 R2 server. I put these commands in ExecuteShellCommand steps, run as SYSTEM, in 64-bit command shell on 64-bit OSes, and did not need to mess with changing/specifying the directory.
2. Something I learned with Kaseya Support's assistance is that some monitor sets just weren't working on some servers, even though they work fine on others, and even after clearing/reassigning them. (I think there's a corrupt file or registry entry somewhere that Kaseya can't overwrite, but we didn't spend the time to figure this out). We ended up creating a new monitor set with the exact same settings as the one that we couldn't get working, assigned it to the server and then it works properly.
3. If you contact Kaseya Support and ask for them for the "Deploy lua scripts file" agent procedure (and the .LUA/.DLL files that it go along with it), they should provide it to you. This procedure will reprocess all monitor sets that are assigned to an agent, which is essentially the same as clearing/reassigning them.
Does anyone have the information about the SQL query used to just display agents that have issues with counters? I'd like to just view this information in SQL management studio to get an idea of what I'm dealing with before making any actual changes to my database.
Hey Eric Nemchik
I cannot comment directly about Brian's query which checks for non-responding C Drives.
However, there was a previous thread which discussed this a bit about various queries to find this data:
http://community.kaseya.com/xsp/f/29/p/17912/89549.aspx#89549
Many users each have a unique query tailored for their environment, here is one I frequently use which may help you get what you want:
With that said, for the topic of this thread, I am aware of a recently identified issue with monitor sets still applying to an endpoint after removing them from the UI, that have been applied via Policy Management.
This may interfere with the results you are seeing when you run this query or others.
Hello this query will identify machines that are online and have counter assigned to them but are not responding (they have some sort of issue)
helpdesk.kaseya.com/.../105643816-How-can-I-identify-unresponsive-Perfmon-counters-in-Classic-Monitoring-