I wanted to review monitor log to review CPU histor for last 24 hours but the monitor had stopped responding at 3AM. I had no idea it had stopped responding. Is there any way we can get alerted when a monitor set stops responding?
ahh the old who is monitoring the monitor - Hi richies I have been chasing this down for some time now in fact months and finally our good friends at kaseya have helped me out. I will come back to you later but I have a SQL query you can run that helps the cause anyway.
From within kaseya directly there is no way to know that a monitor set has stopped responding.
How does this work ?
I would be appreciative of this SQL query also. This is one of those things that has always bothered me about the monitoring set implementations.
mmartin...thanks in advance. Need this Query / details ASAP.
THIS IS PROVIDED AS IS - NO SUPPORT, IT HAS BEEN RUN ON OUR SERVER AND HAS WORKED FINE.
Note: if you have a counter that has a value of 0 it will show up on this list you can exclude these if you are good with SQL - we had a few false positives but we also had some servers with genuine problems so it was worth it.
Here is a SQL tool that searchs your system for all POTENTIAL candidates that have non-responding counters, along with their Operating System.
These query excludes OFFLINE and Suspended agents (You can comment or uncomment the query criteria to suit your need)
In the case you find an Agent that's contains the non-responding counters, please use the following steps to correct it:
(1)IF it's 2000 or 2003 machines, please make sure LogicalDisk counters are enabled on them. "diskperf -YV" in command on these machines would enable them
(2)Check perfmon counters created by Kaseya, since there were earlier deployment errors and you do have a lot of counters assigned to each machine, go ahead, stop and delete all counters created by Kaseya on the AGENT MACHINE.
(3)Run the "Update List By Scan" ONCE if the timestamp from last run is older than 1 month OLD. (Please don't schedule them) and wait patiently, since some of 2008 machines, there are large amount of data and takes a long time to finish
When done, Check if the counters are returned properly by query:
Select Counterobject where agentguid = [$Agentguid] -> the actual agentguid for the troubled agent
This should return all the counter objects for that agent.
If Empty, then, the scan failed, need to look at the scan engine.
(4): Un-assign and re-assign the monitorsets that contain the troubled counters and wait for the result to come back.
SELECT distinct MNT.displayName, UII.OsType OS, MNT.agentGuidStr FROM
SELECT MS.Name MonitorsetName, mds.agentguid, MDS.Monitorsetid,monitorcounterid, mc.name, ISNULL(mc.description, '') as description, collectionthreshold,
collectionoperatorid, thresholdamount, counterobject, ISNULL(CI.counterinstance, '') AS CounterInstance,
thresholdwarning, thresholdoperatorid, ISNULL(countersampleinterval, 60) as sampleinterval,
ISNULL(cop.name, '') as coloper, ISNULL(aop.name, '') as alarmoper, ISNULL(ctr.counter, '') as counter,
ISNULL(ctr.description, '') as ctrdescription, ISNULL(mc.allConfigId, 0) as allConfigId
FROM MonitorDeploymentDetail MDD WITH(NOLOCK)
JOIN monitorDeploymentSummary MDS WITH (NOLOCK) ON MDD.Agentguid = MDS.Agentguid AND MDD.MonitorDeploymentID = MDS.MonitorDeploymentID and MDD.MonitorsetID = MDS.MonitorsetID and MDS.Latest = 1 -- and MDS.MonitorsetID = monSetId and MDS.agentguid = acctGuid
JOIN monitorcounter mc WITH(NOLOCK) ON MDD.MonitorsetID = MC.MonitorsetID AND MDD.monitorCSPId = mc.monitorCounterId and MDD.MonitorType = 0
AND MDD.MonitorSetID = mc.monitorSetId
JOIN Monitorset MS ON mc.monitorSetId = MS.monitorSetId
JOIN monitorsetmachinexref mx ON mc.monitorsetid = mx.monitorsetid and mx.monitorSetId = MDS.MonitorSetID
JOIN monitormachineparam mp ON mx.monitormachineparamid = mp.monitormachineparamid AND mp.agentGuid = MDS.AgentGuid
LEFT OUTER JOIN counterobjectList co ON mc.counterobjectid = co.counterobjectid
LEFT OUTER JOIN counterinstanceList ci ON mc.counterinstanceid = ci.counterinstanceid
LEFT OUTER JOIN monitoroperator cop ON mc.collectionoperatorid = cop.monitoroperatorid
LEFT OUTER JOIN monitoroperator aop ON mc.thresholdoperatorid = aop.monitoroperatorid
LEFT OUTER JOIN counterList ctr ON mc.counterid = ctr.counterid
) AS AA
-- comment to include suspended agent
join Users U on AA.agentguid = U.agentguid and (U.SuspendAgent is null or U.suspendAgent = 0)
join UserIPInfo UII on U.agentguid = UII.agentguid
JOIN monitorCounterLogSummary MCLS on AA.agentguid = MCLS.Agentguid and AA.monitorCounterId = MCLS.monitorCounterId
and ( MCLS.counterValue IN (-998, -999) or MCLS.eventDateTime < GETUTCDATE() -1)
-- Uncomment this to check only all currently online agent
JOIN agentState ON AA.AgentGuid = agentState.AgentGuid and agentState.online = 1
JOIN machNameTab MNT ON AA.AgentGuid = MNT.agentGuid