Kaseya Community

ALERT when monitor not responding

This question is not answered

I wanted to review monitor log to review CPU histor for last 24 hours but the monitor had stopped responding at 3AM.  I had no idea it had stopped responding.  Is there any way we can get alerted when a monitor set stops responding?

All Replies
  • ahh the old who is monitoring the monitor - Hi richies I have been chasing this down for some time now in fact months and finally our good friends at kaseya have helped me out. I will come back to you later but I have a SQL query you can run that helps the cause anyway.

    From within kaseya directly there is no way to know that a monitor set has stopped responding.

  • How does this work ?

  • I would be appreciative of this SQL query also. This is one of those things that has always bothered me about the monitoring set implementations.

  • mmartin...thanks in advance.  Need this Query / details ASAP.  

  • THIS IS PROVIDED AS IS - NO SUPPORT, IT HAS BEEN RUN ON OUR SERVER AND HAS WORKED FINE.

    Note: if you have a counter that has a value of 0 it will show up on this list you can exclude these if you are good with SQL - we had a few false positives but we also had some servers with genuine problems so it was worth it.

    Here is a SQL tool that searchs your system for all POTENTIAL candidates that have non-responding counters, along with their Operating System.

    These query excludes OFFLINE and Suspended agents (You can comment or uncomment the query criteria to suit your need)

    In the case you find an Agent that's contains the non-responding counters, please use the following steps to correct it:

    (1)IF it's 2000 or 2003 machines, please make sure LogicalDisk counters are enabled on them. "diskperf -YV" in command on these machines would enable them

    (2)Check perfmon counters created by Kaseya, since there were earlier deployment errors and you do have a lot of counters assigned to each machine, go ahead, stop and delete all counters created by Kaseya on the AGENT MACHINE.

    (3)Run the "Update List By Scan" ONCE if the timestamp from last run is older than 1 month OLD. (Please don't schedule them) and wait patiently, since some of 2008 machines, there are large amount of data and takes a long time to finish

    When done, Check if the counters are returned properly by query:

    Select Counterobject where agentguid = [$Agentguid]  -> the actual agentguid for the troubled agent

    This should return all the counter objects for that agent.

    If Empty, then, the scan failed, need to look at the scan engine.

    (4): Un-assign and re-assign the monitorsets that contain the troubled counters and wait for the result to come back.

    ____________________________________________________________________________________________________________

    SELECT distinct MNT.displayName, UII.OsType OS, MNT.agentGuidStr FROM

    (

    SELECT MS.Name MonitorsetName, mds.agentguid, MDS.Monitorsetid,monitorcounterid, mc.name, ISNULL(mc.description, '') as description, collectionthreshold,

                    collectionoperatorid, thresholdamount,  counterobject, ISNULL(CI.counterinstance, '') AS CounterInstance,

                    thresholdwarning, thresholdoperatorid, ISNULL(countersampleinterval, 60) as sampleinterval,

                    ISNULL(cop.name, '') as coloper, ISNULL(aop.name, '') as alarmoper, ISNULL(ctr.counter, '') as counter,

                    ISNULL(ctr.description, '') as ctrdescription, ISNULL(mc.allConfigId, 0) as allConfigId

        FROM MonitorDeploymentDetail MDD WITH(NOLOCK)

        JOIN monitorDeploymentSummary MDS WITH (NOLOCK) ON MDD.Agentguid = MDS.Agentguid AND MDD.MonitorDeploymentID = MDS.MonitorDeploymentID  and MDD.MonitorsetID = MDS.MonitorsetID and MDS.Latest = 1 -- and MDS.MonitorsetID =  monSetId   and MDS.agentguid =  acctGuid

        JOIN monitorcounter mc WITH(NOLOCK) ON MDD.MonitorsetID = MC.MonitorsetID AND MDD.monitorCSPId = mc.monitorCounterId and MDD.MonitorType =   0

        AND MDD.MonitorSetID = mc.monitorSetId

       JOIN Monitorset MS ON mc.monitorSetId = MS.monitorSetId

        JOIN monitorsetmachinexref mx ON mc.monitorsetid = mx.monitorsetid and mx.monitorSetId = MDS.MonitorSetID

                    JOIN monitormachineparam mp ON mx.monitormachineparamid = mp.monitormachineparamid  AND mp.agentGuid = MDS.AgentGuid

                    LEFT OUTER JOIN counterobjectList co ON mc.counterobjectid = co.counterobjectid

                    LEFT OUTER JOIN counterinstanceList ci ON mc.counterinstanceid = ci.counterinstanceid

                    LEFT OUTER JOIN monitoroperator cop ON mc.collectionoperatorid = cop.monitoroperatorid

                    LEFT OUTER JOIN monitoroperator aop ON mc.thresholdoperatorid = aop.monitoroperatorid

                    LEFT OUTER JOIN counterList ctr ON mc.counterid = ctr.counterid

         ) AS AA

         -- comment to include suspended agent

         join Users U on AA.agentguid = U.agentguid and (U.SuspendAgent is null  or U.suspendAgent = 0)

         join UserIPInfo UII on U.agentguid = UII.agentguid

         JOIN monitorCounterLogSummary MCLS on AA.agentguid = MCLS.Agentguid and AA.monitorCounterId = MCLS.monitorCounterId

         and ( MCLS.counterValue IN  (-998, -999) or  MCLS.eventDateTime < GETUTCDATE() -1)

    -- Uncomment this to check only all currently online agent

        JOIN agentState  ON AA.AgentGuid = agentState.AgentGuid and agentState.online = 1

         JOIN machNameTab MNT ON AA.AgentGuid = MNT.agentGuid