Kaseya Community

Potentially serious Kaseya NOC team Monitor Set issues (Handy SQL Query inside)

  • Hi Community, (This is a long post but it may be worth your while reading it)

    This post only applies if you are, or have been leveraging the NOC team for monitoring. 

    The Kaseya NOC team applies an Event Log set to your servers, if you ask nicely, they usually create that event set as a template you can apply to other servers you don't enrol with them.

    We have been using this event set on all our non NOC team servers for a year or so now. 

    The Story

    Recently, i had an issue with my Kaseya Server, i was collecting too many event logs daily, and the retention was too long.

    The Kaseya Support team provided me with a SQL query to check the HIGHEST EVENT LOGS occurring daily.

    This query doesn't neccessarily show alarms, it is every event captured for the day.

    (If you'd like to test this yourself, check date on the ntEventLog bit of the query to make sure the query is correct. See the query below.)

    select b.displayname, a.eventid, a.source, b.groupName, count(a.agentguid) cnt
    from ntEventLog20120702 a, machnametab b
    where a.agentguid = b.agentguid
    group by b.groupName, b.displayname, a.eventid, a.source
    order by cnt desc

    When you pull this query into Excel and Pivot it, you get some useful information, here is the information for one of my customers.

    The Problem

    When i ran this query and saw the results i was instantly concerned. I have NEVER seen this alarm come up on our dashboard ever. Let alone received an email for it. 

    So i checked my monitoring to see if i was indeed capturing this particular event as an alarm. Steps to check this out are.

    Machine Group: Sample Machine Group
    Machine: Any machine
    VSA Module: Monitor
    Subsection: Agent Monitoring
    Subsection: Event Log Alerts

    Select Log Type: System
    Search for Event Set: high.server.error.events (This is one of the events the Kaseya NOC team creates)
    Edit that event set - Search for event: 2262, you will see that it exists, i should be recieving alarms for it. 

    It turns out that that eventID does exist, and i have it setup to create an alarm, and send an email on the occurrance of the eventID happening. 

    The Support Case

    The problem is that the Alarm never went off, ever...so i emailed support. To their credit, they were right, the respsonse was

    The Outstanding Issue

    What do i do to remediate this? There is potentially 3000+ Event logs incorrectly configured.

    If other people are using a Kaseya NOC team provided Event Log Set, people world wide are potentially missing quite important alerts.

    I took the event sets we were given by the Kaseya NOC team as gospel, if EventID 2262 was set up incorrectly,

    I am hoping support doesn't ask me to modify the template to include the correct Event Log Type for each Event Log, with 3000+ events, that will take a long time.

    Any thoughts?

    Anyway, does anyone here understand what i just wrote about? Has anyone experienced it? Was there a fix? It is still an open ticket for us, i thought i'd share it with the community to check what you all think.

    Regards, Mark.

  • To be honest, This just proves that the NOC are fallible - either or perhaps generally event id 2262 as an error is different to error 2262 as a warning? - I also use a lot of the NOC sets and have also assumed that they're completely reliable.

    I guess the "fix" would be to add 2262 from whichever source it's occuring, to high.server.warning.events. - if you have access to the event sets?

    In my organisation, all the engineers are advised to let us know if there are any critical events missing from our monitor sets when they find them. We don't get much feedback from them, but the procedure is in place internally at my organisation.

    Phil.

  • Mark ,

       Good Morning , We reviewed the complete post and thanks for the detailed information and data .  As rightly captured by you  Event ID 2262 and 2263 related to IIS application pool recycling  currently got included in high server error set and this will be moved to a warning set with the Standard Library .

    The library review is done periodically and update to it  is an ongoing process, we do our best to ensure that all the proactive warning and error conditions are captured. To re-assure we will take up a small project internally to review the monitoring library and the content .

    As you are aware , we are currently managing thousands of servers with hundreds of partners and the best asset is the high amount of data and feedback. In an event of a Critical situation or event not

    Captured  by the current library we do get an update from our partners  and we ensure that the data sent is reviewed and a valid event added to the library . Such instances are very  few or rare.

    Other point to add  ,monitoring library has a filtered set of events and it would not capture all errors or warning with a machine. As the Event log table is just a raw collection of all the data there can be cases where in , some of the events might come up as top ones but they may not be with the library or no alarms would get created.

    Summarizing

    1) We will take the Ownership to ensure that the Monitoring Library and Content is all validated and the Same is updated with the enrolled machines.

    2) Currently there is no Major Concern as all Critical errors and issues are captured as expected.

    3) If there is any Change we see after the Mini Project will ensure that all enrolled machines on your VSA are updated with the Content.

    Kaseya IT Services Team

  • Hi all,

    @Phil, of course  ,i agree 100 percent, any organisation is Fallible, even Microsoft Support staff have proven time and time again that they mightn't be experts in their own product. Not a criticism, but with your engineers, is there any way other than logging onto servers that they see the event ID's that aren't in the NOC sets?  That query might help out. I mean no offense, this whole thing was a good exercise i guess. My only concern outstanding is that there are more ID's in the event set other that 2262 that are set up incorrectly. Our new business process is that daily i'll run that query and cross reference the top occurring events against what the NOC sets are, where they don't line up, i'll create new monitor set and provide it back to the community so everyone is up to date.

    @Kaseya IT Services team, please, don't take anything i said as an outright criticism, the task your team carries out is indeed an immense one, i couldn't begin to fathom the man hours your team puts into making it accurate, the service is exemplary, if only it was a little more affordable i would have kept every single server i monitored enrolled.

    My concern is still outstanding, but we have a process in place to mitigate it. One customer of mine registers around 10 EventID alarms a day. When i ran the query mentioned in the post above, i noticed 50 or so additional Event ID's worth taking action on. So as to help the community, i will keep running this query daily and build my own event set, cross referenced against the NOC team's, and hopefully share it out with the community in good time.

    @Those reading this who aren't either Phil, or the Kaseya IT Services team, the NOC team does provide an incredible service and there is nothing they haven't been able to help me with in the past, and i implore you to not take this as a direct criticism of them, it was more a "be careful with your monitoring bulletin"

    Lesson learnt by me:

    ---------------------------------

    Don't rest on our laurels, as good as the NOC team's event sets are, it is hard to keep up, if you use their Event Sets, supplement them with your own, perhaps use the above query, and cross reference your results against their

  • It's a bit of a grey area using others Monitor sets.  Sure the NOC team should have a pretty reliable set of these but one man's throwaway is another's critical error.  I think the only way round this is to create your own or talk to the NOC guys and ask them to review what they are using.  I can see they've done that already which is good news for all.

    Thanks for that bit of SQL Mark it's opened my eyes to some errors that are flooding my server.

  • Mark + Kaseya - don't get me wrong either.. These things occur all the time, but without the NOC scripts I would be lost. It is a long, hard and laborious job compiling event log sets and the service provided to create those sets in the first place is invaluable.

    Thanks for the SQL scripting Mark. I may well look at implementing that too, as we're getting a lot of kaseya resource errors at present and there must be something annoying at fault. (Hopefully not me! haha.)

    Regards.

    P.

  • @Alistair, i understand. The main reason the NOC sets where used is that it would seem to be the most comprehensive Event Set i've seen so far. Coupled with them having far more access to great server engineers than i could currently dream of (Not denigrating my guys they are great - you know what i mean).  I put a lot of faith in the NOC team to trust that they throw away a lot of the noise and they do. I will definitely be taking a lot of care and monitoring it more actively with that SQL query.

    @Phil, It definitely would be long hard and laborious without the NOC team doing what they do, i don't think i'd be where i am now with my customers if it weren't for the NOC team. 

    That SQL query opened my eyes to the events coming through to our system. I can't take full credit, one of the support guys gave it too me, but it was light on, so i edited it and added a few more columns. It has potential to grow further, i just tested this query with a union - here is something you guys might find more useful (be careful)

    select b.displayname, a.eventid, a.source, b.groupName, count(a.agentguid) cnt
    from ntEventLog20120703 a, machnametab b
    where a.agentguid = b.agentguid
    group by b.groupName, b.displayname, a.eventid, a.source

    union

    select b.displayname, a.eventid, a.source, b.groupName, count(a.agentguid) cnt
    from ntEventLog20120702 a, machnametab b
    where a.agentguid = b.agentguid
    group by b.groupName, b.displayname, a.eventid, a.source

    union

    select b.displayname, a.eventid, a.source, b.groupName, count(a.agentguid) cnt
    from ntEventLog20120702 a, machnametab b
    where a.agentguid = b.agentguid
    group by b.groupName, b.displayname, a.eventid, a.source

    order by cnt desc

    I say be careful with this for a few reasons

    1. DO NOT be tempted to change your event log collection to 30 days, you will kill your server (personal experience)
    2. Your result set will be quite large when you do a union across these tables, across a number of days, if you are querying across a WAN interface, it will take a while.

    If you do run this query with a union, put the group by as it is, and the order at the very end other wise it will fail. 

    Cheerio all, Mark.  

  • Hi again all,

    I have re written the query with a bit more smarts in it. The query with the union would require you to manually change the date.

    It turns out Kaseya provides us with a view that does this automatically.

    PRIOR WARNING - THIS QUERY CAN BE SERVER INTENSE - READ ON - ONLY RUN IF YOU KNOW YOUR DATABASE WELL


    SELECT     groupName, machName, ApplicationName, eventId, username, EventMessage, count(computerName) cnt
    FROM         dbo.vnteventlog
    WHERE eventType not like '16'
    and eventType not like '8'
    and eventType not like '4'
    group by groupName, machName, ApplicationName, eventId, username, EventMessage
    order by cnt desc

    This query is useful as it does the join, but it also ignores informational alerts, success and failure security audits (I find them largely useless for reporting purposes. 

    One word of caution is that this joins ALL of your ntEventLogs that are in your database. If you collect event logs for 30 days, this query will take an ungodly amount of time.

    We only collect for 7 days and it takes about 2 minutes to execute.

    The beauty of the query though is that it shows the top occuring event logs for the last 7 days, if these events aren't occuring on your dashboard,  you can cross reference them against your monitor sets. 

    See this screenshot

    On top of this, Kaseya now does something in 6.2 i didn't know about, quite handy. If you cross reference the above against the Live Connect Event Viewer, you can quickly create an IGNORE event set of your very own. 

    Live Connect to machine > Event Viewer > Click the button in the screen shot below

    I don't think we will ever miss an event every again with this query and such a quick nifty way to deploy event sets! 

    Cheerio all, Mark.

  • For what its worth here is my solution...

    The inherent problem I found is that you don't know for certain what all Source Filters are (both currently and what they might be in the future).  The solution is to start by creating an event set to capture *ALL* event log ID's and work your way back.

    Using this I was able to start customising another monitor set by adding ignore filters for events that I know that I dont want to see.

    By using this method I am guaranteed to see *everything* except what I already know to be not worth my attention.

    Cheers

    Josh

  • Thank you Joshua, it was very useful !

    Regards.

    Lorenzo.