I think it is important to note, for anyone reading this thread before deploying the update, that once the update is applied, the 'somewhat usable' Desktop Access function (which uses TightVNC) will no longer work until the 188.8.131.52 - 184.108.40.206 agent is installed on the endpoint. Remote Control, which is the RealVNC client still works either from the QL or from the Remote Control tab.
Let me clarify some inaccuracies with the data here. Our agent updater is very robust and truly can handle updating 100's or even 1000's of agents, the limit isn't in the agent updater its in the practicality of bandwidth requirements both on the kserver side and on the client side.
Right now in 6.2 the agent update function will automatically push the agent update to the machines you specify in1 minute intervals to all selected agents whether they are online or offline.
Imagine the agent updater to function like an Agent Procedure in terms of packets and file delivery. We still break the files into 64k chunks to deliver etc, etc. So if the update is 1.5MB and you select 2500 machines you can see where that consumes a lot of bandwidth on the kserver side and the client side is all dependent on how many machines on one subnet/internet connection are trying to update. This is the MAIN REASON FOR GROUPING UPDATES. To bear this out further, let's say you schedule the agent update that includes 100 machines on one network that at the time you click update, are offline. If the next morning all those machines come back online at once and start requesting say, a 1.5MB agent update, you better hope they have some good bandwidth on their side to support it and on the kserver side.
So now, take this into 6.3 - I'm going to give you some inside info on what's coming.
We have built the ability in 6.3 for you to schedule the agent updates with your own timing, your own specified intervals, basically the same scheduler you see in Agent Procedures today. Sounds good eh?
Hope this helps.
I assume that agent updates are treated differently than any other kind of update and do not cache on the Patch Management location? I know that KES and BU/DR both use the PM file location so that 100's of install/update requests do not need to bang on the VSA server.
Brendan CosgroveOur agent updater is very robust and truly can handle updating 100's or even 1000's of agents, the limit isn't in the agent updater its in the practicality of bandwidth requirements both on the kserver side and on the client side. Right now in 6.2 the agent update function will automatically push the agent update to the machines you specify in1 minute intervals to all selected agents whether they are online or offline. [...] We have built the ability in 6.3 for you to schedule the agent updates with your own timing, your own specified intervals, basically the same scheduler you see in Agent Procedures today. Sounds good eh?
Issue 1: The Scheduler function can't handle scheduling an Agent Update on more than 100-200 machines at a shot.Solution 1: Specialists have indicated that this is a "known issue" and that it will hopefully be addressed during the "focus on the core" drive currently underway. This is now a "Feature Request" in Kaseya's system (CS091766). Issue 2: There doesn't seem to be an automated way to make these updates happen that's built-in to Kaseya. They're scheduled as one-offs, so each time a framework upgrade comes along (6.1 to 6.2) or this KLC patch makes it into the wild (VSAPatch and VSAPatch01), we must, at present, schedule these updates.Solution 2: I think you're driving at "this will get better" in 6.3, and that we can schedule this on a recurring basis? I hope?Issue 3: There is no way to "filter" based on the KLC version in order to determine which agents need updated to the latest version inclusive of the KLC patch.Solution 3: This is now a "Feature Request" in Kaseya's system (CS091766).
Anyway, just thought I'd re-frame this and clear up any misconceptions about what we're all on about.
Brendan, I have to disagree. The scheduler has had issues for a long time, in many different modules. It's been awhile since I tried it, but previously when trying to change the patch install schedules for 1000 agents at a time, the scheduler would fall on it's face every time. I worked with support and they cleaned up our database a little which made a slight improvement, but I still pretty much couldn't make a schedule change for more than 200-300 agents without it consistently timing out and only a portion of the agents getting the new schedule.
As others have said, pushing this update out was nothing short of a nightmare, and I did end up having to do it group by group, which certainly wasn't a great use of my time. I'm not sure what bandwidth requirements have to do with anything when all I'm trying to do is schedule the update. If i did a group of 20, they would be scheduled, all spread out, as expected. If I tried to schedule a group of 500, it would time out after a little bit, and only a portion of those would ever actually be scheduled. This isn't a bandwidth limitation, it's a scheduler problem.
Yeah, what Brian said.
I'm not sure if this is a good contribution or a bad contribution to this discussion, but here's a couple of comments/thoughts:
1) Alistair's workaround has allowed me to successfully update almost all 6500 nodes in my VSA by doing it by viewing agents with Update Required and Online in the last 1 minute at various times over a week.
2) Instead of "by group", perhaps use an alphabetical approach with <All Groups>. In other words, do all of the machines that start with a. Then b. There's only 26 steps involved in that...
3) Has anyone tried suspending (not alarms, actual suspensions) all of the VSA agents during a (for instance) 10 minute planned maintenance period to let the SQL backend do NOTHING except schedule? This might rule out a SQL performance problem that is masking itself as a scheduler functionality problem.
There does seem to be a problem scheduling ANYTHING on more than X number of machines at once; eventually the New Fancy Scheduler AJAX-y Window basically times out if there are too many endpoints to work on.
With one exception.
What's interesting is that the Agent Procedures scheduler is smart enough to bail out to a "background scheduler" if it detects more than Y number of endpoints selected when you click the Schedule button. Can we get the same smarts plugged into all the other places where New AJAX-y Scheduler is used? I think that'd make everybody much, much happier.
I'm tracking with you guys, but my point was more about all the other factors involved in pushing out agent updates are more likely the cause of problems rather than the agent update function. Clearly there is a feasibility limit based on CPU/RAM bandwidth, online offline status, etc etc. This is why you'll hear support say do it in smaller chunks. I'm curious to know how things would behave if I had a kserver on a LAN and all my agents were on that LAN. I wonder where the bottlenecks would show up? Who knows?
I think the new scheduler for agent updates in 6.3 will also help automate this "grouping" process however you choose to do it.
I'm still not sure why you keep pointing to bandwidth as a reason why when I try to schedule 500 agents to get an update, I'm lucky if 100 of them actually have their Last Update column updated with a scheduled date. We're not saying they get scheduled and the update itself fails, we're saying the SCHEDULING of the update NEVER HAPPENS.
Also you keep referring to this new scheduler in 6.3 for agent updates, yet as Brian pointed out, in 6.2 on the Update Agent page, the scheduler looks exactly like it does in all other areas of the VSA. I'm familiar with the old scheduler where it forced it to a 1 minute increment, but that is not the case in 6.2. In fact, I just tried it again right now, and for the 14 servers on my own network, I chose a 25 minute distribution window, and as expected they are randomly spread out for install anywhere from 8:49am to 9:10am. You seem to be working off outdate info, or else we're completely missing something here.
Brendan: It's frustrating to have people talk about A and have Kaseya miss the point and respond about B. This happens quite a lot. It's pretty clear (At least to me) that this is not a bandwidth issue that people here are talking about in this thread in recent days, and that has been demonstrated by Brian in his DETAILED explanation. I often get the feeling that Kaseya is hearing me, but not really listening. I think some of the frustration here is that people are complaining about the broken scheduler, but you are FOCUSING on other limitations as if what we are saying isn't sinking in which leaves people with a feeling of not being listened to and unsure if in the next release, the actual issue isn't going to be fixed. I think you and a lot of the Kaseya people are doing a great job, I wouldn't want to do it, I take my hat off for the effort you put in, but I think the overall feeling I have now is that communication the way Kaseya does it now, isn't working.
Not meaning to miss the point networkn. Sorry about that. I'm going to look more into this issue. The schedulers in 6.2 for agent update are still the "old" style. If you compare that scheduler to the one in Agent Procedures you'll see some differences. The UI's look similar right now, but the backends are different. In 6.3 it will be the same backend and frontend in Agent Procedures today for 6.2
I'm not in the product personally, as much as you guys are day to day, but here's how they're different today. I'm waiting to see the 6.3 version of Update Agent, but they haven't updated my test server yet.
Here's 6.2 Update Agent Scheduler:
Here's the 6.2 Agent Procedures Scheduler
Forwarded Brian Dagan's post to some engineers to look at as well.
Excellent Brendon, that's the exact right response. Thank you for hearing AND listening! I can see that Kaseya wants to get this right and that is half the battle won right there. Hopefully once a clearer understanding exists, then someone will be able to confirm that the next version will include this SPECIFIC fix, rather than just general scheduler fixes which may or may not resolve the issue mentioned here.
Kaseya: Any update on the status of correcting the issue with the scheduler crashing the VSA when scheduling more than a few procedures at a time? I see the latest version of the agent is now at 220.127.116.11 - 18.104.22.168. I would like to update all endpoints, but I can't take the time to go though and schedule group by group.
I had the same problem with the scheduler when working with Patch Management. I had a specific group of machines that I wanted to run a one time patch cycle on. Apparently it was too many machines for the scheduler to handle because the VSA locked up, eventually returned a SQL error and only some of the machines were scheduled. To make matters worse, scheduling a one time patch cycle cancels the previous schedule, so I had to go back through and reschedule these machines in small groups. I can live with the one time update canceling the schedule if I can simply click all and restore the schedule when the on time update is finished, but the problem with the scheduler really makes this much more time consuming than it should be.
Thanks in advance.