So I have an idea for a feature request in BU*DR - something that would give us more sensible completion estimates! I would like to start a discussion here, get the ideas worked out, and have a bunch of us put in a simultaneous feature request through Support to try and show Kaseya that it is worth it to us to get this:
Inaccurate estimated finish times and the dreaded "95% hang"
Backup timing works this way: the VSA looks at the logs for a given machine for the previous backup, and has an idea of how long it took. Let's say, for example, we have a server with 100GB in use, and the previous full backup ran for 2 hours. A new full is triggered at 1pm. The VSA guesses, therefore, that it will finish around 3pm and gives that as the estimated finish time. In addition, it starts running the "Check backup status" command starting around 3pm, every 15 minutes, until the backup actually finishes - the whole time, the GUI shows "95% completed". We'll call this "predictive estimating" - it's sometimes accurate, but often frustratingly inaccurate.
Sometimes .TIB files just show zero bytes until all write activities are finished. Polling directory listings and filtering out results burns bandwidth and CPU cycles.
Possible solution #1 - calculating estimate based on TIB file size
Watch the actual files in the Image Location, and make better guesses, refining them as you go. The VSA already "knows how" to get a directory listing of the backup location - so every 5 minutes, it pulls the directory listing from the target directory, and compares the total size written to the total data amount to be backed up. From this, it knows how fast it is writing to the target. Also, it knows how much uncompressed data needs to be backed up, so it can make a vague guess as to how big the compressed size is. Comparing these it should be able to give a floating estimate of the "real" finish time, and also a *much* better progress graph - instead of sitting at 95% forever. In our example above, the VSA would estimate that the 100GB backup will probably be 60GB on disk (compressed) - so when it sees that 30GB have been written it gives an estimate of 50%, regardless of what time is on the clock.
Possible solution #2 - calculating estimate based on rate
There are performance counters readily accessible in the Windows API. The agent could easily poll the "IO Read Bytes, IO Write Bytes, and IO Other Bytes" from the Trueimagecmd.exe process on a regular basis and *know* how much has been processed. So in our example above, when the TrueImageCMD.Exe process has read 50GB, we know that 50% of our 100GB backup is complete. We can also do the math and project a finish time.
Possible solution #3 - calculating estimate based on free space
If we knew that this backup was the only thing being written to a given target, we could just watch the free space and know from that how much has been used by the backup. This would be similar to solution #1, differing in the following ways: 1. It would work even if the .TIB file stays at zero bytes until completion. 2. If other stuff is being written to the same volume it would be horribly inaccurate :)
Any of these solutions should also give us the ability to see whether the backup is steady, or speeding up, or slowing down - this could be indicated visually on the Backup Status page by color coding, or up/down arrows (for instance, if the backup is trending towards slower and slower performance it might indicate to us that we need to investigate other issues, or at least give us an idea that the estimate should be taken with a grain of salt... but a much smaller grain than we're currently required to swallow with the existing inaccurate estimating method!)
I would love to hear anyone else chime in with refinements, other ideas, etc. Then when we've discussed it we can all make a feature request and see if Kaseya will give us some better, more useful and accurate estimates.