We're looking into automating our backup remediation and hoping that someone may have some insight for the following scenario:
1. Backup starts at 9PM on 12/1/2015
2. Backup continues to run for 48 hours, may or may not be writing files but should complete within this time frame - chances are its likely stuck
3. Backup is automatically cancelled after XX hours / minutes and tries to run the next scheduled backup
4. Alert (to email) sent that 'Backup was running for XX hours / minutes - This backup has been cancelled - please look into this'
Looking to get help with step 3 and as we've found that sometimes it just takes a cancel to refresh itself and is our first line before restarting services, restarting the server, etc. I've toyed with creating a monitor for the Acronis process as we do for system uptime in seconds but not sure if this would work for this type of process.
If anyone has any ideas, please let me know. Thanks!
I've had similar issues with BUDR, what happened in my experience was that Acronis would encounter an 'unknown network error' but would not alert us to this and BUDR would continue to skip backup tasks automatically until someone checked the backup log and then killed the task on the server.
As far as I am aware BUDR has no alerting functionality (Acronis does however I think it incorrectly alerts you that no backups have been run from the GUI)
You could probably achieve what you want through an agent procedure, what you would need to do is create a procedure that does something like this:
'If isServiceRunning("Acrocmd.exe") then executeShellCommand("taskkill /im acrocmd.exe /f") then sendEmail)"youremailhere")
Then schedule this to run 1 hour after you expect your backup to complete, it's not pretty but it should do the job with some minor tweaking.
Would the Backup Alert in Monitor -> Alerts do that?
Quintin - That's a decent way of doing it, but it would need to be somehow scheduled only say 48 hours after the backup runs... but then could detect the one thats running two days from now and not "is service running and was started 2 days ago"
Mozikhan - Those alerts are only for backup failed or completed - not currently running over X time
I did setup a Monitor List with custom Counter / Instance of the service_process.exe and will check tomorrow morning (backups run overnight) if I get any emails. I set it for alarming if the process runs over 5 minutes so I should have a decent amount alarming.
Will check back in in the morning. Thanks!
Hmmm, not sure why its not going through but I didn't receive any alerts based on the options I set. Counters show as 'not responding' for the new monitor set but appear to be working for all other counters. I'll keep banging away at this and update if I find something that works.
Any other suggestions are welcome!