I have a number of critical applications that run on server machines, applications that should always be running on every app server in our sites no matter what. As these are critical applications I've enabled Process Status monitoring for these applications, as essentially they should never be down at any site, 24x7. That said, there's of course cases where we do restart them. Patches, troubleshooting, etc. Sometimes these restarts are even initiated by the customers themselves who don't communicate this with us. Sometimes even our own people forget to suspend alarms at the site before they do disruptive troubleshooting work. Every time this happens, an alarm is generated. If someone restarts every application in the "stack," in older releases that can be up to 5 different applications so we'd get 5 alarms. So therein lies the problem. Realistically I only want to know if an application has been offline for say, 2-3 minutes, to weed out these normal events.While I understand I can put in another set of monitors that say "Alarm on Transition: Up" I'd rather just avoid getting the alarms at all. I'd estimate that currently, over half of the process status alarms are "false alarms" where the shutdown was initiated by someone with the intent to immediately restart it again. Is there a way anyone has worked out to do this?
Instead of creating an alarm you could consider to run a script from the process monitor.
This script (agent procedure) can check the process state and if the application is down wait for 3 minutes, check the process again and if still down create an alarm or ticket.
We developed what we refer to as Smart Monitors - they automatically set thresholds, suppress transient events (like your brief outage), and self remediate. We created a custom Smart Monitor for a client that tracks an application, restarts it if it is down for more than 4 minutes, and generates an alert if it can't be restarted.
The custom monitor we developed actually tracks multiple processes, understands their dependencies, and can restart them in specific sequences. This is all managed via a configuration file. It can also restart the system if the applications can't be restarted successfully.
You can get more info about this in the Automation Exchange or our web site (mspbuilder.com).