We're working on some tools that would enable us to autoscale our instances up/down in our private cloud using our custom control software. To support this we're working on using metric data from Traverse and send notifications to our control software to indicate if a certain preselected metric's value is breaching a threshold (need to scale up) or the value is below a certain other threshold (need to scale down).
As it is we can only work with the warning and critical thresholds (Traverse used to have what's called a shadow threshold, but apparently they're not supported anymore), but so far we couldn't figure out a clean way to use them without triggering unwanted alerts in the Event Manager that our NOC tracks.
Here's our best idea so far, using the 'Idle CPU Time' basic SNMP test for this example.
- Set Critical threshold to 0-20 (discrete threshold type)
- Set Warning threshold to 80-100 (discrete threshold type)
- Create Action Profile "Scaleup"
- Action #1 triggered by Critical threshold only and assign a custom script 'scaleup.pl' to it.
- Action #2 triggered by Warning threshold only and assign a custom script 'scaledown.pl' to it.
So this would work correctly for the most part except that the test would stay in 'warning' state when the test value is between 80-100 (but in reality this really is an OK state, but we had to use the warning threshold to kick off the action). We thought that the 'scaledown.pl' could execute a 'test.suppress' API call to suppress this test from the Event Manager, which would take care of the problem in theory.
I'm curious if anyone has a better solution to solve this problem. (maybe we can have the shadow thresholds back Rajib? :) )