Deprecated: Function split() is deprecated in /var/www/seafelt-blog/wp-content/plugins/google-analytics-for-wordpress/googleanalytics.php on line 413
Today I want to explain a small feature of seafelt Performance Manager that has a big impact at scale: random offset polling.
Many monitoring products that use polling to check on remote devices do so using Unix cron, or a similar scheduling mechanism. These scheduling mechanisms kick off a job periodically (such as every 5 minutes) to interrogate the devices in their database. More sophisticated schemes will poll certain things every 5 minutes, and others every 15 minutes, every hour, etc. Things that are unlikely to change often may only get polled every 6 hours. This reduces the total amount of polling going on, reducing network traffic, load on the polled devices, and so on.
The problem with this approach is that all the devices in the database are polled when this job kicks off. In the most naive implementations, everything in the database is polled every 5 minutes. This can place significant load on the network, and the polling server. I know of at least one organization that had to purchase new hardware to run their polling server because the processing caused by polling every 5 minutes was taking longer than 5 minutes to process; the polling kept falling further and further behind.
In the more sophisticated implementations that don’t poll everything when the scheduler kicks off the polling job, there are still spikes when all the time intervals line up. If you poll some things every 5 minutes — others every 30 minutes, and others every hour, or 4 hours — you end up with certain times when all of these polls happen at the same time. I’ve observed this effect placing significant load on storage appliances from a certain vendor when using their storage monitoring application.
The Solution: Random Offset Polling
seafelt Performance Monitor is different. It uses a scheduling algorithm known as random offset polling. When the poller is started, the scheduler determines all the polling that needs to happen, and schedules the first poll to occur somewhere within a random time interval. This interval is configurable, and defaults to 5 minutes. Once the first poll has occurred, all subsequent polls are performed based on the polling interval for the Element, which may be 5 minutes, 8 hours, or whatever it’s configured to be.
Why is this good? This method smooths out the load on the network, polling server and the monitored devices. It means you don’t need to buy as powerful a polling server to handle the polling spikes caused by other products. It also means you don’t end up with big polling induced spikes in network load, and you don’t place undue stress on your monitored devices in order to monitor them. Your network and devices can spend more of their time doing the job you bought them to do, instead of spending all their time being monitored.
You save money.
You also save time diagnosing problems, because when you see a spike in utilisation on a device, you know it’s not likely to be caused by your monitoring software.
We’re Years Ahead
This is a core feaure of seafelt Performance Manager, and has been for years. We think it’s quite surprising that other monitoring products don’t do this, since it’s not particularly difficult to implement. If you’re contemplating buying new hardware because your monitoring solution is causing you problems, why not check out seafelt Performance Manager instead?