In today’s post, we discuss some of the concepts behind seafelt Performance Manager, and some of the terms and definitions we use. We explain how SPM is organised, and why various components exist.
Devices and Elements
At the core of SPM are Devices and Elements. Devices are things like switches, routers, computers, disk storage arrays; things that we can talk to over a network, essentially. Elements are things that are logically contained within a Device. CPUs, switch ports, volumes, applications, and an abstract representation of the device itself.
An example will help illustrate these concepts:
This computer that I’m typing on (called fnord) is a Device. When SPM discovers fnord, it uses fnord’s IP address, and a few other attributes to talk to it via several protocols, such as SNMP. The results of this discovery process is a Device, and several Elements, representing components of fnord that we want to monitor. Fnord has a single CPU element, a couple of interfaces, a few storage elements (representing disk volumes, like /var, /home, etc.) and some other Elements.
Each Element represents something we collect statistics about. The kind of statistics we collect depends on what sort of Element it is. Elements fall into categories known as Element Types, or elemtypes for short.
An Element Type is a class of elements; things that share a common set of characteristics. ‘CPU’ is an element type. So is ‘interface’. All Elements of a given Element Type can have the same set of Element Variables (or elemvars) monitored. For a CPU, you might want to monitor its %busy, %user, %IOwait, and %idle, while for an interface you would monitor things like bytesIn, or collisions. %user doesn’t make any sense for an interface, the same as collisions wouldn’t make any sense for a CPU.
So we group similar groups of monitoring into Element Types, and then when we detect a particular instance of an Element Type on a Device, that is discovered as an Element.
Element Variables, or Elemvars
Elemvars are a concept unique to SPM (as far as we’re aware). An Elemvar is an abstract kind of variable that is used for monitoring an Element. The value of an Elemvar is derived from Variables that are monitored by Pollers by performing a calculation. The calculation that is performed is defined by an Expression. This allows us to define a single Elemvar that is used by different Element Types to define a similar value, but the way the value is calculated might be different for different Element Types.
Again, an example will help: There’s an Elemvar known as availability. This is an abstract idea of whether an Element was available or not, and for how long. If we poll an element every 5 minutes, it might have availability of 300 seconds, meaning it was available for the entire time between polls. A value of 150 seconds might mean that the element was only available for half the time between polls.
On fnord, if the computer was running, availability for its CPU might be calculated based on a poll of the SNMP variable ssCpuRawIdle (from the UCD-SNMP MIB). Our Expression might be
availability = delta_time
which is the amount of time between two polls. In this case, if we successfully poll ssCpuRawIdle, we assume the CPU was up the whole time between polls.
For an interface, we might define a completely different Expression for availability, like so:
availability = (ifOperStatus.enum() == 'up') * delta_time
This means that we would poll the ifOperStatus SNMP variable, and if the interface was marked as ‘up’, we would say it was available for delta_time. If the interface was marked as down, availability would be 0 seconds.
In this way, we’re able to represent the same concept across multiple Element Types by using different methods of calculating the same thing. We can do the same thing with different models of a similar piece of equipment, such as Cisco switches that support different MIBs.
So what are these Variables we mentioned? A Variable is something that we poll for on a specific Element on a Device. A Variable is protocol specific; for example, SNMP has a range of variables defined in MIBs. MIBs define a set of variables that a particular piece of equipment supports when you poll it. Every equipment vendor will define a MIB for their products when they want to allow them to be managed via SNMP.
We’ve abstracted this concept slightly so that we can use the same Variable concept no matter what protocol we use to monitor a Device. Variables of a given type are supported by Pollers; a Poller knows how to poll a Device to collect the value for a Variable.
For example, ifOperStatus is an Variable of type SNMP that you can collect for interfaces. A poller that knows how to speak SNMP will be able to fetch the value for ifOperStatus for a given interface. Other Variables can be defined for TCP pollers, Subprocess pollers, or other, custom pollers. Variables are linked to Elemvars (the statistics we actually track) via Expressions.
Expressions are the magic glue that hold everything together. Expressions are Python statements that are eval()uated in a given namespace context that SPM defines. An Elemvar is calculated using an Expression, usually making use of Variables. Let’s use the Expression example above:
availability = (ifOperStatus.enum() == 'up') * delta_time
To calculate availability, we need to fetch the Variable ifOperStatus. This is an SNMP variable, so SPM knows to use an SNMP capable poller to collect the appropriate value. Once the value is collected, it’s placed in a namespace that is used to eval() the Expression formula. The result of the eval() is the value for availability.
This simple concept provides much of SPMs power and flexibility, at the cost of some complexity.
Finally we have Pollers. Pollers are plugin modules that implement a specific API that supports discovery and polling of Devices and Elements for a given variable type. Each variable type usually corresponds to a particular protocol, such as SNMP or TCP, though it doesn’t have to. A Poller knows how to collect values for Variables of a given type from Devices.
There are several Poller plugins provided with SPM, including one for SNMP, TCP, Subprocesses and Nagios plugins. Here’s a quick overview:
- The SNMP Poller supports polling of SNMP v1 and v2c SNMP variables from any MIB. It is used for the majority of the built-in Element Types bundled with SPM.
- The TCP Poller supports connecting to an arbitrary TCP port on a Device, sending some data to it, and then checking the response. It provides a generic mechanism for monitoring any TCP based service.
- The Subprocess Poller supports running any arbitrary commandline and checking the result (exitcode and output). Parameterisation of the command string allows you to define any polling command you like.
- The Nagios Plugin Poller is a custom Subprocess Poller that integrates SPM with Nagios plugins. You can use any Nagios plugin with SPM out of the box, both for availability monitoring, and statistics gathering.
That’s a quick overview of the major components of seafelt Performance Manager. I hope this has given you a taste of what seafelt can do, and how it works. There’s a lot more to discuss, and I look forward to writing more about the various components in days and weeks to come.
But why wait? If you want to know more about anything in this post, ask a question in the comments! I’m only too happy to fill you in on more of the gory details.