Discussion:
[Ipmitool-devel] SDR Failure Assertion for Nagios Check_IPMI
Werner Fischer
2010-04-26 20:29:06 UTC
Permalink
Hi Brian,

thank you for pointing out this problem with this kind of Power
Redundancy Sensors when the ipmitool output is used for a Nagios plugin.

We developed a new IPMI plugin for Nagios, also using ipmitool. You can
download the plugin here:
http://www.thomas-krenn.com/de/wikiDE/images/8/82/Check_ipmi_sensor.tar.gz

More in-depth documentation on this plugin is currently only available
in German:
http://www.thomas-krenn.com/de/wiki/IPMI_Sensor_Monitoring_Plugin

Although our plugin has currently the same issue, I'll try to address
this issue in a future version. Patches are of course also welcome ;-)

I plan to setup a mailing list regarding the IPMI Nagios plugin. Once I
have set it up, I'll post its address here.

Best regards,
Werner

additional links:
http://exchange.nagios.org/directory/Uncategorized/IPMI-Sensor-Monitoring-Plugin/details
http://www.monitoringexchange.org/inventory/Check-Plugins/Hardware/Server-%2528Manufacturer%2529/IPMI-Sensor-Monitoring-Plugin
Werner Fischer
2010-06-17 12:48:05 UTC
Permalink
Hi Brian,

the mailing lists for the IPMI plugin for Nagios are now online:
http://lists.thomas-krenn.com/cgi-bin/mailman/listinfo/ipmi-plugin-announce
http://lists.thomas-krenn.com/cgi-bin/mailman/listinfo/ipmi-plugin-user

Although I couldn't address your issue yet, I added it the planned
features for future versions:
http://www.thomas-krenn.com/de/wiki/IPMI_Sensor_Monitoring_Plugin#Versionen

best regards,
Werner
Post by Werner Fischer
Hi Brian,
thank you for pointing out this problem with this kind of Power
Redundancy Sensors when the ipmitool output is used for a Nagios plugin.
We developed a new IPMI plugin for Nagios, also using ipmitool. You can
http://www.thomas-krenn.com/de/wikiDE/images/8/82/Check_ipmi_sensor.tar.gz
More in-depth documentation on this plugin is currently only available
http://www.thomas-krenn.com/de/wiki/IPMI_Sensor_Monitoring_Plugin
Although our plugin has currently the same issue, I'll try to address
this issue in a future version. Patches are of course also welcome ;-)
I plan to setup a mailing list regarding the IPMI Nagios plugin. Once I
have set it up, I'll post its address here.
Best regards,
Werner
http://exchange.nagios.org/directory/Uncategorized/IPMI-Sensor-Monitoring-Plugin/details
http://www.monitoringexchange.org/inventory/Check-Plugins/Hardware/Server-%2528Manufacturer%2529/IPMI-Sensor-Monitoring-Plugin
--
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | Speed is (y)our success
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki
Werner Fischer
2010-06-21 13:08:52 UTC
Permalink
Hi ipmitool developers,

I thought about the problem regarding monitoring discrete IPMI sensors,
that Brian reported back in April:
http://www.mail-archive.com/ipmitool-***@lists.sourceforge.net/msg01472.html

I did some in-depth testing and looked how the current VMware ESXi 4.0
reports different states of discrete IPMI sensors.

I tested two example scenarios with an Intel SR2500 server:

Test case 1:
* Power Supply 2 removed
* Chassis cover removed
* VMware reports: Loading Image...

Test case 2:
* Power Supply 2 present, but power cable removed
* Vmware reports: Loading Image...

(Below you find some example ipmitool outputs for these two cases).

The current IPMI specification lists possible sensor-specific-offsets
for each sensor type in table 42-3, Sensor Type Codes.

To me it seems that VMware uses some mapping, which defines which
offsets (assertions/deassertions) cause a warning or an alarm,
e.g. an offset for the event "General Chassis Intrusion" for a Physical
Security sensor (sensor type code 05h) leads to status "Warning".

So my request:
* introduce some new option for ipmitool (something like "ipmitool
get-server-status") where ipmitool uses such kind of mapping,
too. We could define which offsets/assertions should cause a
warning. In this way an end-user would have an easy way to
quickly find out whether or not everything is ok with his
hardware...

Currently using e.g. "ipmitool sdr elist all" returns "ok" for sensor
states like "General Chassis Intrusion" (see below)

What do you think?
Any other ideas how we could accomplish that?
Does anybody know whether one of the other tools like freeipmi or
impiutil has some functionality like this?

best regards,
Werner

PS: Here are the outputs of ipmitool for this:

Test case 1:
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L user sdr elist all | grep -i "PS"
Password:
PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps
PS2 AC Current | 79h | ns | 10.2 | No Reading
PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps
PS2 +12V Current | 7Bh | ns | 10.2 | No Reading
PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts
PS2 +12V Power | 7Dh | ns | 10.2 | No Reading
PS1 Status | 70h | ok | 10.1 | Presence detected
PS2 Status | 71h | ok | 10.2 |
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L user sdr elist all | grep -i "Physical Scrty"
Password:
Physical Scrty | 05h | ok | 23.1 | General Chassis intrusion
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw 0x04 0x2d 0x70
Password:
Data length = 1
00 c0 01 00
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw 0x04 0x2d 0x71
Password:
Data length = 1
00 c0 00 00
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U admin -P relation sdr get "Physical Scrty"
Sensor ID : Physical Scrty (0x5)
Entity ID : 23.1 (System Chassis)
Sensor Type (Discrete): Physical Security
States Asserted : Physical Security
[General Chassis intrusion]
Assertion Events : Physical Security
[General Chassis intrusion]
Assertions Enabled : Physical Security
[General Chassis intrusion]
[System unplugged from LAN]
Deassertions Enabled : Physical Security
[General Chassis intrusion]
[System unplugged from LAN]

Test case 2:
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L user sdr get "PS2 Status"
Password:
Sensor ID : PS2 Status (0x71)
Entity ID : 10.2 (Power Supply)
Sensor Type (Discrete): Power Supply
States Asserted : Power Supply
[Presence detected]
[Power Supply AC lost]
Assertion Events : Power Supply
[Presence detected]
[Power Supply AC lost]
Assertions Enabled : Power Supply
[Presence detected]
[Failure detected]
[Predictive failure]
[Power Supply AC lost]
[Config Error: Vendor Mismatch]
[Config Error: Revision Mismatch]
[Config Error: Processor Missing]
[Config Error]
Deassertions Enabled : Power Supply
[Presence detected]
[Failure detected]
[Predictive failure]
[Power Supply AC lost]
[Config Error: Vendor Mismatch]
[Config Error: Revision Mismatch]
[Config Error: Processor Missing]
[Config Error]

***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L user sdr elist all | grep -i "PS"
Password:
PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps
PS2 AC Current | 79h | ok | 10.2 | 0.12 Amps
PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps
PS2 +12V Current | 7Bh | ok | 10.2 | 0 Amps
PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts
PS2 +12V Power | 7Dh | ok | 10.2 | 0 Watts
PS1 Status | 70h | ok | 10.1 | Presence detected
PS2 Status | 71h | ok | 10.2 | Presence detected, Power Supply AC lost
***@wfischer-t410-ubuntu:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw 0x04 0x2d 0x71
Password:
Data length = 1
00 c0 09 00
***@wfischer-t410-ubuntu:~$
--
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | Speed is (y)our success
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki
Al Chu
2010-06-21 16:32:45 UTC
Permalink
Hi Werner,
Post by Werner Fischer
Does anybody know whether one of the other tools like freeipmi or
impiutil has some functionality like this?
In FreeIPMI, there is a tool called ipmimonitoring that I believe does
what you're asking for (output condensed for readability below).

18 | Fan1 | Nominal | 14500.00 | RPM | 'OK'
19 | Fan2 | Nominal | 14300.00 | RPM | 'OK'
20 | Fan3/CPU2 | Nominal | 14300.00 | RPM | 'OK'
21 | Fan4/CPU1 | Nominal | 13900.00 | RPM | 'OK'
22 | Fan5 | Nominal | 14000.00 | RPM | 'OK'
23 | Fan6 | Nominal | 14000.00 | RPM | 'OK'
24 | Fan7/CPU3 | Critical | 0.00 | RPM | 'At or Below (<=) Lower Non-Recoverable Threshold'
25 | Fan8/CPU4 | Critical | 0.00 | RPM | 'At or Below (<=) Lower Non-Recoverable Threshold'
26 | Fan9 | Critical | 0.00 | RPM | 'At or Below (<=) Lower Non-Recoverable Threshold'
27 | Power Supply 1 | Nominal | N/A | N/A | 'Presence detected'
28 | Power Supply 2 | N/A | N/A | N/A | N/A

So for this example, fans with normal RPM are "Nominal", out of range is
"Critical", and the power supply that doesn't exist is "N/A". There is
also a "Warning" output when the situation is appropriate.

I can speak more of it, but it's probably not best on this mailing.
Feel free to ping me on the FreeIPMI mailing list.

Al
Post by Werner Fischer
Hi ipmitool developers,
I thought about the problem regarding monitoring discrete IPMI sensors,
I did some in-depth testing and looked how the current VMware ESXi 4.0
reports different states of discrete IPMI sensors.
* Power Supply 2 removed
* Chassis cover removed
* VMware reports: Loading Image...
* Power Supply 2 present, but power cable removed
* Vmware reports: Loading Image...
(Below you find some example ipmitool outputs for these two cases).
The current IPMI specification lists possible sensor-specific-offsets
for each sensor type in table 42-3, Sensor Type Codes.
To me it seems that VMware uses some mapping, which defines which
offsets (assertions/deassertions) cause a warning or an alarm,
e.g. an offset for the event "General Chassis Intrusion" for a Physical
Security sensor (sensor type code 05h) leads to status "Warning".
* introduce some new option for ipmitool (something like "ipmitool
get-server-status") where ipmitool uses such kind of mapping,
too. We could define which offsets/assertions should cause a
warning. In this way an end-user would have an easy way to
quickly find out whether or not everything is ok with his
hardware...
Currently using e.g. "ipmitool sdr elist all" returns "ok" for sensor
states like "General Chassis Intrusion" (see below)
What do you think?
Any other ideas how we could accomplish that?
Does anybody know whether one of the other tools like freeipmi or
impiutil has some functionality like this?
best regards,
Werner
PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps
PS2 AC Current | 79h | ns | 10.2 | No Reading
PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps
PS2 +12V Current | 7Bh | ns | 10.2 | No Reading
PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts
PS2 +12V Power | 7Dh | ns | 10.2 | No Reading
PS1 Status | 70h | ok | 10.1 | Presence detected
PS2 Status | 71h | ok | 10.2 |
Physical Scrty | 05h | ok | 23.1 | General Chassis intrusion
Data length = 1
00 c0 01 00
Data length = 1
00 c0 00 00
Sensor ID : Physical Scrty (0x5)
Entity ID : 23.1 (System Chassis)
Sensor Type (Discrete): Physical Security
States Asserted : Physical Security
[General Chassis intrusion]
Assertion Events : Physical Security
[General Chassis intrusion]
Assertions Enabled : Physical Security
[General Chassis intrusion]
[System unplugged from LAN]
Deassertions Enabled : Physical Security
[General Chassis intrusion]
[System unplugged from LAN]
Sensor ID : PS2 Status (0x71)
Entity ID : 10.2 (Power Supply)
Sensor Type (Discrete): Power Supply
States Asserted : Power Supply
[Presence detected]
[Power Supply AC lost]
Assertion Events : Power Supply
[Presence detected]
[Power Supply AC lost]
Assertions Enabled : Power Supply
[Presence detected]
[Failure detected]
[Predictive failure]
[Power Supply AC lost]
[Config Error: Vendor Mismatch]
[Config Error: Revision Mismatch]
[Config Error: Processor Missing]
[Config Error]
Deassertions Enabled : Power Supply
[Presence detected]
[Failure detected]
[Predictive failure]
[Power Supply AC lost]
[Config Error: Vendor Mismatch]
[Config Error: Revision Mismatch]
[Config Error: Processor Missing]
[Config Error]
PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps
PS2 AC Current | 79h | ok | 10.2 | 0.12 Amps
PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps
PS2 +12V Current | 7Bh | ok | 10.2 | 0 Amps
PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts
PS2 +12V Power | 7Dh | ok | 10.2 | 0 Watts
PS1 Status | 70h | ok | 10.1 | Presence detected
PS2 Status | 71h | ok | 10.2 | Presence detected, Power Supply AC lost
Data length = 1
00 c0 09 00
--
Albert Chu
***@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
Loading...