Brian A. Seklecki
2010-04-08 03:35:42 UTC
All:
This is more of a general IPMI question. Sorry there's isn't
a -users@ list.
There's an old Nagios monitoring script that would look
through 'ipmitool sdr list', and search the status column
for values != "ok".
It turns out that this simple logic may be insufficient for
checking values such 'Power Supply Fully Redundancy'.
For example:
Consider sensor 7.1 on a PowerEdge 2950r1:
Sensor ID : PS Redundancy (0x74)
Entity ID : 7.1 (System Board)
Sensor Type (Discrete): Power Supply
States Asserted : Redundancy State
[Fully Redundant]
Assertion Events : Redundancy State
[Fully Redundant]
Assertions Enabled : Redundancy State
[Fully Redundant]
[Redundancy Lost]
---------------------------------------------------
When the primary power supply is missing or unplugged, the
'sdr list' returns the sensor with 'OK' value:
% sudo ipmitool -U foo -H system-lom sdr elist all
PS Redundancy | 74h | ok | 7.1 | Redundancy Lost
Note how the sensor status reads 'OK' in almost all
conditions (except for possibly both power supplies
being 'not present' or 'failed', which would hard
to test! >:} )
I'm a bit confused about the data structures, but I understand
thresholds for assertion and de-asseration can be programmed
using OpenIPMI (A co-worker had to do this for a broken Dell
DRAC Card in an r710 or 2950r3 reading the upper warning state
threshold wrong)
So is there a way to progarm 7.1 or 10.1/10.2 to set status
NOT OK during: 1) Predictive Failure 2) Power loss 3) Absence?
As an alternative, I can script start doing additional
string matching for key words on specific sensor categories:
For example, sdr type "Power Supply"
----------------------------
$ ipmitool -P XX -U netadmin -H system-lom sdr entity 10
Presence | 54h | ok | 10.1 | Absent
Presence | 55h | ok | 10.2 | Present
Status | 64h | ok | 10.1 | Failure detected, Power Supply AC lost
Status | 65h | ok | 10.2 | Presence detected
With the power cable pulled:
% ipmitool -P XX -U netadmin -H system-lom sdr entity 10
Presence | 54h | ok | 10.1 | Present
Presence | 55h | ok | 10.2 | Present
Status | 64h | ok | 10.1 | Presence detected,
Failure detected,
Power Supply AC lost
Status | 65h | ok | 10.2 | Presence detected
Thanks, ~BAS
This is more of a general IPMI question. Sorry there's isn't
a -users@ list.
There's an old Nagios monitoring script that would look
through 'ipmitool sdr list', and search the status column
for values != "ok".
It turns out that this simple logic may be insufficient for
checking values such 'Power Supply Fully Redundancy'.
For example:
Consider sensor 7.1 on a PowerEdge 2950r1:
Sensor ID : PS Redundancy (0x74)
Entity ID : 7.1 (System Board)
Sensor Type (Discrete): Power Supply
States Asserted : Redundancy State
[Fully Redundant]
Assertion Events : Redundancy State
[Fully Redundant]
Assertions Enabled : Redundancy State
[Fully Redundant]
[Redundancy Lost]
---------------------------------------------------
When the primary power supply is missing or unplugged, the
'sdr list' returns the sensor with 'OK' value:
% sudo ipmitool -U foo -H system-lom sdr elist all
PS Redundancy | 74h | ok | 7.1 | Redundancy Lost
Note how the sensor status reads 'OK' in almost all
conditions (except for possibly both power supplies
being 'not present' or 'failed', which would hard
to test! >:} )
I'm a bit confused about the data structures, but I understand
thresholds for assertion and de-asseration can be programmed
using OpenIPMI (A co-worker had to do this for a broken Dell
DRAC Card in an r710 or 2950r3 reading the upper warning state
threshold wrong)
So is there a way to progarm 7.1 or 10.1/10.2 to set status
NOT OK during: 1) Predictive Failure 2) Power loss 3) Absence?
As an alternative, I can script start doing additional
string matching for key words on specific sensor categories:
For example, sdr type "Power Supply"
----------------------------
$ ipmitool -P XX -U netadmin -H system-lom sdr entity 10
Presence | 54h | ok | 10.1 | Absent
Presence | 55h | ok | 10.2 | Present
Status | 64h | ok | 10.1 | Failure detected, Power Supply AC lost
Status | 65h | ok | 10.2 | Presence detected
With the power cable pulled:
% ipmitool -P XX -U netadmin -H system-lom sdr entity 10
Presence | 54h | ok | 10.1 | Present
Presence | 55h | ok | 10.2 | Present
Status | 64h | ok | 10.1 | Presence detected,
Failure detected,
Power Supply AC lost
Status | 65h | ok | 10.2 | Presence detected
Thanks, ~BAS