Discussion:
[Ipmitool-devel] BMC hangs
Tomasz Nowak
2008-04-24 12:27:16 UTC
Permalink
Hello,

I have one more problem with IPMI on my Intel
S5000PAL board: sometimes BMC seems to die.

Symptoms: no IPMI command execute available.

I have a "Local Control Panel" - that's a gadget
that allows to communicate with BMC via small LCD
screen and 4 panel buttons. And it says everywhere
(system monitor, SEL, etc):

Message transmission incomplete

Nothing helps to such state but resetting BMC (unplug
the whole system from AC power for 1+ minute). Than
everything comes back to normal - I can issue IPMI
commands, I can view SEL, I can get to the console
via SOL (which feature I love particulary), etc.

That BMC hangs are critical form me, as I can
not do anything with the machine remotetly.
Unfortunetly Google returns nothing interesting
for search key "Message transmission incomplete".

Any ideals?


PS. Yes, I have latest Intel BMC firmware v62 installed.
--
Tomek
Hall, Eric R
2008-04-25 01:38:02 UTC
Permalink
Tomek -

The LCD screen interfaces to the BMC via I2C (IPMB connector). If for
some [very bad] reason the BMC fails, the LAN and I2C interface will not
work. The BMC should reset to a working state in the event of an error
unless when it's failing it's being held by something.

How is your network configured on the BMC's NIC? Is there a lot of
traffic hitting the BMC? How long does it take before the BMC fails to
respond since power on? What are the last few items in the SEL log?
How many systems do you have and do they all have this problem?

- Eric
-----Original Message-----
[mailto:ipmitool-devel-
Sent: Thursday, April 24, 2008 5:27 AM
Subject: [Ipmitool-devel] BMC hangs
Hello,
I have one more problem with IPMI on my Intel
S5000PAL board: sometimes BMC seems to die.
Symptoms: no IPMI command execute available.
I have a "Local Control Panel" - that's a gadget
that allows to communicate with BMC via small LCD
screen and 4 panel buttons. And it says everywhere
Message transmission incomplete
Nothing helps to such state but resetting BMC (unplug
the whole system from AC power for 1+ minute). Than
everything comes back to normal - I can issue IPMI
commands, I can view SEL, I can get to the console
via SOL (which feature I love particulary), etc.
That BMC hangs are critical form me, as I can
not do anything with the machine remotetly.
Unfortunetly Google returns nothing interesting
for search key "Message transmission incomplete".
Any ideals?
PS. Yes, I have latest Intel BMC firmware v62 installed.
--
Tomek
-----------------------------------------------------------------------
--
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/
javaone
_______________________________________________
Ipmitool-devel mailing list
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel
Tomasz Nowak
2008-04-25 10:19:59 UTC
Permalink
Post by Hall, Eric R
Tomek -
The LCD screen interfaces to the BMC via I2C (IPMB connector). If for
some [very bad] reason the BMC fails, the LAN and I2C interface will not
work. The BMC should reset to a working state in the event of an error
unless when it's failing it's being held by something.
How is your network configured on the BMC's NIC? Is there a lot of
traffic hitting the BMC? How long does it take before the BMC fails to
respond since power on? What are the last few items in the SEL log?
How many systems do you have and do they all have this problem?
That's my first S5000PAL (test) system.

I have lan channel 1 configured with macaddr, static ipaddr,
netmask and defgw (both on the same network that operating
system's nic). lan ippaddr is just next to the os's ip.
I've setup user 2 administrative privilige with md5 auth
for that channel and disabled user 1 at all.

That simple setup seems to works perfectly for remote ipmi
commands.

Then I've configred Serial Over LAN:
- set privilege-level admin
- set non-volatile-bit-rate 115.2
- set volatile-bit-rate 115.2
- set force-encryption true
- set retry-interval 2
- set enabled true
- payload enable 1 2

And that also seems to work well most of the time,
besides one problem that if more data goes through SOL,
the output get somehow broken, ie:
# pstree |tail -n5 # <-- layout is ok
# pstre # <--- lyout is broken and hardly readable
So it seems SOL has problems with fast data streem or
big data chunks.

Anyway, BMC hangs happend already 3 times to me.
Every time it was while I've been booting system
and watching the bootloader/kernel output via SOL.

I have no idea what are the last SEL inputs because..
BMC hangs :> and there's no way to check it out
(with any method). I suppose the last sel inputs
could be stm like:
Power Unit #0x01 | Power off/down | Asserted
Power Unit #0x01 | Power off/down | Deasserted
System Event #0x01 | OEM System boot event | Asserted
System ACPI Power State #0x82 | S0/G0: working | Asserted


PS1. Is there any method to fix that the SOL output mangling
described above?

PS2. Is there any method to reset BMC not unplugging the
system from AC power for 1> minute? Any "secret"/magic
local panel button press combination..?
--
Tomek
Hall, Eric R
2008-04-25 17:57:07 UTC
Permalink
-----Original Message-----
From: Tomasz Nowak
Sent: Friday, April 25, 2008 3:20 AM
Subject: Re: [Ipmitool-devel] BMC hangs
- set privilege-level admin
- set non-volatile-bit-rate 115.2
- set volatile-bit-rate 115.2
- set force-encryption true
- set retry-interval 2
- set enabled true
- payload enable 1 2
115.2 might be a bit too fast.

What are the BIOS settings? Ideally the BIOS settings should look like:
Console Redirection [Serial Port B]
Flow Control [RTS/CTS]
Baud Rate [19.2k]
Terminal Type [PC-ANSI]
Legacy OS Redirection [Enabled]
And that also seems to work well most of the time,
besides one problem that if more data goes through SOL,
# pstree |tail -n5 # <-- layout is ok
# pstre # <--- lyout is broken and hardly readable
Does the BMC hang when outputting lots of data like this? Or is it some
time after?
Anyway, BMC hangs happend already 3 times to me.
Not good. I'm sorry to hear that!
Every time it was while I've been booting system
and watching the bootloader/kernel output via SOL.
Slow the redirection rate down to 19.2 and see if it hangs then. Please
let me know if that works.
I have no idea what are the last SEL inputs because..
BMC hangs :> and there's no way to check it out
(with any method).
Well, if you have a USB keyfob you could boot to DOS and use the SEL
Viewer utility to save it. (sorry for the huge ugly link)
http://downloadcenter.intel.com/download.aspx?url=/14724/eng/DOS%20SEL%2
0Viewer_v154.exe&agr=N&ProductID=2451&DwnldId=14724&strOSs=All&OSFullNam
e=All+Operating+Systems&lang=eng
or
http://tinyurl.com/5xrl6l

Usually when there is a major problem like this, something more useful
is dumped to the SEL.
PS1. Is there any method to fix that the SOL output mangling
described above?
Yes, try slowing down the rate and make sure the BIOS settings for
console redirection are the same as above.
PS2. Is there any method to reset BMC not unplugging the
system from AC power for 1> minute? Any "secret"/magic
local panel button press combination..?
No. It's integrated into the baseboard and it gets its power from the
standby voltage. The only way to power-cycle/reset the BMC is to
completely remove AC power from the system.

- Eric
Tomasz Nowak
2008-04-25 19:27:54 UTC
Permalink
Post by Hall, Eric R
Post by Tomasz Nowak
- set non-volatile-bit-rate 115.2
- set volatile-bit-rate 115.2
[...]
115.2 might be a bit too fast.
Console Redirection [Serial Port B]
Flow Control [RTS/CTS]
Baud Rate [19.2k]
Terminal Type [PC-ANSI]
Legacy OS Redirection [Enabled]
I have enabled default settings considering them best,
so (as far as I remember):
Console Redirection [Serial Port B]
Flow Control [None]
Baud Rate [115.2k]
Terminal Type [VT-100+]
Legacy OS Redirection ? - don't rembember

Following these BIOS default settings I've set up the same
speed in system bootloader (grub):

kernel /boot/xen.gz-2.6.18-53.1.14.el5 com2=115200,8n1 console=com2,vga
module /boot/vmlinuz-2.6.18-53.1.14.el5xen xencons=ttyS1 console=tty
console=ttyS1,115200n8

and that's why I've set 111.2k also for SOL.
Post by Hall, Eric R
Post by Tomasz Nowak
# pstree |tail -n5 # <-- layout is ok
# pstre # <--- lyout is broken and hardly readable
Does the BMC hang when outputting lots of data like this? Or is it
some time after?
I think the broken output doesn't influence BMC blocking.
As far as I remember all 3 hangs happend about the moment
when "bios console redirection" is passing the ball to
"system console redirection".
Post by Hall, Eric R
Slow the redirection rate down to 19.2 and see if it hangs then.
Please
let me know if that works.
Ok, I'll try (on Monday - I'd rather be close to the machine
while doing this) to slow down bios and sol speed. I suppose
I should slow down bootloader/kernel settings also, shoudn't I?

What about "Legacy OS Redirection"? Is that relevant? (RHEL)

Is PC-ANSI terminal required, or somehow better the VT-100(+)?
--
Tomek
Cress, Andrew R
2008-04-26 04:49:20 UTC
Permalink
Tomasz,

Try upgrading to BMC63. It has the fix to a problem where
establishing/killing SOL sessions repeatedly was hanging the BMC.
(Tracker 32507)
That fix is in the same ballpark.

Eric,
Is BMC63 on intel.com?

Andy

-----Original Message-----
From: ipmitool-devel-***@lists.sourceforge.net
[mailto:ipmitool-devel-***@lists.sourceforge.net] On Behalf Of
Tomasz Nowak
Sent: Friday, April 25, 2008 3:28 PM
To: ipmitool-***@lists.sourceforge.net
Subject: Re: [Ipmitool-devel] BMC hangs
Post by Hall, Eric R
Post by Tomasz Nowak
- set non-volatile-bit-rate 115.2
- set volatile-bit-rate 115.2
[...]
115.2 might be a bit too fast.
Console Redirection [Serial Port B]
Flow Control [RTS/CTS]
Baud Rate [19.2k]
Terminal Type [PC-ANSI]
Legacy OS Redirection [Enabled]
I have enabled default settings considering them best,
so (as far as I remember):
Console Redirection [Serial Port B]
Flow Control [None]
Baud Rate [115.2k]
Terminal Type [VT-100+]
Legacy OS Redirection ? - don't rembember

Following these BIOS default settings I've set up the same
speed in system bootloader (grub):

kernel /boot/xen.gz-2.6.18-53.1.14.el5 com2=115200,8n1 console=com2,vga
module /boot/vmlinuz-2.6.18-53.1.14.el5xen xencons=ttyS1 console=tty
console=ttyS1,115200n8

and that's why I've set 111.2k also for SOL.
Post by Hall, Eric R
Post by Tomasz Nowak
# pstree |tail -n5 # <-- layout is ok
# pstre # <--- lyout is broken and hardly readable
Does the BMC hang when outputting lots of data like this? Or is it
some time after?
I think the broken output doesn't influence BMC blocking.
As far as I remember all 3 hangs happend about the moment
when "bios console redirection" is passing the ball to
"system console redirection".
Post by Hall, Eric R
Slow the redirection rate down to 19.2 and see if it hangs then.
Please
let me know if that works.
Ok, I'll try (on Monday - I'd rather be close to the machine
while doing this) to slow down bios and sol speed. I suppose
I should slow down bootloader/kernel settings also, shoudn't I?

What about "Legacy OS Redirection"? Is that relevant? (RHEL)

Is PC-ANSI terminal required, or somehow better the VT-100(+)?
--
Tomek



------------------------------------------------------------------------
-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j
avaone
Tomasz Nowak
2008-04-28 16:40:54 UTC
Permalink
Post by Tomasz Nowak
Post by Hall, Eric R
Post by Tomasz Nowak
- set non-volatile-bit-rate 115.2
- set volatile-bit-rate 115.2
[...]
115.2 might be a bit too fast.
Console Redirection [Serial Port B]
Flow Control [RTS/CTS]
Baud Rate [19.2k]
Terminal Type [PC-ANSI]
Legacy OS Redirection [Enabled]
I have enabled default settings considering them best,
Console Redirection [Serial Port B]
Flow Control [None]
Baud Rate [115.2k]
Terminal Type [VT-100+]
Legacy OS Redirection ? - don't rembember
Following these BIOS default settings I've set up the same
kernel /boot/xen.gz-2.6.18-53.1.14.el5 com2=115200,8n1
console=com2,vga module /boot/vmlinuz-2.6.18-53.1.14.el5xen
xencons=ttyS1 console=tty console=ttyS1,115200n8
and that's why I've set 111.2k also for SOL.
Post by Hall, Eric R
Post by Tomasz Nowak
# pstree |tail -n5 # <-- layout is ok
# pstre # <--- lyout is broken and hardly readable
Does the BMC hang when outputting lots of data like this? Or is it
some time after?
I think the broken output doesn't influence BMC blocking.
As far as I remember all 3 hangs happend about the moment
when "bios console redirection" is passing the ball to
"system console redirection".
Post by Hall, Eric R
Slow the redirection rate down to 19.2 and see if it hangs then.
Please
let me know if that works.
Ok, I'll try (on Monday - I'd rather be close to the machine
while doing this) to slow down bios and sol speed. I suppose
I should slow down bootloader/kernel settings also, shoudn't I?
What about "Legacy OS Redirection"? Is that relevant? (RHEL)
Ok, so I've changed bios settings to:
Console Redirection [Serial Port B]
Flow Control [RTS/CTS]
Baud Rate [19.2k]
Terminal Type [VT-100]
Legacy OS Redirection [Enabled]

I've also changed:
- grub.conf settings to 19200
- sol bit-rate settings to 19.2

And from now on it seems to work better. No broken/mangled
chars on console. Eric thanks for that tip!

I can not replicate BMC hangs now, but I suppose
repeating sol kill+establish-new tries might be my case.

Please note I've had no idea there was a BMC 0.63 firmware.
That's because the latest BMC presented either for
OFU or for IDA CD is 0.62:
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&DwnldID=15426&strOSs=All&OSFullName=All
Operating Systems&lang=eng

BMC 0.63 shows up only in DOS update software package
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&DwnldID=15840&strOSs=All&OSFullName=All
Operating Systems&lang=eng

I'll upgrade to 0.63 today.
Thanks again to Eric and Andrew!
--
Tomek
Tomasz Nowak
2008-04-28 19:33:20 UTC
Permalink
Post by Tomasz Nowak
Please note I've had no idea there was a BMC 0.63 firmware.
That's because the latest BMC presented either for
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&DwnldID=15426&strOSs=All&OSFullName=All
Operating Systems&lang=eng
BMC 0.63 shows up only in DOS update software package
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&DwnldID=15840&strOSs=All&OSFullName=All
Operating Systems&lang=eng
I'll upgrade to 0.63 today.
I've tried hard, but it's impossible.
I have wasted 4 hours on this :(

1. DOS-way - no-way.
I've burned RAM-DOS iso from Development Toolkit 2.0,
it boots ok, but entering dos shell mode neither allows me
to see usb-key nor even second CD with firmware files.
Both available disk (C:, D:) are RAM-DOS controlled.

2. So then I've tried Intel Development Aassistant 1.3
(isolinux etc). Very nice tool, but:

a) Intel on-line updates show January'08 versions
as lastest (these I already have installed this way)

b) software packages (.zip) for DOS updates that I've downloaded
from April last update are not seen from IDA (even if
release.txt says they should)

3. So I've created my own zip with R0088.Cap, ALBMC63C.hex
etc and edited flashupdt.cfg manually to point to new files.
My own created zip was discovered and I've managed to upgrade
bios to v88, but.. BMC firmware still seems not to be seen
by IDA. At least no new version is shown in the update table.
I didn't risk bricking my server to I've tried tha final way:

4. - OFU

# unzip OFU961.zip
# chmod +x Install_OFU_Linux Utilities/OFU/setup_linux
Utilities/OFU/linux/installme

# ./Install_OFU_Linux
./Install_OFU_Linux: line 1: cd: Utilities/ofu: No such file or directory
./Install_OFU_Linux: line 2: ./setup_linux: No such file or directory

# Utilities/OFU/setup_linux
Segmentation fault

# Utilities/OFU/setup_linux
Segmentation fault [*]

# Utilities/OFU/linux/installme 1
installme: Unable to install the One-Boot Flash Update Utility.
: The w3c-libwww package was not detected on the system. In order to
install and
: use the One-Boot Flash Update Utility, the w3c-libwww package must be
installed.
: This package can be obtained from the operating system's installation
CD.
Exiting installation...

# rpm -ihv Utilities/OFU/linux/w3c-libwww-5.4.0-1.i586.rpm
Preparing... ###########################################
[100%]
1:w3c-libwww ###########################################
[100%]

# cd Utilities/OFU/linux/
# ./installme 1
installme: This script installs the RPM packages needed for One-Boot
Flash Update Utility.
: The flashupdt package contains the One-Boot Flash Update Utility and
libraries.
: All these components are required in order to use the One-Boot Flash
Update Utility.

installme: *1* Installing flashupdt RPM package...
error: Failed dependencies:
libwwwzip.so.0 is needed by flashupdt-1.9.61-1.i386
installme: *ERROR* Installation of the flashupdt RPM package failed.
Exiting installation...


* kernel: setup_linux[3403]: segfault at 0000000000000047
rip 00000000f7e222b0 rsp 00000000ffbc2da0 error 4

:/
--
Tomek
Hall, Eric R
2008-04-30 21:51:47 UTC
Permalink
Post by Tomasz Nowak
Post by Tomasz Nowak
BMC 0.63 shows up only in DOS update software package
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&D
wnldID=1
Post by Tomasz Nowak
5840&strOSs=All&OSFullName=All
Post by Tomasz Nowak
Operating Systems&lang=eng
I'll upgrade to 0.63 today.
I've tried hard, but it's impossible.
I have wasted 4 hours on this :(
Wow! That really stinks. I'm sorry to hear that.
Post by Tomasz Nowak
I've burned RAM-DOS iso from Development Toolkit 2.0,
it boots ok, but entering dos shell mode neither allows me
to see usb-key nor even second CD with firmware files.
Both available disk (C:, D:) are RAM-DOS controlled.
Have you tried it with a USB keyfob?
Post by Tomasz Nowak
2. So then I've tried Intel Development Aassistant 1.3
a) Intel on-line updates show January'08 versions
as lastest (these I already have installed this way)
b) software packages (.zip) for DOS updates that I've downloaded
from April last update are not seen from IDA (even if
release.txt says they should)
3. So I've created my own zip with R0088.Cap, ALBMC63C.hex
etc and edited flashupdt.cfg manually to point to new files.
My own created zip was discovered and I've managed to upgrade
bios to v88, but.. BMC firmware still seems not to be seen
by IDA. At least no new version is shown in the update table.
Luckily, the BIOS recoverable on the AL. I've never seen a BMC fail
from firmware updates. The update must succeed in order for the new FW
image to be used.
Post by Tomasz Nowak
4. - OFU
Uh oh. I don't think I like to hear what's coming next.
Post by Tomasz Nowak
# Utilities/OFU/setup_linux
Segmentation fault
You might need to be in the Utilities/OFU directory. The last time I
worked on OFU was in 2003 and back then it did not handle non-working
directory usage.
Post by Tomasz Nowak
installme: *1* Installing flashupdt RPM package...
libwwwzip.so.0 is needed by flashupdt-1.9.61-1.i386
installme: *ERROR* Installation of the flashupdt RPM package failed.
Exiting installation...
* kernel: setup_linux[3403]: segfault at 0000000000000047
rip 00000000f7e222b0 rsp 00000000ffbc2da0 error 4
Remind me again... Are you using Red Hat or ...?

Were you able to install libwwwzip and try again?

My only other suggestion would be to use DOS (sorry Andrew) on a usb fob
with the FW update tool and FW image.

Hall, Eric R
2008-04-30 21:31:25 UTC
Permalink
Post by Hall, Eric R
Post by Tomasz Nowak
Post by Hall, Eric R
Slow the redirection rate down to 19.2 and see if it hangs then.
Please
let me know if that works.
What about "Legacy OS Redirection"? Is that relevant? (RHEL)
Depends... I've got a system where when it's enabled the system (not
BMC) hangs when GRUB loads. When I turn Legacy Redirection off and let
GRUB have its own serial interface I don't have any problems. When I
PXE boot the same system into DOS, avoiding GRUB, with Legacy OS
Redirection enabled I have no problems.
Post by Hall, Eric R
Console Redirection [Serial Port B]
Flow Control [RTS/CTS]
Baud Rate [19.2k]
Terminal Type [VT-100]
Legacy OS Redirection [Enabled]
- grub.conf settings to 19200
- sol bit-rate settings to 19.2
And from now on it seems to work better. No broken/mangled
chars on console. Eric thanks for that tip!
Although I'm sure someone will harangue me for this, 115.2 is not the
optimal setting. I don't know why it's the default!? 19.2 - albeit
painfully slow (at least for me) - works much better normally and under
duress.
Post by Hall, Eric R
I can not replicate BMC hangs now, but I suppose
repeating sol kill+establish-new tries might be my case.
Please note I've had no idea there was a BMC 0.63 firmware.
That's because the latest BMC presented either for
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&D
wnldID=1
Post by Hall, Eric R
5426&strOSs=All&OSFullName=All
Operating Systems&lang=eng
BMC 0.63 shows up only in DOS update software package
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&ProductID=2451&D
wnldID=1
Post by Hall, Eric R
5840&strOSs=All&OSFullName=All
Operating Systems&lang=eng
I'll upgrade to 0.63 today.
Thanks again to Eric and Andrew!
--
Tomek
-----------------------------------------------------------------------
--
Post by Hall, Eric R
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/
javaone
Post by Hall, Eric R
_______________________________________________
Ipmitool-devel mailing list
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel
Loading...