IPMI
Alteeve Wiki :: How To :: IPMI |
IPMI is an acronym for Intelligent Platform Management Interface. This is a technology built into many server-grade mainboards. This is called the Baseboard Management Controller, or BMC. IPMI, via the BMC, allows "out of band" access to a server. This means that, via an IPMI interface, a user can remotely connect to a server regardless of it's power state and sensor data.
The BMC is isolated from the host machine, and so it can be used to power on a machine that is off, force a crashed machine to reboot, etc.
Company-Specific Implementations of IPMI
Many companies build on the basic IPMI standard by adding advanced features like remote console access over the network, ability to monitor devices plugged into the server like the RAID controller and its hard drives and so on. Each vendor generally has a name for their implementation of IPMI;
Various other vendors will have different names as well. In most cases though, they will all support the generic IPMI interface, plus additional features beyond the IPMI specification.
Fencing
In and Anvil! cluster, IPMI is used to force a node that has entered into an unknown state into a known state. Specifically, when a node stops responding, it will be forced to power off via IPMI. This is a mechanism called fencing, and it is core to the Anvil! cluster's reliability. IPMI is not the only way to fence a node, but when it is available, it is always the primary way to fence.
Configuring IPMI
Note: In M3 Anvil! clusters, you do not need to configure IPMI manually. The Anvil! will detect and auto-configure most versions of IPMI automatically. If you found this not to be the case, please contact us so that we can add support for your hardware. |
If you want to, or need to, manually configure IPMI, we'll show you now how to do that from the linux command line.
Finding the BMC
We need to assign an IP address to each IPMI BMC and then configure the user name and password to use later when connecting.
We will also use the sensor values reported by the IPMI BMC in our monitoring and alert system. If, for example, a temperate climbs too high or too fast, the alert system will be able to see this and fire off an alert.
Note: This section walks through configuring IPMI on an-a01dr01 only on a Fujitsu Primergy server. The specifics for different brands will vary, but the guide should apply similarly. |
Note: This tutorial requires that the freeipmi and ipmitool packages are installed. If any of the Anvil! RPMs are installed, these will already be installed also. |
Check to see if your machine has an IPMI BMC;
an-a01dr01 | dmidecode --type 38
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
Handle 0x0020, DMI type 38, 18 bytes
IPMI Device Information
Interface Type: KCS (Keyboard Control Style)
Specification Version: 2.0
I2C Slave Address: 0x10
NV Storage Device: Not Present
Base Address: 0x0000000000000CA2 (I/O)
Register Spacing: Successive Byte Boundaries
|
---|
The IPMI Device Information line shows us that there is indeed an IPMI BMC on this machine. Knowing this, we should be able to see that there is a /dev/ipmi0 device available.
an-a01dr01 | ls -lah /dev/ipmi0
crw-------. 1 root root 242, 0 Aug 11 15:44 /dev/ipmi0
|
---|
Looking good!
Reading IPMI Data
This tells us that we should be able to talk to the BMC. If this failed, /dev/ipmi0 would not exist. If this is the case for you, please find what make and model of IPMI BMC is used in your server and look for known issues with that chip.
The first thing we'll check is that we can query IPMI's chassis data:
an-a01dr01 | ipmitool chassis status
System Power : on
Power Overload : false
Power Interlock : inactive
Main Power Fault : false
Power Control Fault : false
Power Restore Policy : always-off
Last Power Event : ac-failed
Chassis Intrusion : inactive
Front-Panel Lockout : inactive
Drive Fault : false
Cooling/Fan Fault : false
Front Panel Control : none
|
---|
Excellent! If you get something like this, you're past 90% of the potential problems.
We can check more information on the hosts using mc to query the management controller.
an-a01dr01 | ipmitool mc info
Device ID : 4
Device Revision : 2
Firmware Revision : 1.00
IPMI Version : 2.0
Manufacturer ID : 10368
Manufacturer Name : Fujitsu Siemens
Product ID : 1041 (0x0411)
Product Name : Unknown (0x411)
Device Available : yes
Provides Device SDRs : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device
Aux Firmware Rev Info :
0x09
0x44
0x00
0x46
|
---|
Some servers will report the details of "field replaceable units"; components than can be swapped out as needed. Every server will report different data here. We can also look at the FRU, or field-replaceable units.
an-a01dr01 | ipmitool fru print
FRU Device Description : Builtin FRU Device (ID 0)
Board Mfg Date : Sun Aug 16 23:24:00 2015
Board Mfg : FUJITSU
Board Product : D3229
Board Serial : 47980304
Board Part Number : S26361-D3229-A15
Board Extra : WGS06 GS02
Board Extra : 02
Product Manufacturer : FUJITSU
Product Name : PRIMERGY RX1330 M1
Product Part Number : ABN:K1537-V401-908
Product Version : GS01
Product Serial : YLWT010802
Product Asset Tag : 15
Product Extra : d10a9e
Product Extra : 0411
Product Extra : CS0f
FRU Device Description : Chassis (ID 2)
Chassis Type : Rack Mount Chassis
Chassis Extra : RX1330M1R7
Product Manufacturer : FUJITSU
Product Name : PRIMERGY RX1330 M1
Product Part Number : ABN:K1537-V401-908
Product Version : GS01
Product Serial : YLWT010802
Product Asset Tag : 15
Product Extra : d10a9e
Product Extra : 0411
Product Extra : CS0f
FRU Device Description : MainBoard (ID 3)
Board Mfg Date : Sun Aug 16 23:24:00 2015
Board Mfg : FUJITSU
Board Product : D3229
Board Serial : 47980304
Board Part Number : S26361-D3229-A15
Board Extra : WGS06 GS02
Board Extra : 02
FRU Device Description : PSU STD (ID 11)
Device not present (Command response could not be provided)
FRU Device Description : PSU1 (ID 12)
Board Mfg Date : Sat May 30 19:29:00 2015
Board Mfg : DELTA
Board Product : DPS-450SB A
Board Serial : DCND1522062427
Board Part Number : A3C40161429
Board Extra : S9F
Board Extra : 08
FRU Device Description : PSU2 (ID 13)
Board Mfg Date : Sat May 30 18:36:00 2015
Board Mfg : DELTA
Board Product : DPS-450SB A
Board Serial : DCND1522062468
Board Part Number : A3C40161429
Board Extra : S9F
Board Extra : 08
|
---|
We can check all the sensor value using ipmitool as well. This is actually what the cluster monitor we'll install later does.
an-a01dr01 | ipmitool sdr list
Ambient | 26 degrees C | ok
Systemboard | 39 degrees C | ok
CPU | 47 degrees C | ok
MEM A | 34 degrees C | ok
MEM B | 33 degrees C | ok
PSU Inlet | disabled | ns
PSU1 Inlet | 38 degrees C | ok
PSU2 Inlet | 36 degrees C | ok
PSU | disabled | ns
PSU1 | 63 degrees C | ok
PSU2 | 63 degrees C | ok
BBU | 33 degrees C | ok
RAID Controller | 73 degrees C | ok
BATT 3.0V | 2.44 Volts | ok
STBY 3.3V | 3.37 Volts | ok
iRMC 1.8V STBY | 1.77 Volts | ok
iRMC 1.5V STBY | 1.49 Volts | ok
iRMC 1.0V STBY | 0.98 Volts | ok
MAIN 12V | 12.78 Volts | ok
MAIN 5V | 5.24 Volts | ok
MAIN 3.3V | 3.33 Volts | ok
MEM 1.35V | 1.35 Volts | ok
PCH 1.05V | 1.04 Volts | ok
MEM VTT 0.68V | 0.66 Volts | ok
FAN1 SYS | 7020 RPM | ok
FAN2 SYS | 4080 RPM | ok
FAN3 SYS | 4080 RPM | ok
FAN4 SYS | 4440 RPM | ok
FAN5 SYS | 3840 RPM | ok
FAN PSU | disabled | ns
FAN PSU1 | 2960 RPM | ok
FAN PSU2 | 2560 RPM | ok
PSU Power | disabled | ns
PSU1 Power | 54 Watts | ok
PSU2 Power | 54 Watts | ok
Total Power | 108 Watts | ok
Total Power Out | 84 Watts | ok
I2C1 error ratio | 0 percent | ok
I2C2 error ratio | 0 percent | ok
I2C3 error ratio | 0 percent | ok
I2C4 error ratio | 0 percent | ok
I2C5 error ratio | 0 percent | ok
I2C6 error ratio | 0 percent | ok
I2C7 error ratio | 0 percent | ok
I2C8 error ratio | 0 percent | ok
SEL Level | 0 percent | ok
Ambient | 0x00 | ok
Ambient | 0x00 | ok
CPU | 0x00 | ok
Power Limit | 0x00 | ok
Power Unit | 0x00 | ok
PSU Config | 0x00 | ok
PSU | Not Readable | ns
PSU | Not Readable | ns
PSU1 | 0x00 | ok
PSU2 | 0x00 | ok
Power Level | 0x00 | ok
P-STATE Throttle | 0x00 | ok
System State | 0x00 | ok
FAN1 SYS | 0x00 | ok
FAN2 SYS | 0x00 | ok
FAN3 SYS | 0x00 | ok
FAN4 SYS | 0x00 | ok
FAN5 SYS | 0x00 | ok
FAN PSU | Not Readable | ns
FAN PSU1 | 0x00 | ok
FAN PSU2 | 0x00 | ok
Watchdog | 0x00 | ok
Housing open | 0x00 | ok
CPU detection | 0x00 | ok
ME | 0x00 | ok
iRMC request | 0x00 | ok
I2C1 | 0x00 | ok
I2C2 | 0x00 | ok
I2C3 | 0x00 | ok
I2C4 | 0x00 | ok
I2C5 | 0x00 | ok
I2C6 | 0x00 | ok
I2C7 | 0x00 | ok
I2C8 | 0x00 | ok
Config backup | 0x00 | ok
FAN5 SYS | 0x00 | ok
Power Unit | 0x00 | ok
PSU | 0x00 | ok
PSU1 | 0x00 | ok
PSU2 | 0x00 | ok
Total Power Out | 0x00 | ok
Power Level | 0x00 | ok
System Mgmt SW | Not Readable | ns
Local Monitor | 0x00 | ok
Pwr Btn override | 0x00 | ok
NMI | 0x00 | ok
System BIOS | Not Readable | ns
iRMC | 0x00 | ok
|
---|
You can narrow that call down to just see temperature, power consumption and what not. That's beyond the scope of this tutorial though. The man page for ipmitool is great for seeing all the neat stuff you can do.
Configuring IPMI LAN Access
So far, we've been directly talking to the BMC, via the operating system on the host. To be useful in an Anvil! cluster (or in many other use cases), we want to be able to talk to the BMC over the network
Before we can configure it though, we need to find our "LAN channel". Different manufacturers will use different channels, so we need to be able to find the one we're using.
To find it, simply call ipmitool lan print X. Increment X, starting at 1, until you get a response.
So first, let's query LAN channel 1.
an-a01dr01 | ipmitool lan print 1
Get Channel Info command failed: Destination unavailable
Invalid channel: 1
|
---|
No luck; Let's try channel 2.
an-a01dr01 | ipmitool lan print 2
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD OEM
Auth Type Enable : Callback : MD5 PASSWORD OEM
: User : MD5 PASSWORD OEM
: Operator : MD5 PASSWORD OEM
: Admin : MD5 PASSWORD OEM
: OEM : MD5 PASSWORD OEM
IP Address Source : BIOS Assigned Address
IP Address : 10.201.23.3
Subnet Mask : 255.255.0.0
MAC Address : 90:1b:0e:50:59:8a
SNMP Community String : public
IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl : 0.0 seconds
Default Gateway IP : 10.20.255.254
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 0,1,2,3,6,7,8,11,12,15,16,17
Cipher Suite Priv Max : XaaaaaaaaXXaXXX
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
Bad Password Threshold : 0
Invalid password disable: no
Attempt Count Reset Int.: 0
User Lockout Interval : 0
|
---|
Found it! So we know that this server uses LAN channel 2. We'll need to use this for the next steps.
Reading IPMI Network Info
Now that we can read our IPMI data, it's time to set some values.
Note: If you're not familiar with how IPs are assigned in an Anvil! cluster, check out "Anvil! Networking". |
We know that we want to set an-a01dr01's IPMI interface to have the IP 10.201.11.3/16. We also need to setup a user on the IPMI BMC so that we can log in from other Anvil! cluster machines.
First up, let's set the IP address. Remember to use the LAN channel you found on your server. We don't actually have a gateway on the 10.201.0.0/16 Back-Channel Network, but some devices insist on a default gateway being set. For this reason, we'll always set 10.201.255.254 as the gateway server.
This requires four calls;
- Tell the interface to use a static IP address.
- Set the IP address
- Set the subnet mask
- (optional) Set the default gateway
an-a01dr01 | ipmitool lan set 2 ipsrc static
ipmitool lan set 2 ipaddr 10.201.11.3
Setting LAN IP Address to 10.201.11.3
ipmitool lan set 2 netmask 255.255.0.0
Setting LAN Subnet Mask to 255.255.0.0
ipmitool lan set 2 defgw ipaddr 10.201.255.254
Setting LAN Default Gateway IP to 10.201.255.254
|
---|
Now we'll again print the LAN channel information and we should see that the IP address has been set.
an-a01dr01 | ipmitool lan print 2
Set in Progress : Set In Progress
Auth Type Support : NONE MD2 MD5 PASSWORD OEM
Auth Type Enable : Callback : MD5 PASSWORD OEM
: User : MD5 PASSWORD OEM
: Operator : MD5 PASSWORD OEM
: Admin : MD5 PASSWORD OEM
: OEM : MD5 PASSWORD OEM
IP Address Source : Static Address
IP Address : 10.201.11.3
Subnet Mask : 255.255.0.0
MAC Address : 90:1b:0e:50:59:8a
SNMP Community String : public
IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl : 0.0 seconds
Default Gateway IP : 10.201.255.254
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 0,1,2,3,6,7,8,11,12,15,16,17
Cipher Suite Priv Max : XaaaaaaaaXXaXXX
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
Bad Password Threshold : 0
Invalid password disable: no
Attempt Count Reset Int.: 0
User Lockout Interval : 0
|
---|
Note that "Set in Progress: Set In Progress" is shown. You will want to wait until this displays "Set in Progress: Set Complete".
Lets test that we can ping the IP. We'll do this from an-a01n01.
an-a01n01 | ping -c 1 10.201.11.3
PING 10.201.11.3 (10.201.11.3) 56(84) bytes of data.
64 bytes from 10.201.11.3: icmp_seq=1 ttl=64 time=0.593 ms
--- 10.201.11.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.593/0.593/0.593/0.000 ms
|
---|
Excellent!
If you don't get a response, see the section below.
Cold Resetting the BMC
Some IPMI BMCs won't update their IPs or respond to pings until they've been "cold" reset. To do this, run;
an-a01dr01 | bmc reset cold
Sent cold reset command to MC
|
---|
This will trigger a cold reboot of the BMC. During the reboot, no calls to the IPMI BMC will work. Likewise, nothing will be able to ping the BMC. In most cases, you will hear the fans on the machine spool up close to the end of the reboot process.
Warning: Be patient! It's normal for the BMC cold reset to take a few minutes to complete. |
Find the IPMI User ID
Next up is to find the IPMI administrative user name and user ID. We'll record the name for later use in the cluster setup. We'll use the ID to update the user's password.
To see the list of users, run the following.
an-a05n01 | ipmitool user list 2
ID Name Callin Link Auth IPMI Msg Channel Priv Limit
1 true true true Unknown (0x00)
2 admin true true true OEM
|
---|
Note: If you see an error like "Get User Access command failed (channel 2, user 3): Unknown (0x32)", it is safe to ignore. |
Normally you should see OEM or ADMINISTRATOR under the Channel Priv Limit column. Above we see that the user named admin with ID 2 is OEM, so that is the user we will use.
Note: The 2 in the next argument corresponds to the user ID, not the LAN channel! |
To set the password to secret, run the following command and then enter the word secret twice.
an-a05n01 | ipmitool user set password 2
Password for user 2:
Password for user 2:
|
---|
Done!
Testing the IPMI Connection From the Peer
At this point, we've set each node's IPMI BMC network address and admin user's password. Now it's time to make sure it works.
In the example above, we walked through setting up an-a05n01's IPMI BMC. So here, we will log into an-a05n02 and try to connect to an-a05n01.ipmi to make sure everything works.
- From an-a05n02
an-a05n02 | ipmitool -I lanplus -U admin -P secret -H an-a01dr01.ipmi chassis power status
Chassis Power is on
|
---|
Excellent! Now let's test from an-a01n01 connecting to an-a01dr01.ipmi.
an-a01dr01 | ipmitool -I lanplus -U admin -P secret -H an-a05n02.ipmi chassis power status
Chassis Power is on
|
---|
Woohoo!
Any questions, feedback, advice, complaints or meanderings are welcome. | ||||
Us: Alteeve's Niche! | Support: Mailing List | IRC: #clusterlabs on Libera Chat | ||
© Alteeve's Niche! Inc. 1997-2023 | Anvil! "Intelligent Availability™" Platform | |||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |