记录日常工作关于系统运维,虚拟化云计算,数据库,网络安全等各方面问题。

ESXi 5.0 host experiences a purple diagnostic screen with the errors "Failed to ack TLB invalidate" or "no heartbeat" on HP servers with PCC support (2000091)

Symptoms

  • ESXi 5.0 host fails with a purple diagnostic screen
  • The purple diagnostic screen or core dump contains messages similar to:

    • PCPU 39 locked up. Failed to ack TLB invalidate (total of 1 locked up, PCPU(s): 39).
      0x41228efc7b88:[0x41800646cd62]Panic@vmkernel#nover+0xa9 stack: 0x41228efe5000
      0x41228efc7cb8:[0x4180064989af]TLBDoInvalidate@vmkernel#nover+0x45a stack: 0x41228efc7ce8


    • @BlueScreen: PCPU 0: no heartbeat, IPIs received (0/1).
      ...
      0x4122c27c7a68:[0x41800966cd62]Panic@vmkernel#nover+0xa9 stack: 0x4122c27c7a98
      0x4122c27c7ad8:[0x4180098d80ec]Heartbeat_DetectCPULockups@vmkernel#nover+0x2d3 stack: 0x0
      ...
      NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x7eb2e(0x418009600000):0x4122c2307688:0x4010](Src 0x1, CPU140)
      Heartbeat: 618: PCPU 140 didn't have a heartbeat for 8 seconds. *may* be locked up

Cause

Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.

As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.

Resolution

This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default. For more information, see VMware ESXi 5.0, Patch ESXi500-Update02: VMware ESXi 5.0 Complete Update 2 (2033751) and the ESXi 5.0 Update 2 Release Notes.


To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.

To disable PCC:
  1. Connect to the ESXi host using the vSphere Client.
  2. Click the Configuration tab.
  3. In the Software menu, click Advanced Settings.
  4. Select vmkernel.
  5. Deselect the vmkernel.boot.usePCC option.
  6. Restart the host for the change to take effect.
For more information, see Configuring advanced options for ESXi/ESX (1038578).

Additional Information

To be alerted when this article is updated, click Subscribe to Document in the Actions box.

For more information, see the HP Customer Advisory article c03543898.

Note: This is a specific case of a Failed to ack TLB invalidate based purple diagnostic screen. For more information about general cases:
If looking at the logs and searching the Knowledge Base does not reveal any additional error messages that would justify the outage, or if the error has not been documented within the Knowledge Base, collect diagnostic information from the VMware ESXi host and submit a Support Request.

For more information, see:

For more information, see ESXi hosts that use HP CRU driver fail with a purple diagnostic screen when ECC events occur (2001207).



网站已经关闭评论