User Tools

Site Tools


general:ipkm_it_post_mortem

2025-02-06 mail server crash

Mail server crashed. Reason not fully clear, was working until around 15:05. Mail host was responding to pings but ssh and all mail/LDAP related stuff was not working.

Node was moved to pve-gustav (which ran a different qemu-kvm version, wee below) the evening before. Not according to the logfiles.

Graylog search: “update VM 105” The hotplug setting has not been changed either.

Somebody rebooted the MV around 15:08:

Feb  6 15:08:10 mail kernel: [    0.000000] Linux version 5.10.0-33-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.226-1 (2024-10-03)
...
Feb  6 15:08:10 mail kernel: [   26.811753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Feb  6 15:08:10 mail kernel: [   26.811783] Call Trace:
Feb  6 15:08:10 mail kernel: [   26.813554]  dump_stack+0x6b/0x83
Feb  6 15:08:10 mail kernel: [   26.814022]  dump_header+0x4a/0x1f4
Feb  6 15:08:10 mail kernel: [   26.814448]  oom_kill_process.cold+0xb/0x10
Feb  6 15:08:10 mail kernel: [   26.814827]  out_of_memory+0x1bd/0x4e0
Feb  6 15:08:10 mail kernel: [   26.815223]  __alloc_pages_slowpath.constprop.0+0xc02/0xcc0
Feb  6 15:08:10 mail kernel: [   26.815580]  __alloc_pages_nodemask+0x2de/0x310
Feb  6 15:08:10 mail kernel: [   26.815942]  pagecache_get_page+0x175/0x390
Feb  6 15:08:10 mail kernel: [   26.816294]  filemap_fault+0x6a2/0x900
Feb  6 15:08:10 mail kernel: [   26.816655]  ? xas_load+0x5/0x80
Feb  6 15:08:10 mail kernel: [   26.817069]  ext4_filemap_fault+0x2d/0x50 [ext4]
Feb  6 15:08:10 mail kernel: [   26.817430]  __do_fault+0x37/0x170
Feb  6 15:08:10 mail kernel: [   26.817754]  handle_mm_fault+0x124d/0x1c00
Feb  6 15:08:10 mail kernel: [   26.818145]  do_user_addr_fault+0x1b8/0x400
Feb  6 15:08:10 mail kernel: [   26.818484]  exc_page_fault+0x78/0x160
Feb  6 15:08:10 mail kernel: [   26.818785]  ? asm_exc_page_fault+0x8/0x30
Feb  6 15:08:10 mail kernel: [   26.819121]  asm_exc_page_fault+0x1e/0x30
Feb  6 15:08:10 mail kernel: [   26.819439] RIP: 0033:0x7f8e1fe9d386
Feb  6 15:08:10 mail kernel: [   26.819740] Code: Unable to access opcode bytes at RIP 0x7f8e1fe9d35c.
Feb  6 15:08:10 mail kernel: [   26.820066] RSP: 002b:00007fff7fde5570 EFLAGS: 00010202
Feb  6 15:08:10 mail kernel: [   26.820387] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
Feb  6 15:08:10 mail kernel: [   26.820707] RDX: 00007f8e1feb09a0 RSI: 0000000000000000 RDI: 00005576b32a0270
Feb  6 15:08:10 mail kernel: [   26.821115] RBP: 00005576b32a0270 R08: 00005576b32a0270 R09: 00007f8e1fe73be0
Feb  6 15:08:10 mail kernel: [   26.821465] R10: 00005576b32a0170 R11: 0000000000000070 R12: 00007fff7fde55bc
Feb  6 15:08:10 mail kernel: [   26.821764] R13: 0000000000000004 R14: 0000000000000000 R15: 00007fff7fde5890
Feb  6 15:08:10 mail kernel: [   26.822111] Mem-Info:
Feb  6 15:08:10 mail kernel: [   26.822431] active_anon:62 inactive_anon:4025 isolated_anon:0
Feb  6 15:08:10 mail kernel: [   26.822431]  active_file:132 inactive_file:35 isolated_file:0
Feb  6 15:08:10 mail kernel: [   26.822431]  unevictable:0 dirty:0 writeback:0
Feb  6 15:08:10 mail kernel: [   26.822431]  slab_reclaimable:3337 slab_unreclaimable:7300
Feb  6 15:08:10 mail kernel: [   26.822431]  mapped:138 shmem:739 pagetables:315 bounce:0
Feb  6 15:08:10 mail kernel: [   26.822431]  free:11707 free_pcp:0 free_cma:0
Feb  6 15:08:10 mail kernel: [   26.824222] Node 0 active_anon:248kB inactive_anon:16100kB active_file:284kB inactive_file:88kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:552kB dirty:0kB writeback:0kB shmem:2956kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2832kB all_unreclaimable? no
Feb  6 15:08:10 mail kernel: [   26.824849] Node 0 DMA free:4128kB min:788kB low:984kB high:1180kB reserved_highatomic:0KB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:5076kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Feb  6 15:08:10 mail kernel: [   26.825570] lowmem_reserve[]: 0 844 844 844 844
Feb  6 15:08:10 mail kernel: [   26.825933] Node 0 DMA32 free:42896kB min:42940kB low:53672kB high:64404kB reserved_highatomic:0KB active_anon:248kB inactive_anon:16096kB active_file:48kB inactive_file:292kB unevictable:0kB writepending:0kB present:1032040kB managed:652036kB mlocked:0kB pagetables:1260kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Feb  6 15:08:10 mail kernel: [   26.827042] lowmem_reserve[]: 0 0 0 0 0
Feb  6 15:08:10 mail kernel: [   26.827439] Node 0 DMA: 26*4kB (M) 13*8kB (UM) 13*16kB (M) 10*32kB (UM) 1*64kB (M) 4*128kB (UM) 1*256kB (U) 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB = 4128kB
Feb  6 15:08:10 mail kernel: [   26.828236] Node 0 DMA32: 476*4kB (UME) 201*8kB (ME) 136*16kB (ME) 87*32kB (UME) 86*64kB (UME) 48*128kB (UME) 19*256kB (UME) 3*512kB (UM) 1*1024kB (U) 8*2048kB (UME) 0*4096kB = 43928kB
Feb  6 15:08:10 mail kernel: [   26.829155] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb  6 15:08:10 mail kernel: [   26.829594] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb  6 15:08:10 mail kernel: [   26.830021] 913 total pagecache pages
Feb  6 15:08:10 mail kernel: [   26.830427] 0 pages in swap cache
Feb  6 15:08:10 mail kernel: [   26.830779] Swap cache stats: add 0, delete 0, find 0/0
Feb  6 15:08:10 mail kernel: [   26.831302] Free swap  = 0kB
Feb  6 15:08:10 mail kernel: [   26.831726] Total swap = 0kB
Feb  6 15:08:10 mail kernel: [   26.832224] 262008 pages RAM
Feb  6 15:08:10 mail kernel: [   26.832793] 0 pages HighMem/MovableOnly
Feb  6 15:08:10 mail kernel: [   26.833272] 97730 pages reserved
Feb  6 15:08:10 mail kernel: [   26.833657] 0 pages hwpoisoned
Feb  6 15:08:10 mail kernel: [   26.834067] Unreclaimable slab info:
Feb  6 15:08:10 mail kernel: [   26.834458] Name                      Used          Total
Feb  6 15:08:10 mail kernel: [   26.834929] ext4_system_zone           3KB          3KB
Feb  6 15:08:10 mail kernel: [   26.835368] scsi_sense_cache         400KB        400KB
Feb  6 15:08:10 mail kernel: [   26.835764] RAWv6                     30KB         30KB
Feb  6 15:08:10 mail kernel: [   26.836221] UDPv6                     94KB         94KB
Feb  6 15:08:10 mail kernel: [   26.836631] mqueue_inode_cache         31KB         31KB
Feb  6 15:08:10 mail kernel: [   26.837034] UNIX                     382KB        382KB
Feb  6 15:08:10 mail kernel: [   26.837435] RAW                       32KB         32KB
Feb  6 15:08:10 mail kernel: [   26.837808] hugetlbfs_inode_cache         30KB         30KB
Feb  6 15:08:10 mail kernel: [   26.838234] eventpoll_pwq             47KB         47KB
Feb  6 15:08:10 mail kernel: [   26.838639] request_queue            411KB        506KB
Feb  6 15:08:10 mail kernel: [   26.839075] biovec-max               480KB        480KB

Going in to the Proxmox web interface and attach the console to see if there is any output on the terminal revealed some kind of memory issues (oom, etc.) Reboot did not help, VM crashed very soon after that.

Fix: Disable the option for memory hotplug!

hotplug: disk,network,usb,cpu # was additionally with memory
pve-donna.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
pve-emil.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
pve-franz.cluster: ii  pve-qemu-kvm   9.0.2-4      amd64        Full virtualization on x86 hardware 
pve-gustav.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
pve-hans.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware

Symptoms are similar to: https://forum.proxmox.com/threads/memory-hotplug-prevents-vm-boot.122599/

general/ipkm_it_post_mortem.txt · Last modified: 2025-02-07 11:09 by Markus Rosenstihl