2025-02-06 mail server crash
Mail server crashed. Reason not fully clear, was working until around 15:05. Mail host was responding to pings but ssh and all mail/LDAP related stuff was not working.
Node was moved to pve-gustav (which ran a different qemu-kvm version, wee below) the evening before. Not according to the logfiles.
Graylog search: “update VM 105” The hotplug setting has not been changed either.
Somebody rebooted the MV around 15:08:
Feb 6 15:08:10 mail kernel: [ 0.000000] Linux version 5.10.0-33-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.226-1 (2024-10-03) ... Feb 6 15:08:10 mail kernel: [ 26.811753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Feb 6 15:08:10 mail kernel: [ 26.811783] Call Trace: Feb 6 15:08:10 mail kernel: [ 26.813554] dump_stack+0x6b/0x83 Feb 6 15:08:10 mail kernel: [ 26.814022] dump_header+0x4a/0x1f4 Feb 6 15:08:10 mail kernel: [ 26.814448] oom_kill_process.cold+0xb/0x10 Feb 6 15:08:10 mail kernel: [ 26.814827] out_of_memory+0x1bd/0x4e0 Feb 6 15:08:10 mail kernel: [ 26.815223] __alloc_pages_slowpath.constprop.0+0xc02/0xcc0 Feb 6 15:08:10 mail kernel: [ 26.815580] __alloc_pages_nodemask+0x2de/0x310 Feb 6 15:08:10 mail kernel: [ 26.815942] pagecache_get_page+0x175/0x390 Feb 6 15:08:10 mail kernel: [ 26.816294] filemap_fault+0x6a2/0x900 Feb 6 15:08:10 mail kernel: [ 26.816655] ? xas_load+0x5/0x80 Feb 6 15:08:10 mail kernel: [ 26.817069] ext4_filemap_fault+0x2d/0x50 [ext4] Feb 6 15:08:10 mail kernel: [ 26.817430] __do_fault+0x37/0x170 Feb 6 15:08:10 mail kernel: [ 26.817754] handle_mm_fault+0x124d/0x1c00 Feb 6 15:08:10 mail kernel: [ 26.818145] do_user_addr_fault+0x1b8/0x400 Feb 6 15:08:10 mail kernel: [ 26.818484] exc_page_fault+0x78/0x160 Feb 6 15:08:10 mail kernel: [ 26.818785] ? asm_exc_page_fault+0x8/0x30 Feb 6 15:08:10 mail kernel: [ 26.819121] asm_exc_page_fault+0x1e/0x30 Feb 6 15:08:10 mail kernel: [ 26.819439] RIP: 0033:0x7f8e1fe9d386 Feb 6 15:08:10 mail kernel: [ 26.819740] Code: Unable to access opcode bytes at RIP 0x7f8e1fe9d35c. Feb 6 15:08:10 mail kernel: [ 26.820066] RSP: 002b:00007fff7fde5570 EFLAGS: 00010202 Feb 6 15:08:10 mail kernel: [ 26.820387] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 Feb 6 15:08:10 mail kernel: [ 26.820707] RDX: 00007f8e1feb09a0 RSI: 0000000000000000 RDI: 00005576b32a0270 Feb 6 15:08:10 mail kernel: [ 26.821115] RBP: 00005576b32a0270 R08: 00005576b32a0270 R09: 00007f8e1fe73be0 Feb 6 15:08:10 mail kernel: [ 26.821465] R10: 00005576b32a0170 R11: 0000000000000070 R12: 00007fff7fde55bc Feb 6 15:08:10 mail kernel: [ 26.821764] R13: 0000000000000004 R14: 0000000000000000 R15: 00007fff7fde5890 Feb 6 15:08:10 mail kernel: [ 26.822111] Mem-Info: Feb 6 15:08:10 mail kernel: [ 26.822431] active_anon:62 inactive_anon:4025 isolated_anon:0 Feb 6 15:08:10 mail kernel: [ 26.822431] active_file:132 inactive_file:35 isolated_file:0 Feb 6 15:08:10 mail kernel: [ 26.822431] unevictable:0 dirty:0 writeback:0 Feb 6 15:08:10 mail kernel: [ 26.822431] slab_reclaimable:3337 slab_unreclaimable:7300 Feb 6 15:08:10 mail kernel: [ 26.822431] mapped:138 shmem:739 pagetables:315 bounce:0 Feb 6 15:08:10 mail kernel: [ 26.822431] free:11707 free_pcp:0 free_cma:0 Feb 6 15:08:10 mail kernel: [ 26.824222] Node 0 active_anon:248kB inactive_anon:16100kB active_file:284kB inactive_file:88kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:552kB dirty:0kB writeback:0kB shmem:2956kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2832kB all_unreclaimable? no Feb 6 15:08:10 mail kernel: [ 26.824849] Node 0 DMA free:4128kB min:788kB low:984kB high:1180kB reserved_highatomic:0KB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:5076kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Feb 6 15:08:10 mail kernel: [ 26.825570] lowmem_reserve[]: 0 844 844 844 844 Feb 6 15:08:10 mail kernel: [ 26.825933] Node 0 DMA32 free:42896kB min:42940kB low:53672kB high:64404kB reserved_highatomic:0KB active_anon:248kB inactive_anon:16096kB active_file:48kB inactive_file:292kB unevictable:0kB writepending:0kB present:1032040kB managed:652036kB mlocked:0kB pagetables:1260kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Feb 6 15:08:10 mail kernel: [ 26.827042] lowmem_reserve[]: 0 0 0 0 0 Feb 6 15:08:10 mail kernel: [ 26.827439] Node 0 DMA: 26*4kB (M) 13*8kB (UM) 13*16kB (M) 10*32kB (UM) 1*64kB (M) 4*128kB (UM) 1*256kB (U) 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB = 4128kB Feb 6 15:08:10 mail kernel: [ 26.828236] Node 0 DMA32: 476*4kB (UME) 201*8kB (ME) 136*16kB (ME) 87*32kB (UME) 86*64kB (UME) 48*128kB (UME) 19*256kB (UME) 3*512kB (UM) 1*1024kB (U) 8*2048kB (UME) 0*4096kB = 43928kB Feb 6 15:08:10 mail kernel: [ 26.829155] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Feb 6 15:08:10 mail kernel: [ 26.829594] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Feb 6 15:08:10 mail kernel: [ 26.830021] 913 total pagecache pages Feb 6 15:08:10 mail kernel: [ 26.830427] 0 pages in swap cache Feb 6 15:08:10 mail kernel: [ 26.830779] Swap cache stats: add 0, delete 0, find 0/0 Feb 6 15:08:10 mail kernel: [ 26.831302] Free swap = 0kB Feb 6 15:08:10 mail kernel: [ 26.831726] Total swap = 0kB Feb 6 15:08:10 mail kernel: [ 26.832224] 262008 pages RAM Feb 6 15:08:10 mail kernel: [ 26.832793] 0 pages HighMem/MovableOnly Feb 6 15:08:10 mail kernel: [ 26.833272] 97730 pages reserved Feb 6 15:08:10 mail kernel: [ 26.833657] 0 pages hwpoisoned Feb 6 15:08:10 mail kernel: [ 26.834067] Unreclaimable slab info: Feb 6 15:08:10 mail kernel: [ 26.834458] Name Used Total Feb 6 15:08:10 mail kernel: [ 26.834929] ext4_system_zone 3KB 3KB Feb 6 15:08:10 mail kernel: [ 26.835368] scsi_sense_cache 400KB 400KB Feb 6 15:08:10 mail kernel: [ 26.835764] RAWv6 30KB 30KB Feb 6 15:08:10 mail kernel: [ 26.836221] UDPv6 94KB 94KB Feb 6 15:08:10 mail kernel: [ 26.836631] mqueue_inode_cache 31KB 31KB Feb 6 15:08:10 mail kernel: [ 26.837034] UNIX 382KB 382KB Feb 6 15:08:10 mail kernel: [ 26.837435] RAW 32KB 32KB Feb 6 15:08:10 mail kernel: [ 26.837808] hugetlbfs_inode_cache 30KB 30KB Feb 6 15:08:10 mail kernel: [ 26.838234] eventpoll_pwq 47KB 47KB Feb 6 15:08:10 mail kernel: [ 26.838639] request_queue 411KB 506KB Feb 6 15:08:10 mail kernel: [ 26.839075] biovec-max 480KB 480KB
Going in to the Proxmox web interface and attach the console to see if there is any output on the terminal revealed some kind of memory issues (oom, etc.) Reboot did not help, VM crashed very soon after that.
Fix: Disable the option for memory hotplug!
hotplug: disk,network,usb,cpu # was additionally with memory
pve-donna.cluster: ii pve-qemu-kvm 8.1.5-5 amd64 Full virtualization on x86 hardware pve-emil.cluster: ii pve-qemu-kvm 8.1.5-5 amd64 Full virtualization on x86 hardware pve-franz.cluster: ii pve-qemu-kvm 9.0.2-4 amd64 Full virtualization on x86 hardware pve-gustav.cluster: ii pve-qemu-kvm 8.1.5-5 amd64 Full virtualization on x86 hardware pve-hans.cluster: ii pve-qemu-kvm 8.1.5-5 amd64 Full virtualization on x86 hardware
Symptoms are similar to: https://forum.proxmox.com/threads/memory-hotplug-prevents-vm-boot.122599/