User Tools

Site Tools


general:ipkm_it_post_mortem

This is an old revision of the document!


2025-02-06 mail server crash

Mail server crashed. Reason not fully clear, was working until around 15:05. Mail host was responding to pings but ssh and all mail/LDAP related stuff was not working.

Node was moved to pve-gustav (which ran a different qemu-kvm version, wee below) the evening before.

Somebody rebooted the MV around 15:08:

Feb  6 15:08:10 mail kernel: [    0.000000] Linux version 5.10.0-33-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.226-1 (2024-10-03)
...
Feb  6 15:08:10 mail kernel: [   26.811753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Feb  6 15:08:10 mail kernel: [   26.811783] Call Trace:
Feb  6 15:08:10 mail kernel: [   26.813554]  dump_stack+0x6b/0x83
Feb  6 15:08:10 mail kernel: [   26.814022]  dump_header+0x4a/0x1f4
Feb  6 15:08:10 mail kernel: [   26.814448]  oom_kill_process.cold+0xb/0x10
Feb  6 15:08:10 mail kernel: [   26.814827]  out_of_memory+0x1bd/0x4e0
Feb  6 15:08:10 mail kernel: [   26.815223]  __alloc_pages_slowpath.constprop.0+0xc02/0xcc0
Feb  6 15:08:10 mail kernel: [   26.815580]  __alloc_pages_nodemask+0x2de/0x310
Feb  6 15:08:10 mail kernel: [   26.815942]  pagecache_get_page+0x175/0x390
Feb  6 15:08:10 mail kernel: [   26.816294]  filemap_fault+0x6a2/0x900
Feb  6 15:08:10 mail kernel: [   26.816655]  ? xas_load+0x5/0x80
Feb  6 15:08:10 mail kernel: [   26.817069]  ext4_filemap_fault+0x2d/0x50 [ext4]
Feb  6 15:08:10 mail kernel: [   26.817430]  __do_fault+0x37/0x170
Feb  6 15:08:10 mail kernel: [   26.817754]  handle_mm_fault+0x124d/0x1c00
Feb  6 15:08:10 mail kernel: [   26.818145]  do_user_addr_fault+0x1b8/0x400
Feb  6 15:08:10 mail kernel: [   26.818484]  exc_page_fault+0x78/0x160
Feb  6 15:08:10 mail kernel: [   26.818785]  ? asm_exc_page_fault+0x8/0x30
Feb  6 15:08:10 mail kernel: [   26.819121]  asm_exc_page_fault+0x1e/0x30
Feb  6 15:08:10 mail kernel: [   26.819439] RIP: 0033:0x7f8e1fe9d386
Feb  6 15:08:10 mail kernel: [   26.819740] Code: Unable to access opcode bytes at RIP 0x7f8e1fe9d35c.
Feb  6 15:08:10 mail kernel: [   26.820066] RSP: 002b:00007fff7fde5570 EFLAGS: 00010202
Feb  6 15:08:10 mail kernel: [   26.820387] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
Feb  6 15:08:10 mail kernel: [   26.820707] RDX: 00007f8e1feb09a0 RSI: 0000000000000000 RDI: 00005576b32a0270
Feb  6 15:08:10 mail kernel: [   26.821115] RBP: 00005576b32a0270 R08: 00005576b32a0270 R09: 00007f8e1fe73be0
Feb  6 15:08:10 mail kernel: [   26.821465] R10: 00005576b32a0170 R11: 0000000000000070 R12: 00007fff7fde55bc
Feb  6 15:08:10 mail kernel: [   26.821764] R13: 0000000000000004 R14: 0000000000000000 R15: 00007fff7fde5890
Feb  6 15:08:10 mail kernel: [   26.822111] Mem-Info:
Feb  6 15:08:10 mail kernel: [   26.822431] active_anon:62 inactive_anon:4025 isolated_anon:0
Feb  6 15:08:10 mail kernel: [   26.822431]  active_file:132 inactive_file:35 isolated_file:0
Feb  6 15:08:10 mail kernel: [   26.822431]  unevictable:0 dirty:0 writeback:0
Feb  6 15:08:10 mail kernel: [   26.822431]  slab_reclaimable:3337 slab_unreclaimable:7300
Feb  6 15:08:10 mail kernel: [   26.822431]  mapped:138 shmem:739 pagetables:315 bounce:0
Feb  6 15:08:10 mail kernel: [   26.822431]  free:11707 free_pcp:0 free_cma:0
Feb  6 15:08:10 mail kernel: [   26.824222] Node 0 active_anon:248kB inactive_anon:16100kB active_file:284kB inactive_file:88kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:552kB dirty:0kB writeback:0kB shmem:2956kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2832kB all_unreclaimable? no
Feb  6 15:08:10 mail kernel: [   26.824849] Node 0 DMA free:4128kB min:788kB low:984kB high:1180kB reserved_highatomic:0KB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:5076kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Feb  6 15:08:10 mail kernel: [   26.825570] lowmem_reserve[]: 0 844 844 844 844
Feb  6 15:08:10 mail kernel: [   26.825933] Node 0 DMA32 free:42896kB min:42940kB low:53672kB high:64404kB reserved_highatomic:0KB active_anon:248kB inactive_anon:16096kB active_file:48kB inactive_file:292kB unevictable:0kB writepending:0kB present:1032040kB managed:652036kB mlocked:0kB pagetables:1260kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Feb  6 15:08:10 mail kernel: [   26.827042] lowmem_reserve[]: 0 0 0 0 0
Feb  6 15:08:10 mail kernel: [   26.827439] Node 0 DMA: 26*4kB (M) 13*8kB (UM) 13*16kB (M) 10*32kB (UM) 1*64kB (M) 4*128kB (UM) 1*256kB (U) 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB = 4128kB
Feb  6 15:08:10 mail kernel: [   26.828236] Node 0 DMA32: 476*4kB (UME) 201*8kB (ME) 136*16kB (ME) 87*32kB (UME) 86*64kB (UME) 48*128kB (UME) 19*256kB (UME) 3*512kB (UM) 1*1024kB (U) 8*2048kB (UME) 0*4096kB = 43928kB
Feb  6 15:08:10 mail kernel: [   26.829155] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb  6 15:08:10 mail kernel: [   26.829594] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb  6 15:08:10 mail kernel: [   26.830021] 913 total pagecache pages
Feb  6 15:08:10 mail kernel: [   26.830427] 0 pages in swap cache
Feb  6 15:08:10 mail kernel: [   26.830779] Swap cache stats: add 0, delete 0, find 0/0
Feb  6 15:08:10 mail kernel: [   26.831302] Free swap  = 0kB
Feb  6 15:08:10 mail kernel: [   26.831726] Total swap = 0kB
Feb  6 15:08:10 mail kernel: [   26.832224] 262008 pages RAM
Feb  6 15:08:10 mail kernel: [   26.832793] 0 pages HighMem/MovableOnly
Feb  6 15:08:10 mail kernel: [   26.833272] 97730 pages reserved
Feb  6 15:08:10 mail kernel: [   26.833657] 0 pages hwpoisoned
Feb  6 15:08:10 mail kernel: [   26.834067] Unreclaimable slab info:
Feb  6 15:08:10 mail kernel: [   26.834458] Name                      Used          Total
Feb  6 15:08:10 mail kernel: [   26.834929] ext4_system_zone           3KB          3KB
Feb  6 15:08:10 mail kernel: [   26.835368] scsi_sense_cache         400KB        400KB
Feb  6 15:08:10 mail kernel: [   26.835764] RAWv6                     30KB         30KB
Feb  6 15:08:10 mail kernel: [   26.836221] UDPv6                     94KB         94KB
Feb  6 15:08:10 mail kernel: [   26.836631] mqueue_inode_cache         31KB         31KB
Feb  6 15:08:10 mail kernel: [   26.837034] UNIX                     382KB        382KB
Feb  6 15:08:10 mail kernel: [   26.837435] RAW                       32KB         32KB
Feb  6 15:08:10 mail kernel: [   26.837808] hugetlbfs_inode_cache         30KB         30KB
Feb  6 15:08:10 mail kernel: [   26.838234] eventpoll_pwq             47KB         47KB
Feb  6 15:08:10 mail kernel: [   26.838639] request_queue            411KB        506KB
Feb  6 15:08:10 mail kernel: [   26.839075] biovec-max               480KB        480KB

Going in to the Proxmox web interface and attach the console to see if there is any output on the terminal revealed some kind of memory issues:

Fix: Disable the option for memory hotplug!

hotplug: disk,network,usb,cpu # was additionally with memory
pve-donna.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
pve-emil.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
pve-franz.cluster: ii  pve-qemu-kvm   9.0.2-4      amd64        Full virtualization on x86 hardware 
pve-gustav.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
pve-hans.cluster: ii  pve-qemu-kvm   8.1.5-5      amd64        Full virtualization on x86 hardware
general/ipkm_it_post_mortem.1738919033.txt.gz · Last modified: 2025-02-07 10:03 by Markus Rosenstihl