Big news! The community will be moving to a new platform April 21. Read more.
Big news! The community will be moving to a new platform April 21. Read more.
Absent Member.
Absent Member.
1221 views

OES 11 SP 1 server constant crashes

I have an OES 11 SP 1 server fully patched as of yesterday. It is constantly crashing after about an hour of service. It is actually running Kanaka (I have worked with Condrey support and their stance is the issue is a Linux/Novell issue). At the time it dies various services begin to invoke oom kill events until everything on the server is down. Configuration is as follows:

OES 11 SP 1
Server is a VM (VMware ESXi 5.0)
6GB of RAM
4 CPUs
Swap partition is /dev/sda2/ and is 2GB

output from free -k shows swap never getting used, but the oom kill invokes appear to be because no swap is available.

I have included a small piece of the log file at the moment it crashes the logfile includes nothing else 15 minutes prior to the crash event.

Sep 11 08:40:41 dcam03n kernel: [ 3183.753474] vmtoolsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Sep 11 08:40:41 dcam03n kernel: [ 3183.753478] vmtoolsd cpuset=/ mems_allowed=0
Sep 11 08:40:41 dcam03n kernel: [ 3183.753482] Pid: 2891, comm: vmtoolsd Not tainted 2.6.32.59-0.7-default #1
Sep 11 08:40:41 dcam03n kernel: [ 3183.753483] Call Trace:
Sep 11 08:40:41 dcam03n kernel: [ 3183.753498] [<ffffffff810061dc>] dump_trace+0x6c/0x2d0
Sep 11 08:40:41 dcam03n kernel: [ 3183.753505] [<ffffffff8139bab6>] dump_stack+0x69/0x73
Sep 11 08:40:41 dcam03n kernel: [ 3183.753511] [<ffffffff810b90ec>] oom_kill_process+0xcc/0x2f0
Sep 11 08:40:41 dcam03n kernel: [ 3183.753515] [<ffffffff810b9770>] __out_of_memory+0x50/0xa0
Sep 11 08:40:41 dcam03n kernel: [ 3183.753519] [<ffffffff810b9958>] out_of_memory+0x198/0x210
Sep 11 08:40:41 dcam03n kernel: [ 3183.753522] [<ffffffff810bcf16>] __alloc_pages_slowpath+0x4b6/0x5f0
Sep 11 08:40:41 dcam03n kernel: [ 3183.753526] [<ffffffff810bd18a>] __alloc_pages_nodemask+0x13a/0x140
Sep 11 08:40:41 dcam03n kernel: [ 3183.753531] [<ffffffff810c065e>] __do_page_cache_readahead+0xce/0x220
Sep 11 08:40:42 dcam03n kernel: [ 3183.753536] [<ffffffff810c07cc>] ra_submit+0x1c/0x30
Sep 11 08:40:42 dcam03n kernel: [ 3183.753539] [<ffffffff810b73a3>] filemap_fault+0x3c3/0x3d0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753544] [<ffffffff810d00a7>] __do_fault+0x57/0x520
Sep 11 08:40:42 dcam03n kernel: [ 3183.753547] [<ffffffff810d4a29>] handle_mm_fault+0x199/0x430
Sep 11 08:40:42 dcam03n kernel: [ 3183.753553] [<ffffffff813a121f>] do_page_fault+0x1bf/0x3e0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753558] [<ffffffff8139eb3f>] page_fault+0x1f/0x30
Sep 11 08:40:42 dcam03n kernel: [ 3183.753577] [<00007fa819a441e6>] 0x7fa819a441e6
Sep 11 08:40:42 dcam03n kernel: [ 3183.753578] Mem-Info:
Sep 11 08:40:42 dcam03n kernel: [ 3183.753579] Node 0 DMA per-cpu:
Sep 11 08:40:42 dcam03n kernel: [ 3183.753582] CPU 0: hi: 0, btch: 1 usd: 0
Sep 11 08:40:42 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:42 dcam03n kernel: [ 3183.753583] CPU 1: hi: 0, btch: 1 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753585] CPU 2: hi: 0, btch: 1 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753586] CPU 3: hi: 0, btch: 1 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753587] Node 0 DMA32 per-cpu:
Sep 11 08:40:42 dcam03n kernel: [ 3183.753589] CPU 0: hi: 186, btch: 31 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753590] CPU 1: hi: 186, btch: 31 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753592] CPU 2: hi: 186, btch: 31 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753593] CPU 3: hi: 186, btch: 31 usd: 0
Sep 11 08:40:42 dcam03n kernel: [ 3183.753594] Node 0 Normal per-cpu:
Sep 11 08:40:42 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:43 dcam03n kernel: [ 3183.753596] CPU 0: hi: 186, btch: 31 usd: 30
Sep 11 08:40:43 dcam03n kernel: [ 3183.753597] CPU 1: hi: 186, btch: 31 usd: 0
Sep 11 08:40:43 dcam03n kernel: [ 3183.753599] CPU 2: hi: 186, btch: 31 usd: 30
Sep 11 08:40:43 dcam03n kernel: [ 3183.753600] CPU 3: hi: 186, btch: 31 usd: 0
Sep 11 08:40:43 dcam03n kernel: [ 3183.753604] active_anon:978359 inactive_anon:244663 isolated_anon:64
Sep 11 08:40:43 dcam03n kernel: [ 3183.753605] active_file:52 inactive_file:0 isolated_file:64
Sep 11 08:40:43 dcam03n kernel: [ 3183.753606] unevictable:0 dirty:0 writeback:66 unstable:0
Sep 11 08:40:43 dcam03n kernel: [ 3183.753606] free:9249 slab_reclaimable:3944 slab_unreclaimable:12688
Sep 11 08:40:43 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:43 dcam03n kernel: [ 3183.753607] mapped:995 shmem:873 pagetables:8521 bounce:0
Sep 11 08:40:44 dcam03n kernel: [ 3183.753609] Node 0 DMA free:15452kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15100kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 11 08:40:44 dcam03n kernel: [ 3183.753616] lowmem_reserve[]: 0 3000 6030 6030
Sep 11 08:40:44 dcam03n kernel: [ 3183.753619] Node 0 DMA32 free:16960kB min:4936kB low:6168kB high:7404kB active_anon:2349284kB inactive_anon:587532kB active_file:212kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:40kB mapped:0kB shmem:0kB slab_reclaimable:24kB slab_unreclaimable:404kB kernel_stack:56kB pagetables:7688kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep 11 08:40:44 dcam03n kernel: [ 3183.753627] lowmem_reserve[]: 0 0 3030 3030
Sep 11 08:40:44 dcam03n kernel: [ 3183.753629] Node 0 Normal free:4584kB min:4984kB low:6228kB high:7476kB active_anon:1564152kB inactive_anon:391120kB active_file:0kB inactive_file:20kB unevictable:0kB isolated(anon):256kB isolated(file):256kB present:3102720kB mlocked:0kB dirty:0kB writeback:224kB mapped:4060kB shmem:3492kB slab_reclaimable:15752kB slab_unreclaimable:50348kB kernel_stack:8160kB pagetables:26396kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep 11 08:40:44 dcam03n kernel: [ 3183.753637] lowmem_reserve[]: 0 0 0 0
Sep 11 08:40:44 dcam03n kernel: [ 3183.753639] Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15452kB
Sep 11 08:40:44 dcam03n kernel: [ 3183.753645] Node 0 DMA32: 12*4kB 11*8kB 17*16kB 6*32kB 0*64kB 3*128kB 1*256kB 3*512kB 4*1024kB 3*2048kB 1*4096kB = 17112kB
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n kernel: [ 3183.753652] Node 0 Normal: 124*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 4592kB
Sep 11 08:40:44 dcam03n kernel: [ 3183.753658] 4603 total pagecache pages
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n kernel: [ 3183.753659] 3619 pages in swap cache
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n kernel: [ 3183.753660] Swap cache stats: add 557234, delete 553615, find 6674/9771
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n kernel: [ 3183.753662] Free swap = 0kB
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly

Any help would be greatly appreciated. Another note is that the same issue occurred on a server running OES 2 SP 3 prior to upgrading
Labels (2)
0 Likes
3 Replies
Absent Member.
Absent Member.

Another note the server never really goes into 100% mem utilization. never more than 30%.
0 Likes
Knowledge Partner Knowledge Partner
Knowledge Partner

slrico;2218687 wrote:
Another note the server never really goes into 100% mem utilization. never more than 30%.


Have you tried setting the kernel setting for vm.lower_zone_protection?

VMware KB: RHEL4 virtual machines running Oracle/Java randomly kill processes by OOM killer

Also check that there are no (to) low memory limits set on the VM's settings. Also check how what the resource allocation is reporting in VMware as your stats within the VM seem ok.

atop on the OES server might reveal more of what's going on at memory level of the VM.

-Willem
0 Likes
Absent Member.
Absent Member.

Willem,

Yes i had attemtped to lower zone protection to no avail. I just uncommented the section again to see if it made a difference because i think I attempted pre-patching to update the server. My concern is this.:

Sep 11 08:40:44 dcam03n kernel: [ 3183.753662] Free swap = 0kB
Sep 11 08:40:44 dcam03n [XTCOM]: novell-xsrvd: Server re-started after it terminated unexpectedly
Sep 11 08:40:44 dcam03n kernel: [ 3183.753662] Total swap = 2104312kB

It says free swap is 0kb. Total swap 2Gb. I never see the swap getting used even after it dies. it always shows used as 0. Just can't figure this out.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.