/dev/mapper/vg0-root filled-up 100%

Hey all,

Our OES 2018 SP2 /dev/mapper/vg0-root partition is filled-up to 100% suddenly since yesterday.

Yesterday i was able to fix it by doing command find / -size  +500M -ls which shows me
a 4 GB ndsd.log file.

After removing that file, all seems to be fine again - until this morning.

Root partition is again 100% used but i can't find any big files in /var or /tmp folders.

Because of the non existing free space OES is not able to start some services which is very bad.

Any ideas how to solve it?

Cheers,

Tom

  •  du -amx /|sort -n|tail

    or increase disk

    Everyone is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid. [A. Einstein]

  • du -amx /|sort -n|tail doen'st show me any big files

    i guess increasing disc would not help because it runs again full

    i have 2 similar OES2018 SP2 Servers runiing, the other one has same disk sizes and use 6.3G of 20G for the root partition

  • yep, this means top 10 big size are directories,

    remove "tail" from the command

    du -amx /|sort -n

    you should notice some files in that list

    Everyone is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid. [A. Einstein]

  • Unforutnately that don't helps me out

  • a more systemic way is to see what folders have what in them

    du -hx --max-depth=1 

    will give you the folder totals for context you are in, excluding additional file systems (so BTRFS defeats that strength)  just remove the x to include all file system mounts.

    I start at the root and work my way through each suspect folder that appears larger than it should be.   compare with the other servers as OES boxes should be fairly uniform in that respect.

    --

    Andy of KonecnyConsulting.ca in Toronto

  • I startet now a backup KVM from the server to compare the files and folders.

    See the results:

    Backup from July 2021:

    / # du -hx --max-depth=1
    4.6M ./bin
    12M ./sbin
    20M ./tmp
    6.4G ./var
    2.8G ./opt
    19M ./lib64
    4.0K ./mnt
    152K ./home
    4.0K ./selinux
    16K ./lost+found
    765M ./lib
    125M ./root
    18M ./etc
    32K ./media
    2.3G ./usr
    52M ./srv
    13G .

    / # df -h
    Filesystem Size Used Avail Use% Mounted on
    devtmpfs 8.3G 0 8.3G 0% /dev
    tmpfs 8.3G 56K 8.3G 1% /dev/shm
    tmpfs 8.3G 9.5M 8.3G 1% /run
    tmpfs 8.3G 0 8.3G 0% /sys/fs/cgroup
    /dev/mapper/****--****--*--vg0-root 20G 13G 6.0G 68% /
    /dev/sda1 250M 92M 146M 39% /boot
    tmpfs 1.7G 0 1.7G 0% /run/user/480
    admin 4.0M 0 4.0M 0% /_admin
    tmpfs 1.7G 0 1.7G 0% /run/user/0

    and now the actual running KVM (between July and now i don't do any updates or something):

    :/ # du -hx --max-depth=1
    4.6M ./bin
    12M ./sbin
    12M ./tmp
    4.2G ./var
    3.3G ./opt
    19M ./lib64
    4.0K ./mnt
    152K ./home
    4.0K ./selinux
    16K ./lost+found
    817M ./lib
    964K ./root
    18M ./etc
    8.0K ./media
    2.4G ./usr
    52M ./srv
    11G .

    :/ # df -h
    Filesystem Size Used Avail Use% Mounted on
    devtmpfs 7.9G 0 7.9G 0% /dev
    tmpfs 7.9G 52K 7.9G 1% /dev/shm
    tmpfs 7.9G 34M 7.8G 1% /run
    tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
    /dev/mapper/****--****--*--vg0-root 27G 20G 6.0G 77% /
    /dev/vda1 250M 93M 145M 40% /boot
    tmpfs 1.6G 0 1.6G 0% /run/user/480
    admin 4.0M 0 4.0M 0% /_admin
    tmpfs 1.6G 0 1.6G 0% /run/user/0

    So the big question is why does the vg0-root mapper shows me 20G when there only 11G are showing with du ???

  • If you delete e.g. a logfile and do NOT bounce the corresponding daemon the diskspace won't get released despite of the fact that the file as such doesn't get shown anymore.

  • ok thanks for the information - what is your recommendation?
    i reboot the server two times this week (not because of that problem),
    should a reboot not solve it also?
    how can i figure out which daemon still cause this?

  • A reboot should clean things up, of course.

    You can see such files per e.g.

    lsof +L1

    oder

    lsof |grep deleted

    Sometimes the "culprit" is something you don't think of in the first place, such as syslog.

  • Verified Answer

    Ok finally i made it now.

    I‘ve got a couple of NSS volumes which are mounted to /media.

    One of the volumes do daily Backups but it seems that 9 days ago this volume lose however his mountpoint. So the system has started to write files local in the media folder and not to the NSS volume. This was really hard to figure out because seemly the NSS volume had remount after one day automatically again and so the local saved files were „invisible“.

    However thanks for your input and support, tomorrow i try my luck with the iManager problem.