aha/fs
Kentaro Makita da3bbdd463 fix soft lock up at NFS mount via per-SB LRU-list of unused dentries
[Summary]

 Split LRU-list of unused dentries to one per superblock to avoid soft
 lock up during NFS mounts and remounting of any filesystem.

 Previously I posted here:
 http://lkml.org/lkml/2008/3/5/590

[Descriptions]

- background

  dentry_unused is a list of dentries which are not referenced.
  dentry_unused grows up when references on directories or files are
  released.  This list can be very long if there is huge free memory.

- the problem

  When shrink_dcache_sb() is called, it scans all dentry_unused linearly
  under spin_lock(), and if dentry->d_sb is differnt from given
  superblock, scan next dentry.  This scan costs very much if there are
  many entries, and very ineffective if there are many superblocks.

  IOW, When we need to shrink unused dentries on one dentry, but scans
  unused dentries on all superblocks in the system.  For example, we scan
  500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
  dentries on other superblocks.

  In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
  unused dentries on NFS, but scans 100,000,000 unused dentries on
  superblocks in the system such as local ext3 filesystems.  I hear NFS
  mounting took 1 min on some system in use.

* : NFS uses virtual filesystem in rpc layer, so NFS is affected by
  this problem.

  100,000,000 is possible number on large systems.

  Per-superblock LRU of unused dentried can reduce the cost in
  reasonable manner.

- How to fix

  I found this problem is solved by David Chinner's "Per-superblock
  unused dentry LRU lists V3"(1), so I rebase it and add some fix to
  reclaim with fairness, which is in Andrew Morton's comments(2).

  1) http://lkml.org/lkml/2006/5/25/318
  2) http://lkml.org/lkml/2006/5/25/320

  Split LRU-list of unused dentries to each superblocks.  Then, NFS
  mounting will check dentries under a superblock instead of all.  But
  this spliting will break LRU of dentry-unused.  So, I've attempted to
  make reclaim unused dentrins with fairness by calculate number of
  dentries to scan on this sb based on following way

  number of dentries to scan on this sb =
  count * (number of dentries on this sb / number of dentries in the machine)

- ToDo
 - I have to measuring performance number and do stress tests.

 - When unmount occurs during prune_dcache(), scanning on same
  superblock, It is unable to reach next superblock because it is gone
  away.  We restart scannig superblock from first one, it causes
  unfairness of reclaim unused dentries on first superblock.  But I think
  this happens very rarely.

- Test Results

  Result on 6GB boxes with excessive unused dentries.

Without patch:

$ cat /proc/sys/fs/dentry-state
10181835        10180203        45      0       0       0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real    0m1.830s
user    0m0.001s
sys     0m1.653s

 With this patch:
$ cat /proc/sys/fs/dentry-state
10236610        10234751        45      0       0       0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real    0m0.106s
user    0m0.002s
sys     0m0.032s

[akpm@linux-foundation.org: fix comments]
Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Chinner <dgc@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
..
9p 9p: fix O_APPEND in legacy mode 2008-07-03 09:59:03 -05:00
adfs fs: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
affs [PATCH] fix reservation discarding in affs 2008-05-06 13:45:33 -04:00
afs Fix various old email addresses for dwmw2 2008-06-06 11:29:10 -07:00
autofs mount options: fix autofs 2008-02-08 09:22:40 -08:00
autofs4 autofs: path_{get,put}() cleanups 2008-05-01 08:04:01 -07:00
befs byteorder: don't directly include linux/byteorder/generic.h 2008-05-16 12:01:45 -07:00
bfs fs: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
cifs Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
coda device create: coda: convert device_create to device_create_drvdata 2008-07-21 21:54:41 -07:00
configfs configfs: Allow ->make_item() and ->make_group() to return detailed errors. 2008-07-17 15:21:29 -07:00
cramfs fs: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:16:44 -04:00
debugfs debugfs: Implement debugfs_remove_recursive() 2008-07-21 21:54:59 -07:00
devpts devpts: factor out PTY index allocation 2008-04-30 08:29:48 -07:00
dlm configfs: Allow ->make_item() and ->make_group() to return detailed errors. 2008-07-17 15:21:29 -07:00
ecryptfs Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
efs efs: update error msg to not refer to deleted read_inode() 2008-04-02 15:28:19 -07:00
exportfs fs: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
ext2 ext2: retry block allocation if new blocks are allocated from system zone 2008-04-28 08:58:43 -07:00
ext3 ext3: add missing unlock to error path in ext3_quota_write() 2008-07-04 10:40:05 -07:00
ext4 ext4: do not set extents feature from the kernel 2008-07-11 19:27:31 -04:00
fat Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
freevxfs fs/freevxfs/: proper externs 2008-04-29 08:06:00 -07:00
fuse fuse: fix thinko in max I/O size calucation 2008-06-17 18:08:10 -07:00
gfs2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw 2008-07-15 10:38:46 -07:00
hfs hfs: fix warning with 64k PAGE_SIZE 2008-04-30 08:29:52 -07:00
hfsplus Fix hfsplus oops on image without extents 2008-05-13 08:02:24 -07:00
hostfs uml: fix hostfs tv_usec calculations 2008-02-05 09:44:30 -08:00
hpfs mount options: fix hpfs 2008-02-08 09:22:40 -08:00
hppfs fix hppfs Makefile breakage 2008-05-21 16:55:58 -07:00
hugetlbfs mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
isofs isofs: fix access to unallocated memory when reading corrupted filesystem 2008-04-30 08:29:33 -07:00
jbd jbd: need to hold j_state_lock to updates to transaction t_state to T_COMMIT 2008-05-14 19:11:14 -07:00
jbd2 ext4: Add ordered mode support for delalloc 2008-07-11 19:27:31 -04:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2008-05-01 11:15:28 -07:00
jfs jfs: remove DIRENTSIZ 2008-06-10 15:12:58 -05:00
lockd Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux 2008-07-20 21:21:46 -07:00
minix iget: stop the MINIX filesystem from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
msdos Replace BKL with superblock lock in fat/msdos/vfat 2008-06-20 14:05:54 -06:00
ncpfs Remove BKL from remote_llseek v2 2008-07-02 15:06:27 -06:00
nfs Merge branch 'bkl-removal' into next 2008-07-15 18:34:58 -04:00
nfs_common
nfsd Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux 2008-07-20 21:21:46 -07:00
nls
ntfs ntfs: le*_add_cpu conversion 2008-05-24 09:56:08 -07:00
ocfs2 configfs: Allow ->make_item() and ->make_group() to return detailed errors. 2008-07-17 15:21:29 -07:00
openpromfs iget: stop OPENPROMFS from using iget() and read_inode() 2008-02-07 08:42:29 -08:00
partitions driver core: remove KOBJ_NAME_LEN define 2008-07-21 21:54:52 -07:00
proc mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00
qnx4 iget: stop QNX4 from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
ramfs ramfs: enable splice write 2008-07-04 09:52:14 +02:00
reiserfs reiserfs: discard prealloc in reiserfs_delete_inode 2008-07-08 12:39:31 -07:00
romfs ROMFS: Fix up an error in iget removal 2008-03-19 18:53:36 -07:00
smbfs Remove BKL from remote_llseek v2 2008-07-02 15:06:27 -06:00
sysfs driver core: Suppress sysfs warnings for device_rename(). 2008-07-21 21:55:01 -07:00
sysv sysv: [bl]e*_add_cpu conversion 2008-04-30 08:29:52 -07:00
ubifs UBIFS: include to compilation 2008-07-15 17:35:24 +03:00
udf udf: Fix regression in UDF anchor block detection 2008-06-24 11:38:03 +02:00
ufs ufs: remove unneeded ufs_put_inode prototype 2008-05-13 08:02:23 -07:00
vfat Replace BKL with superblock lock in fat/msdos/vfat 2008-06-20 14:05:54 -06:00
xfs Fix reference counting race on log buffers 2008-07-11 11:37:18 -07:00
aio.c uml: activate_mm: remove the dead PF_BORROWED_MM check 2008-06-06 11:36:22 -07:00
anon_inodes.c [PATCH] sanitize anon_inode_getfd() 2008-05-01 13:08:50 -04:00
attr.c
bad_inode.c iget: introduce a function to register iget failure 2008-02-07 08:42:26 -08:00
binfmt_aout.c fs/binfmt_aout.c: use printk_ratelimit() 2008-04-29 08:06:04 -07:00
binfmt_elf.c execve filename: document and export via auxiliary vector 2008-07-22 09:59:40 -07:00
binfmt_elf_fdpic.c nommu: fix ksize() abuse 2008-06-06 11:29:13 -07:00
binfmt_em86.c binfmt_misc.c: avoid potential kernel stack overflow 2008-04-29 08:06:04 -07:00
binfmt_flat.c nommu: fix ksize() abuse 2008-06-06 11:29:13 -07:00
binfmt_misc.c binfmt_misc.c: avoid potential kernel stack overflow 2008-04-29 08:06:04 -07:00
binfmt_script.c binfmt_misc.c: avoid potential kernel stack overflow 2008-04-29 08:06:04 -07:00
binfmt_som.c [PATCH] sanitize handling of shared descriptor tables in failing execve() 2008-04-25 09:23:53 -04:00
bio-integrity.c block: integrity checkpatch cleanups 2008-07-03 13:21:13 +02:00
bio.c Add bvec_merge_data to handle stacked devices and ->merge_bvec() 2008-07-03 13:21:15 +02:00
block_dev.c [PATCH] fix cgroup-inflicted breakage in block_dev.c 2008-06-23 08:30:55 -04:00
buffer.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
char_dev.c Remove the lock_kernel() call from chrdev_open() 2008-06-20 14:05:53 -06:00
compat.c [PATCH] get rid of leak in compat_execve() 2008-05-16 17:23:05 -04:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2008-07-19 00:30:39 -07:00
dcache.c fix soft lock up at NFS mount via per-SB LRU-list of unused dentries 2008-07-24 10:47:15 -07:00
dcookies.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
direct-io.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
dnotify.c [PATCH] split linux/file.h 2008-05-01 13:08:16 -04:00
dquot.c quota: don't call sync_fs() from vfs_quota_off() when there's no quota turn off 2008-05-13 08:02:23 -07:00
drop_caches.c vfs: skip inodes without pages to free in drop_pagecache_sb() 2008-04-29 08:06:05 -07:00
eventfd.c [PATCH] sanitize anon_inode_getfd() 2008-05-01 13:08:50 -04:00
eventpoll.c [PATCH] sanitize anon_inode_getfd() 2008-05-01 13:08:50 -04:00
exec.c mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
fcntl.c Call fasync() functions without the BKL 2008-07-02 15:06:28 -06:00
fifo.c
file.c [PATCH] avoid multiplication overflows and signedness issues for max_fds 2008-05-16 17:22:52 -04:00
file_table.c [PATCH] split linux/file.h 2008-05-01 13:08:16 -04:00
filesystems.c
fs-writeback.c VFS: export sync_sb_inodes 2008-07-14 19:10:52 +03:00
generic_acl.c
inode.c VFS: fix unused variable warning 2008-05-06 13:13:37 -07:00
inotify.c
inotify_user.c Remove duplicated unlikely() in IS_ERR() 2008-04-29 08:06:25 -07:00
internal.h [PATCH] move a bunch of declarations to fs/internal.h 2008-04-21 23:11:01 -04:00
ioctl.c make vfs_ioctl() static 2008-04-29 08:06:00 -07:00
ioprio.c
Kconfig Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 2008-07-17 10:55:51 -07:00
Kconfig.binfmt frv: don't offer BINFMT_FLAT 2008-06-06 11:29:08 -07:00
libfs.c add kernel-doc for simple_read_from_buffer and memory_read_from_buffer 2008-07-04 10:40:07 -07:00
locks.c [patch 4/4] flock: remove unused fields from file_lock_operations 2008-06-23 11:52:30 -04:00
Makefile Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6 2008-07-16 15:02:57 -07:00
mbcache.c vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
mpage.c vfs: add hooks for ext4's delayed allocation support 2008-07-11 19:27:31 -04:00
namei.c [patch 3/4] vfs: fix ERR_PTR abuse in generic_readlink 2008-06-23 11:52:30 -04:00
namespace.c LSM/SELinux: show LSM mount options in /proc/mounts 2008-07-14 15:02:05 +10:00
nfsctl.c Introduce path_put() 2008-02-14 21:13:33 -08:00
no-block.c
open.c security: filesystem capabilities: fix fragile setuid fixup code 2008-07-04 10:40:08 -07:00
pipe.c [patch 1/4] vfs: path_{get,put}() cleanups 2008-06-23 11:52:29 -04:00
pnode.c [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
pnode.h [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
posix_acl.c
quota.c quota: quota core changes for quotaon on remount 2008-04-28 08:58:33 -07:00
quota_v1.c quota: do not allow setting of quota limits to too high values 2008-04-28 08:58:32 -07:00
quota_v2.c quota: le*_add_cpu conversion 2008-04-30 08:29:51 -07:00
read_write.c Remove BKL from remote_llseek v2 2008-07-02 15:06:27 -06:00
read_write.h
readdir.c
select.c Fix performance regression on lmbench select benchmark 2008-06-22 12:23:15 -07:00
seq_file.c [patch 2/7] vfs: mountinfo: add seq_file_root() 2008-04-23 00:04:38 -04:00
signalfd.c [PATCH] sanitize anon_inode_getfd() 2008-05-01 13:08:50 -04:00
splice.c splice: fix generic_file_splice_read() race with page invalidation 2008-07-04 09:52:14 +02:00
stack.c
stat.c Introduce path_put() 2008-02-14 21:13:33 -08:00
super.c fix soft lock up at NFS mount via per-SB LRU-list of unused dentries 2008-07-24 10:47:15 -07:00
sync.c vfs: fix unconditional write_super() call in file_fsync() 2008-04-29 08:06:06 -07:00
timerfd.c [PATCH] sanitize anon_inode_getfd() 2008-05-01 13:08:50 -04:00
utimes.c [patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens() 2008-06-23 08:43:52 -04:00
xattr.c xattr: add missing consts to function arguments 2008-04-29 08:06:06 -07:00
xattr_acl.c