aha/mm
KOSAKI Motohiro 8891d6da17 mm: remove lru_add_drain_all() from the munlock path
lockdep warns about following message at boot time on one of my test
machine.  Then, schedule_on_each_cpu() sholdn't be called when the task
have mmap_sem.

Actually, lru_add_drain_all() exist to prevent the unevictalble pages
stay on reclaimable lru list.  but currenct unevictable code can rescue
unevictable pages although it stay on reclaimable list.

So removing is better.

In addition, this patch add lru_add_drain_all() to sys_mlock() and
sys_mlockall().  it isn't must.  but it reduce the failure of moving to
unevictable list.  its failure can rescue in vmscan later.  but reducing
is better.

Note, if above rescuing happend, the Mlocked and the Unevictable field
mismatching happend in /proc/meminfo.  but it doesn't cause any real
trouble.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.28-rc2-mm1 #2
-------------------------------------------------------
lvm/1103 is trying to acquire lock:
 (&cpu_hotplug.lock){--..}, at: [<c0130789>] get_online_cpus+0x29/0x50

but task is already holding lock:
 (&mm->mmap_sem){----}, at: [<c01878ae>] sys_mlockall+0x4e/0xb0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){----}:
       [<c0153da2>] check_noncircular+0x82/0x110
       [<c0185e6a>] might_fault+0x4a/0xa0
       [<c0156161>] validate_chain+0xb11/0x1070
       [<c0185e6a>] might_fault+0x4a/0xa0
       [<c0156923>] __lock_acquire+0x263/0xa10
       [<c015714c>] lock_acquire+0x7c/0xb0			(*) grab mmap_sem
       [<c0185e6a>] might_fault+0x4a/0xa0
       [<c0185e9b>] might_fault+0x7b/0xa0
       [<c0185e6a>] might_fault+0x4a/0xa0
       [<c0294dd0>] copy_to_user+0x30/0x60
       [<c01ae3ec>] filldir+0x7c/0xd0
       [<c01e3a6a>] sysfs_readdir+0x11a/0x1f0			(*) grab sysfs_mutex
       [<c01ae370>] filldir+0x0/0xd0
       [<c01ae370>] filldir+0x0/0xd0
       [<c01ae4c6>] vfs_readdir+0x86/0xa0			(*) grab i_mutex
       [<c01ae75b>] sys_getdents+0x6b/0xc0
       [<c010355a>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

-> #2 (sysfs_mutex){--..}:
       [<c0153da2>] check_noncircular+0x82/0x110
       [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
       [<c0156161>] validate_chain+0xb11/0x1070
       [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
       [<c0156923>] __lock_acquire+0x263/0xa10
       [<c015714c>] lock_acquire+0x7c/0xb0			(*) grab sysfs_mutex
       [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
       [<c04f8b55>] mutex_lock_nested+0xa5/0x2f0
       [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
       [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
       [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
       [<c01e422f>] create_dir+0x3f/0x90
       [<c01e42a9>] sysfs_create_dir+0x29/0x50
       [<c04faaf5>] _spin_unlock+0x25/0x40
       [<c028f21d>] kobject_add_internal+0xcd/0x1a0
       [<c028f37a>] kobject_set_name_vargs+0x3a/0x50
       [<c028f41d>] kobject_init_and_add+0x2d/0x40
       [<c019d4d2>] sysfs_slab_add+0xd2/0x180
       [<c019d580>] sysfs_add_func+0x0/0x70
       [<c019d5dc>] sysfs_add_func+0x5c/0x70			(*) grab slub_lock
       [<c01400f2>] run_workqueue+0x172/0x200
       [<c014008f>] run_workqueue+0x10f/0x200
       [<c0140bd0>] worker_thread+0x0/0xf0
       [<c0140c6c>] worker_thread+0x9c/0xf0
       [<c0143c80>] autoremove_wake_function+0x0/0x50
       [<c0140bd0>] worker_thread+0x0/0xf0
       [<c0143972>] kthread+0x42/0x70
       [<c0143930>] kthread+0x0/0x70
       [<c01042db>] kernel_thread_helper+0x7/0x1c
       [<ffffffff>] 0xffffffff

-> #1 (slub_lock){----}:
       [<c0153d2d>] check_noncircular+0xd/0x110
       [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
       [<c0156161>] validate_chain+0xb11/0x1070
       [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
       [<c015433d>] mark_lock+0x35d/0xd00
       [<c0156923>] __lock_acquire+0x263/0xa10
       [<c015714c>] lock_acquire+0x7c/0xb0
       [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
       [<c04f93a3>] down_read+0x43/0x80
       [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0		(*) grab slub_lock
       [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
       [<c04fd9ac>] notifier_call_chain+0x3c/0x70
       [<c04f5454>] _cpu_up+0x84/0x110
       [<c04f552b>] cpu_up+0x4b/0x70				(*) grab cpu_hotplug.lock
       [<c06d1530>] kernel_init+0x0/0x170
       [<c06d15e5>] kernel_init+0xb5/0x170
       [<c06d1530>] kernel_init+0x0/0x170
       [<c01042db>] kernel_thread_helper+0x7/0x1c
       [<ffffffff>] 0xffffffff

-> #0 (&cpu_hotplug.lock){--..}:
       [<c0155bff>] validate_chain+0x5af/0x1070
       [<c040f7e0>] dev_status+0x0/0x50
       [<c0156923>] __lock_acquire+0x263/0xa10
       [<c015714c>] lock_acquire+0x7c/0xb0
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c04f8b55>] mutex_lock_nested+0xa5/0x2f0
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c017bc30>] lru_add_drain_per_cpu+0x0/0x10
       [<c0130789>] get_online_cpus+0x29/0x50			(*) grab cpu_hotplug.lock
       [<c0140cf2>] schedule_on_each_cpu+0x32/0xe0
       [<c0187095>] __mlock_vma_pages_range+0x85/0x2c0
       [<c0156945>] __lock_acquire+0x285/0xa10
       [<c0188f09>] vma_merge+0xa9/0x1d0
       [<c0187450>] mlock_fixup+0x180/0x200
       [<c0187548>] do_mlockall+0x78/0x90			(*) grab mmap_sem
       [<c01878e1>] sys_mlockall+0x81/0xb0
       [<c010355a>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by lvm/1103:
 #0:  (&mm->mmap_sem){----}, at: [<c01878ae>] sys_mlockall+0x4e/0xb0

stack backtrace:
Pid: 1103, comm: lvm Not tainted 2.6.28-rc2-mm1 #2
Call Trace:
 [<c01555fc>] print_circular_bug_tail+0x7c/0xd0
 [<c0155bff>] validate_chain+0x5af/0x1070
 [<c040f7e0>] dev_status+0x0/0x50
 [<c0156923>] __lock_acquire+0x263/0xa10
 [<c015714c>] lock_acquire+0x7c/0xb0
 [<c0130789>] get_online_cpus+0x29/0x50
 [<c04f8b55>] mutex_lock_nested+0xa5/0x2f0
 [<c0130789>] get_online_cpus+0x29/0x50
 [<c0130789>] get_online_cpus+0x29/0x50
 [<c017bc30>] lru_add_drain_per_cpu+0x0/0x10
 [<c0130789>] get_online_cpus+0x29/0x50
 [<c0140cf2>] schedule_on_each_cpu+0x32/0xe0
 [<c0187095>] __mlock_vma_pages_range+0x85/0x2c0
 [<c0156945>] __lock_acquire+0x285/0xa10
 [<c0188f09>] vma_merge+0xa9/0x1d0
 [<c0187450>] mlock_fixup+0x180/0x200
 [<c0187548>] do_mlockall+0x78/0x90
 [<c01878e1>] sys_mlockall+0x81/0xb0
 [<c010355a>] syscall_call+0x7/0xb

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-12 17:17:16 -08:00
..
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c misc: replace __FUNCTION__ with __func__ 2008-10-16 11:21:30 -07:00
bounce.c highmem: use bio_has_data() in the bounce path 2008-10-09 08:56:01 +02:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
filemap.c fs: remove prepare_write/commit_write 2008-10-30 11:38:45 -07:00
filemap_xip.c mm: xip/ext2 fix block allocation race 2008-08-20 15:40:32 -07:00
fremap.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
highmem.c x86, pat: avoid highmem cache attribute aliasing 2008-08-15 17:22:57 +02:00
hugetlb.c hugetlb: make unmap_ref_private multi-size-aware 2008-11-12 17:17:16 -08:00
internal.h hugetlb: pull gigantic page initialisation out of the default path 2008-11-06 15:41:18 -08:00
Kconfig Unevictable LRU Infrastructure 2008-10-20 08:50:26 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
Makefile memcg: allocate all page_cgroup at boot 2008-10-20 08:52:39 -07:00
memcontrol.c memcg: fix page_cgroup allocation 2008-10-23 08:55:02 -07:00
memory.c mm: remove duplicated #include's 2008-10-20 16:17:42 -07:00
memory_hotplug.c memory hotplug: release memory regions in PAGES_PER_SECTION chunks 2008-10-20 08:52:32 -07:00
mempolicy.c mm: move migrate_prep out from under mmap_sem 2008-11-06 15:41:18 -08:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm: move migrate_prep out from under mmap_sem 2008-11-06 15:41:18 -08:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c mm: remove lru_add_drain_all() from the munlock path 2008-11-12 17:17:16 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c parisc: fix find_extend_vma() breakage 2008-11-12 10:37:48 -08:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mremap.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c nfsd: fix vm overcommit crash 2008-10-30 11:38:47 -07:00
oom_kill.c mm/oom_kill.c: fix badness() kerneldoc 2008-11-06 15:41:19 -08:00
page-writeback.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
page_alloc.c cpusets: update mems allowed in page allocator 2008-11-12 17:17:16 -08:00
page_cgroup.c memcg: fix page_cgroup allocation 2008-10-23 08:55:02 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c mm: size of quicklists shouldn't be proportional to the number of CPUs 2008-09-02 19:21:38 -07:00
readahead.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
rmap.c make mm/rmap.c:anon_vma_cachep static 2008-10-20 08:52:40 -07:00
shmem.c nfsd: fix vm overcommit crash 2008-10-30 11:38:47 -07:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
slab.c proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c 2008-10-23 15:20:06 +04:00
slob.c SLOB: fix bogus ksize calculation fix 2008-10-09 12:18:27 -07:00
slub.c proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c 2008-10-23 15:20:06 +04:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c mm/sparse.c: removed duplicated include 2008-08-12 16:07:30 -07:00
swap.c swap: cull unevictable pages in fault path 2008-10-20 08:52:31 -07:00
swap_state.c mm: pagecache insertion fewer atomics 2008-10-20 08:52:31 -07:00
swapfile.c mm: page lock use lock bitops 2008-10-20 08:52:32 -07:00
thrash.c
tiny-shmem.c Export tiny shmem_file_setup for DRM-GEM 2008-10-20 16:17:42 -07:00
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c vmap: cope with vm_unmap_aliases before vmalloc_init() 2008-11-07 10:05:59 +01:00
vmscan.c mm: unlockless reclaim 2008-10-20 08:52:32 -07:00
vmstat.c proc: move /proc/zoneinfo boilerplate to mm/vmstat.c 2008-10-23 17:35:04 +04:00