aha/drivers/md
Daniel Kobras c06aad854f [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
we experienced frequent deadlock of the system under high i/o load.  'cat
/dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
after a few GB, 'cp' would be left in 'D' state along with kjournald and
kmirrord.  The functions cp and kjournald were blocked in did vary, but
kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
that this problem has been around even before.

So much for the facts, here's my interpretation: mempool_alloc() first tries
to atomically allocate the requested memory, or falls back to hand out
preallocated chunks from the mempool.  If both fail, it puts the calling
process (kmirrord in this case) on a private waitqueue until somebody refills
the pool.  Where the only 'somebody' is kmirrord itself, so we have a
deadlock.

I worked around this problem by falling back to a (blocking) kmalloc when
before kmirrord would have ended up on the waitqueue.  This defeats part of
the benefits of using the mempool, but at least keeps the system running.  And
it could be done with a two-line change.  Note that mempool_alloc() clears the
GFP_NOIO flag internally, and only uses it to decide whether to wait or return
an error if immediate allocation fails, so the attached patch doesn't change
behaviour in the non-deadlocking case.  Path is against current git
(2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
2.6.15, where this patch makes the difference between random lockup and a
stable system.

Signed-off-by: Daniel Kobras <kobras@linux.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27 11:01:28 -07:00
..
raid6test [PATCH] RAID6 Altivec fix 2005-09-17 11:49:58 -07:00
.gitignore gitignore: misc files 2006-01-01 22:21:50 +01:00
bitmap.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
dm-bio-list.h [PATCH] device-mapper snapshot: bio_list fix 2005-11-22 09:14:31 -08:00
dm-bio-record.h
dm-crypt.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-emc.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-exception-store.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-hw-handler.c BUG_ON() Conversion in md/dm-hw-handler.c 2006-03-24 18:36:27 +01:00
dm-hw-handler.h
dm-io.c [PATCH] mempool: use common mempool kmalloc allocator 2006-03-26 08:56:59 -08:00
dm-io.h [PATCH] device-mapper: remove unused definition 2006-01-06 08:34:00 -08:00
dm-ioctl.c [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed 2006-06-26 12:25:08 -07:00
dm-linear.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-log.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-log.h
dm-mpath.c [PATCH] dm: BUG/OOPS fix 2006-08-14 12:54:29 -07:00
dm-mpath.h
dm-path-selector.c BUG_ON() Conversion in md/dm-path-selector.c 2006-03-26 18:21:58 +02:00
dm-path-selector.h
dm-raid1.c [PATCH] dm: Fix deadlock under high i/o load in raid1 setup. 2006-08-27 11:01:28 -07:00
dm-round-robin.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-snap.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
dm-snap.h [PATCH] device-mapper snapshot: load metadata on creation 2006-02-01 08:53:10 -08:00
dm-stripe.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-table.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-target.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm-zero.c [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
dm.c [PATCH] devfs: Last little devfs cleanups throughout the kernel tree. 2006-06-26 12:25:09 -07:00
dm.h [PATCH] dm: improve error message consistency 2006-06-26 09:58:36 -07:00
faulty.c [PATCH] md: allow array level to be set textually via sysfs 2006-01-06 08:34:09 -08:00
Kconfig [PATCH] md: Fix Kconfig error 2006-06-26 09:58:39 -07:00
kcopyd.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
kcopyd.h
linear.c [PATCH] md: Fix a bug that recently crept into md/linear 2006-08-06 08:57:46 -07:00
Makefile [PATCH] md: merge raid5 and raid6 code 2006-06-26 09:58:37 -07:00
md.c [PATCH] md: fix oops in error-handling 2006-07-10 13:24:17 -07:00
mktables.c
multipath.c [PATCH] mempool: use common mempool kzalloc allocator 2006-03-26 08:56:59 -08:00
raid0.c [PATCH] md: fix possible oops when starting a raid0 array 2006-05-23 10:35:31 -07:00
raid1.c [PATCH] md: include sector number in messages about corrected read errors 2006-07-10 13:24:17 -07:00
raid5.c [PATCH] md: include sector number in messages about corrected read errors 2006-07-10 13:24:17 -07:00
raid6.h [PATCH] RAID6 Altivec fix 2005-09-17 11:49:58 -07:00
raid6algos.c [PATCH] drivers/md/raid6algos.c: fix a NULL dereference 2006-06-23 07:43:08 -07:00
raid6altivec.uc [PATCH] RAID6 Altivec fix 2005-09-17 11:49:58 -07:00
raid6int.uc
raid6mmx.c
raid6recov.c
raid6sse1.c
raid6sse2.c
raid6x86.h
raid10.c [PATCH] md: include sector number in messages about corrected read errors 2006-07-10 13:24:17 -07:00
unroll.pl
xor.c