aha/fs
NeilBrown d3374825ce md: make devices disappear when they are no longer needed.
Currently md devices, once created, never disappear until the module
is unloaded.  This is essentially because the gendisk holds a
reference to the mddev, and the mddev holds a reference to the
gendisk, this a circular reference.

If we drop the reference from mddev to gendisk, then we need to ensure
that the mddev is destroyed when the gendisk is destroyed.  However it
is not possible to hook into the gendisk destruction process to enable
this.

So we drop the reference from the gendisk to the mddev and destroy the
gendisk when the mddev gets destroyed.  However this has a
complication.
Between the call
   __blkdev_get->get_gendisk->kobj_lookup->md_probe
and the call
   __blkdev_get->md_open

there is no obvious way to hold a reference on the mddev any more, so
unless something is done, it will disappear and gendisk will be
destroyed prematurely.

Also, once we decide to destroy the mddev, there will be an unlockable
moment before the gendisk is unlinked (blk_unregister_region) during
which a new reference to the gendisk can be created.  We need to
ensure that this reference can not be used.  i.e. the ->open must
fail.

So:
 1/  in md_probe we set a flag in the mddev (hold_active) which
     indicates that the array should be treated as active, even
     though there are no references, and no appearance of activity.
     This is cleared by md_release when the device is closed if it
     is no longer needed.
     This ensures that the gendisk will survive between md_probe and
     md_open.

 2/  In md_open we check if the mddev we expect to open matches
     the gendisk that we did open.
     If there is a mismatch we return -ERESTARTSYS and modify
     __blkdev_get to retry from the top in that case.
     In the -ERESTARTSYS sys case we make sure to wait until
     the old gendisk (that we succeeded in opening) is really gone so
     we loop at most once.

Some udev configurations will always open an md device when it first
appears.   If we allow an md device that was just created by an open
to disappear on an immediate close, then this can race with such udev
configurations and result in an infinite loop the device being opened
and closed, then re-open due to the 'ADD' even from the first open,
and then close and so on.
So we make sure an md device, once created by an open, remains active
at least until some md 'ioctl' has been made on it.  This means that
all normal usage of md devices will allow them to disappear promptly
when not needed, but the worst that an incorrect usage will do it
cause an inactive md device to be left in existence (it can easily be
removed).

As an array can be stopped by writing to a sysfs attribute
  echo clear > /sys/block/mdXXX/md/array_state
we need to use scheduled work for deleting the gendisk and other
kobjects.  This allows us to wait for any pending gendisk deletion to
complete by simply calling flush_scheduled_work().



Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:10 +11:00
..
9p Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
adfs vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
afs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
autofs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
autofs4 autofs4: fix string validation check order 2009-01-06 15:59:23 -08:00
befs befs: ensure fast symlinks are NUL-terminated 2008-12-31 18:07:40 -05:00
bfs bfs: check that filesystem fits on the blockdevice 2009-01-06 15:59:31 -08:00
cifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
coda add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
configfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
cramfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
debugfs debugfs: add helpers for exporting a size_t simple value 2009-01-07 10:00:16 -08:00
devpts zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2009-01-05 19:02:09 -08:00
ecryptfs fs/ecryptfs/inode.c: cleanup kerneldoc 2009-01-06 15:59:22 -08:00
efs [PATCH] switch all filesystems over to d_obtain_alias 2008-10-23 05:13:01 -04:00
exportfs Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
ext2 nfsd race fixes: ext2 2008-12-31 18:07:43 -05:00
ext3 ext3: Add default allocation routines for quota structures 2009-01-05 08:40:25 -08:00
ext4 percpu_counter: FBC_BATCH should be a variable 2009-01-06 15:59:13 -08:00
fat Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 2008-12-30 20:33:34 -08:00
freevxfs freevxfs: ensure fast symlinks are NUL-terminated 2008-12-31 18:07:40 -05:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2009-01-06 17:01:20 -08:00
gfs2 GFS2: Fix typo in gfs_page_mkwrite() 2009-01-07 08:58:28 +00:00
hfs CRED: Wrap task credential accesses in the HFS filesystem 2008-11-14 10:38:54 +11:00
hfsplus CRED: Wrap task credential accesses in the HFSplus filesystem 2008-11-14 10:38:54 +11:00
hostfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
hpfs CRED: Wrap task credential accesses in the HPFS filesystem 2008-11-14 10:38:55 +11:00
hppfs CRED: Use creds in file structs 2008-11-14 10:39:25 +11:00
hugetlbfs hugetlb: unsigned ret cannot be negative 2009-01-06 15:59:08 -08:00
isofs isofs check for NULL ->i_op in root directory is dead code 2009-01-05 11:53:38 -05:00
jbd jbd: don't give up looking for space so easily in __log_wait_for_space 2008-11-06 22:37:59 -05:00
jbd2 jbd2: Add buffer triggers 2009-01-05 08:40:30 -08:00
jffs2 fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
jfs fix the treatment of jfs special inodes 2009-01-05 11:54:29 -05:00
lockd NLM: Clean up flow of control in make_socks() function 2009-01-07 15:40:44 -05:00
minix minix: fix add link's wrong position calculation 2009-01-06 15:59:27 -08:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-01-07 11:31:52 -08:00
nfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
nfs_common SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only 2008-12-23 15:21:32 -05:00
nfsd nfsd: last_byte_offset 2009-01-07 17:38:31 -05:00
nls remove CONFIG_KMOD from fs 2008-10-17 02:38:36 +11:00
notify inotify: fix type errors in interfaces 2009-01-05 11:54:29 -05:00
ntfs ntfs: don't NULL i_op 2009-01-05 11:54:27 -05:00
ocfs2 trivial: fix then -> than typos in comments and documentation 2009-01-06 11:28:06 +01:00
omfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
openpromfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
partitions block: struct device - replace bus_id with dev_name(), dev_set_name() 2009-01-06 10:44:43 -08:00
proc Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc 2009-01-07 12:01:06 -08:00
qnx4
ramfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
reiserfs Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 2009-01-05 18:32:43 -08:00
romfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
smbfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
sysfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
sysv sysv: ensure fast symlinks are NUL-terminated 2008-12-31 18:07:39 -05:00
ubifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-01-07 11:31:52 -08:00
udf Merge branch 'master' into next 2008-12-04 17:16:36 +11:00
ufs CRED: Wrap task credential accesses in the UFS filesystem 2008-11-14 10:39:04 +11:00
xfs trivial: fix then -> than typos in comments and documentation 2009-01-06 11:28:06 +01:00
aio.c aio: make the lookup_ioctx() lockless 2008-12-29 08:29:50 +01:00
anon_inodes.c anon_inodes: use fops->owner for module refcount 2008-12-31 16:55:44 +02:00
attr.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
bad_inode.c kill ->dir_notify() 2008-12-31 18:07:43 -05:00
binfmt_aout.c sanitize ifdefs in binfmt_aout 2009-01-03 11:45:54 -08:00
binfmt_elf.c Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 2008-12-28 12:33:21 -08:00
binfmt_elf_fdpic.c CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
binfmt_em86.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_flat.c CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
binfmt_misc.c fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/status 2009-01-06 15:59:19 -08:00
binfmt_script.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_som.c CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
bio-integrity.c bio: allow individual slabs in the bio_set 2008-12-29 08:29:23 +01:00
bio.c bio: get rid of bio_vec clearing 2008-12-29 08:29:53 +01:00
block_dev.c md: make devices disappear when they are no longer needed. 2009-01-09 08:31:10 +11:00
buffer.c block_write_begin(): remove useless goto 2009-01-06 15:59:08 -08:00
char_dev.c fs: fix name overwrite in __register_chrdev_region() 2009-01-06 15:59:13 -08:00
compat.c add missing accounting calls to compat_sys_{readv,writev} 2009-01-06 15:59:13 -08:00
compat_binfmt_elf.c
compat_ioctl.c
dcache.c filp_cachep can be static in fs/file_table.c 2008-12-31 18:07:42 -05:00
dcookies.c shrink struct dentry 2008-12-31 18:07:38 -05:00
direct-io.c fs: truncate blocks outside i_size after O_DIRECT write error 2009-01-06 15:59:06 -08:00
dquot.c quota: Export dquot_alloc() and dquot_destroy() functions 2009-01-05 08:40:25 -08:00
drop_caches.c
eventfd.c
eventpoll.c epoll: introduce resource usage limits 2008-12-01 19:55:24 -08:00
exec.c fs/exec.c: make do_coredump() void 2009-01-06 15:59:29 -08:00
fcntl.c Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
fifo.c [PATCH] introduce fmode_t, do annotations 2008-10-21 07:47:06 -04:00
file.c
file_table.c filp_cachep can be static in fs/file_table.c 2008-12-31 18:07:42 -05:00
filesystems.c vfs: remove duplicate code in get_fs_type() 2009-01-05 11:54:29 -05:00
fs-writeback.c fs: sys_sync fix 2009-01-06 15:59:09 -08:00
generic_acl.c
inode.c async: make the final inode deletion an asynchronous event 2009-01-07 08:47:24 -08:00
internal.h CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
ioctl.c GFS2: Support for FIEMAP ioctl 2009-01-05 07:38:46 +00:00
ioprio.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
Kconfig fs: use menuconfig to control the Misc. filesystems menu 2009-01-06 15:59:12 -08:00
Kconfig.binfmt add CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS 2008-10-20 08:52:39 -07:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
locks.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
Makefile quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
mbcache.c
mpage.c do_mpage_readpage(): remove useless clear_buffer_mapped() call 2009-01-06 15:59:01 -08:00
namei.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
namespace.c fs/namespace.c: drop code after return 2008-12-31 18:07:38 -05:00
nfsctl.c pass a struct path * to may_open 2008-12-31 18:07:41 -05:00
no-block.c
open.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
pipe.c sanitize audit_fd_pair() 2009-01-04 15:14:41 -05:00
pnode.c
pnode.h
posix_acl.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
quota.c quota: Introduce DQUOT_QUOTA_SYS_FILE flag 2009-01-05 08:36:57 -08:00
quota_tree.c quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_tree.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_v1.c quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quota_v2.c quota: Convert union in mem_dqinfo to a pointer 2009-01-05 08:40:21 -08:00
quotaio_v1.h quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quotaio_v2.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
read_write.c vfs: lseek(fd, 0, SEEK_CUR) race condition 2009-01-05 11:53:07 -05:00
read_write.h
readdir.c [PATCH] prepare vfs_readdir() callers to returning filldir result 2008-10-23 05:13:10 -04:00
select.c poll: allow f_op->poll to sleep 2009-01-06 15:59:12 -08:00
seq_file.c Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-03 12:04:39 -08:00
signalfd.c
splice.c fs: remove prepare_write/commit_write 2008-10-30 11:38:45 -07:00
stack.c
stat.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
super.c async: make the final inode deletion an asynchronous event 2009-01-07 08:47:24 -08:00
sync.c mm: do_sync_mapping_range integrity fix 2009-01-06 15:59:00 -08:00
timerfd.c hrtimer: convert timerfd to the new hrtimer apis 2008-09-05 21:35:09 -07:00
utimes.c
xattr.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
xattr_acl.c