aha/fs
Al Viro 8f7b0ba1c8 Fix inotify watch removal/umount races
Inotify watch removals suck violently.

To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex.  That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount.  We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.

Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done.  Cleanup is just deactivate_super().

However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords.  That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.

We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e.  that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().

So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong.  We had
to drop ih->mutex before we could grab ->s_umount.  So the watch
could've been gone already.

That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer.  If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address.  That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that.  Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.

So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check.  If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.

And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 12:26:44 -08:00
..
9p 9p: fix format warning 2008-10-22 18:48:45 -05:00
adfs vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
affs vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
afs [PATCH] fix ->llseek for more directories 2008-10-23 05:13:21 -04:00
autofs vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
autofs4 autofs4: collect version check return 2008-11-06 15:41:17 -08:00
befs befs: annotate fs32 on tests for superblock endianness 2008-10-16 11:21:46 -07:00
bfs [PATCH] fix ->llseek for more directories 2008-10-23 05:13:21 -04:00
cifs cifs: fix renaming one hardlink on top of another 2008-11-03 18:31:05 +00:00
coda Switch to a valid email address... 2008-10-27 08:40:17 -07:00
configfs [PATCH] assorted path_lookup() -> kern_path() conversions 2008-10-23 05:12:52 -04:00
cramfs cramfs: fix named-pipe handling 2008-08-20 15:40:32 -07:00
debugfs integrity: special fs magic 2008-10-13 09:47:43 +11:00
devpts vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
dlm dlm: fix shutdown cleanup 2008-11-13 13:22:34 -06:00
ecryptfs ecryptfs: fix memory corruption when storing crypto info in xattrs 2008-10-30 11:38:46 -07:00
efs [PATCH] switch all filesystems over to d_obtain_alias 2008-10-23 05:13:01 -04:00
exportfs [PATCH] prepare vfs_readdir() callers to returning filldir result 2008-10-23 05:13:10 -04:00
ext2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev 2008-10-23 10:23:07 -07:00
ext3 ext3: Clean up outdated and incorrect comment for ext3_write_super() 2008-11-12 17:17:17 -08:00
ext4 ext4: add checksum calculation when clearing UNINIT flag in ext4_new_inode 2008-11-07 09:21:01 -05:00
fat fat: i_blocks warning fix 2008-11-06 15:41:22 -08:00
freevxfs
fuse saner FASYNC handling on file close 2008-11-01 09:49:46 -07:00
gfs2 [PATCH] switch all filesystems over to d_obtain_alias 2008-10-23 05:13:01 -04:00
hfs [PATCH] move executable checking into ->permission() 2008-10-23 05:13:25 -04:00
hfsplus [PATCH] move executable checking into ->permission() 2008-10-23 05:13:25 -04:00
hostfs [PATCH] introduce fmode_t, do annotations 2008-10-21 07:47:06 -04:00
hpfs [PATCH] hpfs: cleanup ->setattr 2008-10-23 05:12:58 -04:00
hppfs
hugetlbfs vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
isofs [PATCH] switch all filesystems over to d_obtain_alias 2008-10-23 05:13:01 -04:00
jbd jbd: don't give up looking for space so easily in __log_wait_for_space 2008-11-06 22:37:59 -05:00
jbd2 jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space 2008-11-06 22:38:07 -05:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2008-11-06 15:43:13 -08:00
jfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev 2008-10-23 10:23:07 -07:00
lockd NLM: Set address family before calling nlm_host_rebooted() 2008-10-30 17:19:30 -04:00
minix
ncpfs
nfs NFS: Convert nfs_attr_generation_counter into an atomic_long 2008-10-28 15:21:40 -04:00
nfs_common
nfsd Fix nfsd truncation of readdir results 2008-11-09 15:15:50 -05:00
nls remove CONFIG_KMOD from fs 2008-10-17 02:38:36 +11:00
ntfs [PATCH] switch all filesystems over to d_obtain_alias 2008-10-23 05:13:01 -04:00
ocfs2 ocfs2: Check search result in ocfs2_xattr_block_get() 2008-11-10 09:51:47 -08:00
omfs [PATCH] fix ->llseek for more directories 2008-10-23 05:13:21 -04:00
openpromfs [PATCH] fix ->llseek for more directories 2008-10-23 05:13:21 -04:00
partitions [PATCH] sanitize blkdev_get() and friends 2008-10-21 07:49:06 -04:00
proc Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc 2008-11-03 09:59:01 -08:00
qnx4
ramfs Ramfs and Ram Disk pages are unevictable 2008-10-20 08:50:26 -07:00
reiserfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev 2008-10-23 10:23:07 -07:00
romfs
smbfs
sysfs [PATCH] fix ->llseek for more directories 2008-10-23 05:13:21 -04:00
sysv
ubifs Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 2008-10-20 09:19:03 -07:00
udf [PATCH] get rid of on-stack dentry in udf 2008-10-23 05:13:15 -04:00
ufs [PATCH] fix ->llseek for more directories 2008-10-23 05:13:21 -04:00
xfs [XFS] XFS: Check for valid transaction headers in recovery 2008-11-10 18:01:50 +11:00
aio.c
anon_inodes.c
attr.c [patch] vfs: make security_inode_setattr() calling consistent 2008-10-23 05:13:27 -04:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge branch 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-10-20 13:19:56 -07:00
binfmt_elf_fdpic.c binfmt_elf_fdpic: Update for cputime changes. 2008-10-20 20:17:18 -07:00
binfmt_em86.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_flat.c uclinux: fix gzip header parsing in binfmt_flat.c 2008-10-16 11:21:29 -07:00
binfmt_misc.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_script.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_som.c binfmt_som.c: add MODULE_LICENSE 2008-10-16 11:21:38 -07:00
bio-integrity.c block: Introduce integrity data ownership flag 2008-10-09 08:56:21 +02:00
bio.c block: mark bio_split_pool static 2008-10-09 08:57:05 +02:00
block_dev.c block: fix __blkdev_get() for removable devices 2008-11-06 08:41:56 +01:00
buffer.c fs: buffer lock use lock bitops 2008-10-20 08:52:32 -07:00
char_dev.c [PATCH] tidy up chrdev_open 2008-10-23 05:12:59 -04:00
compat.c select: deal with math overflow from borderline valid userland data 2008-10-26 11:22:08 -07:00
compat_binfmt_elf.c
compat_ioctl.c
dcache.c [PATCH] fs: add a sanity check in d_free 2008-10-23 05:17:12 -04:00
dcookies.c
direct-io.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
dnotify.c
dquot.c [PATCH] switch quota_on-related stuff to kern_path() 2008-10-23 05:12:44 -04:00
drop_caches.c
eventfd.c
eventpoll.c epoll: avoid double-inserts in case of EFAULT 2008-10-26 12:09:49 -07:00
exec.c coredump: format_corename: don't append .%pid if multi-threaded 2008-10-20 08:52:39 -07:00
fcntl.c
fifo.c [PATCH] introduce fmode_t, do annotations 2008-10-21 07:47:06 -04:00
file.c
file_table.c saner FASYNC handling on file close 2008-11-01 09:49:46 -07:00
filesystems.c proc: move /proc/filesystems to fs/filesystems.c 2008-10-23 14:27:09 +04:00
fs-writeback.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
generic_acl.c
inode.c fs/inode.c: properly init address_space->writeback_index 2008-08-15 08:35:44 -07:00
inotify.c Fix inotify watch removal/umount races 2008-11-15 12:26:44 -08:00
inotify_user.c saner FASYNC handling on file close 2008-11-01 09:49:46 -07:00
internal.h
ioctl.c provide generic_block_fiemap() only with BLOCK=y 2008-10-12 11:44:37 -07:00
ioprio.c fix setpriority(PRIO_PGRP) thread iterator breakage 2008-08-20 15:40:32 -07:00
Kconfig [patch 1/3] FS_MBCACHE: don't needlessly make it built-in 2008-10-23 05:13:26 -04:00
Kconfig.binfmt add CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS 2008-10-20 08:52:39 -07:00
libfs.c fs: remove prepare_write/commit_write 2008-10-30 11:38:45 -07:00
locks.c Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc 2008-10-23 12:04:37 -07:00
Makefile fat: move fs/vfat/* and fs/msdos/* to fs/fat 2008-11-06 15:41:20 -08:00
mbcache.c
mpage.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
namei.c [PATCH] move executable checking into ->permission() 2008-10-23 05:13:25 -04:00
namespace.c vfs: fix shrink_submounts 2008-11-12 17:17:17 -08:00
nfsctl.c
no-block.c
open.c [PATCH] introduce fmode_t, do annotations 2008-10-21 07:47:06 -04:00
pipe.c saner FASYNC handling on file close 2008-11-01 09:49:46 -07:00
pnode.c
pnode.h
posix_acl.c
quota.c
quota_v1.c
quota_v2.c
read_write.c [PATCH] generic_file_llseek tidyups 2008-10-23 05:12:59 -04:00
read_write.h
readdir.c [PATCH] prepare vfs_readdir() callers to returning filldir result 2008-10-23 05:13:10 -04:00
select.c select: deal with math overflow from borderline valid userland data 2008-10-26 11:22:08 -07:00
seq_file.c seq_file: add seq_cpumask_list(), seq_nodemask_list() 2008-10-20 08:52:39 -07:00
signalfd.c
splice.c fs: remove prepare_write/commit_write 2008-10-30 11:38:45 -07:00
stack.c
stat.c
super.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev 2008-10-23 10:23:07 -07:00
sync.c
timerfd.c hrtimer: convert timerfd to the new hrtimer apis 2008-09-05 21:35:09 -07:00
utimes.c
xattr.c
xattr_acl.c