Commit graph

16364 commits

Author SHA1 Message Date
Al Viro
6de88d7292 kill __link_path_walk()/link_path_walk() distinction
put retry logics into path_walk() and do_filp_open()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:43 -05:00
Al Viro
258fa99905 lift path_put(path) to callers of __do_follow_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:43 -05:00
Al Viro
d231412db6 switch create_read_pipe() to alloc_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:43 -05:00
Al Viro
2c48b9c455 switch alloc_file() to passing struct path
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Al Viro
a95161aaa8 switch nilfs2 to deactivate_locked_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Al Viro
3d1e463158 get rid of init_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Al Viro
732741274d unexport get_empty_filp()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:41 -05:00
Al Viro
825f9692fb switched inotify_init1() to alloc_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:40 -05:00
Akinobu Mita
868d64812a qnx4: use hweight8
Use hweight8 instead of counting for each bit

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Anders Larsen <al@alarsen.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:18 -08:00
Anders Larsen
ca120b20c3 qnx4fs: remove remains of the (defunct) write support
commit 945ffe54bb ("qnx4: remove write support") removed the (defunct)
write support but missed a chunk of related, dead code.

Signed-off-by: Anders Larsen <al@alarsen.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:17 -08:00
Christoph Hellwig
5fe878ae7f direct-io: cleanup blockdev_direct_IO locking
Currently the locking in blockdev_direct_IO is a mess, we have three
different locking types and very confusing checks for some of them.  The
most complicated one is DIO_OWN_LOCKING for reads, which happens to not
actually be used.

This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
case is unused anyway, and the write side is almost identical to
DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
create argument for the get_blocks callback to zero, but we can easily
move that to the actual get_blocks callbacks.  There are four users of the
DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
fine with the new version, ocfs2 only errors out if create were ever set,
and we can remove this dead code now, the block device code only ever uses
create for an error message if we are fully beyond the device which can
never happen, and last but not least XFS will need the new behavour for
writes.

Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag.  Separate out the check for not allowing to fill holes into a
separate flag, although for now both flags always get set at the same
time.

Also revamp the documentation of the locking scheme to actually make
sense.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Elder <aelder@sgi.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Jeff Moyer
23aee091d8 dio: don't zero out the pages array inside struct dio
Intel reported a performance regression caused by the following commit:

commit 848c4dd515
Author: Zach Brown <zach.brown@oracle.com>
Date:   Mon Aug 20 17:12:01 2007 -0700

    dio: zero struct dio with kzalloc instead of manually

    This patch uses kzalloc to zero all of struct dio rather than
    manually trying to track which fields we rely on being zero.  It
    passed aio+dio stress testing and some bug regression testing on
    ext3.

    This patch was introduced by Linus in the conversation that lead up
    to Badari's minimal fix to manually zero .map_bh.b_state in commit:

      6a648fa721

    It makes the code a bit smaller.  Maybe a couple fewer cachelines to
    load, if we're lucky:

       text    data     bss     dec     hex filename
    3285925  568506 1304616 5159047  4eb887 vmlinux
    3285797  568506 1304616 5158919  4eb807 vmlinux.patched

    I was unable to measure a stable difference in the number of cpu
    cycles spent in blockdev_direct_IO() when pushing aio+dio 256K reads
    at ~340MB/s.

    So the resulting intent of the patch isn't a performance gain but to
    avoid exposing ourselves to the risk of finding another field like
    .map_bh.b_state where we rely on zeroing but don't enforce it in the
    code.

Zach surmised that zeroing out the page array was what caused most of
the problem, and suggested the approach taken in the attached patch for
resolving the issue.  Intel re-tested with this patch and saw a 0.6%
performance gain (the original regression was 0.5%).

[akpm@linux-foundation.org: add comment]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Shaohua Li
fac046ad0b aio: remove unused field
Don't know the reason, but it appears ki_wait field of iocb never gets used.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
David Howells
ea58ceb543 FS-Cache: Avoid maybe-used-uninitialised warning on variable
Andrew Morton's compiler sees the following warning in FS-Cache:

fs/fscache/object-list.c: In function 'fscache_objlist_lookup':
fs/fscache/object-list.c:94: warning: 'obj' may be used uninitialized in this function

which my compiler doesn't.  This is a false positive as obj can only be
used in the comparison against minobj if minobj has been set to something
other than NULL, but for that to happen, obj has to be first set to
something.

Deal with this by preclearing obj too.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Christoph Hellwig
698ba7b5a3 elf: kill USE_ELF_CORE_DUMP
Currently all architectures but microblaze unconditionally define
USE_ELF_CORE_DUMP.  The microblaze omission seems like an error to me, so
let's kill this ifdef and make sure we are the same everywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Simek <michal.simek@petalogix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Zhaolei
1d81a181e0 fatfs: use common time_to_tm in fat_time_unix2fat()
It is not necessary to write custom code for convert calendar time to
broken-down time.  time_to_tm() is more generic to do that.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Akinobu Mita
f4c54fcf3a hpfs: use bitmap_weight()
Use bitmap_weight instead of doing hweight32 for each 32bit in bitmap.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Akinobu Mita
c2923c3a3e hpfs: use hweight32
Use hweight32 instead of counting for each bit

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Alexey Dobriyan
e3c96f53ac reiserfs: don't compile procfs.o at all if no support
* small define cleanup in header
* fix #ifdeffery in procfs.c via Kconfig

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Alexey Dobriyan
904e812931 reiserfs: remove /proc/fs/reiserfs/version
/proc/fs/reiserfs/version is on the way of removing ->read_proc interface.
 It's empty however, so simply remove it instead of doing dummy
conversion.  It's hard to see what information userspace can extract from
empty file.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Alexey Dobriyan
f3e2a520f5 ufs: NFS support
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Alexey Dobriyan
080497079c ufs: pass qstr instead of dentry where necessary for NFS
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Jan Kara
48bde86df0 ext2: report metadata errors during fsync
When an IO error happens while writing metadata buffers, we should better
report it and call ext2_error since the filesystem is probably no longer
consistent.  Sometimes such IO errors happen while flushing thread does
background writeback, the buffer gets later evicted from memory, and thus
the only trace of the error remains as AS_EIO bit set in blockdevice's
mapping.  So we check this bit in ext2_fsync and report the error although
we cannot be really sure which buffer we failed to write.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Theodore Ts'o
7bf0dc9b0c ext2: avoid WARN() messages when failing to write to the superblock
This fixes a common warning reported by kerneloops.org

[Kernel summit hacking hour]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:05 -08:00
Ian Kent
213614d583 autofs4: always use lookup for lookup
We need to be able to cope with the directory mutex being held during
->d_revalidate() in some cases, but not all cases, and not necessarily by
us.  Because we need to release the mutex when we call back to the daemon
to do perform a mount we must be sure that it is us who holds the mutex so
we must redirect mount requests to ->lookup() if the mutex is held.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
cb4b492ac7 autofs4: rename dentry to expiring in autofs4_lookup_expiring()
In autofs4_lookup_expiring() a declaration within the list traversal loop
uses a declaration that has the same name as the function parameter.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
e4d5ade7b5 autofs4: rename dentry to active in autofs4_lookup_active()
In autofs4_lookup_active() a declaration within the list traversal loop
uses a declaration that has the same name as the function parameter.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
c42c7f7e69 autofs4: eliminate d_unhashed in path walk checks
We unhash the dentry (in a subsequent patch) in ->d_revalidate() in order
to send mount requests to ->lookup().  But then we can not rely on
d_unhased() to give reliable results because it may be called at any time
by any code path.  The d_unhashed() function is used by __simple_empty()
in the path walking callbacks but autofs mount point dentrys should have
no directories at all so a list_empty() on d_subdirs should be (and is)
sufficient.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
6510c9d859 autofs4: cleanup active and expire lookup
The lookup functions for active and expiring dentrys use parameters that
can be easily obtained on entry so we change the call to to take just the
dentry.  This makes the subsequent change, to send all lookups to
->lookup(), a bit cleaner.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
90387c9c1d autofs4: renamer unhashed to active in autofs4_lookup()
Rename the variable unhashed to active in autofs4_lookup() to better
reflect its usage.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
aa952eb26d autofs4: use autofs_info for pending flag
Eliminate the use of the d_lock spin lock by using the autofs super block
info spin lock.  This reduces the number of spin locks we use by one and
makes the code for the following patch (to redirect ->d_revalidate() to
->lookup()) a little simpler.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:58 -08:00
Ian Kent
36b6413ef3 autofs4: use helper function for need mount check
Define simple helper function for checking if we need to trigger a mount.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:57 -08:00
Ian Kent
c4cd70b3e3 autofs4: use helper functions for expiring list
Define some simple helper functions for adding and deleting entries on the
expiring dentry list.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:57 -08:00
Ian Kent
4f8427d190 autofs4: use helper functions for active list handling
Define some simple helper functions for adding and deleting entries on the
active (and unhashed) dentry list.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Yehuda Saheh <yehuda@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:57 -08:00
Alexey Dobriyan
135d5655dc proc: rename de_get() to pde_get() and inline it
* de_get() is trivial -- make inline, save a few bits of code, drop
  "refcount is 0" check -- it should be done in some generic refcount
  code, don't recall it's was helpful

* rename GET and PUT functions to pde_get(), pde_put() for cool prefix!

* remove obvious and incorrent comments

* in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to
  be normal one, remove_proc_entry() was supposed to do "-1" and code now
  reflects that.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:19:57 -08:00
Wu Fengguang
1a9b5b7fe0 mm: export stable page flags
Rename get_uflags() to stable_page_flags() and make it a global function
for use in the hwpoison page flags filter, which need to compare user
page flags with the value provided by user space.

Also move KPF_* to kernel-page-flags.h for use by user space tools.

Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16 12:19:59 +01:00
David Woodhouse
2e16cfca6e jffs2: Fix long-standing bug with symlink garbage collection.
Ever since jffs2_garbage_collect_metadata() was first half-written in
February 2001, it's been broken on architectures where 'char' is signed.
When garbage collecting a symlink with target length above 127, the payload
length would end up negative, causing interesting and bad things to happen.

Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-16 03:27:20 +00:00
Yan, Zheng
920bbbfb05 Btrfs: Rewrite btrfs_drop_extents
Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can
avoid calling lock_extent within transaction.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-15 21:24:52 -05:00
Yan, Zheng
ad48fd7546 Btrfs: Add btrfs_duplicate_item
btrfs_duplicate_item duplicates item with new key, guaranteeing
the source item and the new items are in the same tree leaf and
contiguous. It allows us to split file extent in place, without
using lock_extent to prevent bookend extent race.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-15 21:24:25 -05:00
Yan, Zheng
8cef4e160d Btrfs: Avoid superfluous tree-log writeout
We allow two log transactions at a time, but use same flag
to mark dirty tree-log btree blocks. So we may flush dirty
blocks belonging to newer log transaction when committing a
log transaction. This patch fixes the issue by using two
flags to mark dirty tree-log btree blocks.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-15 21:24:25 -05:00
Trond Myklebust
380454126f NFSv4: Fix a regression in the NFSv4 state manager
Commit 5601a00d67 (nfs: run state manager
in privileged mode) introduces a regression in the NFSv4 code when
compiled with CONFIG_NFS_V4_1. The calls to nfs4_end_drain_session()
from the main loop in nfs4_state_manager() Oops due to the lack of an
NFSv4.1 session when running NFSv4.0.

The fix is to move those two calls back into nfs41_init_clientid() and
nfs4_reset_session().

The calls to nfs4_end_drain_session() that remain inside
nfs4_state_manager() are safe, since the NFSv4.0 code will never set the
NFS4CLNT_SESSION_DRAINING bit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 17:36:57 -05:00
J. Bruce Fields
7663dacd92 nfsd: remove pointless paths in file headers
The new .h files have paths at the top that are now out of date.  While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:47 -05:00
J. Bruce Fields
1557aca790 nfsd: move most of nfsfh.h to fs/nfsd
Most of this can be trivially moved to a private header as well.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:46 -05:00
J. Bruce Fields
774b147828 nfsd: make V4ROOT exports read-only
I can't see any use for writeable V4ROOT exports.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:44 -05:00
Trond Myklebust
72211dbe72 NFSv4: Release the sequence id before restarting a CLOSE rpc call
If the CLOSE or OPEN_DOWNGRADE call triggers a state recovery, and has
to be resent, then we must release the seqid. Otherwise the open
recovery will wait for the close to finish, which causes a deadlock.

This is mainly a NFSv4.1 problem, although it can theoretically happen
with NFSv4.0 too, in a OPEN_DOWNGRADE situation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 14:47:36 -05:00
Steve Dickson
03a816b46d nfsd: restrict filehandles accepted in V4ROOT case
On V4ROOT exports, only accept filehandles that are the *root* of some
export.  This allows mountd to allow or deny access to individual
directories and symlinks on the pseudofilesystem.

Note that the checks in readdir and lookup are not enough, since a
malicious host with access to the network could guess filehandles that
they weren't able to obtain through lookup or readdir.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields
f2ca7153ca nfsd: allow exports of symlinks
We want to allow exports of symlinks, to allow mountd to communicate to
the kernel which symlinks lead to exports, and hence which symlinks need
to be visible on the pseudofilesystem.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields
3227fa41ab nfsd: filter readdir results in V4ROOT case
As with lookup, we treat every boject as a mountpoint and pretend it
doesn't exist if it isn't exported.

The preexisting code here is confusing, but I haven't yet figured out
how to make it clearer.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields
82ead7fe41 nfsd: filter lookup results in V4ROOT case
We treat every object as a mountpoint and pretend it doesn't exist if
it isn't exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:23 -05:00
J. Bruce Fields
3b6cee7bc4 nfsd4: don't continue "under" mounts in V4ROOT case
If /A/mount/point/ has filesystem "B" mounted on top of it, and if "A"
is exported, but not "B", then the nfs server has always returned to the
client a filehandle for the mountpoint, instead of for the root of "B",
allowing the client to see the subtree of "A" that would otherwise be
hidden by B.

Disable this behavior in the case of V4ROOT exports; we implement the
path restrictions of V4ROOT exports by treating *every* directory as if
it were a mountpoint, and allowing traversal *only* if the new directory
is exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:23 -05:00
Steve Dickson
eb4c86c6a5 nfsd: introduce export flag for v4 pseudoroot
NFSv4 differs from v2 and v3 in that it presents a single unified
filesystem tree, whereas v2 and v3 exported multiple filesystem (whose
roots could be found using a separate mount protocol).

Our original NFSv4 server implementation asked the administrator to
designate a single filesystem as the NFSv4 root, then to mount
filesystems they wished to export underneath.  (Often using bind mounts
of already-existing filesystems.)

This was conceptually simple, and allowed easy implementation, but
created a serious obstacle to upgrading between v2/v3: since the paths
to v4 filesystems were different, administrators would have to adjust
all the paths in client-side mount commands when switching to v4.

Various workarounds are possible.  For example, the administrator could
export "/" and designate it as the v4 root.  However, the security risks
of that approach are obvious, and in any case we shouldn't be requiring
the administrator to take extra steps to fix this problem; instead, the
server should present consistent paths across different versions by
default.

These patches take a modified version of that approach: we provide a new
export option which exports only a subset of a filesystem.  With this
flag, it becomes safe for mountd to export "/" by default, with no need
for additional configuration.

We begin just by defining the new flag.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:00:40 -05:00
Andy Adamson
68bf05efb7 nfs41: fix session fore channel negotiation
If the rsize or wsize is not set on the mount command, negotiate the highest
supported rsize and wsize in session creation.

Fixes a bug where the client negotiated nfs41_maxwrite_overhead as
ca_maxrequestsize and nfs41_maxread_overhead as ca_maxresponsesize resulting
in NFS4ERR_REQ_TOO_BIG errors on writes.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:42 -05:00
Andy Adamson
a5523b84c4 nfs41: do not zero seqid portion of stateid on close
Remove code left over from a previous minorversion draft.
which specified zeroing seqid portions of stateid's.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:36 -05:00
Alexandros Batsakis
5601a00d67 nfs: run state manager in privileged mode
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:23 -05:00
Alexandros Batsakis
b257957e50 nfs: make recovery state manager operations privileged
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:07 -05:00
Alexandros Batsakis
689cf5c15b nfs: enforce FIFO ordering of operations trying to acquire slot
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:55:18 -05:00
Alexandros Batsakis
40ead580ae nfs: remove rpc_task argument from nfs4_find_slot
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:51:22 -05:00
Alexandros Batsakis
afe6c27ccb nfs: change nfs4_do_setlk params to identify recovery type
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:50:32 -05:00
Alexandros Batsakis
0f7e720694 nfs: do not do a LOOKUP after open
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:50:21 -05:00
Alexandros Batsakis
3bfb0fc591 nfs: minor cleanup of session draining
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:50:01 -05:00
Linus Torvalds
d180ec5d34 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: event tracing support
  xfs: change the xfs_iext_insert / xfs_iext_remove
  xfs: cleanup bmap extent state macros
2009-12-15 09:12:43 -08:00
Joe Perches
7f2f4e72d3 fs/ubifs: use %pUB to print UUIDs
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:33 -08:00
Joe Perches
f0b34ae634 fs/gfs2/sys.c: use %pUB to print UUIDs
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:33 -08:00
Joe Perches
03daa57cdb fs/xfs/xfs_log_recover.c: use %pU to print UUIDs
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:33 -08:00
André Goddard Rosa
e7d2860b69 tree-wide: convert open calls to remove spaces to skip_spaces() lib function
Makes use of skip_spaces() defined in lib/string.c for removing leading
spaces from strings all over the tree.

It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
   text    data     bss     dec     hex filename
  64688     584     592   65864   10148 (TOTALS-BEFORE)
  64641     584     592   65817   10119 (TOTALS-AFTER)

Also, while at it, if we see (*str && isspace(*str)), we can be sure to
remove the first condition (*str) as the second one (isspace(*str)) also
evaluates to 0 whenever *str == 0, making it redundant. In other words,
"a char equals zero is never a space".

Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
and found occurrences of this pattern on 3 more files:
    drivers/leds/led-class.c
    drivers/leds/ledtrig-timer.c
    drivers/video/output.c

@@
expression str;
@@

( // ignore skip_spaces cases
while (*str &&  isspace(*str)) { \(str++;\|++str;\) }
|
- *str &&
isspace(*str)
)

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Julia Lawall <julia@diku.dk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:32 -08:00
Hiroshi Shimamoto
e4c570c4cb task_struct: make journal_info conditional
journal_info in task_struct is used in journaling file system only.  So
introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:27 -08:00
john stultz
4614a696bd procfs: allow threads to rename siblings via /proc/pid/tasks/tid/comm
Setting a thread's comm to be something unique is a very useful ability
and is helpful for debugging complicated threaded applications.  However
currently the only way to set a thread name is for the thread to name
itself via the PR_SET_NAME prctl.

However, there may be situations where it would be advantageous for a
thread dispatcher to be naming the threads its managing, rather then
having the threads self-describe themselves.  This sort of behavior is
available on other systems via the pthread_setname_np() interface.

This patch exports a task's comm via proc/pid/comm and
proc/pid/task/tid/comm interfaces, and allows thread siblings to write to
these values.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Mike Fulton <fultonm@ca.ibm.com>
Cc: Sean Foley <Sean_Foley@ca.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:24 -08:00
Steven J. Magnani
7e1e0ef22c procfs: use proper units for noMMU statm
On no-MMU systems, sizes reported in /proc/n/statm have units of bytes.
Per Documentation/filesystems/proc.txt, these values should be in pages.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:24 -08:00
Jie Zhang
ea63763959 nommu: fix malloc performance by adding uninitialized flag
The NOMMU code currently clears all anonymous mmapped memory.  While this
is what we want in the default case, all memory allocation from userspace
under NOMMU has to go through this interface, including malloc() which is
allowed to return uninitialized memory.  This can easily be a significant
performance penalty.  So for constrained embedded systems were security is
irrelevant, allow people to avoid clearing memory unnecessarily.

This also alters the ELF-FDPIC binfmt such that it obtains uninitialised
memory for the brk and stack region.

Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:24 -08:00
Naoya Horiguchi
5dc37642cb mm hugetlb: add hugepage support to pagemap
This patch enables extraction of the pfn of a hugepage from
/proc/pid/pagemap in an architecture independent manner.

Details
-------
My test program (leak_pagemap) works as follows:
 - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
 - read()/write() something on it,
 - call page-types with option -p,
 - munmap() and unlink() the file on hugetlbfs

Without my patches
------------------
$ ./leak_pagemap
             flags page-count       MB  symbolic-flags                     long-symbolic-flags
0x0000000000000000          1        0  __________________________________
0x0000000000000804          1        0  __R________M______________________ referenced,mmap
0x000000000000086c         81        0  __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
0x0000000000005808          5        0  ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
0x0000000000005868         12        0  ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c          1        0  __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
             total        101        0

The output of page-types don't show any hugepage.

With my patches
---------------
$ ./leak_pagemap
             flags page-count       MB  symbolic-flags                     long-symbolic-flags
0x0000000000000000          1        0  __________________________________
0x0000000000030000      51100      199  ________________TG________________ compound_tail,huge
0x0000000000028018        100        0  ___UD__________H_G________________ uptodate,dirty,compound_head,huge
0x0000000000000804          1        0  __R________M______________________ referenced,mmap
0x000000000000080c          1        0  __RU_______M______________________ referenced,uptodate,mmap
0x000000000000086c         80        0  __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
0x0000000000005808          4        0  ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
0x0000000000005868         12        0  ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c          1        0  __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
             total      51300      200

The output of page-types shows 51200 pages contributing to hugepages,
containing 100 head pages and 51100 tail pages as expected.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:24 -08:00
Amerigo Wang
ec81aecb29 hfs: fix a potential buffer overflow
A specially-crafted Hierarchical File System (HFS) filesystem could cause
a buffer overflow to occur in a process's kernel stack during a memcpy()
call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24).  The
attacker can provide the source buffer and length, and the destination
buffer is a local variable of a fixed length.  This local variable (passed
as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in
the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir().
Because the hfs_readdir() function executes upon any attempt to read a
directory on the filesystem, it gets called whenever a user attempts to
inspect any filesystem contents.

[amwang@redhat.com: modify this patch and fix coding style problems]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:10 -08:00
Christoph Hellwig
0b1b213fcf xfs: event tracing support
Convert the old xfs tracing support that could only be used with the
out of tree kdb and xfsidbg patches to use the generic event tracer.

To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
all xfs trace channels by:

   echo 1 > /sys/kernel/debug/tracing/events/xfs/enable

or alternatively enable single events by just doing the same in one
event subdirectory, e.g.

   echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable

or set more complex filters, etc. In Documentation/trace/events.txt
all this is desctribed in more detail.  To reads the events do a

   cat /sys/kernel/debug/tracing/trace

Compared to the last posting this patch converts the tracing mostly to
the one tracepoint per callsite model that other users of the new
tracing facility also employ.  This allows a very fine-grained control
of the tracing, a cleaner output of the traces and also enables the
perf tool to use each tracepoint as a virtual performance counter,
     allowing us to e.g. count how often certain workloads git various
     spots in XFS.  Take a look at

    http://lwn.net/Articles/346470/

for some examples.

Also the btree tracing isn't included at all yet, as it will require
additional core tracing features not in mainline yet, I plan to
deliver it later.

And the really nice thing about this patch is that it actually removes
many lines of code while adding this nice functionality:

 fs/xfs/Makefile                |    8
 fs/xfs/linux-2.6/xfs_acl.c     |    1
 fs/xfs/linux-2.6/xfs_aops.c    |   52 -
 fs/xfs/linux-2.6/xfs_aops.h    |    2
 fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
 fs/xfs/linux-2.6/xfs_buf.h     |   33
 fs/xfs/linux-2.6/xfs_fs_subr.c |    3
 fs/xfs/linux-2.6/xfs_ioctl.c   |    1
 fs/xfs/linux-2.6/xfs_ioctl32.c |    1
 fs/xfs/linux-2.6/xfs_iops.c    |    1
 fs/xfs/linux-2.6/xfs_linux.h   |    1
 fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
 fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
 fs/xfs/linux-2.6/xfs_super.c   |  104 ---
 fs/xfs/linux-2.6/xfs_super.h   |    7
 fs/xfs/linux-2.6/xfs_sync.c    |    1
 fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
 fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_vnode.h   |    4
 fs/xfs/quota/xfs_dquot.c       |  110 ---
 fs/xfs/quota/xfs_dquot.h       |   21
 fs/xfs/quota/xfs_qm.c          |   40 -
 fs/xfs/quota/xfs_qm_syscalls.c |    4
 fs/xfs/support/ktrace.c        |  323 ---------
 fs/xfs/support/ktrace.h        |   85 --
 fs/xfs/xfs.h                   |   16
 fs/xfs/xfs_ag.h                |   14
 fs/xfs/xfs_alloc.c             |  230 +-----
 fs/xfs/xfs_alloc.h             |   27
 fs/xfs/xfs_alloc_btree.c       |    1
 fs/xfs/xfs_attr.c              |  107 ---
 fs/xfs/xfs_attr.h              |   10
 fs/xfs/xfs_attr_leaf.c         |   14
 fs/xfs/xfs_attr_sf.h           |   40 -
 fs/xfs/xfs_bmap.c              |  507 +++------------
 fs/xfs/xfs_bmap.h              |   49 -
 fs/xfs/xfs_bmap_btree.c        |    6
 fs/xfs/xfs_btree.c             |    5
 fs/xfs/xfs_btree_trace.h       |   17
 fs/xfs/xfs_buf_item.c          |   87 --
 fs/xfs/xfs_buf_item.h          |   20
 fs/xfs/xfs_da_btree.c          |    3
 fs/xfs/xfs_da_btree.h          |    7
 fs/xfs/xfs_dfrag.c             |    2
 fs/xfs/xfs_dir2.c              |    8
 fs/xfs/xfs_dir2_block.c        |   20
 fs/xfs/xfs_dir2_leaf.c         |   21
 fs/xfs/xfs_dir2_node.c         |   27
 fs/xfs/xfs_dir2_sf.c           |   26
 fs/xfs/xfs_dir2_trace.c        |  216 ------
 fs/xfs/xfs_dir2_trace.h        |   72 --
 fs/xfs/xfs_filestream.c        |    8
 fs/xfs/xfs_fsops.c             |    2
 fs/xfs/xfs_iget.c              |  111 ---
 fs/xfs/xfs_inode.c             |   67 --
 fs/xfs/xfs_inode.h             |   76 --
 fs/xfs/xfs_inode_item.c        |    5
 fs/xfs/xfs_iomap.c             |   85 --
 fs/xfs/xfs_iomap.h             |    8
 fs/xfs/xfs_log.c               |  181 +----
 fs/xfs/xfs_log_priv.h          |   20
 fs/xfs/xfs_log_recover.c       |    1
 fs/xfs/xfs_mount.c             |    2
 fs/xfs/xfs_quota.h             |    8
 fs/xfs/xfs_rename.c            |    1
 fs/xfs/xfs_rtalloc.c           |    1
 fs/xfs/xfs_rw.c                |    3
 fs/xfs/xfs_trans.h             |   47 +
 fs/xfs/xfs_trans_buf.c         |   62 -
 fs/xfs/xfs_vnodeops.c          |    8
 70 files changed, 2151 insertions(+), 2592 deletions(-)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-14 23:08:16 -06:00
Christoph Hellwig
6ef3554422 xfs: change the xfs_iext_insert / xfs_iext_remove
Change the xfs_iext_insert / xfs_iext_remove prototypes to pass more
information which will allow pushing the trace points from the callers
into those functions.  This includes folding the whichfork information
into the state variable to minimize the addition stack footprint.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-14 23:08:15 -06:00
Christoph Hellwig
7574aa92f9 xfs: cleanup bmap extent state macros
Cleanup the extent state macros in the bmap code to use one common set of
flags that we can pass to the tracing code later and remove a lot of the
macro obsfucation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-14 23:08:15 -06:00
J. Bruce Fields
12045a6ee9 nfsd: let "insecure" flag vary by pseudoflavor
This was an oversight; it should be among the export flags that can be
allowed to vary by pseudoflavor.  This allows an administrator to (for
example) allow auth_sys mounts only from low ports, but allow auth_krb5
mounts to use any port.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 19:08:58 -05:00
J. Bruce Fields
e8e8753f7a nfsd: new interface to advertise export features
Soon we will add the new V4ROOT flag, and allow the INSECURE flag to
vary by pseudoflavor.  It would be useful for nfs-utils (for example,
for improved exportfs error reporting) to be able to know when this
happens.  Use this new interface for that purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:51:29 -05:00
Boaz Harrosh
9a74af2133 nfsd: Move private headers to source directory
Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:12 -05:00
Boaz Harrosh
68590c382b vfs: nfsctl.c un-used nfsd #includes
Only linux/nfsd/syscall.h is actually used. Remove the
other nfsd #includes, so they can be moved to source
directory.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:11 -05:00
Boaz Harrosh
0296f55f98 lockd: Remove un-used nfsd headers #includes
In what history where these ever needed? Well not
any more.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:11 -05:00
Boaz Harrosh
370d560017 compat.c: Remove dependence on nfsd private headers
Two nfsd related headers where included but never actually
used. The linux/nfsd/nfsd.h file will eventually be moved
to fs/nfsd directory as it is only needed by nfsd itself.

There are 3 more compat.c files in the Kernel at other ARCHs
that wrongly #include nfsd headers. Once these are fixed the
headers can be moved.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:10 -05:00
Boaz Harrosh
341eb18446 nfsd: Source files #include cleanups
Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:09 -05:00
J. Bruce Fields
57ecb34feb nfsd4: fix share mode permissions
NFSv4 opens may function as locks denying other NFSv4 users the rights
to open a file.

We're requiring a user to have write permissions before they can deny
write.  We're *not* requiring a user to have write permissions to deny
read, which is if anything a more drastic denial.

What was intended was to require write permissions for DENY_READ.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:06:54 -05:00
Linus Torvalds
3ea6b3d0e6 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: Avoid IO in udf_clear_inode
  udf: Try harder when looking for VAT inode
  udf: Fix compilation with UDFFS_DEBUG enabled
2009-12-14 12:50:25 -08:00
Jan Kara
2c948b3f86 udf: Avoid IO in udf_clear_inode
It is not very good to do IO in udf_clear_inode. First, VFS does not really
expect inode to become dirty there and thus we have to write it ourselves,
second, memory reclaim gets blocked waiting for IO when it does not really
expect it, third, the IO pattern (e.g. on umount) resulting from writes in
udf_clear_inode is bad and it slows down writing a lot.

The reason why UDF needed to do IO in udf_clear_inode is that UDF standard
mandates extent length to exactly match inode size. But when we allocate
extents to a file or directory, we don't really know what exactly the final
file size will be and thus temporarily set it to block boundary and later
truncate it to exact length in udf_clear_inode. Now, this is changed to
truncate to final file size in udf_release_file for regular files. For
directories and symlinks, we do the truncation at the moment when learn
what the final file size will be.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-14 21:40:04 +01:00
Jan Kara
e971b0b9e0 udf: Try harder when looking for VAT inode
Some disks do not contain VAT inode in the last recorded block as required
by the standard but a few blocks earlier (or the number of recorded blocks
is wrong). So look for the VAT inode a bit before the end of the media.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-14 21:40:04 +01:00
Jan Kara
1fefd086df udf: Fix compilation with UDFFS_DEBUG enabled
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-14 21:40:04 +01:00
Linus Torvalds
37222e1c9e Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (27 commits)
  md: add 'recovery_start' per-device sysfs attribute
  md: rcu_read_lock() walk of mddev->disks in md_do_sync()
  md: integrate spares into array at earliest opportunity.
  md: move compat_ioctl handling into md.c
  md: revise Kconfig help for MD_MULTIPATH
  md: add MODULE_DESCRIPTION for all md related modules.
  raid: improve MD/raid10 handling of correctable read errors.
  md/raid10: print more useful messages on device failure.
  md/bitmap: update dirty flag when bitmap bits are explicitly set.
  md: Support write-intent bitmaps with externally managed metadata.
  md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb
  md: support updating bitmap parameters via sysfs.
  md: factor out parsing of fixed-point numbers
  md: support bitmap offset appropriate for external-metadata arrays.
  md: remove needless setting of thread->timeout in raid10_quiesce
  md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
  md: move offset, daemon_sleep and chunksize out of bitmap structure
  md: collect bitmap-specific fields into one structure.
  md/raid1: add takeover support for raid5->raid1
  md: add honouring of suspend_{lo,hi} to raid1.
  ...
2009-12-14 10:03:36 -08:00
Linus Torvalds
3f86ce72cf Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (75 commits)
  NFS: Fix nfs_migrate_page()
  rpc: remove unneeded function parameter in gss_add_msg()
  nfs41: Invoke RECLAIM_COMPLETE on all new client ids
  SUNRPC: IS_ERR/PTR_ERR confusion
  NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare
  nfs41: Handle NFSv4.1 session errors in the delegation recall code
  nfs41: Retry delegation return if it failed with session error
  nfs41: Handle session errors during delegation return
  nfs41: Mark stateids in need of reclaim if state manager gets stale clientid
  NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured
  nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID
  nfs41: nfs41_setup_state_renewal
  NFSv41: More cleanups
  NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code
  NFSv41: Clean up slot table management
  NFSv41: Fix nfs4_proc_create_session
  nfs41: Invoke RECLAIM_COMPLETE
  nfs41: RECLAIM_COMPLETE functionality
  nfs41: RECLAIM_COMPLETE XDR functionality
  Cleanup some NFSv4 XDR decode comments
  ...
2009-12-14 10:00:24 -08:00
Linus Torvalds
d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
Arnd Bergmann
aa98aa3198 md: move compat_ioctl handling into md.c
The RAID ioctls are only implemented in md.c, so the
handling for them should also be moved there from
fs/compat_ioctl.c.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Andre Noll <maan@systemlinux.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
Trond Myklebust
52c9948b1f Merge branch 'nfs-for-2.6.33' 2009-12-13 13:56:27 -05:00
Linus Torvalds
09cea96caa Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits)
  powerpc: Fix usage of 64-bit instruction in 32-bit altivec code
  MAINTAINERS: Add PowerPC patterns
  powerpc/pseries: Track previous CPPR values to correctly EOI interrupts
  powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP
  powerpc: Make "intspec" pointers in irq_host->xlate() const
  powerpc/8xx: DTLB Miss cleanup
  powerpc/8xx: Remove DIRTY pte handling in DTLB Error.
  powerpc/8xx: Start using dcbX instructions in various copy routines
  powerpc/8xx: Restore _PAGE_WRITETHRU
  powerpc/8xx: Add missing Guarded setting in DTLB Error.
  powerpc/8xx: Fixup DAR from buggy dcbX instructions.
  powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions.
  powerpc/8xx: Update TLB asm so it behaves as linux mm expects.
  powerpc/8xx: Invalidate non present TLBs
  powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
  pseries/pseries: Add code to online/offline CPUs of a DLPAR node
  powerpc: stop_this_cpu: remove the cpu from the online map.
  powerpc/pseries: Add kernel based CPU DLPAR handling
  sysfs/cpu: Add probe/release files
  powerpc/pseries: Kernel DLPAR Infrastructure
  ...
2009-12-12 14:27:24 -08:00
Linus Torvalds
92340ee319 Merge branch 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
* 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  usbdevfs: move compat_ioctl handling to devio.c
  lp: move compat_ioctl handling into lp.c
  compat_ioctl: pass compat pointer directly to handlers
  compat_ioctl: simplify lookup table
  compat_ioctl: simplify calling of handlers
  compat_ioctl: inline all conversion handlers
  compat_ioctl: Remove BKL
  compat_ioctl: remove all VT ioctl handling
2009-12-11 20:57:46 -08:00
Linus Torvalds
0f4974c439 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits)
  tty: split the lock up a bit further
  tty: Move the leader test in disassociate
  tty: Push the bkl down a bit in the hangup code
  tty: Push the lock down further into the ldisc code
  tty: push the BKL down into the handlers a bit
  tty: moxa: split open lock
  tty: moxa: Kill the use of lock_kernel
  tty: moxa: Fix modem op locking
  tty: moxa: Kill off the throttle method
  tty: moxa: Locking clean up
  tty: moxa: rework the locking a bit
  tty: moxa: Use more tty_port ops
  tty: isicom: fix deadlock on shutdown
  tty: mxser: Use the new locking rules to fix setserial properly
  tty: mxser: use the tty_port_open method
  tty: isicom: sort out the board init logic
  tty: isicom: switch to the new tty_port_open helper
  tty: tty_port: Add a kref object to the tty port
  tty: istallion: tty port open/close methods
  tty: stallion: Convert to the tty_port_open/close methods
  ...
2009-12-11 15:34:40 -08:00
Linus Torvalds
3126c136bc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
  ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
  ext3: Fix data / filesystem corruption when write fails to copy data
  ext4: Support for 64-bit quota format
  ext3: Support for vfsv1 quota format
  quota: Implement quota format with 64-bit space and inode limits
  quota: Move definition of QFMT_OCFS2 to linux/quota.h
  ext2: fix comment in ext2_find_entry about return values
  ext3: Unify log messages in ext3
  ext2: clear uptodate flag on super block I/O error
  ext2: Unify log messages in ext2
  ext3: make "norecovery" an alias for "noload"
  ext3: Don't update the superblock in ext3_statfs()
  ext3: journal all modifications in ext3_xattr_set_handle
  ext2: Explicitly assign values to on-disk enum of filetypes
  quota: Fix WARN_ON in lookup_one_len
  const: struct quota_format_ops
  ubifs: remove manual O_SYNC handling
  afs: remove manual O_SYNC handling
  kill wait_on_page_writeback_range
  vfs: Implement proper O_SYNC semantics
  ...
2009-12-11 15:31:13 -08:00
Linus Torvalds
f4d544ee57 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: Fix error return for fallocate() on XFS
  xfs: cleanup dmapi macros in the umount path
  xfs: remove incorrect sparse annotation for xfs_iget_cache_miss
  xfs: kill the STATIC_INLINE macro
  xfs: uninline xfs_get_extsz_hint
  xfs: rename xfs_attr_fetch to xfs_attr_get_int
  xfs: simplify xfs_buf_get / xfs_buf_read interfaces
  xfs: remove IO_ISAIO
  xfs: Wrapped journal record corruption on read at recovery
  xfs: cleanup data end I/O handlers
  xfs: use WRITE_SYNC_PLUG for synchronous writeout
  xfs: reset the i_iolock lock class in the reclaim path
  xfs: I/O completion handlers must use NOFS allocations
  xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks
  xfs: simplify inode teardown
2009-12-11 15:30:29 -08:00
Linus Torvalds
f58df54a54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (27 commits)
  Driver core: fix race in dev_driver_string
  Driver Core: Early platform driver buffer
  sysfs: sysfs_setattr remove unnecessary permission check.
  sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir
  sysfs: Propagate renames to the vfs on demand
  sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish
  sysfs: In sysfs_chmod_file lazily propagate the mode change.
  sysfs: Implement sysfs_getattr & sysfs_permission
  sysfs: Nicely indent sysfs_symlink_inode_operations
  sysfs: Update s_iattr on link and unlink.
  sysfs: Fix locking and factor out sysfs_sd_setattr
  sysfs: Simplify iattr time assignments
  sysfs: Simplify sysfs_chmod_file semantics
  sysfs: Use dentry_ops instead of directly playing with the dcache
  sysfs: Rename sysfs_d_iput to sysfs_dentry_iput
  sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex
  debugfs: fix create mutex racy fops and private data
  Driver core: Don't remove kobjects in device_shutdown.
  firmware_class: make request_firmware_nowait more useful
  Driver-Core: devtmpfs - set root directory mode to 0755
  ...
2009-12-11 15:24:56 -08:00
Sukadev Bhattiprolu
edfacdd6f8 devpts_get_tty() should validate inode
devpts_get_tty() assumes that the inode passed in is associated with a valid
pty.  But if the only reference to the pty is via a bind-mount, the inode
passed to devpts_get_tty() while valid, would refer to a pty that no longer
exists.

With a lot of debug effort, Grzegorz Nosek developed a small program (see
below) to reproduce a crash on recent kernels. This crash is a regression
introduced by the commit:

	commit 527b3e4773
	Author: Sukadev Bhattiprolu <sukadev@us.ibm.com>
	Date:   Mon Oct 13 10:43:08 2008 +0100

To fix, ensure that the dentry associated with the inode has not yet been
deleted/unhashed by devpts_pty_kill().

See also:
https://lists.linux-foundation.org/pipermail/containers/2009-July/019273.html 

tty-bug.c:

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdio.h>

#include <linux/fs.h>

void dummy(int sig)
{
}

static int child(void *unused)
{
	int fd;

	signal(SIGINT, dummy); signal(SIGHUP, dummy);
	pause(); /* cheesy synchronisation to wait for /dev/pts/0 to appear */

	mount("/dev/pts/0", "/dev/console", NULL, MS_BIND, NULL);
	sleep(2);

	fd = open("/dev/console", O_RDWR);
	dup(0); dup(0);
	write(1, "Hello world!\n", sizeof("Hello world!\n")-1);
	return 0;
}

int main(void)
{
	pid_t pid;
	char *stack;

	stack = malloc(16384);
	pid = clone(child, stack+16384, CLONE_NEWNS|SIGCHLD, NULL);

	open("/dev/ptmx", O_RDWR|O_NOCTTY|O_NONBLOCK);

	unlockpt(fd); grantpt(fd);

	sleep(2);
	kill(pid, SIGHUP);
	sleep(1);
	return 0; /* exit before child opens /dev/console */
}

Reported-by: Grzegorz Nosek <root@localdomain.pl>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 15:18:05 -08:00
Jason Gunthorpe
44a743f687 xfs: Fix error return for fallocate() on XFS
Noticed that through glibc fallocate would return 28 rather than -1
and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format
positive return error codes while the syscalls use negative return
codes.  Fixup the two cases in xfs_vn_fallocate syscall to convert to
negative.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:23 -06:00
Christoph Hellwig
30ac0683dd xfs: cleanup dmapi macros in the umount path
Stop the flag saving as we never mangle those in the unmount path, and
hide all the weird arguents to the dmapi code inside the
XFS_SEND_PREUNMOUNT / XFS_SEND_UNMOUNT macros.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:23 -06:00
Christoph Hellwig
0c3dc2b02a xfs: remove incorrect sparse annotation for xfs_iget_cache_miss
xfs_iget_cache_miss does not get called with the pag_ici_lock held, so
the __releases annotation is incorrect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:23 -06:00
Christoph Hellwig
b8f82a4a6f xfs: kill the STATIC_INLINE macro
Remove our own STATIC_INLINE macro.  For small function inside
implementation files just use STATIC and let gcc inline it, and for
those in headers do the normal static inline - they are all small
enough to be inlined for debug builds, too.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:22 -06:00
Christoph Hellwig
5683f53e36 xfs: uninline xfs_get_extsz_hint
This function is too large to efficiently be inlined.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:22 -06:00
Christoph Hellwig
e82fa0c7ca xfs: rename xfs_attr_fetch to xfs_attr_get_int
Using a totally different name for the low-level get operation does
not fit the _int convention used in the rest of the attr code, so
rename it.

While we're at it also fix the prototype to use the normal convention
and mark it static as it's never used outside of xfs_attr.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:22 -06:00
Christoph Hellwig
6ad112bfb5 xfs: simplify xfs_buf_get / xfs_buf_read interfaces
Currently the low-level buffer cache interfaces are highly confusing
as we have a _flags variant of each that does actually respect the
flags, and one without _flags which has a flags argument that gets
ignored and overriden with a default set.  Given that very few places
use the default arguments get rid of the duplication and convert all
callers to pass the flags explicitly.  Also remove the now confusing
_flags postfix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:21 -06:00
Christoph Hellwig
c355c656fe xfs: remove IO_ISAIO
We set the IO_ISAIO flag for all read/write I/O since early Linux
2.6.x.  Remove it as it has lost it's purpose long ago.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:21 -06:00
Andy Poling
fc5bc4c85c xfs: Wrapped journal record corruption on read at recovery
Summary of problem:

If a journal record wraps at the physical end of the journal, it has to be
read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
read at the physical beginning.  If xlog_bread() has to re-align the first
read, the second read request does not take that re-alignment into account.
If the first read was re-aligned, the second read over-writes the end of the
data from the first read, effectively corrupting it.  This can happen either
when reading the record header or reading the record data.

The first sanity check in xlog_recover_process_data() is to check for a valid
clientid, so that is the error reported.

Summary of fix:

If there was a first read at the physical end, XFS_BUF_PTR() returns where the
data was requested to begin.  Conversely, because it is the result of
xlog_align(), offset indicates where the requested data for the first read
actually begins - whether or not xlog_bread() has re-aligned it.

Using offset as the base for the calculation of where to place the second read
data ensures that it will be correctly placed immediately following the data
from the first read instead of sometimes over-writing the end of it.

The attached patch has resolved the reported problem of occasional inability
to recover the journal (reporting "bad clientid").

Signed-off-by: Andy Poling <andy@realbig.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:21 -06:00
Christoph Hellwig
5ec4fabb02 xfs: cleanup data end I/O handlers
Currently we have different end I/O handlers for read vs the different
types of write I/O.  But they are all very similar so we could just
use one with a few conditionals and reduce code size a lot.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:20 -06:00
Christoph Hellwig
06342cf8ad xfs: use WRITE_SYNC_PLUG for synchronous writeout
The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for
synchronous writeout.  Right now I can't see any changes in performance
numbers with this, but we're getting some beating for not using it,
and the knowledge definitely could help the block code to make better
decisions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:20 -06:00
Christoph Hellwig
033da48fda xfs: reset the i_iolock lock class in the reclaim path
The iolock is used for protecting reads, writes and block truncates
against each other.  We have two classes of callers, the first one is
induced by a file operation and requires a reference to the inode be
held and not dropped after the operation is done:

 - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read,
   xfs_splice_write and xfs_setattr are all implementations of VFS
   methods that require a live inode
 - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the
   same is true
 - xfs_truncate_file is only called on quota inodes just returned from
   xfs_iget
 - xfs_sync_inode_data does the lock just after an igrab()
 - xfs_filestream_associate and xfs_filestream_new_ag take the iolock
   on the parent inode of an inode which by VFS rules must be referenced

And we have various calls to truncate blocks past EOF or the whole
file when dropping the last reference to an inode.  Unfortunately
lockdep complains when we do memory allocations that can recurse into
the filesystem in the first class because the second class happens to
take the same lock.  To avoid this re-init the iolock in the beginning
of xfs_fs_clear_inode to get a new lock class.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:20 -06:00
Christoph Hellwig
80641dc66a xfs: I/O completion handlers must use NOFS allocations
When completing I/O requests we must not allow the memory allocator to
recurse into the filesystem, as we might deadlock on waiting for the
I/O completion otherwise.  The only thing currently allocating normal
GFP_KERNEL memory is the allocation of the transaction structure for
the unwritten extent conversion.  Add a memflags argument to
_xfs_trans_alloc to allow controlling the allocator behaviour.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Thomas Neumann <tneumann@users.sourceforge.net>
Tested-by: Thomas Neumann <tneumann@users.sourceforge.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:20 -06:00
Christoph Hellwig
c56c9631cb xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks
When xfs_free_eofblocks is called from ->release the VM might already
hold the mmap_sem, but in the write path we take the iolock before
taking the mmap_sem in the generic write code.

Switch xfs_free_eofblocks to only trylock the iolock if called from
->release and skip trimming the prellocated blocks in that case.
We'll still free them later on the final iput.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:19 -06:00
Christoph Hellwig
848ce8f731 xfs: simplify inode teardown
Currently the reclaim code for the case where we don't reclaim the
final reclaim is overly complicated.  We know that the inode is clean
but instead of just directly reclaiming the clean inode we go through
the whole process of marking the inode reclaimable just to directly
reclaim it from the calling context.  Besides being overly complicated
this introduces a race where iget could recycle an inode between
marked reclaimable and actually being reclaimed leading to panics.

This patch gets rid of the existing reclaim path, and replaces it with
a simple call to xfs_ireclaim if the inode was clean.  While we're at
it we also use the slightly more lax xfs_inode_clean check we'd use
later to determine if we need to flush the inode here.

Finally get rid of xfs_reclaim function and place the remaining small
bits of reclaim code directly into xfs_fs_destroy_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Reported-by: Tommy van Leeuwen <tommy@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11 15:11:19 -06:00
Linus Torvalds
4e2ccdb040 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (49 commits)
  nilfs2: separate wait function from nilfs_segctor_write
  nilfs2: add iterator for segment buffers
  nilfs2: hide nilfs_write_info struct in segment buffer code
  nilfs2: relocate io status variables to segment buffer
  nilfs2: do not return io error for bio allocation failure
  nilfs2: use list_splice_tail or list_splice_tail_init
  nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty
  nilfs2: delete mark_inode_dirty in nilfs_delete_entry
  nilfs2: delete mark_inode_dirty in nilfs_commit_chunk
  nilfs2: change return type of nilfs_commit_chunk
  nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink
  nilfs2: delete redundant mark_inode_dirty
  nilfs2: expand inode_inc_link_count and inode_dec_link_count
  nilfs2: delete mark_inode_dirty from nilfs_set_link
  nilfs2: delete mark_inode_dirty in nilfs_new_inode
  nilfs2: add norecovery mount option
  nilfs2: add helper to get if volume is in a valid state
  nilfs2: move recovery completion into load_nilfs function
  nilfs2: apply readahead for recovery on mount
  nilfs2: clean up get/put function of a segment usage
  ...
2009-12-11 11:49:18 -08:00
Eric W. Biederman
e16acb503b sysfs: sysfs_setattr remove unnecessary permission check.
inode_change_ok already clears the SGID bit when necessary
so there is no reason for sysfs_setattr to carry code to do
the same, and it is good to kill the extra copy because when
I moved the code last in certain corner cases the code will
look at the wrong gid.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:54 -08:00
Eric W. Biederman
ca1bab3819 sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir
These two functions do 90% of the same work and it doesn't significantly
obfuscate the function to allow both the parent dir and the name to change
at the same time.  So merge them together to simplify maintenance, and
increase testing.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:54 -08:00
Eric W. Biederman
832b6af198 sysfs: Propagate renames to the vfs on demand
By teaching sysfs_revalidate to hide a dentry for
a sysfs_dirent if the sysfs_dirent has been renamed,
and by teaching sysfs_lookup to return the original
dentry if the sysfs dirent has been renamed.  I can
show the results of renames correctly without having to
update the dcache during the directory rename.

This massively simplifies the rename logic allowing a lot
of weird sysfs special cases to be removed along with
a lot of now unnecesary helper code.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:54 -08:00
Eric W. Biederman
a16bbc3430 sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish
With lazy inode updates and dentry operations bringing everything
into sync on demand there is no longer any need to immediately
update the vfs or grab i_mutex to protect those updates as we
make changes to sysfs.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:54 -08:00
Eric W. Biederman
06fc0d66f7 sysfs: In sysfs_chmod_file lazily propagate the mode change.
Now that sysfs_getattr and sysfs_permission refresh the vfs
inode there is no need to immediatly push the mode change
into the vfs cache.  Reducing the amount of work needed and
simplifying the locking.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:54 -08:00
Eric W. Biederman
e61ab4ae48 sysfs: Implement sysfs_getattr & sysfs_permission
With the implementation of sysfs_getattr and sysfs_permission
sysfs becomes able to lazily propogate inode attribute changes
from the sysfs_dirents to the vfs inodes.   This paves the way
for deleting significant chunks of now unnecessary code.

While doing this we did not reference sysfs_setattr from
sysfs_symlink_inode_operations so I added along with
sysfs_getattr and sysfs_permission.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:54 -08:00
Eric W. Biederman
c099aacd48 sysfs: Nicely indent sysfs_symlink_inode_operations
Lining up the functions in sysfs_symlink_inode_operations
follows the pattern in the rest of sysfs and makes things
slightly more readable.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
6b0bfe9383 sysfs: Update s_iattr on link and unlink.
Currently sysfs updates the timestamps on the vfs directory
inode when we create or remove a directory entry but doesn't
update the cached copy on the sysfs_dirent, fix that oversight.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
35df63c46c sysfs: Fix locking and factor out sysfs_sd_setattr
Cleanly separate the work that is specific to setting the
attributes of a sysfs_dirent from what is needed to update
the attributes of a vfs inode.

Additionally grab the sysfs_mutex to keep any nasties from
surprising us when updating the sysfs_dirent.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
4be3df28be sysfs: Simplify iattr time assignments
The granularity of sysfs time when we keep it is 1 ns.  Which
when passed to timestamp_trunc results in a nop.  So remove
the unnecessary function call making sysfs_setattr slightly
easier to read.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
4c6974f51a sysfs: Simplify sysfs_chmod_file semantics
Currently every caller of sysfs_chmod_file happens at either
file creation time to set a non-default mode or in response
to a specific user requested space change in policy.  Making
timestamps of when the chmod happens and notification of
a file changing mode uninteresting.

Remove the unnecessary time stamp and filesystem change
notification, and removes the last of the explicit inotify
and donitfy support from sysfs.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
e8f077c883 sysfs: Use dentry_ops instead of directly playing with the dcache
Calling d_drop unconditionally when a sysfs_dirent is deleted has
the potential to leak mounts, so instead implement dentry delete
and revalidate operations that cause sysfs dentries to be removed
at the appropriate time.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
28a027cfc0 sysfs: Rename sysfs_d_iput to sysfs_dentry_iput
Using dentry instead of d in the function name is what
several other filesystems are doing and it seems to be
a more readable convention.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Eric W. Biederman
f44d3e7857 sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex
The sysfs_mutex is required to ensure updates are and will remain
atomic with respect to other inode iattr updates, that do not happen
through the filesystem.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Mathieu Desnoyers
d3a3b0adad debugfs: fix create mutex racy fops and private data
Setting fops and private data outside of the mutex at debugfs file
creation introduces a race where the files can be opened with the wrong
file operations and private data.  It is easy to trigger with a process
waiting on file creation notification.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:53 -08:00
Stefan Richter
f38506c49d sysfs: mark a locally-only used function static
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11 11:24:51 -08:00
Arnd Bergmann
637e8a60a7 usbdevfs: move compat_ioctl handling to devio.c
Half the compat_ioctl handling is in devio.c, the other
half is in fs/compat_ioctl.c. This moves everything into
one place for consistency.

As a positive side-effect, push down the BKL into the
ioctl methods.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: David Vrabel <david.vrabel@csr.com>
Cc: linux-usb@vger.kernel.org
2009-12-10 22:55:37 +01:00
Arnd Bergmann
3695669cd4 lp: move compat_ioctl handling into lp.c
Handling for LPSETTIMEOUT can easily be done in lp_ioctl, which
is the only user. As a positive side-effect, push the BKL
into the ioctl methods.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-10 22:55:36 +01:00
Arnd Bergmann
43c6e7b97f compat_ioctl: pass compat pointer directly to handlers
Instead of having each handler call compat_ptr, we can now
convert the pointer once and pass that to each handler.
This saves a little bit of both source and object code size.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10 22:52:12 +01:00
Arnd Bergmann
661f627da9 compat_ioctl: simplify lookup table
The compat_ioctl table now only contains entries for
COMPATIBLE_IOCTL, so we only need to know if a number
is listed in it or now.

As an optimization, we hash the table entries with a
reversible transformation to get a more uniform distribution
over it, sort the table at startup and then guess the
position in the table when an ioctl number gets called
to do a linear search from there.

With the current set of ioctl numbers and the chosen
transformation function, we need an average of four
steps to find if a number is in the set, all of the
accesses within one or two cache lines.

This at least as good as the previous hash table
approach but saves 8.5 kb of kernel memory.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10 22:52:11 +01:00
Arnd Bergmann
789f0f8911 compat_ioctl: simplify calling of handlers
The compat_ioctl array now contains only entries for ioctl numbers
that do not require a separate handler. By special-casing the
ULONG_IOCTL case in the do_ioctl_trans function, we can kill the
final use of a function pointer in the array.

   text    data     bss     dec     hex filename
   7539   13352    2080   22971    59bb before/fs/compat_ioctl.o
   7910    8552    2080   18542    486e after/fs/compat_ioctl.o

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10 22:52:10 +01:00
Arnd Bergmann
5a07ea0b97 compat_ioctl: inline all conversion handlers
This makes all ioctl conversion handlers called from
a single switch statement, leaving only COMPATIBLE_IOCTL
and ULONG_IOCTL statements in the table. This is somewhat
more space efficient and also lets us simplify the
handling of the lookup table significantly.

before:
   text    data     bss     dec     hex filename
   7619   14024    2080   23723    5cab obj/fs/compat_ioctl.o
after:
   7567   13352    2080   22999    59d7 obj/fs/compat_ioctl.o

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10 22:52:09 +01:00
Arnd Bergmann
348c4b9078 compat_ioctl: Remove BKL
We have always called ioctl conversion handlers under the big kernel lock,
although that is generally not necessary.  In particular it is not needed
for conversion of data structures and for calling sys_ioctl or
do_vfs_ioctl, which will get the BKL again if needed.

Handlers doing more than those two have been moved out, so we can kill off
the BKL from compat_sys_ioctl.  This may significantly improve latencies
with 32 bit applications, and it avoids a common scenario where a thread
acquires the BKL twice.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-12-10 22:52:08 +01:00
Arnd Bergmann
fb07a5f857 compat_ioctl: remove all VT ioctl handling
The VT driver now handles all of these ioctls directly, so we can remove
the handlers from common code.

These are the only handlers that require the BKL because they directly
perform the ioctl action rather than just converting the data structures.
Once they are gone, we can remove the BKL from the remaining ioctl
conversion handlers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-12-10 22:52:08 +01:00
Linus Torvalds
02412f49f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: always use GFP_NOFS
2009-12-10 09:33:59 -08:00
Linus Torvalds
4515c3069d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)
  ext4: Do not override ext2 or ext3 if built they are built as modules
  jbd2: Export jbd2_log_start_commit to fix ext4 build
  ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
  ext4: Wait for proper transaction commit on fsync
  ext4: fix incorrect block reservation on quota transfer.
  ext4: quota macros cleanup
  ext4: ext4_get_reserved_space() must return bytes instead of blocks
  ext4: remove blocks from inode prealloc list on failure
  ext4: wait for log to commit when umounting
  ext4: Avoid data / filesystem corruption when write fails to copy data
  ext4: Use ext4 file system driver for ext2/ext3 file system mounts
  ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()
  jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()
  ext4: remove unused parameter wbc from __ext4_journalled_writepage()
  ext4: remove encountered_congestion trace
  ext4: move_extent_per_page() cleanup
  ext4: initialize moved_len before calling ext4_move_extents()
  ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
  ext4: use ext4_data_block_valid() in ext4_free_blocks()
  ...
2009-12-10 09:33:29 -08:00
Linus Torvalds
a5eba3f66f Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Multi-device mirror support
  exofs: Move all operations to an io_engine
  exofs: move osd.c to ios.c
  exofs: statfs blocks is sectors not FS blocks
  exofs: Prints on mount and unmout
  exofs: refactor exofs_i_info initialization into common helper
  exofs: dbg-print less
  exofs: More sane debug print
  trivial: some small fixes in exofs documentation
2009-12-10 09:32:24 -08:00
Linus Torvalds
fc1495bf99 Merge git://git.infradead.org/ubifs-2.6
* git://git.infradead.org/ubifs-2.6:
  UBIFS: fix return code in check_leaf
  UBI: flush wl before clearing update marker
  MAINTAINERS: change e-mail of Artem Bityutskiy
  UBIFS: remove manual O_SYNC handling
  UBIFS: support mounting of UBI volume character devices
  UBI: Add ubi_open_volume_path
2009-12-10 09:31:45 -08:00
Trond Myklebust
190f38e5ce NFS: Fix nfs_migrate_page()
The call to migrate_page() will cause the page->private field to be
cleared.
Also fix up the locking around the page->private transfer, so that we ensure
that calls to nfs_page_find_request() don't end up racing.

Finally, fix up a double free bug: nfs_unlock_request() already calls
nfs_release_request() for us...

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Tested-by: Andi Kleen <andi@firstfloor.org>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-10 09:05:55 -05:00
Roel Kluin
8e0eb4011b ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
Return the PTR_ERR of the correct pointer.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:55 +01:00
Jan Kara
68eb3db083 ext3: Fix data / filesystem corruption when write fails to copy data
When ext3_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks already
instantiated beyond i_size. Although these blocks were never inside i_size, we
have to truncate pagecache of these blocks so that corresponding buffers get
unmapped. Otherwise subsequent __block_prepare_write (called because we are
retrying the write) will find the buffers mapped, not call ->get_block, and
thus the page will be backed by already freed blocks leading to filesystem and
data corruption.

Reported-by: James Y Knight <foom@fuhm.net>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:55 +01:00
Jan Kara
5a20bdfcdc ext4: Support for 64-bit quota format
Add support for new 64-bit quota format. It is enough to add proper
mount options handling. The rest is done by the generic code.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:54 +01:00
Jan Kara
1aeec43432 ext3: Support for vfsv1 quota format
We just have to add proper mount options handling. The rest is handled by
the generic quota code.

CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:54 +01:00
Jan Kara
498c60153e quota: Implement quota format with 64-bit space and inode limits
So far the maximum quota space limit was 4TB. Apparently this isn't enough
for Lustre guys anymore. So implement new quota format which raises block
limits to 2^64 bytes. Also store number of inodes and inode limits in
64-bit variables as 2^32 files isn't that insanely high anymore.

The first version of the patch has been developed by Andrew Perepechko
<Andrew.Perepechko@Sun.COM>.

CC: Andrew.Perepechko@Sun.COM
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:54 +01:00
Jan Kara
3067393005 quota: Move definition of QFMT_OCFS2 to linux/quota.h
Move definition of this constant to linux/quota.h so that it
cannot clash with other format IDs.

CC: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:53 +01:00
Jérémy Cochoy
92e128884b ext2: fix comment in ext2_find_entry about return values
Signed-off-by: Jérémy Cochoy <jeremy.cochoy@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:53 +01:00