Commit graph

9950 commits

Author SHA1 Message Date
Arjan van de Ven
76369470b7 hrtimer: convert timerfd to the new hrtimer apis
In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts timerfd to these accessors.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-09-05 21:35:09 -07:00
Arjan van de Ven
8ff3e8e85f select: switch select() and poll() over to hrtimers
With lots of help, input and cleanups from Thomas Gleixner

This patch switches select() and poll() over to hrtimers.

The core of the patch is replacing the "s64 timeout" with a
"struct timespec end_time" in all the plumbing.

But most of the diffstat comes from using the just introduced helpers:
	poll_select_set_timeout
	poll_select_copy_remaining
	timespec_add_safe
which make manipulating the timespec easier and less error-prone.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-09-05 21:35:03 -07:00
Thomas Gleixner
b773ad40ac select: add poll_select_set_timeout() and poll_select_copy_remaining() helpers
This patch adds 2 helpers that will be used for the hrtimer based select/poll:

poll_select_set_timeout() is a helper that takes a timeout (as a second, nanosecond
pair) and turns that into a "struct timespec" that represents the absolute end time.
This is a common operation in the many select() and poll() variants and needs various,
common, sanity checks.

poll_select_copy_remaining() is a helper that takes care of copying the remaining
time to userspace, as select(), pselect() and ppoll() do. This function comes in
both a natural and a compat implementation (due to datastructure differences).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-09-05 21:34:59 -07:00
Balbir Singh
49048622ea sched: fix process time monotonicity
Spencer reported a problem where utime and stime were going negative despite
the fixes in commit b27f03d4bd. The suspected
reason for the problem is that signal_struct maintains it's own utime and
stime (of exited tasks), these are not updated using the new task_utime()
routine, hence sig->utime can go backwards and cause the same problem
to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
fixes the problem

TODO: using max(task->prev_utime, derived utime) works for now, but a more
generic solution is to implement cputime_max() and use the cputime_gt()
function for comparison.

Reported-by: spencer@bluehost.com
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 18:14:35 +02:00
KOSAKI Motohiro
4b8561521d mm: show quicklist usage in /proc/meminfo
Quicklists can consume several GB of memory.  We should provide a means of
monitoring this.

After this patch is applied, /proc/meminfo will output the following:

% cat /proc/meminfo

MemTotal:      7715392 kB
MemFree:       5401600 kB
Buffers:         80384 kB
Cached:         300800 kB
SwapCached:          0 kB
Active:         235584 kB
Inactive:       262656 kB
SwapTotal:     2031488 kB
SwapFree:      2031488 kB
Dirty:            3520 kB
Writeback:           0 kB
AnonPages:      117696 kB
Mapped:          38528 kB
Slab:          1589952 kB
SReclaimable:    23104 kB
SUnreclaim:    1566848 kB
PageTables:      14656 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   5889152 kB
Committed_AS:   393152 kB
VmallocTotal: 17592177655808 kB
VmallocUsed:     29056 kB
VmallocChunk: 17592177626432 kB
Quicklists:     130944 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    262144 kB

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:38 -07:00
Adrian Bunk
169ccbd44e NTFS: update homepage
Update the location of the NTFS homepage in several files.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:37 -07:00
Linus Torvalds
e77295dc9e Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix buffer overrun decoding NFSv4 acl
  sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports
  nfsd: fix compound state allocation error handling
  svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet
2008-09-02 10:58:11 -07:00
J. Bruce Fields
91b80969ba nfsd: fix buffer overrun decoding NFSv4 acl
The array we kmalloc() here is not large enough.

Thanks to Johann Dahm and David Richter for bug report and testing.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Richter <richterd@citi.umich.edu>
Tested-by: Johann Dahm <jdahm@umich.edu>
2008-09-01 14:24:24 -04:00
Andy Adamson
c228c24bf1 nfsd: fix compound state allocation error handling
Move the cstate_alloc call so that if it fails, the response is setup to
encode the NFS error. The out label now means that the
nfsd4_compound_state has not been allocated.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-01 14:17:48 -04:00
Steve French
c76da9da1f [CIFS] Turn off Unicode during session establishment for plaintext authentication
LANMAN session setup did not support Unicode (after session setup, unicode can
still be used though).

Fixes samba bug# 5319

CC: Jeff Layton <jlayton@redhat.com>
CC: Stable Kernel <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-28 15:32:22 +00:00
Steve French
2e655021b8 [CIFS] update cifs change log
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-28 15:30:06 +00:00
Jeff Layton
838726c475 cifs: fix O_APPEND on directio mounts
The direct I/O write codepath for CIFS is done through
cifs_user_write(). That function does not currently call
generic_write_checks() so the file position isn't being properly set
when the file is opened with O_APPEND.  It's also not doing the other
"normal" checks that should be done for a write call.

The problem is currently that when you open a file with O_APPEND on a
mount with the directio mount option, the file position is set to the
beginning of the file. This makes any subsequent writes clobber the data
in the file starting at the beginning.

This seems to fix the problem in cursory testing. It is, however
important to note that NFS disallows the combination of
(O_DIRECT|O_APPEND). If my understanding is correct, the concern is
races with multiple clients appending to a file clobbering each others'
data. Since the write model for CIFS and NFS is pretty similar in this
regard, CIFS is probably subject to the same sort of races. What's
unclear to me is why this is a particular problem with O_DIRECT and not
with buffered writes...

Regardless, disallowing O_APPEND on an entire mount is probably not
reasonable, so we'll probably just have to deal with it and reevaluate
this flag combination when we get proper support for O_DIRECT. In the
meantime this patch at least fixes the existing problem.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-28 14:15:32 +00:00
Steve French
6405c9cd9b Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-08-28 02:47:00 +00:00
Linus Torvalds
325a9a3d39 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Add destroy routine for dns_resolver
  [CIFS] Reorder cifs config item for better clarity
  [CIFS] Correct keys dependency for cifs kerberos support
2008-08-27 14:34:49 -07:00
Linus Torvalds
5b51a7e9d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] deal with the first call of ->show() generating no output
  [PATCH] fix ->llseek() for a bunch of directories
  [PATCH] fix regular readdir() and friends
  [PATCH] fix hpux_getdents()
  [PATCH] fix osf_getdirents()
  [PATCH] ntfs: use d_add_ci
  [PATCH] change d_add_ci argument ordering
  [PATCH] fix efs_lookup()
  [PATCH] proc: inode number fixlet
2008-08-27 14:31:44 -07:00
Steve French
bcc55c6664 [CIFS] Fix plaintext authentication
The last eight bytes of the password field were not cleared when doing lanman plaintext password authentication. This patch fixes that.

I tested it with Samba by setting password
encryption to no in the server's smb.conf.  Other servers also can be
configured to force plaintext authentication.    Note that plaintexti
authentication requires setting /proc/fs/cifs/SecurityFlags to 0x30030
on the client (enabling both LANMAN and also plaintext password support).
Also note that LANMAN support (and thus plaintext password support) requires
CONFIG_CIFS_WEAK_PW_HASH to be enabled in menuconfig.

CC: Jeff Layton <jlayton@redhat.com>
CC: Stable Kernel <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-27 21:30:22 +00:00
Jeff Layton
87ed1d65fb [CIFS] Add destroy routine for dns_resolver
Otherwise, we're leaking the payload memory.

CC: Stable Kernel <stable@vger.kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-27 21:17:41 +00:00
Linus Torvalds
0559bc8e9b Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: remove blk_queue_tag_depth() and blk_queue_tag_queue()
  block: remove unused ->busy part of the block queue tag map
  bio: fix __bio_copy_iov() handling of bio->bv_len
  bio: fix bio_copy_kern() handling of bio->bv_len
  block: submit_bh() inadvertently discards barrier flag on a sync write
  block: clean up cmdfilter sysfs interface
  block: rename blk_scsi_cmd_filter to blk_cmd_filter
  sg: restore command permission for TYPE_SCANNER
  block: move cmdfilter from gendisk to request_queue
2008-08-27 13:55:35 -07:00
Linus Torvalds
e472233fc5 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Increment the reference count of an already-active stack.
  [PATCH] configfs: Consolidate locking around configfs_detach_prep() in configfs_rmdir()
  ocfs2: correctly set i_blocks after inline dir gets expanded
  ocfs2: Jump to correct label in ocfs2_expand_inline_dir()
  ocfs2: Fix sleep-with-spinlock recovery regression
  [PATCH] ocfs2/cluster/netdebug.c: fix warning
  [PATCH] ocfs2/cluster/tcp.c: make some functions static
2008-08-27 13:54:55 -07:00
FUJITA Tomonori
aefcc28a3a bio: fix __bio_copy_iov() handling of bio->bv_len
The commit c5dec1c303 introduced
__bio_copy_iov() to add bounce support to blk_rq_map_user_iov.

__bio_copy_iov() uses bio->bv_len to copy data for READ commands after
the completion but it doesn't work with a request that partially
completed. SCSI always completes a PC request as a whole but seems
some don't.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-08-27 09:50:19 +02:00
FUJITA Tomonori
76029ff37f bio: fix bio_copy_kern() handling of bio->bv_len
The commit 68154e90c9 introduced
bio_copy_kern() to add bounce support to blk_rq_map_kern.

bio_copy_kern() uses bio->bv_len to copy data for READ commands after
the completion but it doesn't work with a request that partially
completed. SCSI always completes a PC request as a whole but seems
some don't.

This patch fixes bio_copy_kern to handle the above case. As
bio_copy_user does, bio_copy_kern uses struct bio_map_data to store
struct bio_vec.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reported-by: Nix <nix@esperi.org.uk>
Tested-by: Nix <nix@esperi.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-08-27 09:50:19 +02:00
Jens Axboe
48fd4f93a0 block: submit_bh() inadvertently discards barrier flag on a sync write
Reported by Milan Broz <mbroz@redhat.com>, commit 18ce3751 inadvertently
made submit_bh() discard the barrier bit for a WRITE_SYNC request. Fix
that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-08-27 09:50:19 +02:00
Steve French
96c2a1137b [CIFS] Reorder cifs config item for better clarity
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-26 18:32:28 +00:00
Steve French
e9775843ec [CIFS] Correct keys dependency for cifs kerberos support
Must also depend on CIFS ...

Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-26 18:22:50 +00:00
Steve French
3dae49abef Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-08-26 16:56:05 +00:00
Steve French
6ce5eecb9c [CIFS] check version in spnego upcall response
Currently, we don't check the version in the SPNEGO upcall response
even though one is provided. Jeff and Q have made the corresponding
change to the Samba client (cifs.upcall).

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-26 00:37:14 +00:00
Joel Becker
d6817cdbd1 ocfs2: Increment the reference count of an already-active stack.
The ocfs2_stack_driver_request() function failed to increment the
refcount of an already-active stack.  It only did the increment on the
first reference.  Whoops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Marcos Matsunaga <marcos.matsunaga@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-25 07:29:47 -07:00
Al Viro
4cdfe84b51 [PATCH] deal with the first call of ->show() generating no output
seq_read() has a subtle bug - we want the first loop there to go
until at least one *non-empty* record had fit entirely into buffer.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:10 -04:00
Al Viro
59af1584bf [PATCH] fix ->llseek() for a bunch of directories
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:09 -04:00
Al Viro
8f3f655da7 [PATCH] fix regular readdir() and friends
Handling of -EOVERFLOW.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:08 -04:00
Christoph Hellwig
2690421743 [PATCH] ntfs: use d_add_ci
d_add_ci was lifted 1:1 from ntfs.  Change ntfs to use the common
version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:06 -04:00
Christoph Hellwig
e45b590b97 [PATCH] change d_add_ci argument ordering
As pointed out during review d_add_ci argument order should match d_add,
so switch the dentry and inode arguments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:05 -04:00
Al Viro
2d8a10cd17 [PATCH] fix efs_lookup()
it needs to use d_splice_alias(), not d_add()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:04 -04:00
Alexey Dobriyan
cc99609917 [PATCH] proc: inode number fixlet
Ouch, if number taken from IDA is too big, the intent was to signal an
error, not check for overflow and still do overflowing addition.

One still needs 2^28 proc entries to notice this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-25 01:18:03 -04:00
Adrian Bunk
7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Louis Rilling
de6bf18e9c [PATCH] configfs: Consolidate locking around configfs_detach_prep() in configfs_rmdir()
It appears that configfs_rmdir() can protect configfs_detach_prep() retries with
less calls to {spin,mutex}_{lock,unlock}, and a cleaner code.

This patch does not change any behavior, except that it removes two useless
lock/unlock pairs having nothing inside to protect and providing a useless
barrier.

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:09:02 -07:00
Mark Fasheh
9780eb6cfa ocfs2: correctly set i_blocks after inline dir gets expanded
We were setting i_blocks based on allocation before the extent insert, which
is wrong as the value is a calculation based on ip_clusters which gets
updated as a result of the insert. This patch moves the line in question
to just after the call to ocfs2_insert_extent().

Without this fix, inline directories were temporarily having an i_blocks
value of zero immediately after expansion to extents.

Reported-and-tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:09:02 -07:00
Tao Ma
83cab5338f ocfs2: Jump to correct label in ocfs2_expand_inline_dir()
When we fail to insert extent in ocfs2_expand_inline_dir(), we should go to
out_commit, not out.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:09:02 -07:00
Mark Fasheh
a1af7d15a1 ocfs2: Fix sleep-with-spinlock recovery regression
This fixes a bug introduced with 539d826409:
    [PATCH 2/2] ocfs2: Fix race between mount and recovery

ocfs2_mark_dead_nodes() was reading journal inodes while holding the
spinlock protecting our in-memory recovery state. The fix is very simple -
the disk state is protected by a cluster lock that's already held, so we
just move the spinlock down past the read.

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 11:08:38 -07:00
Alexander Beregalov
a57a874b04 [PATCH] ocfs2/cluster/netdebug.c: fix warning
ocfs2/cluster/netdebug.c: fix warning

fs/ocfs2/cluster/netdebug.c:154: warning: format '%lu' expects
     type 'long unsigned int', but argument 17 has type 'suseconds_t'

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 10:56:57 -07:00
Adrian Bunk
18496e80f7 [PATCH] ocfs2/cluster/tcp.c: make some functions static
Commit 0f475b2abe (ocfs2/net: Silence build
warnings) made sense as far as it fixed compile warnings, but it was not
required that it made the functions global.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-22 10:56:40 -07:00
Linus Torvalds
ee26562772 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Update documentation to remind users to update mke2fs.conf
  ext4: Fix small file fragmentation
  ext4: Initialize writeback_index to 0 when allocating a new inode
  ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC
  ext4: journal credit fix for the delayed allocation's writepages() function
  ext4: Rework the ext4_da_writepages() function
  ext4: journal credits reservation fixes for DIO, fallocate
  ext4: journal credits reservation fixes for extent file writepage
  ext4: journal credits calulation cleanup and fix for non-extent writepage
  ext4: Fix bug where we return ENOSPC even though we have plenty of inodes
  ext4: don't try to resize if there are no reserved gdt blocks left
  ext4: Use ext4_discard_reservations instead of mballoc-specific call
  ext4: Fix ext4_dx_readdir hash collision handling
  ext4: Fix delalloc release block reservation for truncate
  ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty
  ext4: Handle unwritten extent properly with delayed allocation
2008-08-22 08:37:07 -07:00
Al Viro
82d63fc9e3 cramfs: fix named-pipe handling
After commit a97c9bf33f (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.

It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).

Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.

Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>		[2.6.14 and later]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:32 -07:00
Ken Chen
2d70b68d42 fix setpriority(PRIO_PGRP) thread iterator breakage
When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting.  The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.

Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process.  Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:32 -07:00
Pavel Emelyanov
ff9bc512f1 binfmt_misc: fix false -ENOEXEC when coupled with other binary handlers
In case the binfmt_misc binary handler is registered *before* the e.g.
script one (when for example being compiled as a module) the following
situation may occur:

1. user launches a script, whose interpreter is a misc binary;
2. the load_misc_binary sets the misc_bang and returns -ENOEVEC,
   since the binary is a script;
3. the load_script_binary loads one and calls for search_binary_hander
   to run the interpreter;
4. the load_misc_binary is called again, but refuses to load the
   binary due to misc_bang bit set.

The fix is to move the misc_bang setting lower - prior to the actual
call to the search_binary_handler.

Caused by the commit 3a2e7f47 (binfmt_misc.c: avoid potential kernel
stack overflow)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: <stable@kernel.org>		[2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:31 -07:00
Clement Calmels
1804dc6e14 /proc/self/maps doesn't display the real file offset
This addresses

	http://bugzilla.kernel.org/show_bug.cgi?id=11318

In function show_map (file: fs/proc/task_mmu.c), if vma->vm_pgoff > 2^20
than (vma->vm_pgoff << PAGE_SIZE) is greater than 2^32 (with PAGE_SIZE
equal to 4096 (i.e.  2^12).  The next seq_printf use an unsigned long for
the conversion of (vma->vm_pgoff << PAGE_SIZE), as a result the offset
value displayed in /proc/self/maps is truncated if the page offset is
greater than 2^20.

A test that shows this issue:

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

#define PAGE_SIZE (getpagesize())

#if __i386__
#   define U64_STR "%llx"
#elif __x86_64
#   define U64_STR "%lx"
#else
#   error "Architecture Unsupported"
#endif

int main(int argc, char *argv[])
{
	int fd;
	char *addr;
	off64_t offset = 0x10000000;
	char *filename = "/dev/zero";

	fd = open(filename, O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 1;
	}

	offset *= 0x10;
	printf("offset = " U64_STR "\n", offset);

	addr = (char*)mmap64(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd,
			     offset);
	if ((void*)addr == MAP_FAILED) {
		perror("mmap64");
		return 1;
	}

	{
		FILE *fmaps;
		char *line = NULL;
		size_t len = 0;
		ssize_t read;
		size_t filename_len = strlen(filename);

		fmaps = fopen("/proc/self/maps", "r");
		if (!fmaps) {
			perror("fopen");
			return 1;
		}
		while ((read = getline(&line, &len, fmaps)) != -1) {
			if ((read > filename_len + 1)
			    && (strncmp(&line[read - filename_len - 1], filename, filename_len) == 0))
				printf("%s", line);
		}

		if (line)
			free(line);

		fclose(fmaps);
	}

	close(fd);
	return 0;
}

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Clement Calmels <cboulte@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:30 -07:00
Linus Torvalds
1bbe44f69d Merge branch 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Provide a FLAT_PLAT_INIT() definition.
  binfmt_flat: Stub in a FLAT_PLAT_INIT().
  video: export sh_mobile_lcdc panel size
  sh: select memchunk size using kernel cmdline
  sh: export sh7723 VEU as VEU2H
  input: migor_ts compile and detection fix
  sh: remove MSTPCR defines from Migo-R header file
  sh: Update sh7763rdp defconfig
  sh: Add support sh7760fb to sh7763rdp board
  sh: Add support sh_eth to sh7763rdp board
  sh: Disable 64kB hugetlbpage size when using 64kB PAGE_SIZE.
  sh: Don't export __{s,u}divsi3_i4i from SH-2 libgcc.
  fix SH7705_CACHE_32KB compilation
  sh: mach-x3proto: Fix up smc91x platform data.
2008-08-20 08:46:11 -07:00
Linus Torvalds
5f22ca9b13 vfat: fix 'sync' mount deadlock due to BKL->lock_super conversion
There was another FAT BKL conversion deadlock reported by Bart
Trojanowski due to the BKL being used as a recursive lock by FAT, which
was missed because it only triggers with 'sync' (or 'dirsync') mounts.

The recursion worked for the BKL, but after the conversion to lock_super
(which uses a mutex), it just deadlocks.

Thanks to Bart for debugging this and testing the fix.  The lock
debugging information from the original report:

  =============================================
  [ INFO: possible recursive locking detected ]
  2.6.27-rc3-bisect-00448-ga7f5aaf #16
  ---------------------------------------------
  mv/4020 is trying to acquire lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  but task is already holding lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  other info that might help us debug this:
  3 locks held by mv/4020:
   #0:  (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140
   #1:  (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110
   #2:  (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  stack backtrace:
  Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16
   [<c014e694>] validate_chain+0x984/0xea0
   [<c0108d70>] ? native_sched_clock+0x0/0xf0
   [<c014ee9c>] __lock_acquire+0x2ec/0x9b0
   [<c014f5cf>] lock_acquire+0x6f/0x90
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c044e5fd>] mutex_lock_nested+0xad/0x300
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] lock_super+0x1e/0x20
   [<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat]
   [<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80
   [<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat]
   [<f8b3a962>] fat_sync_inode+0x12/0x20 [fat]
   [<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat]
   [<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat]
   [<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat]
   [<c01b0968>] vfs_unlink+0x98/0x110
   [<c01b2400>] do_unlinkat+0x130/0x140
   [<c016a8f5>] ? audit_syscall_entry+0x105/0x150
   [<c01b253b>] sys_unlinkat+0x3b/0x40
   [<c01040d3>] sysenter_do_call+0x12/0x3f
   =======================

where the deadlock is due to the nesting of lock_super from vfat_unlink
to fat_write_inode:

 - do_unlinkat
   - vfs_unlink
     - vfat_unlink
       * lock_super
       - fat_remove_entries
         - fat_sync_inode
           - fat_write_inode
             * lock_super

and the fix is to simply remove the use of lock_super() in fat_write_inode.

The lock_super() there had been just an automatic conversion of the
kernel lock to the superblock lock, but no locking was actually needed
there, since the code in fat_write_inode already protected all relevant
accesses with a spinlock (sbi->inode_hash_lock to be exact).  The only
code inside the BKL (and thus the superblock lock) was accesses tp local
variables or calls to functions that have long been SMP-safe (i.e.
sb_bread, mark_buffe_dirty and brlese).

Bart reports:
 "Looks good.  I ran 10 parallel processes creating 1M files truncating
  them, writing to them again and then deleting them.  This patch fixes
  the issue I ran into.

  Signed-off-by: Bart Trojanowski <bart@jukie.net>"

Reported-and-tested-by: Bart Trojanowski <bart@jukie.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 08:31:19 -07:00
Steve French
3d2af3465e [CIFS] Kerberos support not considered experimental anymore
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-19 20:51:09 +00:00
Steve French
c16fefa563 [CIFS] distinguish between Kerberos and MSKerberos in upcall
Properly handle MSKRB5 by passing sec=mskrb5 to the upcall so that the
spengo blob can be generated appropriately. Also, make
decode_negTokenInit prefer whichever mechanism is first in the list.

Needed for some NetApp servers, and possibly some older
versions of Windows which treat the two KRB5 mechanisms differently.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-08-19 19:35:33 +00:00