Commit graph

63 commits

Author SHA1 Message Date
Frederic Weisbecker
48f6ba5e69 kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency
While creating the reiserfs workqueue during the journal
initialization, we are holding the reiserfs lock, but
create_workqueue() also holds the cpu_add_remove_lock, creating
then the following dependency:

- reiserfs lock -> cpu_add_remove_lock

But we also have the following existing dependencies:

- mm->mmap_sem -> reiserfs lock
- cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex

The merged dependency chain then becomes:

- mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
	cpu_hotplug.lock -> slub_lock -> sysfs_mutex

But when we fill a dir entry in sysfs_readir(), we are holding the
sysfs_mutex and we also might fault while copying the directory entry
to the user, leading to the following dependency:

- sysfs_mutex -> mm->mmap_sem

The end result is then a lock inversion between sysfs_mutex and
mm->mmap_sem, as reported in the following lockdep warning:

[ INFO: possible circular locking dependency detected ]
2.6.31-07095-g25a3912 #4
-------------------------------------------------------
udevadm/790 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0

but task is already holding lock:
 (sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (sysfs_mutex){+.+.+.}:
      [...]

-> #4 (slub_lock){+++++.}:
      [...]

-> #3 (cpu_hotplug.lock){+.+.+.}:
      [...]

-> #2 (cpu_add_remove_lock){+.+.+.}:
      [...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
      [...]

-> #0 (&mm->mmap_sem){++++++}:
      [...]

This can be fixed by relaxing the reiserfs lock while creating the
workqueue.
This is fine to relax the lock here, we just keep it around to pass
through reiserfs lock checks and for paranoid reasons.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-10-05 16:31:37 +02:00
Frederic Weisbecker
193be0ee17 kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency
Alexander Beregalov reported the following warning:

	=======================================================
	[ INFO: possible circular locking dependency detected ]
	2.6.31-03149-gdcc030a #1
	-------------------------------------------------------
	udevadm/716 is trying to acquire lock:
	 (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0

	but task is already holding lock:
	 (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200

	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #3 (sysfs_mutex){+.+.+.}:
	       [...]

	-> #2 (&bdev->bd_mutex){+.+.+.}:
	       [...]

	-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
	       [...]

	-> #0 (&mm->mmap_sem){++++++}:
	       [...]

On reiserfs mount path, we take the reiserfs lock and while
initializing the journal, we open the device, taking the
bdev->bd_mutex. Then rescan_partition() may signal the change
to sysfs.

We have then the following dependency:

	reiserfs_lock -> bd_mutex -> sysfs_mutex

Later, while entering reiserfs_readpage() after a pagefault in an
mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
to take the reiserfs lock too.
We have then the following dependency:

	mm->mmap_sem -> reiserfs_lock

which, expanded with the previous dependency gives us:

	mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex

Now while entering the sysfs readdir path, we are holding the
sysfs_mutex. And when we copy a directory entry to the user buffer, we
might fault and then take the mm->mmap_sem lock. Which leads to the
circular locking dependency reported.

We can fix that by relaxing the reiserfs lock during the call to
journal_init_dev(), which is the place where we open the mounted
device.

This is fine to relax the lock here because we are in the begining of
the reiserfs mount path and there is nothing to protect at this time,
the journal is not intialized.
We just keep this lock around for paranoid reasons.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-17 05:31:37 +02:00
Frederic Weisbecker
c72e05756b kill-the-bkl/reiserfs: acquire the inode mutex safely
While searching a pathname, an inode mutex can be acquired
in do_lookup() which calls reiserfs_lookup() which in turn
acquires the write lock.

On the other side reiserfs_fill_super() can acquire the write_lock
and then call reiserfs_lookup_privroot() which can acquire an
inode mutex (the root of the mount point).

So we theoretically risk an AB - BA lock inversion that could lead
to a deadlock.

As for other lock dependencies found since the bkl to mutex
conversion, the fix is to use reiserfs_mutex_lock_safe() which
drops the lock dependency to the write lock.

[ Impact: fix a possible deadlock with reiserfs ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:24 +02:00
Frederic Weisbecker
c63e3c0b24 kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe
reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
an internal reiserfs mutex and the write lock, it has been proposed
to follow the old bkl logic.

The code does the following:

while (!mutex_trylock(m)) {
	reiserfs_write_unlock(s);
	schedule();
	reiserfs_write_lock(s);
}

It then imitate the implicit behaviour of the lock when it was
a Bkl and hadn't such dependency:

mutex_lock(m) {
	if (fastpath)
		let's go
	else {
		wait_for_mutex() {
			schedule() {
				unlock_kernel()
				reacquire_lock_kernel()
			}
		}
	}
}

The problem is that by using such explicit schedule(), we don't
benefit of the adaptive mutex spinning on owner.

The logic in use now is:

reiserfs_write_unlock(s);
mutex_lock(m); // -> possible adaptive spinning
reiserfs_write_lock(s);

[ Impact: restore the use of adaptive spinning mutexes in reiserfs ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:21 +02:00
Frederic Weisbecker
6e3647acb4 kill-the-BKL/reiserfs: release the write lock on flush_commit_list()
flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
ll_rw_block() might sleep, and the bkl was released at this point. Then
we can also relax the write lock at this point.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:13 +02:00
Frederic Weisbecker
e6950a4da3 kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end()
When do_journal_end() copies data to the journal blocks buffers in memory,
it reschedules if needed between each block copied and dirtyfied.

We can also release the write lock at this rescheduling stage,
like did the bkl implicitly.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:08 +02:00
Frederic Weisbecker
a412f9efdd reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock
Impact: fix a deadlock

The j_flush_mutex is acquired safely in journal.c:
if we can't take it, we free the reiserfs per superblock lock
and wait a bit.

But we have a remaining place in kupdate_transactions() where
j_flush_mutex is still acquired traditionnaly. Thus the following
scenario (warned by lockdep) can happen:

A						B

mutex_lock(&write_lock)			mutex_lock(&write_lock)
	mutex_lock(&j_flush_mutex)	mutex_lock(&j_flush_mutex) //block
	mutex_unlock(&write_lock)
	sleep...
	mutex_lock(&write_lock) //deadlock

Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:18:01 +02:00
Frederic Weisbecker
8ebc423238 reiserfs: kill-the-BKL
This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.

It is a bit inspired from an old attempt by Peter Zijlstra:

   http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html

The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.

Reiserfs makes a deep use of the specific properties of the Bkl:

- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns

The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.

- We need a recursive-able locking unless we want to restructure several blocks
  of the code.
- We need to identify the sites where the bkl was implictly relaxed
  (schedule, wait, sync, etc...) so that we can in turn release and
  reacquire our new lock explicitly.
  Such implicit releases of the lock are often required to let other
  resources producer/consumer do their job or we can suffer unexpected
  starvations or deadlocks.

So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.

For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.

The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.

The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.

The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.

This is fine with the bkl:

      T1                       T2

lock_kernel()
    mutex_lock(A)
    unlock_kernel()
    // do something
                            lock_kernel()
                                mutex_lock(A) -> already locked by T1
                                schedule() (and then unlock_kernel())
    lock_kernel()
    mutex_unlock(A)
    ....

This is not fine with a mutex:

      T1                       T2

mutex_lock(write)
    mutex_lock(A)
    mutex_unlock(write)
    // do something
                           mutex_lock(write)
                              mutex_lock(A) -> already locked by T1
                              schedule()

    mutex_lock(write) -> already locked by T2
    deadlock

The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.

The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.

Those are:

- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super

Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.

After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:17:59 +02:00
Jens Axboe
8aa7e847d8 Fix congestion_wait() sync/async vs read/write confusion
Commit 1faa16d228 accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-10 20:31:53 +02:00
Jeff Mahoney
a9dd364358 reiserfs: rename p_s_sb to sb
This patch is a simple s/p_s_sb/sb/g to the reiserfs code.  This is the
first in a series of patches to rip out some of the awful variable
naming in reiserfs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:39 -07:00
Jeff Mahoney
0222e6571c reiserfs: strip trailing whitespace
This patch strips trailing whitespace from the reiserfs code.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:39 -07:00
Jeff Mahoney
32e8b10629 reiserfs: rearrange journal abort
This patch kills off reiserfs_journal_abort as it is never called, and
combines __reiserfs_journal_abort_{soft,hard} into one function called
reiserfs_abort_journal, which performs the same work. It is silent
as opposed to the old version, since the message was always issued
after a regular 'abort' message.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:36 -07:00
Jeff Mahoney
c3a9c2109f reiserfs: rework reiserfs_panic
ReiserFS panics can be somewhat inconsistent.
In some cases:
 * a unique identifier may be associated with it
 * the function name may be included
 * the device may be printed separately

This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:36 -07:00
Jeff Mahoney
45b03d5e8e reiserfs: rework reiserfs_warning
ReiserFS warnings can be somewhat inconsistent.
In some cases:
 * a unique identifier may be associated with it
 * the function name may be included
 * the device may be printed separately

This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:36 -07:00
Jeff Mahoney
600ed41675 reiserfs: audit transaction ids to always be unsigned ints
This patch fixes up the reiserfs code such that transaction ids are
always unsigned ints.  In places they can currently be signed ints or
unsigned longs.

The former just causes an annoying clm-2200 warning and may join a
transaction when it should wait.

The latter is just for correctness since the disk format uses a 32-bit
transaction id.  There aren't any runtime problems that result from it
not wrapping at the correct location since the value is truncated
correctly even on big endian systems.  The 0 value might make it to
disk, but the mount-time checks will bump it to 10 itself.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:35 -07:00
Al Viro
e5eb8caa83 [PATCH] remember mode of reiserfs journal
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:49:04 -04:00
Al Viro
30c40d2c01 [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
replace open_bdev_excl/close_bdev_excl with variants taking fmode_t.
superblock gets the value used to mount it stored in sb->s_mode

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:49:00 -04:00
Al Viro
9a1c354276 [PATCH] pass fmode_t to blkdev_put()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:48:58 -04:00
Al Viro
aeb5d72706 [PATCH] introduce fmode_t, do annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:47:06 -04:00
Nick Piggin
ca5de404ff fs: rename buffer trylock
Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-04 21:56:09 -07:00
Nick Piggin
529ae9aaa0 mm: rename page trylock
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-04 21:31:34 -07:00
Jeff Mahoney
90415deac7 reiserfs: convert j_commit_lock to mutex
j_commit_lock is a semaphore but uses it as if it were a mutex.  This patch
converts it to a mutex.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Jeff Mahoney
afe7025907 reiserfs: convert j_flush_sem to mutex
j_flush_sem is a semaphore but uses it as if it were a mutex.  This patch
converts it to a mutex.

[akpm@linux-foundation.org: fix mutex_trylock retval treatment]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Jeff Mahoney
f68215c464 reiserfs: convert j_lock to mutex
j_lock is a semaphore but uses it as if it were a mutex.  This patch converts
it to a mutex.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Christoph Hellwig
86098fa011 reiserfs: use open_bdev_excl
Use the proper helper to open a blockdevice by name for filesystem use,
this makes sure it's properly claimed (also added for open-by-number) and
gets rid of the struct file abuse.

Tested by mounting a reiserfs filesystem with external journal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Harvey Harrison
fbe5498b3d reiserfs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Harvey Harrison
e13601bc6a reiserfs: fix sparse warning in journal.c
fs/reiserfs/journal.c:4319:2: warning: returning void-valued expression

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Matthew Wilcox
6188e10d38 Convert asm/semaphore.h users to linux/semaphore.h
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:22:54 -04:00
Jeff Mahoney
cb680c1be6 reiserfs: ignore on disk s_bmap_nr value
Implement support for file systems larger than 8 TiB.

The reiserfs superblock contains a 16 bit value for counting the number of
bitmap blocks.  The rest of the disk format supports file systems up to 2^32
blocks, but the bitmap block limitation artificially limits this to 8 TiB with
a 4KiB block size.

Rather than trust the superblock's 16-bit bitmap block count, we calculate it
dynamically based on the number of blocks in the file system.  When an
incorrect value is observed in the superblock, it is zeroed out, ensuring that
older kernels will not be able to mount the file system.

Userspace support has already been implemented and shipped in reiserfsprogs
3.6.20.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Jeff Mahoney
3ee1667042 reiserfs: fix usage of signed ints for block numbers
Do a quick signedness check for block numbers.  There are a number of places
where signed integers are used for block numbers, which limits the usable file
system size to 8 TiB.  The disk format, excepting a problem which will be
fixed in the following patch, supports file systems up to 16 TiB in size.
This patch cleans up those sites so that we can enable the full usable size.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Edward Shishkin
cf3d0b8182 reiserfs: do not repair wrong journal params
When mounting a file system with wrong journal params do not try to repair
them, suggest fsck instead.

Signed-off-by: Edward Shishkin <edward@namesys.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:01 -07:00
Chris Mason
398c95bdf2 try to reap reiserfs pages left around by invalidatepage
reiserfs_invalidatepage will refuse to free pages if they have been logged
in data=journal mode, or were pinned down by a data=ordered operation.  For
data=journal, this is fairly easy to trigger just with fsx-linux, and it
results in a large number of pages hanging around on the LRUs with
page->mapping == NULL.

Calling try_to_free_buffers when reiserfs decides it is done with the page
allows it to be freed earlier, and with much less VM thrashing.  Lock
ordering rules mean that reiserfs can't call lock_page when it is releasing
the buffers, so TestSetPageLocked is used instead.  Contention on these
pages should be rare, so it should be sufficient most of the time.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:57 -07:00
Adrian Bunk
deba0f49b9 fs/reiserfs/: cleanups
- remove the following no longer used functions:
  - bitmap.c: reiserfs_claim_blocks_to_be_allocated()
  - bitmap.c: reiserfs_release_claimed_blocks()
  - bitmap.c: reiserfs_can_fit_pages()

- make the following functions static:
  - inode.c: restart_transaction()
  - journal.c: reiserfs_async_progress_wait()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:46 -07:00
Robert P. J. Day
beb7dd86a1 Fix misspellings collected by members of KJ list.
Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:14:03 +02:00
Milind Arun Choudhary
5ab2f7e0fd reiserfs: use __set_current_state()
use __set_current_state(TASK_*) instead of current->state = TASK_*, in
fs/reiserfs

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Matt LaPlante
0779bf2d2e Fix misc .c/.h comment typos
Fix various .c/.h typos in comments (no code changes).

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-30 05:24:39 +01:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Andrew Morton
3fcfab16c5 [PATCH] separate bdi congestion functions from queue congestion functions
Separate out the concept of "queue congestion" from "backing-dev congestion".
Congestion is a backing-dev concept, not a queue concept.

The blk_* congestion functions are retained, as wrappers around the core
backing-dev congestion functions.

This proper layering is needed so that NFS can cleanly use the congestion
functions, and so that CONFIG_BLOCK=n actually links.

Cc: "Thomas Maier" <balagi@justmail.de>
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Howells <dhowells@redhat.com>
Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:35 -07:00
Eric Sesterhenn
14a61442c2 BUG_ON conversion for fs/reiserfs
This patch converts several if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled. S_ISREG() has no side effects, so the
conversion is safe.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:36:38 +02:00
Chris Mason
a317202714 [PATCH] Fix reiserfs latencies caused by data=ordered
ReiserFS does periodic cleanup of old transactions in order to limit the
length of time a journal replay may take after a crash.  Sometimes, writing
metadata from an old (already committed) transaction may require committing
a newer transaction, which also requires writing all data=ordered buffers.
This can cause very long stalls on journal_begin.

This patch makes sure new transactions will not need to be committed before
trying a periodic reclaim of an old transaction.  It is low risk because if
a bad decision is made, it just means a slightly longer journal replay
after a crash.

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:11 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Akinobu Mita
f116629d03 [PATCH] fs: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under fs/.

Cc: Ian Kent <raven@themaw.net>
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Hans Reiser <reiserfs-dev@namesys.com>
Cc: Urban Widmark <urban@teststation.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Alexander Zarochentsev
a44c94a7b8 [PATCH] reiserfs: handle trans_id overflow
Reiserfs does not handle transaction ID overflow correctly.  Transaction ID
== 0 causes reiserfs to crash.  The patch fixes all places where the
transaction ID is incremented.

Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Vladimir V. Saveliev
c499ec24c3 [PATCH] reiserfs: do not check if unsigned < 0
This patch fixes bugs in reiserfs where unsigned integers were checked
whether they are less then 0.

Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-02 08:33:08 -08:00
Chris Mason
6ae1ea447d [PATCH] reiserfs: reiserfs fix journal accounting in journal_transaction_should_end
reiserfs: journal_transaction_should_end should increase the count of
blocks allocated so the transaction subsystem can keep new writers from
creating a transaction that is too large.

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:26 -08:00
Chris Mason
3d4492f81d [PATCH] reiserfs: reiserfs write_ordered_buffers should not oops on dirty non-uptodate bh
write_ordered_buffers should handle dirty non-uptodate buffers without a
BUG()

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:26 -08:00
Chris Mason
e0e851cf30 [PATCH] reiserfs: reiserfs hang and performance fix for data=journal mode
In data=journal mode, reiserfs writepage needs to make sure not to trigger
transactions while being run under PF_MEMALLOC.  This patch makes sure to
redirty the page instead of forcing a transaction start in this case.

Also, calling filemap_fdata* in order to trigger io on the block device can
cause lock inversions on the page lock.  Instead, do simple batching from
flush_commit_list.

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:26 -08:00
Chris Mason
d62b1b87a7 [PATCH] resierfs: fix reiserfs_invalidatepage race against data=ordered
After a transaction has closed but before it has finished commit, there is
a window where data=ordered mode requires invalidatepage to pin pages
instead of freeing them.  This patch fixes a race between the
invalidatepage checks and data=ordered writeback, and it also adds a check
to the reiserfs write_ordered_buffers routines to write any anonymous
buffers that were dirtied after its first writeback loop.

That bug works like this:

proc1: transaction closes and a new one starts
proc1: write_ordered_buffers starts processing data=ordered list
proc1: buffer A is cleaned and written
proc2: buffer A is dirtied by another process
proc2: File is truncated to zero, page A goes through invalidatepage
proc2: reiserfs_invalidatepage sees dirty buffer A with reiserfs
       journal head, pins it
proc1: write_ordered_buffers frees the journal head on buffer A

At this point, buffer A stays dirty forever

Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:26 -08:00
Pekka Enberg
8c777cc4be [PATCH] reiserfs: use __GFP_NOFAIL instead of yield and retry loop for allocation
This patch replaces yield and retry loop with __GFP_NOFAIL in
alloc_journal_list().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:25 -08:00