Now that JFFS2 can be exported by NFS, we need to get this right.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Avoid calling the underlying ->readdir() again when we reached the end
already; keep going round the loop only if we stopped due to our own
buffer being full.
[AV: tidy the things up a bit, while we are there]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's not the final state, but it allows moving ->readdir() instances
to passing filldir return value to caller of vfs_readdir().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Better pass parent and qstr to ext3_find_entry() explicitly than
use such kludges, especially since the stack footprint is nasty
enough and we have every chance to be deep in call chain.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that the readdir/lookup deadlock issues have been dealt with, we can
export JFFS2 file systems again.
(For now, you have to specify fsid manually; we should add a method to
the export_ops to handle that too.)
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we've moved the readdir hack to the nfsd code, we can
remove the local version from the XFS code.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some file systems with their own internal locking have problems with the
way that nfsd calls the ->lookup() method from within a filldir function
called from their ->readdir() method. The recursion back into the file
system code can cause deadlock.
XFS has a fairly hackish solution to this which involves doing the
readdir() into a locally-allocated buffer, then going back through it
calling the filldir function afterwards. It's not ideal, but it works.
It's particularly suboptimal because XFS does this for local file
systems too, where it's completely unnecessary.
Copy this hack into the NFS code where it can be used only for NFS
export. In response to feedback, use it unconditionally rather than only
for the affected file systems.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The calling conventions of d_alloc_anon are rather unfortunate for all
users, and it's name is not very descriptive either.
Add d_obtain_alias as a new exported helper that drops the inode
reference in the failure case, too and allows to pass-through NULL
pointers and inodes to allow for tail-calls in the export operations.
Incidentally this helper already existed as a private function in
libfs.c as exportfs_d_alloc so kill that one and switch the callers
to d_obtain_alias.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add kerneldoc for generic_file_llseek and generic_file_llseek_unlocked,
use sane variable names and unclutter the code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use a single goto label for chrdev_put + return error cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reformat hpfs_notify_change to standard kernel style to make it readable
and rename it to hpfs_setattr as that's what the method is called.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New flag: LOOKUP_EXCL. Set before doing the final step of pathname
resolution on the paths that have LOOKUP_CREATE and O_EXCL.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and don't pass bogus flags when we are just looking for parent.
Fold __path_lookup_intent_open() into path_lookup_open() while we
are at it; that's the only remaining caller.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit f06febc96b ("timers: fix itimer/
many thread hang") introduced a new task_cputime interface and
subsequently only converted binfmt_elf over to it. This results in the
build for binfmt_elf_fdpic blowing up given that p->signal->{u,s}time
have disappeared from underneath us.
Apply the same trivial fix from binfmt_elf to binfmt_elf_fdpic.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This merges branches irq/genirq, irq/sparseirq-v4, timers/hpet-percpu
and x86/uv.
The sparseirq branch is just preliminary groundwork: no sparse IRQs are
actually implemented by this tree anymore - just the new APIs are added
while keeping the old way intact as well (the new APIs map 1:1 to
irq_desc[]). The 'real' sparse IRQ support will then be a relatively
small patch ontop of this - with a v2.6.29 merge target.
* 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (178 commits)
genirq: improve include files
intr_remapping: fix typo
io_apic: make irq_mis_count available on 64-bit too
genirq: fix name space collisions of nr_irqs in arch/*
genirq: fix name space collision of nr_irqs in autoprobe.c
genirq: use iterators for irq_desc loops
proc: fixup irq iterator
genirq: add reverse iterator for irq_desc
x86: move ack_bad_irq() to irq.c
x86: unify show_interrupts() and proc helpers
x86: cleanup show_interrupts
genirq: cleanup the sparseirq modifications
genirq: remove artifacts from sparseirq removal
genirq: revert dynarray
genirq: remove irq_to_desc_alloc
genirq: remove sparse irq code
genirq: use inline function for irq_to_desc
genirq: consolidate nr_irqs and for_each_irq_desc()
x86: remove sparse irq from Kconfig
genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n
...
* 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
fix documentation of sysrq-q really
Fix documentation of sysrq-q
timer_list: add base address to clock base
timer_list: print cpu number of clockevents device
timer_list: print real timer address
NOHZ: restart tick device from irq_enter()
NOHZ: split tick_nohz_restart_sched_tick()
NOHZ: unify the nohz function calls in irq_enter()
timers: fix itimer/many thread hang, fix
timers: fix itimer/many thread hang, v3
ntp: improve adjtimex frequency rounding
timekeeping: fix rounding problem during clock update
ntp: let update_persistent_clock() sleep
hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds
posix-timers: lock_timer: make it readable
posix-timers: lock_timer: kill the bogus ->it_id check
posix-timers: kill ->it_sigev_signo and ->it_sigev_value
posix-timers: sys_timer_create: cleanup the error handling
posix-timers: move the initialization of timer->sigq from send to create path
posix-timers: sys_timer_create: simplify and s/tasklist/rcu/
...
Fix trivial conflicts due to sysrq-q description clahes in
Documentation/sysrq.txt and drivers/char/sysrq.c
Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (26 commits)
9p: add more conservative locking
9p: fix oops in protocol stat parsing error path.
9p: fix device file handling
9p: Improve debug support
9p: eliminate depricated conv functions
9p: rework client code to use new protocol support functions
9p: remove unnecessary tag field from p9_req_t structure
9p: remove 9p fcall debug prints
9p: add new protocol support code
9p: encapsulate version function
9p: move dirread to fs layer
9p: adjust 9p vfs write operation
9p: move readn meta-function from client to fs layer
9p: consolidate read/write functions
9p: drop broken unused error path from p9_conn_create()
9p: make rpc code common and rework flush code
9p: use the rcall structure passed in the request in trans_fd read_work
9p: apply common request code to trans_fd
9p: apply common tagpool handling to trans_fd
9p: move request management to client code
...
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: use correct fs type for v4 submounts and referrals
Make nfs_file_cred more robust.
NFS: Enable NFSv4 callback server to listen on AF_INET6 sockets
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits)
UBIFS: fix ubifs_compress commentary
UBIFS: amend printk
UBIFS: do not read unnecessary bytes when unpacking bits
UBIFS: check buffer length when scanning for LPT nodes
UBIFS: correct condition to eliminate unecessary assignment
UBIFS: add more debugging messages for LPT
UBIFS: fix bulk-read handling uptodate pages
UBIFS: improve garbage collection
UBIFS: allow for sync_fs when read-only
UBIFS: commit on sync_fs
UBIFS: correct comment for commit_on_unmount
UBIFS: update dbg_dump_inode
UBIFS: fix commentary
UBIFS: fix races in bit-fields
UBIFS: ensure data read beyond i_size is zeroed out correctly
UBIFS: correct key comparison
UBIFS: use bit-fields when possible
UBIFS: check data CRC when in error state
UBIFS: improve znode splitting rules
UBIFS: add no_chk_data_crc mount option
...
* git://git.infradead.org/mtd-2.6: (69 commits)
Revert "[MTD] m25p80.c code cleanup"
[MTD] [NAND] GPIO driver depends on ARM... for now.
[MTD] [NAND] sh_flctl: fix compile error
[MTD] [NOR] AT49BV6416 has swapped erase regions
[MTD] [NAND] GPIO NAND flash driver
[MTD] cmdlineparts documentation change - explain where mtd-id comes from
[MTD] cfi_cmdset_0002.c: Add Macronix CFI V1.0 TopBottom detection
[MTD] [NAND] Fix compilation warnings in drivers/mtd/nand/cs553x_nand.c
[JFFS2] Write buffer offset adjustment for NOR-ECC (Sibley) flash
[MTD] mtdoops: Fix a bug where block may not be erased
[MTD] mtdoops: Add a magic number to logged kernel oops
[MTD] mtdoops: Fix an off by one error
[JFFS2] Correct parameter names of jffs2_compress() in comments
[MTD] [NAND] sh_flctl: add support for Renesas SuperH FLCTL
[MTD] [NAND] Bug on atmel_nand HW ECC : OOB info not correctly written
[MTD] [MAPS] Remove unused variable after ROM API cleanup.
[MTD] m25p80.c extended jedec support (v2)
[MTD] remove unused mtd parameter in of_mtd_parse_partitions()
[MTD] [NAND] remove dead Kconfig associated with !CONFIG_PPC_MERGE
[MTD] [NAND] driver extension to support NAND on TQM85xx modules
...
The usage of elfcorehdr_addr has changed recently such that being set to
ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is
executing in a kernel executed as a crash kernel.
However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest
elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent
calls to is_kdump_kernel() will return 0, even though they should return
1.
Ok, at this point in time there are no subsequent calls, but I think its
fair to say that there is ample scope for error or at the very least
confusion.
This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that
elfcorehdr_addr was passed on the command line, and thus execution is
taking place in a crashdump kernel, but vmcore can't be used for some
reason. This is tested for using is_vmcore_usable() and set using
vmcore_unusable(). A subsequent patch makes use of this new code.
To summarise, the states that elfcorehdr_addr can now be in are as follows:
ELFCORE_ADDR_MAX: not a crashdump kernel
ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable
any other value: crash dump kernel and vmcore is usable
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
but also by the code which is not inside CONFIG_PROC_VMCORE. For
example, is_kdump_kernel() is used by powerpc code to determine if
kernel is booting after a panic then use previous kernel's TCE table.
So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
able to correctly determine that we are booting after a panic and setup
calgary iommu accordingly.
o So remove the assumption that elfcorehdr_addr is under
CONFIG_PROC_VMCORE.
o Move definition of elfcorehdr_addr to arch dependent crash files.
(Unfortunately crash dump does not have an arch independent file
otherwise that would have been the best place).
o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
second kernel without KEXEC being enabled.
o I don't see sh setup code parsing the command line for
elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
broken on sh.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a kconfig option to change the /proc/PID/coredump_filter default.
Fedora has been carrying a trivial patch to change the hard-wired value for
this default, since Fedora 8. The default default can't change safely
because there are old GDB versions out there (all before 6.7) that are
confused by the core dump files created by the MMF_DUMP_ELF_HEADERS setting.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the coredumping is multi-threaded, format_corename() appends .%pid to
the corename. This was needed before the proper multi-thread core dump
support, now all the threads in the mm go into a single unified core file.
Remove this special case, it is not even documented and we have "%p"
and core_uses_pid.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: La Monte Yarroll <piggy@laurelnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seq_cpumask_list(), seq_nodemask_list() are very like seq_cpumask(),
seq_nodemask(), but they print human readable string.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"m->count + len < m->size" is true commonly, so bitmap_scnprintf()
is commonly called. this fix saves a call to bitmap_scnprintf_len().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A corrupted extent for the extent file itself may try to get an impossible
extent, causing a deadlock if I see it correctly.
Check the inode number after the first_blocks checks and fail if it's the
extent file, as according to the spec the extent file should have no
extent for itself.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A very large directory with many read failures (either due to storage
problems, or due to invalid size & blocks from corruption) will generate a
printk storm as the filesystem continues to try to read all the blocks.
This flood of messages can tie up the box until it is complete - which may
be a very long time, especially for very large corrupted values.
This is fixed by only reporting the corruption once each time we try to
read the directory.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For blocksize < pagesize we need to remove blocks that got allocated in
block_write_begin() if we fail with ENOSPC for later blocks.
block_write_begin() internally does this if it allocated page locally.
This makes sure we don't have blocks outside inode.i_size during ENOSPC.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a bug where readdir() would return a directory entry twice
if there was a hash collision in an hash tree indexed directory.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eugene Dashevsky <eugene@ibrix.com>
Signed-off-by: Mike Snitzer <msnitzer@ibrix.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ordered mode, if a file data buffer being dirtied exists in the
committing transaction, we write the buffer to the disk, move it from the
committing transaction to the running transaction, then dirty it. But we
don't have to remove the buffer from the committing transaction when the
buffer couldn't be written out, otherwise it would miss the error and the
committing transaction would not abort.
This patch adds an error check before removing the buffer from the
committing transaction.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the journal doesn't abort when it gets an IO error in file data blocks,
the file data corruption will spread silently. Because most of
applications and commands do buffered writes without fsync(), they don't
notice the IO error. It's scary for mission critical systems. On the
other hand, if the journal aborts whenever it gets an IO error in file
data blocks, the system will easily become inoperable. So this patch
introduces a filesystem option to determine whether it aborts the journal
or just call printk() when it gets an IO error in file data.
If you mount a ext3 fs with data_err=abort option, it aborts on file data
write error. If you mount it with data_err=ignore, it doesn't abort, just
call printk(). data_err=ignore is the default.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We could run into ENOSPC error on ext3, even when there is free blocks on
the filesystem.
The problem is triggered in the case the goal block group has 0 free
blocks , and the rest block groups are skipped due to the check of
"free_blocks < windowsz/2". Current code could fall back to non
reservation allocation to prevent early ENOSPC after examing all the block
groups with reservation on , but this code was bypassed if the reservation
window is turned off already, which is true in this case.
This patch fixed two issues:
1) We don't need to turn off block reservation if the goal block group has
0 free blocks left and continue search for the rest of block groups.
Current code the intention is to turn off the block reservation if the
goal allocation group has a few (some) free blocks left (not enough for
make the desired reservation window),to try to allocation in the goal
block group, to get better locality. But if the goal blocks have 0 free
blocks, it should leave the block reservation on, and continues search for
the next block groups,rather than turn off block reservation completely.
2) we don't need to check the window size if the block reservation is off.
The problem was originally found and fixed in ext4.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When trying to resize a ext3 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left. This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Dilger <adilger@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>