The NFSv4.1 spec-29 (18.36.3) says that the server MUST use an ONC RPC
(program) version number equal to 4 in callbacks sent to the client.
For now we allow both versions 1 and 4.
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace sync and async handlers setting of the NFS4CLNT_SESSION_SETUP bit with
setting NFS4CLNT_CHECK_LEASE, and let the state manager decide to reset the session.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Do not wake up the next slot_tbl_waitq task in nfs4_free_slot because we
may be draining the slot. Either signal the state manager that the session
is drained (the state manager wakes up tasks) OR wake up the next task.
In nfs41_sequence_done, the slot dereference is only needed in the sequence
operation success case.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the session is reset during state recovery, the state manager thread can
sleep on the slot_tbl_waitq causing a deadlock.
Add a completion framework to the session. Have the state manager thread set
a new session state (NFS4CLNT_SESSION_DRAINING) and wait for the session slot
table to drain.
Signal the state manager thread in nfs41_sequence_free_slot when the
NFS4CLNT_SESSION_DRAINING bit is set and the session is drained.
Reported-by: Trond Myklebust <trond@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_recover_session can put rpciod to sleep. Just use nfs4_schedule_recovery.
Reported-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Do not fall through and set NFS4CLNT_SESSION_RESET bit on NFS4ERR_EXPIRED
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Do not fall through and call nfs4_delay on session error handling.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_read_done returns zero on unhandled errors. nfs_readpage_result will
return on a negative tk_status without freeing the slot.
Call nfs4_sequence_free_slot on unhandled errors in nfs4_read_done.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs41_sequence_free_slot can be called multiple times on SEQUENCE operation
errors.
No reason to inline nfs4_restart_rpc
Reported-by: Trond Myklebust <trond.myklebust@netapp.com>
nfs_writeback_done and nfs_readpage_retry call nfs4_restart_rpc outside the
error handler, and the slot is not freed prior to restarting in the rpc_prepare
state during session reset.
Fix this by moving the call to nfs41_sequence_free_slot from the error
path of nfs41_sequence_done into nfs4_restart_rpc, and by removing the test
for NFS4CLNT_SESSION_SETUP.
Always free slot and goto the rpc prepare state on async errors.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make this clear by calling rpc_restart-call.
Prepare for nfs4_restart_rpc() to free slots.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The bit is no longer used for session setup, only for session reset.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reported-by: Trond Myklebust <trond.myklebust@netapp.com>
Resetting the clientid from the state manager could result in not confirming
the clientid due to create session not being called.
Move the create session call from the NFS4CLNT_SESSION_SETUP state manager
initialize session case into the NFS4CLNT_LEASE_EXPIRED case establish_clid
call.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS4ERR_FILE_OPEN is return by the server when an operation cannot be
performed because the file is currently open and local (to the server)
semantics prohibit the operation while the file is open.
A typical case is a RENAME operation on an MS-Windows platform, which
prevents rename while the file is open.
While it is possible that such a condition is transitory, it is also
very possible that the file will be held open for an extended period
of time thus preventing the operation.
The current behaviour of Linux/NFS is to retry the operation
indefinitely. This is not appropriate - we do not expect a rename to
take an arbitrary amount of time to complete.
Rather, and error should be returned. The most obvious error code
would be EBUSY, which is a legal at least for 'rename' and 'unlink',
and accurately captures the reason for the error.
This patch allows a few retries until about 2 seconds have elapsed,
then returns EBUSY.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The d_instantiate(new_dentry, NULL) is superfluous, the dentry is
already negative. Rehashing this dummy dentry isn't needed either,
d_move() works fine on an unhashed target.
The re-checking for busy after a failed nfs_sillyrename() is bogus
too: new_dentry->d_count < 2 would be a bug here.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move unhashing the target to after the check for existence and being a
non-directory.
If renaming a directory then the VFS already unhashes the target if it
is not busy. If it's busy then acquiring more references during the
rename makes no difference.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Comments are wrong or out of date. In particular d_drop() doesn't
free the inode it just unhashes the dentry. And if target is a
directory then it is not checked for being busy.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
VFS already checks if both source and target are directories.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When the "rsize=" or "wsize=" mount options are not specified,
text-based mounts have slightly different behavior than legacy binary
mounts. Text-based mounts use the smaller of the server's maximum
and the client's maximum, but binary mounts use the smaller of the
server's _preferred_ size and the client's maximum.
This difference is actually pretty subtle. Most servers advertise
the same value as their maximum and their preferred transfer size, so
the end result is the same in most cases.
The reason for this difference is that for text-based mounts, if
r/wsize are not specified, they are set to the largest value supported
by the client. For legacy mounts, the values are set to zero if these
options are not specified.
nfs_server_set_fsinfo() can negotiate the transfer size defaults
correctly in any case. There's no need to specify any particular
value as default in the text-based option parsing logic.
Note that nfs4 doesn't use nfs_server_set_fsinfo(), but the mount.nfs4
command does set rsize and wsize to 0 if the user didn't specify these
options. So, make the same change for text-based NFSv4 mounts.
Thanks to James Pearson <james-p@moving-picture.com> for reporting and
diagnosing the problem.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Recent changes to snprintf() introduced the %pI6c formatter, which can
display an IPv6 address with standard shorthanding. Use this new
formatter when displaying IPv6 server addresses in /proc/mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Solaris uses netids as values for the proto= option, so that when
someone specifies "tcp6" they get traffic over TCP + IPv6. Until
recently, this has never really been an issue for Linux since it didn't
support NFS over IPv6. The netid and the protocol name were generally
always the same (modulo any strange configuration in /etc/netconfig).
The solaris manpage documents their proto= option as:
proto= _netid_ | rdma
This patch is intended to bring Linux closer to how the Solaris proto=
option works, by declaring a static netid mapping in the kernel and
converting the proto= and mountproto= options to follow it and display
the proper values in /proc/mounts.
Much of this functionality will need to be provided by a userspace
mount.nfs patch. Chuck Lever has a patch to change mount.nfs in
the same way. In principle, we could do *all* of this in userspace but
that would mean that the options in /proc/mounts may not match the
options used by userspace.
The alternative to the static mapping here is to add a mechanism to
upcall to userspace for netid's. I'm not opposed to that option, but
it'll probably mean more overhead (and quite a bit more code). Rather
than shoot for that at first, I figured it was probably better to
start simply.
Comments welcome.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs4_state_manager should not be looking at the error values when
deciding whether or not to loop round in order to handle a higher priority
state recovery task. It should rather be looking at the clp->cl_state.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If our lease expires, and the server reboots while we're recovering, we
need to be able to wait until the grace period is over.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_recovery_handle_error() will correctly handle errors such as
NFS4ERR_CB_PATH_DOWN, however because they are still passed back to the
main loop in nfs4_state_manager(), they can cause the latter to exit
prematurely.
Fix this by letting nfs4_recovery_handle_error() change the error value in
cases where there is no action required by the caller.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In practice, we need to ensure that we call nfs4_state_end_reclaim_reboot
in 2 cases:
- If we lose the lease while we were reclaiming state
OR
- After we're done with reboot recovery
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfsv4 state manager could potentially deadlock inside
__nfs_inode_return_delegation() if the server reboots, so that the calls to
nfs_msync_inode() end up waiting on state recovery to complete.
Also ensure that if a server reboot or network partition causes us to have
to stop returning delegations, that NFS4CLNT_DELEGRETURN is set so that
the state manager can resume any outstanding delegation returns after it
has dealt with the state recovery situation.
Finally, ensure that the state manager doesn't wait for the DELEGRETURN
call to complete. It doesn't need to, and that too can cause a deadlock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: [PATCH] nfs: fix acl decoding
Commit 28f566942c "NFS: use dynamically
computed compound_hdr.replen for xdr_inline_pages offset" accidentally
changed the amount of space to allow for the acl reply, resulting in an
IO error on attempts to get an acl.
Reported-by: Paul Rudin <paul@rudin.co.uk>
Cc: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When IMA is active, using dentry_open without updating the
IMA counters will result in free/open imbalance errors when
fput is eventually called.
Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While building 2.6.32-rc8-git2 for Fedora I noticed the following thinko
in commit 201a15428b ("FS-Cache: Handle
pages pending storage that get evicted under OOM conditions"):
fs/9p/cache.c: In function '__v9fs_fscache_release_page':
fs/9p/cache.c:346: error: 'vnode' undeclared (first use in this function)
fs/9p/cache.c:346: error: (Each undeclared identifier is reported only once
fs/9p/cache.c:346: error: for each function it appears in.)
make[2]: *** [fs/9p/cache.o] Error 1
Fix the 9P filesystem to correctly construct the argument to
fscache_maybe_release_page().
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com> [from identical patch]
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> [from identical patch]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Fix sparse warning
[CIFS] Duplicate data on appending to some Samba servers
[CIFS] fix oops in cifs_lookup during net boot
In 2.6.23 kernel, commit a32ea1e1f9
("Fix read/truncate race") fixed a race in the generic code, and as a
side effect, now do_generic_file_read() can ask us to readpage() past
the i_size. This seems to be correctly handled by the block routines
(e.g. block_read_full_page() fills the page with zeroes in case if
somebody is trying to read past the last inode's block).
JFFS2 doesn't handle this; it assumes that it won't be asked to read
pages which don't exist -- and thus that there will be at least _one_
valid 'frag' on the page it's being asked to read. It will fill any
holes with the following memset:
memset(buf, 0, min(end, frag->ofs + frag->size) - offset);
When the 'closest smaller match' returned by jffs2_lookup_node_frag() is
actually on a previous page and ends before 'offset', that results in:
memset(buf, 0, <huge unsigned negative>);
Hopefully, in most cases the corruption is fatal, and quickly causing
random oopses, like this:
root@10.0.0.4:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc01cd980
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
NIP [c01cd980] rb_insert_color+0x38/0x184
LR [c0043978] enqueue_hrtimer+0x88/0xc4
Call Trace:
[c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable)
[c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4
[c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc
[c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8
[c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254
[c6c63c30] [c001352c] ret_from_except+0x0/0x14
--- Exception: 901 at memset+0x38/0x5c
LR = jffs2_read_inode_range+0x144/0x17c
[c6c63cf0] [00000000] (null) (unreliable)
This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with
JFFS2 filesystem that were failing since 2.6.23 (seems like the bug
above also broke the truncation).
Reported-By: Anton Vorontsov <avorontsov@ru.mvista.com>
Tested-By: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
CacheFiles: Don't log lookup/create failing with ENOBUFS
CacheFiles: Catch an overly long wait for an old active object
CacheFiles: Better showing of debugging information in active object problems
CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
CacheFiles: Handle truncate unlocking the page we're reading
CacheFiles: Don't write a full page if there's only a partial page to cache
FS-Cache: Actually requeue an object when requested
FS-Cache: Start processing an object's operations on that object's death
FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
FS-Cache: Add a retirement stat counter
FS-Cache: Handle pages pending storage that get evicted under OOM conditions
FS-Cache: Handle read request vs lookup, creation or other cache failure
FS-Cache: Don't delete pending pages from the page-store tracking tree
FS-Cache: Fix lock misorder in fscache_write_op()
FS-Cache: The object-available state can't rely on the cookie to be available
FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
...
The comment in fuse_open about O_DIRECT:
"VFS checks this, but only _after_ ->open()"
also holds for fuse_create, however, the same kind of check was missing there.
As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a
stub newfile will remain if the fuse server handled the implied FUSE_CREATE
request appropriately.
Other impact: in the above situation ima_file_free() will complain to open/free
imbalance if CONFIG_IMA is set.
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Harshavardhana <harsha@gluster.com>
Cc: stable@kernel.org
SMB writes are sent with a starting offset and length. When the server
supports the newer SMB trans2 posix open (rather than using the SMB
NTCreateX) a file can be opened with SMB_O_APPEND flag, and for that
case Samba server assumes that the offset sent in SMBWriteX is unneeded
since the write should go to the end of the file - which can cause
problems if the write was cached (since the beginning part of a
page could be written twice by the client mm). Jeff suggested that
masking the flag on posix open on the client is easiest for the time
being. Note that recent Samba server also had an unrelated problem with
SMB NTCreateX and append (see samba bugzilla bug number 6898) which
should not affect current Linux clients (unless cifs Unix Extensions
are disabled).
The cifs client did not send the O_APPEND flag on posix open
before 2.6.29 so the fix is unneeded on early kernels.
In the future, for the non-cached case (O_DIRECT, and forcedirectio mounts)
it would be possible and useful to send O_APPEND on posix open (for Windows
case: FILE_APPEND_DATA but not FILE_WRITE_DATA on SMB NTCreateX) but for
cached writes although the vfs sets the offset to end of file it
may fragment a write across pages - so we can't send O_APPEND on
open (could result in sending part of a page twice).
CC: Stable <stable@kernel.org>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fixes bugzilla.kernel.org bug number 14641
Lookup called during network boot (network root filesystem
for diskless workstation) has case where nd is null in
lookup. This patch fixes that in cifs_lookup.
(Shirish noted that 2.6.30 and 2.6.31 stable need the same check)
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Vladimir Stavrinov <vs@inist.ru>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Provide nop fscache_stat_d() macro if CONFIG_FSCACHE_STATS=n lest errors like
the following occur:
fs/fscache/cache.c: In function 'fscache_withdraw_cache':
fs/fscache/cache.c:386: error: implicit declaration of function 'fscache_stat_d'
fs/fscache/cache.c:386: error: 'fscache_n_cop_sync_cache' undeclared (first use in this function)
fs/fscache/cache.c:386: error: (Each undeclared identifier is reported only once
fs/fscache/cache.c:386: error: for each function it appears in.)
fs/fscache/cache.c:392: error: 'fscache_n_cop_dissociate_pages' undeclared (first use in this function)
Signed-off-by: David Howells <dhowells@redhat.com>
GFS2 has been altered to pass THIS_MODULE to slow_work_register_user(), but
hasn't been altered to #include <linux/module.h> to provide it, resulting in
the following error:
fs/gfs2/recovery.c:596: error: 'THIS_MODULE' undeclared here (not in a function)
Add the missing #include.
Signed-off-by: David Howells <dhowells@redhat.com>
As of the patch:
SLOW_WORK: Wait for outstanding work items belonging to a module to clear
Wait for outstanding slow work items belonging to a module to clear
when unregistering that module as a user of the facility. This
prevents the put_ref code of a work item from being taken away before
it returns.
slow_work_register_user() takes a module pointer as an argument. CIFS must now
pass THIS_MODULE as that argument, lest the following error be observed:
fs/cifs/cifsfs.c: In function 'init_cifs':
fs/cifs/cifsfs.c:1040: error: too few arguments to function 'slow_work_register_user'
Signed-off-by: David Howells <dhowells@redhat.com>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Trivial cleanup of jbd compatibility layer removal
ocfs2: Refresh documentation
ocfs2: return f_fsid info in ocfs2_statfs()
ocfs2: duplicate inline data properly during reflink.
ocfs2: Move ocfs2_complete_reflink to the right place.
ocfs2: Return -EINVAL when a device is not ocfs2.