Introduce a new data type, used by both the in-kernel NLM and NSM
implementations, that is used to manage the opaque "priv" argument
for the NSMPROC_MON and NLMPROC_SM_NOTIFY calls.
Construct the "priv" cookie when the nsm_handle is created.
The nsm_init_private() function may look a little strange, but it is
roughly equivalent to how the XDR encoder formed the "priv" argument.
It's going to go away soon.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nsm_release() function should never be called with a NULL handle
point. If it is, that's a bug.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nsm_find() function should never be called with a NULL IP address
pointer. If it is, that's a bug.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Introduce some dprintk() calls in fs/lockd/mon.c that are enabled by
the NLMDBG_MONITOR flag. These report when we find, create, and
release nsm_handles.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nsm_find() function sets up fresh nsm_handle entries. This is
where we will store the "priv" cookie used to lookup nsm_handles during
reboot recovery. The cookie will be constructed when nsm_find()
creates a new nsm_handle.
As much as possible, I would like to keep everything that handles a
"priv" cookie in fs/lockd/mon.c so that all the smarts are in one
source file. That organization should make it pretty simple to see how
all this works.
To me, it makes more sense than the current arrangement to keep
nsm_find() with nsm_monitor() and nsm_unmonitor().
So, start reorganizing by moving nsm_find() into fs/lockd/mon.c. The
nsm_release() function comes along too, since it shares the nsm_lock
global variable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Introduce xdr_stream-based XDR encoder and decoder functions, which are
more careful about preventing RPC buffer overflows.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Move the RPC program and procedure numbers for NSM into the
one source file that needs them: fs/lockd/mon.c.
And, as with NLM, NFS, and rpcbind calls, use NSMPROC_FOO instead of
SM_FOO for NSM procedure numbers.
Finally, make a couple of comments more precise: what is referred to
here as SM_NOTIFY is really the NLM (lockd) NLMPROC_SM_NOTIFY downcall,
not NSMPROC_NOTIFY.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: NSM's XDR data structures are used only in fs/lockd/mon.c,
so move them there.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make sure any error returned by rpc.statd during an SM_UNMON call is
reported rather than ignored completely. There isn't much to do with
such an error, but we should log it in any case.
Similar to a recent change to nsm_monitor().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up.
Make the nlm_host argument "const," and move the public declaration to
lockd.h. Add a documenting comment.
Bruce observed that nsm_unmonitor()'s only caller doesn't care about
its return code, so make nsm_unmonitor() return void.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nsm_handle's reference count is bumped in nlm_lookup_host(). It
should be decremented in nlm_destroy_host() to make it easier to see
the balance of these two operations.
Move the nsm_release() call to fs/lockd/host.c.
The h_nsmhandle pointer is set in nlm_lookup_host(), and never cleared.
The nlm_destroy_host() function is never called for the same nlm_host
twice, so h_nsmhandle won't ever be NULL when nsm_unmonitor() is
called.
All references to the nlm_host are gone before it is freed. We can
skip making h_nsmhandle NULL just before the nlm_host is deallocated.
It's also likely we can remove the h_nsmhandle NULL check in
nlmsvc_is_client() as well, but we can do that later when rearchitect-
ing the nlm_host cache.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up.
Make the nlm_host argument "const," and move the public declaration to
lockd.h with other NSM public function (nsm_release, eg) and global
variable declarations.
Add a documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nsm_monitor() function reports an error and does not set sm_monitored
if the SM_MON upcall reply has a non-zero result code, but nsm_monitor()
does not return an error to its caller in this case.
Since sm_monitored is not set, the upcall is retried when the next NLM
request invokes nsm_monitor(). However, that may not come for a while.
In the meantime, at least one NLM request will potentially proceed
without the peer being monitored properly.
Have nsm_monitor() return an error if the result code is non-zero.
This will cause all NLM requests to fail immediately if the upcall
completed successfully but rpc.statd returned an error.
This may be inconvenient in some cases (for example if rpc.statd
cannot complete a proper DNS reverse lookup of the hostname), but will
make the reboot monitoring service more robust by forcing such issues
to be corrected by an admin.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Remove the BUG_ON() invocation in nsm_monitor(). It's not
likely that nsm_monitor() is ever called with a NULL host pointer, and
the code will die anyway if host is NULL.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nsm_monitor() function already generates a printk(KERN_NOTICE) if
the SM_MON upcall fails, so the similar printk() in the nlmclnt_lock()
function is redundant.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Use the sm_name field for reporting the hostname in nsm_monitor()
and nsm_unmonitor(), just as the other functions in fs/lockd/mon.c do.
The h_name field is just a copy of the sm_name pointer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The "mon_name" argument of the NSMPROC_MON and NSMPROC_UNMON upcalls
is a string that contains the hostname or IP address of the remote peer
to be notified when this host has rebooted. The sm-notify command uses
this identifier to contact the peer when we reboot, so it must be
either a well-qualified DNS hostname or a presentation format IP
address string.
When the "nsm_use_hostnames" sysctl is set to zero, the kernel's NSM
provides a presentation format IP address in the "mon_name" argument.
Otherwise, the "caller_name" argument from NLM requests is used,
which is usually just the DNS hostname of the peer.
To support IPv6 addresses for the mon_name argument, we use the
nsm_handle's address eye-catcher, which already contains an appropriate
presentation format address string. Using the eye-catcher string
obviates the need to use a large buffer on the stack to form the
presentation address string for the upcall.
This patch also addresses a subtle bug.
An NSMPROC_MON request and the subsequent NSMPROC_UNMON request for the
same peer are required to use the same value for the "mon_name"
argument. Otherwise, rpc.statd's NSMPROC_UNMON processing cannot
locate the database entry for that peer and remove it.
If the setting of nsm_use_hostnames is changed between the time the
kernel sends an NSMPROC_MON request and the time it sends the
NSMPROC_UNMON request for the same peer, the "mon_name" argument for
these two requests may not be the same. This is because the value of
"mon_name" is currently chosen at the moment the call is made based on
the setting of nsm_use_hostnames
To ensure both requests pass identical contents in the "mon_name"
argument, we now select which string to use for the argument in the
nsm_monitor() function. A pointer to this string is saved in the
nsm_handle so it can be used for a subsequent NSMPROC_UNMON upcall.
NB: There are other potential problems, such as how nlm_host_rebooted()
might behave if nsm_use_hostnames were changed while hosts are still
being monitored. This patch does not attempt to address those
problems.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: I'm about to add another "char *" field to the nsm_handle
structure. The sm_name field uses an older style of declaring a
"char *" field. If I match that style for the new field, checkpatch.pl
will complain.
So, fix the sm_name field to use the new style.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: make the printk(KERN_DEBUG) in nsm_mon_unmon() a dprintk,
and add another dprintk to note if creating an RPC client for the
upcall failed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Use a C99 structure initializer instead of open-coding the
initialization of nsm_args.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: introduce a helper function to generate IPv4 addresses using
the same style as the IPv6 helper function we just added.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Scope ID support is needed since the kernel's NSM implementation is
about to use these displayed addresses as a mon_name in some cases.
When nsm_use_hostnames is zero, without scope ID support NSM will fail
to handle peers that contact us via a link-local address. Link-local
addresses do not work without an interface ID, which is stored in the
sockaddr's sin6_scope_id field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
AF_UNSPEC support is no longer needed in nlm_display_address() now
that a presentation address is no longer generated for the h_srcaddr
field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The h_name field in struct nlm_host is a just copy of
h_nsmhandle->sm_name. Likewise, the contents of the h_addrbuf field
should be identical to the sm_addrbuf field.
The h_srcaddrbuf field is used only in one place for debugging. We can
live without this until we get %pI formatting for printk().
Currently these buffers are 48 bytes, but we need to support scope IDs
in IPv6 presentation addresses, which means making the buffers even
larger. Instead, let's find ways to eliminate them to save space.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: I'm about to add another "char *" field to the nlm_host
structure. The h_name field, for example, uses an older style of
declaring a "char *" field. If I match that style for the new field,
checkpatch.pl will complain.
So, fix pointer fields to use the new style.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The default method for calculating the number of connections allowed
per RPC service arbitrarily limits single-threaded services to 80
connections. This is too low for services like lockd and artificially
limits the number of TCP clients that it can support.
Have lockd set a default sv_maxconn value to 1024 (which is the typical
default value for RLIMIT_NOFILE. Also add a module parameter to allow an
admin to set this to an arbitrary value.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:
(serv->sv_nrthreads + 3) * 20
Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.
Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.
This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
cksum.data is not freed up in one error case. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Minor cleanup/rewrite of find_stateid. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Supply dm_add_exception as a callback to the read_metadata function.
Add a status function ready for a later patch and name the functions
consistently.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Move the existing snapshot exception store implementations out into
separate files. Later patches will place these behind a new
interface in preparation for alternative implementations.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Rename struct exception_store to dm_exception_store.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Pull structures that bridge the gap between snapshot and
exception store out of dm-snap.h and put them in a new
.h file - dm-exception-store.h. This file will define the
API for new exception stores.
Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
should be using it.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The same workqueue is used both for sending uevents and processing queued I/O.
Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting
for the queued I/O to be processed. Use scheduled_work() for the asynchronous
uevents instead.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Implement simple read-only sysfs entry for device-mapper block device.
This patch adds a simple sysfs directory named "dm" under block device
properties and implements
- name attribute (string containing mapped device name)
- uuid attribute (string containing UUID, or empty string if not set)
The kobject is embedded in mapped_device struct, so no additional
memory allocation is needed for initializing sysfs entry.
During the processing of sysfs attribute we need to lock mapped device
which is done by a new function dm_get_from_kobj, which returns the md
associated with kobject and increases the usage count.
Each 'show attribute' function is responsible for its own locking.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Rework table reference counting.
The existing code uses a reference counter. When the last reference is
dropped and the counter reaches zero, the table destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.
This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
same process.
* stale reference from some other kernel code keeps the table
constructed, which keeps some devices open, even after successful
return from "dmsetup remove". This can confuse lvm and prevent closing
of underlying devices or reusing device minor numbers.
The patch changes reference counting so that the table destructor can be
called only at predetermined places.
The table has always exactly one reference from either mapped_device->map
or hash_cell->new_map. After this patch, this reference is not counted
in table->holders. A pair of dm_create_table/dm_destroy_table functions
is used for table creation/destruction.
Temporary references from the other code increase table->holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.
When the table is about to be destroyed, we wait for table->holders to
reach 0. Then, we call the table destructor. We use active waiting with
msleep(1), because the situation happens rarely (to one user in 5 years)
and removing the device isn't performance-critical task: the user doesn't
care if it takes one tick more or not.
This way, the destructor is called only at specific points
(dm_table_destroy function) and the above problems associated with lazy
destruction can't happen.
Finally remove the temporary protection added to dm_any_congested().
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Implement barrier support for single device DM devices
This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.
NB. Any DM device might cease to support barriers if it gets
reconfigured so code must continue to allow for a possible
-EOPNOTSUPP on every barrier bio submitted. - agk
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds the following target interfaces for request-based dm.
map_rq : for mapping a request
rq_end_io : for finishing a request
busy : for avoiding performance regression from bio-based dm.
Target can tell dm core not to map requests now, and
that may help requests in the block layer queue to be
bigger by I/O merging.
In bio-based dm, this behavior is done by device
drivers managing the block layer queue.
But in request-based dm, dm core has to do that
since dm core manages the block layer queue.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Allow NULL buffer in dm_copy_name_and_uuid if you only want to return one of
the fields.
(Required by a following patch that adds these fields to sysfs.)
Signed-off-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Check that the log bitmap will fit within the log device.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Move log size validation from mirror target to log constructor.
Removed PAGE_SIZE restriction we no longer think necessary.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
rw_header function updates three members of io_req data every time
when I/O is processed. bi_rw and notify.fn are never modified once
they get initialized, and so they can be set in advance.
header_to_disk() can also be pulled out of write_header() since only one
caller needs it and write_header() can be replaced by rw_header()
directly.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Change dm_unregister_target to return void and use BUG() for error
reporting.
dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.
This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.
This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Always increase the error count when I/O on a leg of a mirror fails.
The error count is used to decide whether to select an alternative
mirror leg. If the target doesn't use the "handle_errors" feature, the
error count is not updated and the bio can get requeued forever by the
read callback.
Fix it by increasing error_count before the handle_errors feature
checking.
Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
In create_log_context function, dm_io_client_destroy function needs
to be called, when memory allocation of disk_header, sync_bits and
recovering_bits failed, but dm_io_client_destroy is not called.
Cc: stable@kernel.org
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Acked-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Change yield() to msleep(1). If the thread had realtime priority,
yield() doesn't really yield, so the yielding process would loop
indefinitely and cause machine lockup.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>