Make it clear in the config message that MD_MULTIPATH is not under
active development.
Cc: Oren Held <orenhe@il.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.
With this patch I propose adding a per md device max number of
corrected read attempts. Each rdev will maintain a count of
read correction attempts in the rdev->read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.
In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev->read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.
For testing I used debugfs->fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.
Signed-off-by: Robert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.
Signed-off-by: Robert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>
There is a sysfs file which allows bits in the write-intent
bitmap to be explicit set - indicating that the block is thought
to be 'dirty'.
When this happens we should really set recovery_cp backwards
to include the block to reflect this dirtiness.
In particular, a 'resync' process will refuse to start if
recovery_cp is beyond the end of the array, so this is needed
to allow a resync to be triggered.
Signed-off-by: NeilBrown <neilb@suse.de>
In this case, the metadata needs to not be in the same
sector as the bitmap.
md will not read/write any bitmap metadata. Config must be
done via sysfs and when a recovery makes the array non-degraded
again, writing 'true' to 'bitmap/can_clear' will allow bits in
the bitmap to be cleared again.
Signed-off-by: NeilBrown <neilb@suse.de>
Setting daemon_lastrun really has nothing to do with reading
the bitmap superblock, it just happens to be needed at the same time.
bitmap_read_sb is about to become options, so move that code out
to after the call to bitmap_read_sb.
Signed-off-by: NeilBrown <neilb@suse.de>
A new attribute directory 'bitmap' in 'md' is created which
contains files for configuring the bitmap.
'location' identifies where the bitmap is, either 'none',
or 'file' or 'sector offset from metadata'.
Writing 'location' can create or remove a bitmap.
Adding a 'file' bitmap this way is not yet supported.
'chunksize' and 'time_base' must be set before 'location'
can be set.
'chunksize' can be set before creating a bitmap, but is
currently always over-ridden by the bitmap superblock.
'time_base' and 'backlog' can be updated at any time.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
safe_delay_store can parse fixed point numbers (for fractions
of a second). We will want to do that for another sysfs
file soon, so factor out the code.
Signed-off-by: NeilBrown <neilb@suse.de>
For md arrays were metadata is managed externally, the kernel does not
know about a superblock so the superblock offset is 0.
If we want to have a write-intent-bitmap near the end of the
devices of such an array, we should support sector_t sized offset.
We need offset be possibly negative for when the bitmap is before
the metadata, so use loff_t instead.
Also add sanity check that bitmap does not overlap with data.
Signed-off-by: NeilBrown <neilb@suse.de>
As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.
Signed-off-by: NeilBrown <neilb@suse.de>
... and into bitmap_info. These are all configuration parameters
that need to be set before the bitmap is created.
Signed-off-by: NeilBrown <neilb@suse.de>
In preparation for making bitmap fields configurable via sysfs,
start tidying up by making a single structure to contain the
configuration fields.
Signed-off-by: NeilBrown <neilb@suse.de>
This will allow us to stop writeout to portions of the array
while they are resynced by someone else - e.g. another node in
a cluster.
Signed-off-by: NeilBrown <neilb@suse.de>
The post-barrier-flush is sent by md as soon as make_request on the
barrier write completes. For raid5, the data might not be in the
per-device queues yet. So for barrier requests, wait for any
pre-reading to be done so that the request will be in the per-device
queues.
We use the 'preread_active' count to check that nothing is still in
the preread phase, and delay the decrement of this count until after
write requests have been submitted to the underlying devices.
Signed-off-by: NeilBrown <neilb@suse.de>
Previously barriers were only supported on RAID1. This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.
When a barrier arrives, we send a zero-length barrier to every active
device. When that completes - and if the original request was not
empty - we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.
The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.
The reason for clearing the barrier flag is that a barrier request is
allowed to fail. If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail. That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.
RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet. So we flush the stripe cache before
proceeding with the barrier.
Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted. Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.
That will be done in later patches.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
If a resync/recovery/check/repair is interrupted for some reason, it
can be useful to know exactly where it got up to.
So in that case, do not clear curr_resync_completed.
Initialise it when starting a resync/recovery/... instead.
Signed-off-by: NeilBrown <neilb@suse.de>
When a 'check' or 'repair' finished we should clear resync_min
so that a future check/repair will cover the whole array (by default).
However if it is interrupted, we should update resync_min to
where we got up to, so that when the check/repair continues it
just does the remainder of the array.
Signed-off-by: NeilBrown <neilb@suse.de>
A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap. If it is, it can dereference the
bitmap after it has been freed.
So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.
This is suitable for any current -stable kernel.
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
* 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6:
IXP4xx: GTWX5715 platform only has two PCI IRQ lines, not four.
IXP4xx: Introduce IXP4XX_GPIO_IRQ(n) macro and convert IXP4xx platform files.
IXP4xx: move Gemtek GTWX5715 platform macros to the platform code.
IXP4xx: Remove unused Motorola PrPMC1100 platform macros.
IXP4xx: move FSG platform macros to the platform code.
IXP4xx: move DSM G600 platform macros to the platform code.
IXP4xx: move NAS100D platform macros to the platform code.
IXP4xx: move NSLU2 platform macros to the platform code.
IXP4xx: move Coyote platform macros to the platform code.
IXP4xx: move AVILA platform macros to the platform code.
IXP4xx: move IXDP425 platform macros to the platform code.
IXP4xx: Extend PCI MMIO indirect address space to 1 GB.
IXP4xx: Fix compilation failure with CONFIG_IXP4XX_INDIRECT_PCI.
IXP4xx: Drop "__ixp4xx_" prefix from in/out/ioread/iowrite functions for clarity.
IXP4xx: Rename indirect MMIO primitives from __ixp4xx_* to __indirect_*.
IXP4xx: Ensure index is positive in irq_to_gpio() and npe_request().
ARM: fix insl() and outsl() endianness on IXP4xx architecture.
IXP4xx: Fix normally-disabled debugging text in drivers/net/arm/ixp4xx_eth.c.
IXP4xx: change the timer base frequency to 66.666000 MHz.
The fasync path takes the BKL (it probably doesn't need to in fact)
while holding the file_list spinlock. You can't do that with the kernel
lock: it causes lock inversions and deadlocks.
Leave the BKL over that bit for the moment.
Identified by AKPM.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-and-Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits)
powerpc: Fix usage of 64-bit instruction in 32-bit altivec code
MAINTAINERS: Add PowerPC patterns
powerpc/pseries: Track previous CPPR values to correctly EOI interrupts
powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP
powerpc: Make "intspec" pointers in irq_host->xlate() const
powerpc/8xx: DTLB Miss cleanup
powerpc/8xx: Remove DIRTY pte handling in DTLB Error.
powerpc/8xx: Start using dcbX instructions in various copy routines
powerpc/8xx: Restore _PAGE_WRITETHRU
powerpc/8xx: Add missing Guarded setting in DTLB Error.
powerpc/8xx: Fixup DAR from buggy dcbX instructions.
powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions.
powerpc/8xx: Update TLB asm so it behaves as linux mm expects.
powerpc/8xx: Invalidate non present TLBs
powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
pseries/pseries: Add code to online/offline CPUs of a DLPAR node
powerpc: stop_this_cpu: remove the cpu from the online map.
powerpc/pseries: Add kernel based CPU DLPAR handling
sysfs/cpu: Add probe/release files
powerpc/pseries: Kernel DLPAR Infrastructure
...
* 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (75 commits)
omap3: Fix OMAP35XX_REV macros
omap: serial: fix non-empty uart fifo read abort
omap3: Zoom2/3: Update hsmmc board config params
omap3 : Enable TWL4030 Keypad for Zoom2 and Zoom3 boards
omap3: id code detection 3525 vs 3515
omap3: rx51: Use wl1251 in SPI mode 3
omap3: zoom2/3: make MMC slot work again
omap1: htcherald: Update defconfig to include mux support
omap1: LCD_DMA: Use some define rather than a hexadecimal
omap: header: remove unused data-type
omap: arch/arm/plat-omap/devices.c - sort alphabetically
omap: Correcting GPMC_CONFIG1_DEVICETYPE_NAND
OMAP3: serial - allow platforms specify which UARTs to initialize
omap3: cm-t35: add mux initialization
OMAP4: Sync up omap4430 defconfig
OMAP4: Remove the secondary wait loop
OMAP4: AuxCoreBoot registers only accessible in secure mode
OMAP4: Fix SRAM base and size
OMAP4: Fix cpu detection
omap3: pandora: board file updates for .33
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
be2net: fix error in rx completion processing.
igbvf: avoid reset storms due to mailbox issues
igb: fix handling of mailbox collisions between PF/VF
usb: remove rare pm primitive for conversion to new API
There are certain skews of the NIC which have multiple bits set in
adapter->cap. Use & instead of == to process rx completions.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This change makes it so that reset/interrupt storms can be avoided when
there are mailbox issues. The new behavior is to only allow the device to
trigger mailbox related resets only once every 10 seconds.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the handling of collisions between the use of the
PF/VF sides of the mailbox.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a rare use of the USB power management API which
won't be supported after the conversion to the new generic runtime power
management framework. Functionality is not altered.
Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (75 commits)
net: Handle NETREG_UNINITIALIZED devices correctly
can: add the driver for Analog Devices Blackfin on-chip CAN controllers
xfrm: Fix truncation length of authentication algorithms installed via PF_KEY
net: use compat helper functions in compat_sys_recvmmsg
net: fix compat_sys_recvmmsg parameter type
cxgb3: Fixing EEH handlers
cnic: Zero out status block and Event Queue indices.
cnic: Send delete command when shutting down iSCSI ring.
net: smc91x: Fix up type mismatch in smc_drv_resume().
smc91x: fix unused flags warnings on UP systems
MAINTAINERS: Transfering maintainership of cdc-ether
net: Add missing TST_CFG_WRITE bits around sky2_pci_write
net: Fix Yukon-2 Optima TCP offload setup
net: niu uses crc32, so select CRC32
wireless: update old static regulatory domain rules
mac80211: Revert 'Use correct sign for mesh active path refresh'
mac80211: Fixed bug in mesh portal paths
net/mac80211: Correct size given to memset
b43: Remove reset after fatal DMA error
rtl8187: add radio led and fix warnings on suspend
...
The patch corrects the issue introduced with one of my earlier patches:
OMAP: DMA: Fix omapfb/lcdc on OMAP1510 broken when PM set[1]
as pointed out by OMAP subsystem maintainer.
Applies on top of my prevoius patch:
OMAP: DMA: move LCD DMA related code from plat-omap to mach-omap1[2]
Tested on Amstrad Delta
Compile tested with omap_generic_2420_defconfig
[1] http://patchwork.kernel.org/patch/57922/
[2] http://patchwork.kernel.org/patch/61952/
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Now that all OMAP boards are using the board resources, we don't need
to keep the arch/board specific crap in the driver header.
Cc: linux-net@vger.kernel.org
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
[CPUFREQ] make internal cpufreq_add_dev_* static
[CPUFREQ] use an enum for speedstep processor identification
[CPUFREQ] Document units for transition latency
[CPUFREQ] Use global sysfs cpufreq structure for conservative governor tunings
[CPUFREQ] Documentation: ABI: /sys/devices/system/cpu/cpu#/cpufreq/
[CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used
[CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c
The debug batman option needs to depend on the correct
config option.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ "No means no!" - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits)
tty: split the lock up a bit further
tty: Move the leader test in disassociate
tty: Push the bkl down a bit in the hangup code
tty: Push the lock down further into the ldisc code
tty: push the BKL down into the handlers a bit
tty: moxa: split open lock
tty: moxa: Kill the use of lock_kernel
tty: moxa: Fix modem op locking
tty: moxa: Kill off the throttle method
tty: moxa: Locking clean up
tty: moxa: rework the locking a bit
tty: moxa: Use more tty_port ops
tty: isicom: fix deadlock on shutdown
tty: mxser: Use the new locking rules to fix setserial properly
tty: mxser: use the tty_port_open method
tty: isicom: sort out the board init logic
tty: isicom: switch to the new tty_port_open helper
tty: tty_port: Add a kref object to the tty port
tty: istallion: tty port open/close methods
tty: stallion: Convert to the tty_port_open/close methods
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
ext3: Fix data / filesystem corruption when write fails to copy data
ext4: Support for 64-bit quota format
ext3: Support for vfsv1 quota format
quota: Implement quota format with 64-bit space and inode limits
quota: Move definition of QFMT_OCFS2 to linux/quota.h
ext2: fix comment in ext2_find_entry about return values
ext3: Unify log messages in ext3
ext2: clear uptodate flag on super block I/O error
ext2: Unify log messages in ext2
ext3: make "norecovery" an alias for "noload"
ext3: Don't update the superblock in ext3_statfs()
ext3: journal all modifications in ext3_xattr_set_handle
ext2: Explicitly assign values to on-disk enum of filetypes
quota: Fix WARN_ON in lookup_one_len
const: struct quota_format_ops
ubifs: remove manual O_SYNC handling
afs: remove manual O_SYNC handling
kill wait_on_page_writeback_range
vfs: Implement proper O_SYNC semantics
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (235 commits)
Staging: IIO: add selection of IIO_SW_RING to LIS3L02DQ as needed
Staging: IIO: Add tsl2560-2 support to tsl2563 driver.
Staging: IIO: Remove tsl2561 driver. Support merged with tsl2563.
Staging: wlags49_h2: fix up signal levels
+ drivers-staging-wlags49_h2-remove-cvs-metadata.patch added to -mm tree
Staging: samsung-laptop: add TODO file
Staging: samsung-laptop: remove old kernel code
Staging: add Samsung Laptop driver
staging: batman-adv meshing protocol
Staging: rtl8192u: depends on USB
Staging: rtl8192u: remove dead code
Staging: rtl8192u: remove bad whitespaces
Staging: rtl8192u: make it compile
Staging: Added Realtek rtl8192u driver to staging
Staging: dream: add gpio and pmem support
Staging: dream: add TODO file
Staging: android: delete android drivers
Staging: et131x: clean up the avail fields in the rx registers
Staging: et131x: Clean up number fields
Staging: et131x: kill RX_DMA_MAX_PKT_TIME
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (27 commits)
Driver core: fix race in dev_driver_string
Driver Core: Early platform driver buffer
sysfs: sysfs_setattr remove unnecessary permission check.
sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir
sysfs: Propagate renames to the vfs on demand
sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish
sysfs: In sysfs_chmod_file lazily propagate the mode change.
sysfs: Implement sysfs_getattr & sysfs_permission
sysfs: Nicely indent sysfs_symlink_inode_operations
sysfs: Update s_iattr on link and unlink.
sysfs: Fix locking and factor out sysfs_sd_setattr
sysfs: Simplify iattr time assignments
sysfs: Simplify sysfs_chmod_file semantics
sysfs: Use dentry_ops instead of directly playing with the dcache
sysfs: Rename sysfs_d_iput to sysfs_dentry_iput
sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex
debugfs: fix create mutex racy fops and private data
Driver core: Don't remove kobjects in device_shutdown.
firmware_class: make request_firmware_nowait more useful
Driver-Core: devtmpfs - set root directory mode to 0755
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: ohci: handle receive packets with a data length of zero
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Always process the whole breakpoint list on activate or deactivate
kgdb: continue and warn on signal passing from gdb
kgdb,x86: do not set kgdb_single_step on x86
kgdb: allow for cpu switch when single stepping
kgdb,i386: Fix corner case access to ss with NMI watch dog exception
kgdb: Replace strstr() by strchr() for single-character needles
kgdbts: Read buffer overflow
kgdb: Read buffer overflow
kgdb,x86: remove redundant test
The tty count sanity check may need the BKL, that isn't clear. However it
is clear that the count use of the lock is internal and independant of the
bigger use of the lock.
Furthermore the file list locking is also separately locked already
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are two call points, both want to check that tty->signal->leader is
set. Move the test into disassociate_ctty() as that will make locking
changes easier in a bit
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We know that the redirect field is handled via its own locking in all
places
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>