Commit graph

156560 commits

Author SHA1 Message Date
Wu Fengguang
feb273404f ALSA: hda: remember last command for each codec
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-03 08:27:39 +02:00
Wu Fengguang
c32649feb4 ALSA: hda: read CORBWP inside reg_lock
This converts the last CORBWP access outside of reg_lock.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-03 08:26:55 +02:00
Wu Fengguang
cdb1fbf231 ALSA: hda: take reg_lock in azx_init_cmd_io/azx_free_cmd_io
Just for safety.  azx_init_cmd_io() and azx_free_cmd_io() may be
called when switching to single command mode.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-03 08:26:42 +02:00
Wu Fengguang
a678cdee25 ALSA: hda: take cmd_mutex in probe_codec()
Now that each codec will have its own module, it is possible
for the user to load one codec while another one is running.

So cmd_mutex would be a safe addition to probe_codec().

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-03 08:26:23 +02:00
Wu Fengguang
deadff1665 ALSA: hda: track CIRB/CORB command/response states for each codec
Recently we hit a bug in our dev board, whose HDMI codec#3 may emit
redundant/spurious responses, which were then taken as responses to
command for another onboard Realtek codec#2, and mess up both codecs.

Extend the azx_rb.cmds and azx_rb.res to array and track each codec's
commands/responses separately. This helps keep good codec safe from
broken ones.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-03 08:26:13 +02:00
Takashi Iwai
ce577e8cf5 ALSA: hda - Fix quirk for Toshiba Satellite A135-S4527
Use model=lenovo instead of model=dallas for Toshiba Satellite A135-S4527
with ALC861-VD codec.

Reference: Novell bnc#526325
	https://bugzilla.novell.com/show_bug.cgi?id=526325

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-03 08:23:52 +02:00
Linus Torvalds
a33a052f19 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: Use revalidate_disk to effect changes in size of device.
  md: allow raid5_quiesce to work properly when reshape is happening.
  md/raid5: set reshape_position correctly when reshape starts.
  md: Handle growth of v1.x metadata correctly.
  md: avoid array overflow with bad v1.x metadata
  md: when a level change reduces the number of devices, remove the excess.
  md: Push down data integrity code to personalities.
  md/raid6: release spare page at ->stop()
2009-08-02 21:31:40 -07:00
Yevgeny Petrilin
eb4ad82641 mlx4_en: Fix double pci unmapping.
In cases of fragmented skb, with the data pointers being wrapped around
the TX buffer, the completion handling code would not forward the data
pointer and the firs fragment was unmapped several times, while others
were not unmapped at all.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 20:22:18 -07:00
NeilBrown
449aad3e25 md: Use revalidate_disk to effect changes in size of device.
As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode.  So use that instead of mucking about with locks and
i_size_write.

Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:58 +10:00
NeilBrown
64bd660b51 md: allow raid5_quiesce to work properly when reshape is happening.
The ->quiesce method is not supposed to stop resync/recovery/reshape,
just normal IO.
But in raid5 we don't have a way to know which stripes are being
used for normal IO and which for resync etc, so we need to wait for
all stripes to be idle to be sure that all writes have completed.

However reshape keeps at least some stripe busy for an extended period
of time, so a call to raid5_quiesce can block for several seconds
needlessly.
So arrange for reshape etc to pause briefly while raid5_quiesce is
trying to quiesce the array so that the active_stripes count can
drop to zero.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:58 +10:00
NeilBrown
e516402c0d md/raid5: set reshape_position correctly when reshape starts.
As the internal reshape_progress counter is the main driver
for reshape, the fact that reshape_position sometimes starts with the
wrong value has minimal effect.  It is visible in sysfs and that
is all.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:57 +10:00
NeilBrown
70471dafe3 md: Handle growth of v1.x metadata correctly.
The v1.x metadata does not have a fixed size and can grow
when devices are added.
If it grows enough to require an extra sector of storage,
we need to update the 'sb_size' to match.

Without this, md can write out an incomplete superblock with a
bad checksum, which will be rejected when trying to re-assemble
the array.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:57 +10:00
NeilBrown
3673f305fa md: avoid array overflow with bad v1.x metadata
We trust the 'desc_nr' field in v1.x metadata enough to use it
as an index in an array.  This isn't really safe.
So range-check the value first.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:56 +10:00
NeilBrown
3a981b03f3 md: when a level change reduces the number of devices, remove the excess.
When an array is changed from RAID6 to RAID5, fewer drives are
needed.  So any device that is made superfluous by the level
conversion must be marked as not-active.
For the RAID6->RAID5 conversion, this will be a drive which only
has 'Q' blocks on it.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:55 +10:00
Andre Noll
ac5e7113e7 md: Push down data integrity code to personalities.
This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity.  The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:47 +10:00
Linus Torvalds
4905f92ed7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] Fix COH 901 327 watchdog enablement
2009-08-02 14:15:46 -07:00
Linus Torvalds
0ce166b7b4 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  eeepc-laptop: fix hot-unplug on resume
  ACPI: Ingore the memory block with zero block size in course of memory hotplug
  ACPI: Don't treat generic error as ACPI error code in acpi memory hotplug driver
  ACPI: bind workqueues to CPU 0 to avoid SMI corruption
  ACPI: root-only read protection on /sys/firmware/acpi/tables/*
  thinkpad-acpi: fix incorrect use of TPACPI_BRGHT_MODE_ECNVRAM
  thinkpad-acpi: restrict procfs count value to sane upper limit
  thinkpad-acpi: remove dock and bay subdrivers
  thinkpad-acpi: disable broken bay and dock subdrivers
  hp-wmi: check that an input device exists in resume handler
  Revert "ACPICA: Remove obsolete acpi_os_validate_address interface"
2009-08-02 14:15:27 -07:00
Greg Kroah-Hartman
57d7f28227 TTY: Maintainer change
Clearly, I am a glutton for punishment.  I'll see if I can see Alan's
changes through to the end, otherwise I'll be fending off a lot of bug
reports for usb-serial devices.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-02 14:15:08 -07:00
Linus Torvalds
79896cf42f Make pci_claim_resource() use request_resource() rather than insert_resource()
This function has traditionally used "insert_resource()", because before
commit cebd78a8c5 ("Fix pci_claim_resource") it used to just insert the
resource into whatever root resource tree that was indicated by
"pcibios_select_root()".

So there Matthew fixed it to actually look up the proper parent
resource, which means that now it's actively wrong to then traverse the
resource tree any more: we already know exactly where the new resource
should go.

And when we then did commit a76117dfd6 ("x86: Use pci_claim_resource"),
which changed the x86 PCI code from the open-coded

	pr = pci_find_parent_resource(dev, r);
	if (!pr || request_resource(pr, r) < 0) {

to using

	if (pci_claim_resource(dev, idx) < 0) {

that "insert_resource()" now suddenly became a problem, and causes a
regression covered by

	http://bugzilla.kernel.org/show_bug.cgi?id=13891

which this fixes.

Reported-and-tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Linux PCI <linux-pci@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-02 14:10:18 -07:00
Andreas Eversberg
b564afcfb8 mISDN: Fix handling of receive buffer size in L1oIP
The size of receive buffer pointer was used to get size of
receive buffer instead of recvbuf_size itself, so only 4/8
bytes could be transfered.

This is a regression to 2.6.30 introduced by commit 8c90e11e35
mISDN: Use kernel_{send,recv}msg instead of open coding

Signed-off-by: Andreas Eversberg <andreas@eversberg.eu>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:59:32 -07:00
Linus Walleij
5973bee46f [WATCHDOG] Fix COH 901 327 watchdog enablement
Since the COH 901 327 found in U300 is clocked at 32 kHz we need
to wait for the interrupt clearing flag to propagate through
hardware in order not to accidentally fire off any interrupts
when we enable them.

Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2009-08-02 19:56:30 +00:00
Don Fry
63097b3ad8 pcnet32: VLB support fixes
VLB support has been broken since at least 2004-2005 period as some
changes introduced back then assumed that ->pci_dev is always valid,
lets try to fix it:

- remove duplicated SET_NETDEV_DEV() call

- call SET_NETDEV_DEV() only for PCI devices

- check for ->pci_dev validity in pcnet32_open()

[ Alternatively we may consider removing VLB support but there would not
  be much gain in it since an extra driver code needed for VLB support is
  minimal and quite simple. ]

This takes care of the following entry from Dan's list:

drivers/net/pcnet32.c +1889 pcnet32_probe1(298) warning: variable derefenced before check 'pdev'

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by:  Don Fry <pcnet32@verizon.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:23:06 -07:00
Don Fry
df4e7f72f5 pcnet32: remove superfluous NULL pointer check in pcnet32_probe1()
Move the debug printk() into the proper place and remove superfluous
NULL pointer check in pcnet32_probe1().

This takes care of the following entry from Dan's list:

drivers/net/pcnet32.c +1889 pcnet32_probe1(298) warning: variable derefenced before check 'pdev'

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by:  Don Fry <pcnet32@verizon.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:23:05 -07:00
Jiri Pirko
a6ac65db23 net: restore the original spinlock to protect unicast list
There is a path when an assetion in dev_unicast_sync() appears.

igmp6_group_added -> dev_mc_add -> __dev_set_rx_mode ->
-> vlan_dev_set_rx_mode -> dev_unicast_sync

Therefore we cannot protect this list with rtnl. This patch restores the
original protecting this list with spinlock.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:46 -07:00
Dhananjay Phadke
50c643e765 netxen: fix coherent dma mask setting
Change default dma mask for NX3031 to 39 bit with ability
to update it to 64-bit (if firmware indicates support). Old
code was restricting it under 4GB (32-bit), sometimes causing
failure to allocate descriptor rings on heavily populated
system. NX2031 based NICs will still get 32-bit coherent mask.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:44 -07:00
roel kluin
9bfdac94c7 mISDN: Read buffer overflow
Check whether index is within bounds before testing the element.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:42 -07:00
roel kluin
54706d9905 s6gmac: Read buffer overflow
Check whether index is within bounds before testing the element.
In the last iteration i is PHY_MAX_ADDR. the condition
`!(p = pd->mii.bus->phy_map[PHY_MAX_ADDR])' is undefined and may
evaluate to false, which leads to a dereference of this invalid
phy_map in the phy_connect() below.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:40 -07:00
roel kluin
1b994b5a1b tulip: Read buffer overflow
Check whether index is within bounds before testing the element.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:38 -07:00
Eric Dumazet
144586301f net: net_assign_generic() fix
memcpy() should take into account size of pointers,
not only number of pointers to copy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:36 -07:00
Eric Dumazet
446e72f30e pppol2tp: calls unregister_pernet_gen_device() at unload time
Failure to call unregister_pernet_gen_device() can exhaust memory
if module is loaded/unloaded many times.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:34 -07:00
Ben McKeegan
a53a8b5682 ppp: fix lost fragments in ppp_mp_explode() (resubmit)
This patch fixes the corner cases where the sum of MTU of the free
channels (adjusted for fragmentation overheads) is less than the MTU
of PPP link.  There are at least 3 situations where this case might
arise:

- some of the channels are busy

- the multilink session is running in a degraded state (i.e. with less
than its full complement of active channels)

- by design, where multilink protocol is being used to artificially
increase the effective link MTU of a single link.

Without this patch, at most 1 fragment is ever sent per free channel
for a given PPP frame and any remaining part of the PPP frame that
does not fit into those fragments is silently discarded.

This patch restores the original behaviour which was broken by commit
9c705260fe 'ppp:ppp_mp_explode()
redesign'.  Once all 'free' channels have been given a fragment, an
additional fragment is queued to each available channel in turn, as many
times as necessary, until the entire PPP frame has been consumed.

Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-02 12:20:31 -07:00
Len Brown
3be4ee5199 Merge branch 'misc-2.6.31' into release 2009-08-02 12:55:51 -04:00
Len Brown
95452a6ce1 Merge branch 'bugzilla-13825' into release 2009-08-02 12:36:01 -04:00
Alan Jenkins
7334546a52 eeepc-laptop: fix hot-unplug on resume
OOPS on resume when the wireless adaptor is disabled during suspend was
introduced by "eeepc-laptop: read rfkill soft-blocked state on resume".

Unable to handle kernel NULL pointer dereference

Process s2disk
Tainted: G W
IP: klist_put

Call trace:
? klist_del
? device_del
? device_unregister
? pci_stop_dev
? pci_stop_bus
? pci_remove_device
? eeepc_rfkill_hotplug [eeepc_laptop]
? eeepc_hotk_resume [eeepc_laptop]
? acpi_device_resume
? device_resume
? hibernation_snapshot

It appears the PCI device is removed twice.  The eeepc_rfkill_hotplug()
call from the resume handler is racing against the call from the ACPI
notifier callback.  The ACPI notification is triggered by the resume
handler when it refreshes the value of CM_ASL_WLAN.

The fix is to serialize hotplug calls using a workqueue.

http://bugzilla.kernel.org/show_bug.cgi?id=13825

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Corentin Chary <corentin.chary@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-02 12:35:53 -04:00
Len Brown
a571a79a7e Merge branch 'memhotplug-crash' into release 2009-08-02 12:27:26 -04:00
Zhao Yakui
5d2619fca7 ACPI: Ingore the memory block with zero block size in course of memory hotplug
If the memory block size is zero, ignore it and don't do the memory hotplug
flowchart. Otherwise it will complain the following warning message:
  >System RAM resource 0 - ffffffffffffffff cannot be added

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-02 12:25:12 -04:00
Zhao Yakui
aa7b2b2e97 ACPI: Don't treat generic error as ACPI error code in acpi memory hotplug driver
Don't treat the generic error as ACPI error code. Otherwise when the generic
code is returned, it will complain the following warning messag:
   >ACPI Exception (acpi_memhotplug-0171): UNKNOWN_STATUS_CODE,
		Cannot get acpi bus device [20080609]
   >ACPI: Cannot find driver data
   > ACPI Error (utglobal-0127): Unknown exception code: 0xFFFFFFED [20080609]
   > Pid: 85, comm: kacpi_notify Not tainted 2.6.27.19-5-default #1
     Call Trace:
     [<ffffffff8020da29>] show_trace_log_lvl+0x41/0x58
     [<ffffffff8049a3da>] dump_stack+0x69/0x6f
    .....

At the same time when the generic error code is returned, the ACPI_EXCEPTION
is replaced by the printk.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-02 12:24:59 -04:00
Len Brown
6a61487791 Merge branch 'bugzilla-13751' into release 2009-08-02 12:10:02 -04:00
Bjorn Helgaas
74b5820808 ACPI: bind workqueues to CPU 0 to avoid SMI corruption
On some machines, a software-initiated SMI causes corruption unless the
SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
done in GPE-related methods that are run via workqueues, so we can avoid
the known corruption cases by binding the workqueues to CPU 0.

References:
    http://bugzilla.kernel.org/show_bug.cgi?id=13751
    https://bugs.launchpad.net/bugs/157171
    https://bugs.launchpad.net/bugs/157691

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-02 12:08:50 -04:00
Len Brown
f63440eff0 Merge branch 'thinkpad' into release 2009-08-02 11:34:24 -04:00
Len Brown
437f8c8ab9 Merge branch 'bugzilla-13865' into release 2009-08-02 11:33:01 -04:00
Len Brown
b8a848ed7f Merge branch 'bugzilla-13620-revert' into release 2009-08-02 11:31:32 -04:00
Len Brown
d0006f3281 ACPI: root-only read protection on /sys/firmware/acpi/tables/*
they were world readable.

Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-02 11:26:43 -04:00
Helge Deller
cae5a39f34 parisc: hppb.c - fix printk format strings
Fix those warnings:
drivers/parisc/hppb.c: In function 'hppb_probe':
drivers/parisc/hppb.c:65: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'resource_size_t'
drivers/parisc/hppb.c:77: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'resource_size_t'
drivers/parisc/hppb.c:77: warning: format '%08x' expects type 'unsigned int', but argument 4 has type 'resource_size_t'

Signed-off-by: Helge Deller <deller@gmx.de>
2009-08-02 15:42:39 +02:00
Helge Deller
c43962321e parisc: parisc-agp.c - use correct page_mask function
Fix those compiler warnings, which indeed point to a bug:
drivers/char/agp/parisc-agp.c:228: warning: initialization from incompatible pointer type
drivers/char/agp/parisc-agp.c:201: warning: 'parisc_agp_page_mask_memory' defined but not used

Signed-off-by: Helge Deller <deller@gmx.de>
2009-08-02 15:35:43 +02:00
Helge Deller
1a1dba3241 parisc: sticore.c - check return values
Signed-off-by: Helge Deller <deller@gmx.de>
2009-08-02 15:26:51 +02:00
Ryusuke Konishi
01a261e09a nilfs2: fix missing unlock in error path of nilfs_mdt_write_page
This adds a missing unlock of nilfs->ns_writer_mutex in
nilfs_mdt_write_page() function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-02 22:24:15 +09:00
Helge Deller
1e0deabd35 parisc: dino.c - check return value of pci_assign_resource()
Signed-off-by: Helge Deller <deller@gmx.de>
2009-08-02 15:17:37 +02:00
Helge Deller
c6fe6b0783 parisc: hp_sdc_mlc.c - check return value of down_trylock()
Signed-off-by: Helge Deller <deller@gmx.de>
2009-08-02 15:13:29 +02:00
Gregory Haskins
07903af152 sched: Fix race in cpupri introduced by cpumask_var changes
Background:

Several race conditions in the scheduler have cropped up
recently, which Steven and I have tracked down using ftrace.
The most recent one turns out to be a race in how the scheduler
determines a suitable migration target for RT tasks, introduced
recently with commit:

    commit 68e74568fb
    Date:   Tue Nov 25 02:35:13 2008 +1030

        sched: convert struct cpupri_vec cpumask_var_t.

The original design of cpupri allowed lockless readers to
quickly determine a best-estimate target.  Races between the
pri_active bitmap and the vec->mask were handled in the
original code because we would detect and return "0" when this
occured.  The design was predicated on the *effective*
atomicity (*) of caching the result of cpus_and() between the
cpus_allowed and the vec->mask.

Commit 68e74568 changed the behavior such that vec->mask is
accessed multiple times.  This introduces a subtle race, the
result of which means we can have a result that returns "1",
but with an empty bitmap.

*) yes, we know cpus_and() is not a locked operator across the
   entire composite array, but it is implicitly atomic on a
   per-word basis which is all the design required to work.

Implementation:

Rather than forgoing the lockless design, or reverting to a
stack-based cpumask_t, we simply check for when the race has
been encountered and continue processing in the event that the
race is hit.  This renders the removal race as if the priority
bit had been atomically cleared as well, and allows the
algorithm to execute correctly.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:23:29 +02:00