aha/kernel
Linus Torvalds 04e2f1741d Add memory barrier semantics to wake_up() & co
Oleg Nesterov and others have pointed out that on some architectures,
the traditional sequence of

	set_current_state(TASK_INTERRUPTIBLE);
	if (CONDITION)
		return;
	schedule();

is racy wrt another CPU doing

	CONDITION = 1;
	wake_up_process(p);

because while set_current_state() has a memory barrier separating
setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION
variable, there is no such memory barrier on the wakeup side.

Now, wake_up_process() does actually take a spinlock before it reads and
sets the task state on the waking side, and on x86 (and many other
architectures) that spinlock is in fact equivalent to a memory barrier,
but that is not generally guaranteed.  The write that sets CONDITION
could move into the critical region protected by the runqueue spinlock.

However, adding a smp_wmb() to before the spinlock should now order the
writing of CONDITION wrt the lock itself, which in turn is ordered wrt
the accesses within the spinlock (which includes the reading of the old
state).

This should thus close the race (which probably has never been seen in
practice, but since smp_wmb() is a no-op on x86, it's not like this will
make anything worse either on the most common architecture where the
spinlock already gave the required protection).

Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 18:05:03 -08:00
..
irq genirq: do not leave interupts enabled on free_irq 2008-02-19 10:43:58 +01:00
power PM: Introduce PM_EVENT_HIBERNATE callback state 2008-02-23 10:40:04 -08:00
time timer_list: print relative expiry time signed 2008-02-17 17:29:38 +01:00
.gitignore Update kernel/.gitignore with new auto-generated files 2008-02-09 23:27:01 -08:00
acct.c acct: real_parent ppid 2008-01-07 14:55:37 -08:00
audit.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
audit.h [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit_tree.c Introduce path_put() 2008-02-14 21:13:33 -08:00
auditfilter.c Introduce path_put() 2008-02-14 21:13:33 -08:00
auditsc.c Audit: use == not = in if statements 2008-02-18 18:46:28 -08:00
backtracetest.c x86: add a simple backtrace test module 2008-01-30 13:33:08 +01:00
capability.c Add 64-bit capability support to the kernel 2008-02-05 09:44:20 -08:00
cgroup.c cgroup: remove dead code in cgroup_get_rootdir() 2008-02-23 17:13:25 -08:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
compat.c hrtimer: don't modify restart_block->fn in restart functions 2008-02-10 10:48:03 +01:00
configs.c
cpu.c cpu: fix section mismatch warnings for enable_nonboot_cpus 2008-02-08 09:22:41 -08:00
cpuset.c proc: seqfile convert proc_pid_status to properly handle pid namespaces 2008-02-08 09:22:24 -08:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c Use struct path in fs_struct 2008-02-14 21:13:33 -08:00
extable.c module: Don't report discarded init pages as kernel text. 2008-01-29 17:13:18 +11:00
fork.c Use struct path in fs_struct 2008-02-14 21:13:33 -08:00
futex.c futex: runtime enable pi and robust functionality 2008-02-23 17:12:15 -08:00
futex_compat.c futex: runtime enable pi and robust functionality 2008-02-23 17:12:15 -08:00
hrtimer.c hrtimer: catch expired CLOCK_REALTIME timers early 2008-02-14 22:08:30 +01:00
itimer.c ITIMER_REAL: convert to use struct pid 2008-02-08 09:22:29 -08:00
kallsyms.c remove support for un-needed _extratext section 2008-02-06 10:41:01 -08:00
Kconfig.hz sched: high-res preemption tick 2008-01-25 21:08:29 +01:00
Kconfig.preempt sched: remove the !PREEMPT_BKL code 2008-01-25 21:08:33 +01:00
kexec.c vmcoreinfo: add "VMCOREINFO_" to all the call for vmcoreinfo_append_str() 2008-02-07 08:42:25 -08:00
kfifo.c
kmod.c Dont touch fs_struct in usermodehelper 2008-02-14 21:13:32 -08:00
kprobes.c kprobes: kretprobe user entry-handler 2008-02-06 10:41:11 -08:00
ksysfs.c Kobject: convert remaining kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
kthread.c sched: fix, always create kernel threads with normal priority 2008-01-25 21:08:33 +01:00
latencytop.c sched: latencytop support 2008-01-25 21:08:34 +01:00
lockdep.c softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks 2008-01-25 21:08:02 +01:00
lockdep_internals.h
lockdep_proc.c
Makefile avoid overflows in kernel/time.c 2008-02-08 09:22:39 -08:00
marker.c markers: fix sparse warnings in markers.c 2008-02-23 17:12:14 -08:00
module.c modules: do not try to add sysfs attributes if !CONFIG_SYSFS 2008-02-21 15:27:08 -08:00
mutex-debug.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
mutex-debug.h
mutex.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
mutex.h
notifier.c kernel/notifier.c should #include <linux/reboot.h> 2008-02-06 10:41:02 -08:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c namespaces: move the IPC namespace under IPC_NS option 2008-02-08 09:22:23 -08:00
panic.c ACPI: Taint kernel on ACPI table override (format corrected) 2008-02-06 22:07:51 -05:00
params.c Add new string functions strict_strto* and convert kernel params to use them 2008-02-08 09:22:41 -08:00
pid.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
pid_namespace.c namespaces: cleanup the code managed with PID_NS option 2008-02-08 09:22:23 -08:00
pm_qos_params.c pm qos infrastructure and interface 2008-02-05 09:44:22 -08:00
posix-cpu-timers.c Use find_task_by_vpid in posix timers 2008-02-08 09:22:41 -08:00
posix-timers.c hrtimer: check relative timeouts for overflow 2008-02-14 22:08:30 +01:00
printk.c printk_ratelimit() functions should use CONFIG_PRINTK 2008-02-08 09:22:39 -08:00
profile.c Nuke a duplicate include from profile.c 2008-02-08 09:22:34 -08:00
ptrace.c ptrace_check_attach: remove unneeded ->signal != NULL check 2008-02-08 09:22:26 -08:00
rcuclassic.c Preempt-RCU: implementation 2008-01-25 21:08:24 +01:00
rcupdate.c rcupdate: fix comment 2008-02-13 16:21:18 -08:00
rcupreempt.c Preempt-RCU: CPU Hotplug handling 2008-01-25 21:08:25 +01:00
rcupreempt_trace.c Preempt-RCU: implementation 2008-01-25 21:08:24 +01:00
rcutorture.c cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() 2008-01-25 21:08:02 +01:00
relay.c relay: nopage 2008-02-06 10:41:07 -08:00
res_counter.c Memory controller improve user interface 2008-02-07 08:42:18 -08:00
resource.c [POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc 2008-02-08 19:52:48 +11:00
rtmutex-debug.c Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.h
rtmutex-tester.c Driver core: change sysdev classes to use dynamic kobject names 2008-01-24 20:40:40 -08:00
rtmutex.c hrtimer: more hrtimer_init_sleeper() fallout. 2008-02-13 15:45:36 +01:00
rtmutex.h
rtmutex_common.h Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched.c Add memory barrier semantics to wake_up() & co 2008-02-23 18:05:03 -08:00
sched_debug.c sched: keep total / count stats in addition to the max for 2008-01-25 21:08:35 +01:00
sched_fair.c sched: let +nice tasks have smaller impact 2008-01-31 22:45:22 +01:00
sched_idletask.c sched: high-res preemption tick 2008-01-25 21:08:29 +01:00
sched_rt.c sched: rt-group: make rt groups scheduling configurable 2008-02-13 15:45:40 +01:00
sched_stats.h sched: clean up kernel/sched_stat.h 2007-11-28 15:52:56 +01:00
seccomp.c
signal.c remove final fastcall users 2008-02-13 16:21:18 -08:00
softirq.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
softlockup.c debug: softlockup looping fix 2008-02-02 14:27:45 +11:00
spinlock.c spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
srcu.c make srcu_readers_active() static 2008-02-06 10:41:02 -08:00
stacktrace.c
stop_machine.c stopmachine: semaphore to mutex 2008-02-06 10:41:08 -08:00
sys.c Pidns: make full use of xxx_vnr() calls 2008-02-08 09:22:29 -08:00
sys_ni.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
sysctl.c hugetlb: fix overcommit locking 2008-02-13 16:21:18 -08:00
sysctl_check.c constify tables in kernel/sysctl_check.c 2008-02-08 09:22:31 -08:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
test_kprobes.c kprobes: kretprobe user entry-handler 2008-02-06 10:41:11 -08:00
time.c avoid overflows in kernel/time.c 2008-02-08 09:22:39 -08:00
timeconst.pl timeconst.pl: correct reversal of USEC_TO_HZ and HZ_TO_USEC 2008-02-12 14:29:26 -08:00
timer.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c
user.c sched: rt-group: make rt groups scheduling configurable 2008-02-13 15:45:40 +01:00
user_namespace.c namespaces: cleanup the code managed with the USER_NS option 2008-02-08 09:22:23 -08:00
utsname.c
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
wait.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
workqueue.c workqueue: make delayed_work_timer_fn() static 2008-02-08 09:22:37 -08:00