mirror of
https://github.com/adulau/aha.git
synced 2024-12-29 12:16:20 +00:00
8558f8f816
Dhaval Giani reported this warning during cpu hotplug stress-tests: | On running kernel compiles in parallel with cpu hotplug: | | WARNING: at arch/x86/kernel/smp.c:118 | native_smp_send_reschedule+0x21/0x36() | Modules linked in: | Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1 | [...] | [<c0110355>] native_smp_send_reschedule+0x21/0x36 | [<c014fe8f>] force_quiescent_state+0x47/0x57 | [<c014fef0>] call_rcu+0x51/0x6d | [<c01713b3>] __fput+0x130/0x158 | [<c0171231>] fput+0x17/0x19 | [<c016fd99>] filp_close+0x4d/0x57 | [<c016fdff>] sys_close+0x5c/0x97 IMHO the warning is a spurious one. cpu_online_map is updated by the _cpu_down() using stop_machine_run(). Since force_quiescent_state is invoked from irqs disabled section, stop_machine_run() won't be executing while a cpu is executing force_quiescent_state(). Hence the cpu_online_map is stable while we're in the irq disabled section. However, a cpu might have been offlined _just_ before we disabled irqs while entering force_quiescent_state(). And rcu subsystem might not yet have handled the CPU_DEAD notification, leading to the offlined cpu's bit being set in the rcp->cpumask. Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending smp_reschedule() to an offlined CPU. Here's the timeline: CPU_A CPU_B -------------------------------------------------------------- cpu_down(): . . . . . stop_machine(): /* disables preemption, . * and irqs */ . . . . . take_cpu_down(); . . . . . . . cpu_disable(); /*this removes cpu . *from cpu_online_map . */ . . . . . restart_machine(); /* enables irqs */ . ------WINDOW DURING WHICH rcp->cpumask is stale --------------- . call_rcu(); . /* disables irqs here */ . .force_quiescent_state(); .CPU_DEAD: .for_each_cpu(rcp->cpumask) . . smp_send_reschedule(); . . . . WARN_ON() for offlined CPU! . . . rcu_cpu_notify: . -------- WINDOW ENDS ------------------------------------------ rcu_offline_cpu() /* Which calls cpu_quiet() * which removes * cpu from rcp->cpumask. */ If a new batch was started just before calling stop_machine_run(), the "tobe-offlined" cpu is still present in rcp-cpumask. During a cpu-offline, from take_cpu_down(), we queue an rt-prio idle task as the next task to be picked by the scheduler. We also call cpu_disable() which will disable any further interrupts and remove the cpu's bit from the cpu_online_map. Once the stop_machine_run() successfully calls take_cpu_down(), it calls schedule(). That's the last time a schedule is called on the offlined cpu, and hence the last time when rdp->passed_quiesc will be set to 1 through rcu_qsctr_inc(). But the cpu_quiet() will be on this cpu will be called only when the next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU is still set in rcp->cpumask. Now coming back to the idle_task which truely offlines the CPU, it does check for a pending RCU and raises the softirq, since it will find rdp->passed_quiesc to be 0 in this case. However, since the cpu is offline I am not sure if the softirq will trigger on the CPU. Even if it doesn't the rcu_offline_cpu() will find that rcp->completed is not the same as rcp->cur, which means that our cpu could be holding up the grace period progression. Hence we call cpu_quiet() and move ahead. But because of the window explained in the timeline, we could still have a call_rcu() before the RCU subsystem executes it's CPU_DEAD notification, and we send smp_send_reschedule() to offlined cpu while trying to force the quiescent states. The appended patch adds comments and prevents checking for offlined cpu everytime. cpu_online_map is updated by the _cpu_down() using stop_machine_run(). Since force_quiescent_state is invoked from irqs disabled section, stop_machine_run() won't be executing while a cpu is executing force_quiescent_state(). Hence the cpu_online_map is stable while we're in the irq disabled section. Reported-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: laijs@cn.fujitsu.com Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russel <rusty@rustcorp.com.au> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> |
||
---|---|---|
.. | ||
irq | ||
power | ||
time | ||
.gitignore | ||
acct.c | ||
audit.c | ||
audit.h | ||
audit_tree.c | ||
auditfilter.c | ||
auditsc.c | ||
backtracetest.c | ||
bounds.c | ||
capability.c | ||
cgroup.c | ||
cgroup_debug.c | ||
compat.c | ||
configs.c | ||
cpu.c | ||
cpuset.c | ||
delayacct.c | ||
dma.c | ||
exec_domain.c | ||
exit.c | ||
extable.c | ||
fork.c | ||
futex.c | ||
futex_compat.c | ||
hrtimer.c | ||
itimer.c | ||
kallsyms.c | ||
Kconfig.hz | ||
Kconfig.preempt | ||
kexec.c | ||
kfifo.c | ||
kgdb.c | ||
kmod.c | ||
kprobes.c | ||
ksysfs.c | ||
kthread.c | ||
latencytop.c | ||
lockdep.c | ||
lockdep_internals.h | ||
lockdep_proc.c | ||
Makefile | ||
marker.c | ||
module.c | ||
mutex-debug.c | ||
mutex-debug.h | ||
mutex.c | ||
mutex.h | ||
notifier.c | ||
ns_cgroup.c | ||
nsproxy.c | ||
panic.c | ||
params.c | ||
pid.c | ||
pid_namespace.c | ||
pm_qos_params.c | ||
posix-cpu-timers.c | ||
posix-timers.c | ||
printk.c | ||
profile.c | ||
ptrace.c | ||
rcuclassic.c | ||
rcupdate.c | ||
rcupreempt.c | ||
rcupreempt_trace.c | ||
rcutorture.c | ||
relay.c | ||
res_counter.c | ||
resource.c | ||
rtmutex-debug.c | ||
rtmutex-debug.h | ||
rtmutex-tester.c | ||
rtmutex.c | ||
rtmutex.h | ||
rtmutex_common.h | ||
rwsem.c | ||
sched.c | ||
sched_clock.c | ||
sched_debug.c | ||
sched_fair.c | ||
sched_features.h | ||
sched_idletask.c | ||
sched_rt.c | ||
sched_stats.h | ||
seccomp.c | ||
semaphore.c | ||
signal.c | ||
softirq.c | ||
softlockup.c | ||
spinlock.c | ||
srcu.c | ||
stacktrace.c | ||
stop_machine.c | ||
sys.c | ||
sys_ni.c | ||
sysctl.c | ||
sysctl_check.c | ||
taskstats.c | ||
test_kprobes.c | ||
time.c | ||
timeconst.pl | ||
timer.c | ||
tsacct.c | ||
uid16.c | ||
user.c | ||
user_namespace.c | ||
utsname.c | ||
utsname_sysctl.c | ||
wait.c | ||
workqueue.c |