aha/kernel
Frederic Weisbecker 8a0be9ef82 sched: don't rebalance if attached on NULL domain
Impact: fix function graph trace hang / drop pointless softirq on UP

While debugging a function graph trace hang on an old PII, I saw
that it consumed most of its time on the timer interrupt. And
the domain rebalancing softirq was the most concerned.

The timer interrupt calls trigger_load_balance() which will
decide if it is worth to schedule a rebalancing softirq.

In case of builtin UP kernel, no problem arises because there is
no domain question.

In case of builtin SMP kernel running on an SMP box, still no
problem, the softirq will be raised each time we reach the
next_balance time.

In case of builtin SMP kernel running on a UP box (most distros
provide default SMP kernels, whatever the box you have), then
the CPU is attached to the NULL sched domain. So a kind of
unexpected behaviour happen:

trigger_load_balance() -> raises the rebalancing softirq later
on softirq: run_rebalance_domains() -> rebalance_domains() where
the for_each_domain(cpu, sd) is not taken because of the NULL
domain we are attached at. Which means rq->next_balance is never
updated. So on the next timer tick, we will enter
trigger_load_balance() which will always reschedule() the
rebalacing softirq:

if (time_after_eq(jiffies, rq->next_balance))
	raise_softirq(SCHED_SOFTIRQ);

So for each tick, we process this pointless softirq.

This patch fixes it by checking if we are attached to the null
domain before raising the softirq, another possible fix would be
to set the maximal possible JIFFIES value to rq->next_balance if
we are attached to the NULL domain.

v2: build fix on UP

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:04:44 +01:00
..
irq Merge branch 'core/xen' into x86/urgent 2009-02-04 14:54:56 +01:00
power PM: Split up sysdev_[suspend|resume] from device_power_[down|up] 2009-02-22 10:33:44 -08:00
time hrtimers: allow the hot-unplugging of all cpus 2009-01-30 22:35:29 +01:00
trace tracing: limit the number of loops the ring buffer self test can make 2009-02-18 22:50:01 -05:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c async: use list_move_tail 2009-02-08 10:00:26 -08:00
audit.c [PATCH] fix broken timestamps in AVC generated by kernel threads 2008-12-09 02:27:41 -05:00
audit.h fixing audit rule ordering mess, part 1 2009-01-04 15:14:41 -05:00
audit_tree.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditfilter.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditsc.c make sure that filterkey of task,always rules is reported 2009-01-04 15:14:42 -05:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup.c cgroups: fix possible use after free 2009-02-18 15:37:54 -08:00
cgroup_debug.c
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
compat.c Allow times and time system calls to return small negative values 2009-01-06 15:59:13 -08:00
configs.c
cpu.c stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 11:36:14 -08:00
cpuset.c cpuset: fix possible deadlock in async_rebuild_sched_domains 2009-01-19 02:44:00 +01:00
cred-internals.h CRED: Inaugurate COW credentials 2008-11-14 10:39:23 +11:00
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2009-01-09 13:59:25 -08:00
delayacct.c schedstat: consolidate per-task cpu runtime stats 2008-12-18 13:54:01 +01:00
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c
exec_domain.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
exit.c signal: re-add dead task accumulation stats. 2009-02-05 13:04:33 +01:00
extable.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
fork.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-02-11 08:24:32 -08:00
freezer.c
futex.c futex: fix reference leak 2009-02-11 18:24:08 +01:00
futex_compat.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
hrtimer.c hrtimer: prevent negative expiry value after clock_was_set() 2009-01-30 22:35:34 +01:00
itimer.c timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
kallsyms.c Revert "kbuild: strip generated symbols from *.ko" 2009-01-14 21:38:20 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.preempt rcu: provide RCU options on non-preempt architectures too 2008-12-25 09:31:28 +01:00
kexec.c PM: Split up sysdev_[suspend|resume] from device_power_[down|up] 2009-02-22 10:33:44 -08:00
kfifo.c
kgdb.c
kmod.c kmod: fix varargs kernel-doc 2009-01-06 15:59:27 -08:00
kprobes.c kprobes: check CONFIG_FREEZER instead of CONFIG_PM 2009-01-16 14:32:17 -05:00
ksysfs.c kernel/ksysfs.c:fix dependence on CONFIG_NET 2009-01-06 10:44:31 -08:00
kthread.c tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() 2008-11-16 09:01:36 +01:00
latencytop.c sched, latencytop: incorporate review feedback from Andrew Morton 2009-02-11 10:18:04 +01:00
lockdep.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
lockdep_internals.h
lockdep_proc.c
Makefile PM: fix build for CONFIG_PM unset 2009-02-21 14:17:17 -08:00
marker.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
module.c modules: Use a better scheme for refcounting 2009-02-02 19:17:55 -08:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: __used is needed for function referenced only from inline asm 2008-11-24 10:00:28 +01:00
mutex.h
notifier.c Merge commit 'v2.6.28-rc6' into core/debug 2008-11-26 08:22:50 +01:00
ns_cgroup.c ns_cgroup: remove unused spinlock 2009-01-08 08:31:02 -08:00
nsproxy.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
panic.c oops: increment the oops UUID every time we oops 2009-01-06 15:59:12 -08:00
params.c
pid.c pid: generalize task_active_pid_ns 2009-01-08 08:31:12 -08:00
pid_namespace.c
pm_qos_params.c
posix-cpu-timers.c timers: more consistently use clock vs timer 2009-02-13 13:04:05 +01:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c PM: Fix suspend_console and resume_console to use only one semaphore 2009-02-21 14:17:18 -08:00
profile.c profiling: fix broken profiling regression 2009-02-10 00:50:37 +01:00
ptrace.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
rcuclassic.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcupdate.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcupreempt.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcupreempt_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
rcutorture.c rcu: fix bug in rcutorture system-shutdown code 2009-01-07 23:36:25 +01:00
rcutree.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcutree_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
relay.c relay: fix lock imbalance in relay_late_setup_files 2009-01-18 20:29:35 +01:00
res_counter.c memcg: memory cgroup resource counters for hierarchy 2009-01-08 08:31:05 -08:00
resource.c resources: fix parameter name and kernel-doc 2009-01-15 16:39:38 -08:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: don't rebalance if attached on NULL domain 2009-03-05 14:04:44 +01:00
sched_clock.c sched_clock: cleanups 2009-02-26 21:56:07 +01:00
sched_cpupri.c sched: fix section mismatch 2009-01-06 11:07:15 +01:00
sched_cpupri.h sched: convert struct cpupri_vec cpumask_var_t. 2008-11-24 17:52:22 +01:00
sched_debug.c sched: introduce avg_wakeup 2009-01-15 12:00:08 +01:00
sched_fair.c Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/core 2009-02-15 21:15:16 +01:00
sched_features.h sched: prefer wakers 2009-01-15 12:00:09 +01:00
sched_idletask.c
sched_rt.c Merge branches 'sched/rt' and 'sched/urgent' into sched/core 2009-02-08 20:12:46 +01:00
sched_stats.h timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
seccomp.c x86-64: seccomp: fix 32/64 syscall hole 2009-03-02 15:41:30 -08:00
semaphore.c
signal.c signal: re-add dead task accumulation stats. 2009-02-05 13:04:33 +01:00
smp.c generic-ipi: use per cpu data for single cpu ipi calls 2009-01-30 18:31:08 +01:00
softirq.c Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-03 12:04:39 -08:00
softlockup.c softlock: fix false panic which can occur if softlockup_thresh is reduced 2009-01-14 11:48:07 +01:00
spinlock.c
srcu.c
stacktrace.c stacktrace: provide save_stack_trace_tsk() weak alias 2008-12-25 11:44:43 +01:00
stop_machine.c stop_machine: introduce stop_machine_create/destroy. 2009-01-05 08:40:14 +10:30
sys.c sched: don't allow setuid to succeed if the user does not have rt bandwidth 2009-02-27 11:11:53 +01:00
sys_ni.c [CVE-2009-0029] Make sys_syslog a conditional system call 2009-01-14 14:15:16 +01:00
sysctl.c mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches 2009-02-11 14:25:35 -08:00
sysctl_check.c
taskstats.c cpumask: convert rest of files in kernel/ 2009-01-01 10:12:28 +10:30
test_kprobes.c kprobes: add tests for register_kprobes 2009-01-06 15:59:20 -08:00
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
tracepoint.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
tsacct.c mm: introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting 2009-01-06 15:59:09 -08:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user.c sched: don't allow setuid to succeed if the user does not have rt bandwidth 2009-02-27 11:11:53 +01:00
user_namespace.c Fix recursive lock in free_uid()/free_user_ns() 2009-02-27 16:26:21 -08:00
utsname.c
utsname_sysctl.c
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c work_on_cpu: Use our own workqueue. 2009-01-19 22:36:07 +01:00