aha/kernel
Gregory Haskins 917b627d4d sched: create "pushable_tasks" list to limit pushing to one attempt
The RT scheduler employs a "push/pull" design to actively balance tasks
within the system (on a per disjoint cpuset basis).  When a task is
awoken, it is immediately determined if there are any lower priority
cpus which should be preempted.  This is opposed to the way normal
SCHED_OTHER tasks behave, which will wait for a periodic rebalancing
operation to occur before spreading out load.

When a particular RQ has more than 1 active RT task, it is said to
be in an "overloaded" state.  Once this occurs, the system enters
the active balancing mode, where it will try to push the task away,
or persuade a different cpu to pull it over.  The system will stay
in this state until the system falls back below the <= 1 queued RT
task per RQ.

However, the current implementation suffers from a limitation in the
push logic.  Once overloaded, all tasks (other than current) on the
RQ are analyzed on every push operation, even if it was previously
unpushable (due to affinity, etc).  Whats more, the operation stops
at the first task that is unpushable and will not look at items
lower in the queue.  This causes two problems:

1) We can have the same tasks analyzed over and over again during each
   push, which extends out the fast path in the scheduler for no
   gain.  Consider a RQ that has dozens of tasks that are bound to a
   core.  Each one of those tasks will be encountered and skipped
   for each push operation while they are queued.

2) There may be lower-priority tasks under the unpushable task that
   could have been successfully pushed, but will never be considered
   until either the unpushable task is cleared, or a pull operation
   succeeds.  The net result is a potential latency source for mid
   priority tasks.

This patch aims to rectify these two conditions by introducing a new
priority sorted list: "pushable_tasks".  A task is added to the list
each time a task is activated or preempted.  It is removed from the
list any time it is deactivated, made current, or fails to push.

This works because a task only needs to be attempted to push once.
After an initial failure to push, the other cpus will eventually try to
pull the task when the conditions are proper.  This also solves the
problem that we don't completely analyze all tasks due to encountering
an unpushable tasks.  Now every task will have a push attempted (when
appropriate).

This reduces latency both by shorting the critical section of the
rq->lock for certain workloads, and by making sure the algorithm
considers all eligible tasks in the system.

[ rostedt: added a couple more BUG_ONs ]

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
2008-12-29 09:39:53 -05:00
..
irq Merge branch 'irq/sparseirq' into cpus4096 2008-12-17 13:16:08 +01:00
power Merge branch 'sched/core' into cpus4096 2008-12-12 13:48:57 +01:00
time Merge ../linux-2.6-x86 2008-12-13 21:55:51 +10:30
trace Merge ../linux-2.6-x86 2008-12-13 21:55:51 +10:30
.gitignore
acct.c tty: Fix abusers of current->sighand->tty 2008-10-13 09:51:42 -07:00
audit.c [PATCH] fix broken timestamps in AVC generated by kernel threads 2008-12-09 02:27:41 -05:00
audit.h
audit_tree.c Fix inotify watch removal/umount races 2008-11-15 12:26:44 -08:00
auditfilter.c Fix inotify watch removal/umount races 2008-11-15 12:26:44 -08:00
auditsc.c [PATCH] fix broken timestamps in AVC generated by kernel threads 2008-12-09 02:27:41 -05:00
backtracetest.c
bounds.c
capability.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
cgroup.c cgroups: fix a race between rmdir and remount 2008-12-15 16:27:07 -08:00
cgroup_debug.c cgroups: fix probable race with put_css_set[_taskexit] and find_css_set 2008-10-20 08:52:38 -07:00
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
compat.c Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus 2008-10-20 13:14:06 +02:00
configs.c kernel/configs.c: remove useless comments 2008-10-20 08:52:34 -07:00
cpu.c cpumask: centralize cpu_online_map and cpu_possible_map 2008-12-13 21:19:41 +10:30
cpuset.c cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. 2008-12-13 21:20:25 +10:30
delayacct.c per-task-delay-accounting: update taskstats for memory reclaim delay 2008-07-25 10:53:47 -07:00
dma-coherent.c dma-coherent: export dma_[alloc|release]_from_coherent methods 2008-08-22 08:34:53 +02:00
dma.c kernel/dma.c: remove a CVS keyword 2008-10-16 11:21:30 -07:00
exec_domain.c proc: move /proc/execdomains to kernel/exec_domain.c 2008-10-23 14:30:41 +04:00
exit.c Merge branches 'sched/core', 'core/core' and 'tracing/core' into cpus4096 2008-11-24 17:46:57 +01:00
extable.c Merge branch 'tracing/fastboot' into cpus4096 2008-12-12 12:43:05 +01:00
fork.c Merge branch 'sched/core' into cpus4096 2008-12-12 13:48:57 +01:00
freezer.c freezer_cg: use thaw_process() in unfreeze_cgroup() 2008-10-30 11:38:45 -07:00
futex.c Merge branches 'core/debug', 'core/futexes', 'core/locking', 'core/rcu', 'core/signal', 'core/urgent' and 'core/xen' into core/core 2008-11-24 17:44:55 +01:00
futex_compat.c
hrtimer.c hrtimer: clean up unused callback modes 2008-11-12 09:54:40 +01:00
itimer.c timers: fix itimer/many thread hang 2008-09-14 16:25:35 +02:00
kallsyms.c sprint_symbol(): use less stack 2008-11-19 18:49:58 -08:00
Kconfig.freezer container freezer: implement freezer cgroup subsystem 2008-10-20 08:52:34 -07:00
Kconfig.hz sched: fix SCHED_HRTICK dependency 2008-07-28 14:37:38 +02:00
Kconfig.preempt
kexec.c kexec: fix crash_save_vmcoreinfo_init build problem 2008-10-20 15:28:50 -07:00
kfifo.c
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus 2008-10-16 12:38:34 -07:00
kprobes.c kernel/kprobes.c: don't pad kretprobe_table_locks[] on uniprocessor builds 2008-11-12 17:17:17 -08:00
ksysfs.c profiling: dynamically enable readprofile at runtime 2008-10-16 11:21:31 -07:00
kthread.c tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() 2008-11-16 09:01:36 +01:00
latencytop.c KSYM_SYMBOL_LEN fixes 2008-12-10 08:01:54 -08:00
lockdep.c Merge branch 'tracing/fastboot' into cpus4096 2008-12-12 12:43:05 +01:00
lockdep_internals.h lockdep: build fix 2008-08-13 12:55:10 +02:00
lockdep_proc.c lockstat: contend with points 2008-10-20 15:43:10 +02:00
Makefile Merge branch 'tracing/fastboot' into cpus4096 2008-12-12 12:43:05 +01:00
marker.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
module.c tracing/function-graph-tracer: introduce __notrace_funcgraph to filter special functions 2008-12-08 15:11:44 +01:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: __used is needed for function referenced only from inline asm 2008-11-24 10:00:28 +01:00
mutex.h
notifier.c Merge branches 'core/debug', 'core/futexes', 'core/locking', 'core/rcu', 'core/signal', 'core/urgent' and 'core/xen' into core/core 2008-11-24 17:44:55 +01:00
ns_cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
nsproxy.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
panic.c taint: add missing comment 2008-12-01 19:55:24 -08:00
params.c Fix compile warning in kernel/params.c 2008-10-23 12:09:00 -07:00
pid.c pidns: remove now unused find_pid function. 2008-07-25 10:53:45 -07:00
pid_namespace.c pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits 2008-09-02 19:21:38 -07:00
pm_qos_params.c pm_qos_requirement might sleep 2008-09-02 19:21:40 -07:00
posix-cpu-timers.c Merge branch 'sched/core' into cpus4096 2008-12-12 13:48:57 +01:00
posix-timers.c Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 2008-10-22 09:48:06 +02:00
printk.c printk: remove unused code from kernel/printk.c 2008-10-23 21:54:29 +02:00
profile.c Merge ../linux-2.6-x86 2008-12-13 21:55:51 +10:30
ptrace.c remove __ARCH_WANT_COMPAT_SYS_PTRACE 2008-11-30 11:00:15 -08:00
rcuclassic.c sched: convert nohz_cpu_mask to cpumask_var_t. 2008-11-24 17:51:10 +01:00
rcupdate.c rcupdate: fix bug of rcu_barrier*() 2008-10-21 15:59:53 +02:00
rcupreempt.c byteorder: remove direct includes of linux/byteorder/swab[b].h 2008-10-20 08:52:40 -07:00
rcupreempt_trace.c rcu: trace fix possible mem-leak 2008-08-15 17:54:40 +02:00
rcutorture.c byteorder: remove direct includes of linux/byteorder/swab[b].h 2008-10-20 12:51:53 -07:00
relay.c relayfs: fix infinite loop with splice() 2008-12-10 08:01:52 -08:00
res_counter.c cgroup files: convert res_counter_write() to be a cgroups write_string() handler 2008-07-25 10:53:36 -07:00
resource.c reserve_region_with_split: Fix GFP_KERNEL usage under spinlock 2008-11-01 09:53:58 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c hrtimer: convert kernel/* to the new hrtimer apis 2008-09-05 21:35:13 -07:00
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: create "pushable_tasks" list to limit pushing to one attempt 2008-12-29 09:39:53 -05:00
sched_clock.c Revert "sched_clock: prevent scd->clock from moving backwards" 2008-12-14 16:23:17 -08:00
sched_cpupri.c sched: convert struct cpupri_vec cpumask_var_t. 2008-11-24 17:52:22 +01:00
sched_cpupri.h sched: convert struct cpupri_vec cpumask_var_t. 2008-11-24 17:52:22 +01:00
sched_debug.c sched: add uid information to sched_debug for CONFIG_USER_SCHED 2008-12-01 20:39:50 +01:00
sched_fair.c sched: bias task wakeups to preferred semi-idle packages 2008-12-19 09:21:52 +01:00
sched_features.h sched: backward looking buddy 2008-11-05 10:30:14 +01:00
sched_idletask.c sched: add CONFIG_SMP consistency 2008-10-22 10:01:52 +02:00
sched_rt.c sched: create "pushable_tasks" list to limit pushing to one attempt 2008-12-29 09:39:53 -05:00
sched_stats.h Merge ../linux-2.6-x86 2008-12-13 21:55:51 +10:30
seccomp.c
semaphore.c semaphore: __down_common: use signal_pending_state() 2008-08-05 14:33:47 -07:00
signal.c tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() 2008-11-16 09:01:36 +01:00
smp.c generic-ipi: fix the smp_mb() placement 2008-11-06 08:41:56 +01:00
softirq.c irq: call __irq_enter() before calling the tick_idle_check 2008-11-10 22:36:39 +01:00
softlockup.c Merge branch 'sched/core' into cpus4096 2008-12-12 13:48:57 +01:00
spinlock.c lockdep: spin_lock_nest_lock(), checkpatch fixes 2008-08-13 13:56:51 +02:00
srcu.c
stacktrace.c
stop_machine.c stop_machine: fix race with return value (fixes Bug #11989) 2008-11-16 15:09:52 -08:00
sys.c thread_group_cputime: move a couple of callsites outside of ->siglock 2008-11-17 16:55:55 +01:00
sys_ni.c reintroduce accept4 2008-11-19 18:49:57 -08:00
sysctl.c Merge commit 'v2.6.28-rc7' into tracing/core 2008-12-04 09:07:19 +01:00
sysctl_check.c sysctl: check for bogus modes 2008-07-25 10:53:45 -07:00
taskstats.c cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. 2008-12-13 21:20:25 +10:30
test_kprobes.c
time.c select: add a timespec_add_safe() function 2008-09-05 21:34:57 -07:00
timeconst.pl
timer.c Add round_jiffies_up and related routines 2008-11-06 08:42:48 +01:00
tracepoint.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
tsacct.c task IO accounting: move all IO statistics in struct task_io_accounting 2008-07-27 16:12:28 -07:00
uid16.c
user.c sched: add uid information to sched_debug for CONFIG_USER_SCHED 2008-12-01 20:39:50 +01:00
user_namespace.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
utsname.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
utsname_sysctl.c sysctl: simplify ->strategy 2008-10-16 11:21:47 -07:00
wait.c wait: kill is_sync_wait() 2008-10-16 11:21:31 -07:00
workqueue.c cpumask: introduce new API, without changing anything 2008-11-06 09:05:33 +01:00