Commit graph

1107 commits

Author SHA1 Message Date
Peter Zijlstra
18a3885fc1 sched: Remove reciprocal for cpu_power
Its a source of fail, also, now that cpu_power is dynamical,
its a waste of time.

before:
<idle>-0   [000]   132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1

after:
bash-1689  [001]   137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1

[ v2: build fix from From: Andreas Herrmann ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.425896304@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:56 +02:00
Gautham R Shenoy
d899a789c2 sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
sgs.group_capacity can now be 0, if for some reason
group->__cpu_power happens to be less than SCHED_LOAD_SCALE/2.

In that case, we need the following fix to make it work for
update_sd_power_savings_stats(). That's because both
sum_nr_running and group_capacity are unsigned longs.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:56 +02:00
Peter Zijlstra
bdb94aa5db sched: Try to deal with low capacity
When the capacity drops low, we want to migrate load away.
Allow the load-balancer to remove all tasks when we hit rock
bottom.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.342231003@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:55 +02:00
Peter Zijlstra
e9e9250bc7 sched: Scale down cpu_power due to RT tasks
Keep an average on the amount of time spend on RT tasks and use
that fraction to scale down the cpu_power for regular tasks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.287778431@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:55 +02:00
Peter Zijlstra
ab29230e67 sched: Implement dynamic cpu_power
Recompute the cpu_power for each cpu during load-balance.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.162033479@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:54 +02:00
Peter Zijlstra
a52bfd7358 sched: Add smt_gain
The idea is that multi-threading a core yields more work
capacity than a single thread, provide a way to express a
static gain for threads.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.073345955@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:54 +02:00
Peter Zijlstra
cc9fba7d76 sched: Update the cpu_power sum during load-balance
In order to prepare for a more dynamic cpu_power, update the
group sum while walking the sched domains during load-balance.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083825.985050292@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:53 +02:00
Peter Zijlstra
b5d978e0c7 sched: Add SD_PREFER_SIBLING
Do the placement thing using SD flags.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083825.897028974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:53 +02:00
Peter Zijlstra
f93e65c186 sched: Restore __cpu_power to a straight sum of power
cpu_power is supposed to be a representation of the process
capacity of the cpu, not a value to randomly tweak in order to
affect placement.

Remove the placement hacks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083825.810860576@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:09:53 +02:00
Ingo Molnar
9aa55fbd01 Merge branches 'sched/domains' and 'sched/clock' into sched/core
Merge reason: both topics are ready now, and we want to merge dependent
              changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 10:08:47 +02:00
Arjan van de Ven
8f0dfc34e9 sched: Provide iowait counters
For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.

In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.

This patch below adds this scheduler statistic to the kernel.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A64B813.1080506@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 08:44:08 +02:00
Anirban Sinha
84e9dabf6e sched: Rename init_cfs_rq => init_tg_cfs_rq
... so that it does not share a common name with a function
within the same scope.

Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
LKML-Reference: <DDFD17CC94A9BD49A82147DDF7D545C501EA98A6@exchange.ZeugmaSystems.local>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 15:51:16 +02:00
Peter Zijlstra
34d76c4155 sched: Fix division by zero - really
When re-computing the shares for each task group's cpu
representation we need the ratio of weight on each cpu vs the
total weight of the sched domain.

Since load-balancing is loosely (read not) synchronized, the
weight of individual cpus can change between doing the sum and
calculating the ratio.

The previous patch dealt with only one of the race scenarios,
this patch side steps them all by saving a snapshot of all the
individual cpu weights, thereby always working on a consistent
set.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: torvalds@linux-foundation.org
Cc: jes@sgi.com
Cc: jens.axboe@oracle.com
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1251371336.18584.77.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-28 08:26:49 +02:00
Peter Zijlstra
a8af7246c1 sched: Avoid division by zero
Patch a5004278f0 (sched: Fix
cgroup smp fairness) introduced the possibility of a
divide-by-zero because load-balancing is not synchronized
between sched_domains.

This can cause the state of cpus to change between the first
and second loop over the sched domain in tg_shares_up().

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1250855934.7538.30.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 14:15:10 +02:00
Hiroshi Shimamoto
cde7e5ca4e sched: Use for_each_class macro in move_one_task()
Replace for loop with the macro for_each_class to cleanup.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
LKML-Reference: <4A8A277D.4090304@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-20 13:00:30 +02:00
Andreas Herrmann
294b0c9619 sched: Consolidate definition of variable sd in __build_sched_domains
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818110229.GM29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:45 +02:00
Andreas Herrmann
0601a88d8f sched: Separate out build of NUMA sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818110111.GL29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:44 +02:00
Andreas Herrmann
de616e36c7 sched: Separate out build of ALLNODES sched groups from __build_sched_domains
For the sake of completeness.
Now all calls to init_sched_build_groups() are contained in
build_sched_groups().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818110013.GK29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:44 +02:00
Andreas Herrmann
86548096f2 sched: Separate out build of CPU sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105928.GJ29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:43 +02:00
Andreas Herrmann
a2af04cdbb sched: Separate out build of MC sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105838.GI29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:43 +02:00
Andreas Herrmann
0e8e85c941 sched: Separate out build of SMT sched groups from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105751.GH29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:42 +02:00
Andreas Herrmann
d817353555 sched: Separate out build of SMT sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105703.GG29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:42 +02:00
Andreas Herrmann
410c408108 sched: Separate out build of MC sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105614.GF29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:41 +02:00
Andreas Herrmann
87cce6622c sched: Separate out build of CPU sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105455.GE29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:40 +02:00
Andreas Herrmann
7f4588f3aa sched: Separate out build of NUMA sched domain from __build_sched_domains
... to further strip down __build_sched_domains().

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105406.GD29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:40 +02:00
Andreas Herrmann
2109b99ee1 sched: Separate out allocation/free/goto-hell from __build_sched_domains
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105300.GC29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:39 +02:00
Andreas Herrmann
49a02c514d sched: Use structure to store local data in __build_sched_domains
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090818105152.GB29515@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 18:35:39 +02:00
Peter Zijlstra
693525e3be sched: Ensure the migration task doesn't go away during use
Like sched_migrate_task(), set_cpus_allowed_ptr() should hold
onto the migration thread too.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:15 +02:00
Gregory Haskins
00aec93d10 sched: Fully integrate cpus_active_map and root-domain code
Reflect "active" cpus in the rq->rd->online field, instead of
the online_map.

The motivation is that things that use the root-domain code
(such as cpupri) only care about cpus classified as "active"
anyway. By synchronizing the root-domain state with the active
map, we allow several optimizations.

For instance, we can remove an extra cpumask_and from the
scheduler hotpath by utilizing rq->rd->online (since it is now
a cached version of cpu_active_map & rq->rd->span).

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:12 +02:00
Gregory Haskins
3f029d3c6d sched: Enhance the pre/post scheduling logic
We currently have an explicit "needs_post" vtable method which
returns a stack variable for whether we should later run
post-schedule.  This leads to an awkward exchange of the
variable as it bubbles back up out of the context switch. Peter
Zijlstra observed that this information could be stored in the
run-queue itself instead of handled on the stack.

Therefore, we revert to the method of having context_switch
return void, and update an internal rq->post_schedule variable
when we require further processing.

In addition, we fix a race condition where we try to access
current->sched_class without holding the rq->lock.  This is
technically racy, as the sched-class could change out from
under us.  Instead, we reference the per-rq post_schedule
variable with the runqueue unlocked, but with preemption
disabled to see if we need to reacquire the rq->lock.

Finally, we clean the code up slightly by removing the #ifdef
CONFIG_SMP conditionals from the schedule() call, and implement
some inline helper functions instead.

This patch passes checkpatch, and rt-migrate.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:10 +02:00
Steven Rostedt
da19ab5103 sched: Check for pushing rt tasks after all scheduling
The current method for pushing RT tasks after scheduling only
happens after a context switch. But we found cases where a task
is set up on a run queue to be pushed but the push never
happens because the schedule chooses the same task.

This bug was found with the help of Gregory Haskins and the use
of ftrace (trace_printk). It tooks several days for both of us
analyzing the code and the trace output to find this.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729042526.205923666@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:08 +02:00
Peter Zijlstra
e709715915 sched: Optimize unused cgroup configuration
When cgroup group scheduling is built in, skip some code paths
if we don't have any (but the root) cgroups configured.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:07 +02:00
Peter Zijlstra
a5004278f0 sched: Fix cgroup smp fairness
Commit ec4e0e2fe0 ("fix
inconsistency when redistribute per-cpu tg->cfs_rq shares")
broke cgroup smp fairness.

In order to avoid starvation of newly placed tasks, we never
quite set the share of an empty cpu group-task to 0, but
instead we set it as if there's a single NICE-0 task present.

If however we actually set this in cfs_rq[cpu]->shares, that
means the total shares for that group will be slightly inflated
every time we balance, causing the observed unfairness.

Fix this by setting cfs_rq[cpu]->shares to 0 but actually
setting the effective weight of the related se to the inflated
number.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248696557.6987.1615.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:26:06 +02:00
Ingo Molnar
8e9ed8b024 Merge branch 'sched/urgent' into sched/core
Merge reason: avoid upcoming patch conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:23:57 +02:00
Thomas Gleixner
a004cd4218 sched: Fix return value of migration_init()
migration_init() returns the return value of the hotplug notifier. In
the success case this is NOTIFY_OK which is 1. initcall_debug
evaluates that as an error code because init calls are expected to
return 0 on success.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-24 09:41:35 +02:00
Frederic Weisbecker
613afbf832 sched: Pull up the might_sleep() check into cond_resched()
might_sleep() is called late-ish in cond_resched(), after the
need_resched()/preempt enabled/system running tests are
checked.

It's better to check the sleeps while atomic earlier and not
depend on some environment datas that reduce the chances to
detect a problem.

Also define cond_resched_*() helpers as macros, so that the
FILE/LINE reported in the sleeping while atomic warning
displays the real origin and not sched.h

Changes in v2:

 - Call __might_sleep() directly instead of might_sleep() which
   may call cond_resched()

 - Turn cond_resched() into a macro so that the file:line
   couple reported refers to the caller of cond_resched() and
   not __cond_resched() itself.

Changes in v3:

 - Also propagate this __might_sleep() pull up to
   cond_resched_lock() and cond_resched_softirq()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:44 +02:00
Frederic Weisbecker
e4aafea2d4 sched: Add a preempt count base offset to __might_sleep()
Add a preempt count base offset to compare against the current
preempt level count. It prepares to pull up the might_sleep
check from cond_resched() to cond_resched_lock() and
cond_resched_bh().

For these two helpers, we need to respectively ensure that once
we'll unlock the given spinlock / reenable local softirqs, we
will reach a sleepable state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Move and rename preempt_count_equals() ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:42 +02:00
Frederic Weisbecker
e09758fae8 sched: Cover the CONFIG_DEBUG_SPINLOCK_SLEEP off-case for __might_sleep()
Cover the off case for __might_sleep(), so that we avoid
#ifdefs in files that make use of it. Especially, this prepares
for the __might_sleep() pull up on cond_resched().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:41 +02:00
Frederic Weisbecker
4b2155678d sched: Remove obsolete comment in __cond_resched()
Remove the outdated comment from __cond_resched() related to
the now removed Big Kernel Semaphore.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:39 +02:00
Frederic Weisbecker
e7aaaa6934 sched: Drop the need_resched() loop from cond_resched()
The schedule() function is a loop that reschedules the current
task while the TIF_NEED_RESCHED flag is set:

void schedule(void)
{
need_resched:
	/* schedule code */
	if (need_resched())
		goto need_resched;
}

And cond_resched() repeat this loop:

do {
	add_preempt_count(PREEMPT_ACTIVE);
	schedule();
	sub_preempt_count(PREEMPT_ACTIVE);
} while(need_resched());

This loop is needless because schedule() already did the check
and nothing can set TIF_NEED_RESCHED between schedule() exit
and the loop check in need_resched().

Then remove this needless loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:51:38 +02:00
Ingo Molnar
5304d5fc74 Merge branch 'linus' into sched/core
Merge reason: branch had an old upstream base (-rc1-ish), but also
              merge to avoid a conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18 15:50:40 +02:00
Thomas Gleixner
a468d38934 sched: fix load average accounting vs. cpu hotplug
The new load average code clears rq->calc_load_active on
CPU_ONLINE. That's wrong as the new onlined CPU might have got a
scheduler tick already and accounted the delta to the stale value of
the time we offlined the CPU.

Clear the value when we cleanup the dead CPU instead. 

Also move the update of the calc_load_update time for the newly online
CPU to CPU_UP_PREPARE to avoid that the CPU plays catch up with the
stale update time value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-18 14:19:52 +02:00
Linus Torvalds
4b0a84043e Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched:
  sched: Fix bug in SCHED_IDLE interaction with group scheduling
  sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq()
  sched: Reset sched stats on fork()
  sched_rt: Fix overload bug on rt group scheduling
  sched: Documentation/sched-rt-group: Fix style issues & bump version
2009-07-16 10:18:29 -07:00
Peter Zijlstra
d86ee4809d sched: optimize cond_resched()
Optimize cond_resched() by removing one conditional.

Currently cond_resched() checks system_state ==
SYSTEM_RUNNING in order to avoid scheduling before the
scheduler is running.

We can however, as per suggestion of Matt, use
PREEMPT_ACTIVE to accomplish that very same.

Suggested-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 14:24:05 -07:00
Fabio Checconi
c20b08e398 sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq()
init_rt_rq() initializes only rq->rt.pushable_tasks, and not the
pushable_tasks field of the passed rt_rq.  The plist is not used
uninitialized since the only pushable_tasks plists used are the
ones of root rt_rqs; anyway reinitializing the list on every group
creation corrupts the root plist, losing its previous contents.

Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090615185638.GK21741@gandalf.sssup.it>
CC: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:43:30 +02:00
Lucas De Marchi
7793527b90 sched: Reset sched stats on fork()
The sched_stat fields are currently not reset upon fork.
Ingo's recent commit 6c594c21fc
did reset nr_migrations, but it didn't reset any of the
others.

This patch resets all sched_stat fields on fork.

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <193b0f820907090457s7a3662f4gcdecdc22fcae857b@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:43:29 +02:00
Peter Zijlstra
a1ba4d8ba9 sched_rt: Fix overload bug on rt group scheduling
Fixes an easily triggerable BUG() when setting process affinities.

Make sure to count the number of migratable tasks in the same place:
the root rt_rq. Otherwise the number doesn't make sense and we'll hit
the BUG in set_cpus_allowed_rt().

Also, make sure we only count tasks, not groups (this is probably
already taken care of by the fact that rt_se->nr_cpus_allowed will be 0
for groups, but be more explicit)

Tested-by: Thomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Gregory Haskins <ghaskins@novell.com>
LKML-Reference: <1247067476.9777.57.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:43:29 +02:00
Hitoshi Mitake
54d35f29f4 sched: Hide runqueues from direct reference at source code level for __raw_get_cpu_var()
Hide __raw_get_cpu_var() as well - thus all the direct
references to runqueues will abstracted out.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
LKML-Reference: <20090629.144457.886429910353660979.mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 09:19:27 +02:00
Ingo Molnar
348b346b23 Merge branch 'linus' into sched/core
Merge reason: we will merge a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 09:16:25 +02:00
Linus Torvalds
12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00