In massive parallel enviroment, res_counter can be a performance
bottleneck. One strong techinque to reduce lock contention is reducing
calls by coalescing some amount of calls into one.
Considering charge/uncharge chatacteristic,
- charge is done one by one via demand-paging.
- uncharge is done by
- in chunk at munmap, truncate, exit, execve...
- one by one via vmscan/paging.
It seems we have a chance to coalesce uncharges for improving scalability
at unmap/truncation.
This patch is a for coalescing uncharge. For avoiding scattering memcg's
structure to functions under /mm, this patch adds memcg batch uncharge
information to the task. A reason for per-task batching is for making use
of caller's context information. We do batched uncharge (deleyed
uncharge) when truncation/unmap occurs but do direct uncharge when
uncharge is called by memory reclaim (vmscan.c).
The degree of coalescing depends on callers
- at invalidate/trucate... pagevec size
- at unmap ....ZAP_BLOCK_SIZE
(memory itself will be freed in this degree.)
Then, we'll not coalescing too much.
On x86-64 8cpu server, I tested overheads of memcg at page fault by
running a program which does map/fault/unmap in a loop. Running
a task per a cpu by taskset and see sum of the number of page faults
in 60secs.
[without memcg config]
40156968 page-faults # 0.085 M/sec ( +- 0.046% )
27.67 cache-miss/faults
[root cgroup]
36659599 page-faults # 0.077 M/sec ( +- 0.247% )
31.58 miss/faults
[in a child cgroup]
18444157 page-faults # 0.039 M/sec ( +- 0.133% )
69.96 miss/faults
[child with this patch]
27133719 page-faults # 0.057 M/sec ( +- 0.155% )
47.16 miss/faults
We can see some amounts of improvement.
(root cgroup doesn't affected by this patch)
Another patch for "charge" will follow this and above will be improved more.
Changelog(since 2009/10/02):
- renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
- some clean up and commentary/description updates.
- added initialize code to copy_process(). (possible bug fix)
Changelog(old):
- fixed !CONFIG_MEM_CGROUP case.
- rebased onto the latest mmotm + softlimit fix patches.
- unified patch for callers
- added commetns.
- make ->do_batch as bool.
- removed css_get() at el. We don't need it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ktime will overflow from 03:14:07 UTC on Tuesday, 19 January 2038,
ktime_add() in timecompare_update() will overflow a half earlier. As a
result, wrong offset will be gotten, then cause some strange problems.
Signed-off-by: Barry Song <21cnbao@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Patrick Ohly <patrick.ohly@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
clockevents: Convert to raw_spinlock
clockevents: Make tick_device_lock static
debugobjects: Convert to raw_spinlocks
perf_event: Convert to raw_spinlock
hrtimers: Convert to raw_spinlocks
genirq: Convert irq_desc.lock to raw_spinlock
smp: Convert smplocks to raw_spinlocks
rtmutes: Convert rtmutex.lock to raw_spinlock
sched: Convert pi_lock to raw_spinlock
sched: Convert cpupri lock to raw_spinlock
sched: Convert rt_runtime_lock to raw_spinlock
sched: Convert rq->lock to raw_spinlock
plist: Make plist debugging raw_spinlock aware
bkl: Fixup core_lock fallout
locking: Cleanup the name space completely
locking: Further name space cleanups
alpha: Fix fallout from locking changes
locking: Implement new raw_spinlock
locking: Convert raw_rwlock functions to arch_rwlock
locking: Convert raw_rwlock to arch_rwlock
...
Makes use of skip_spaces() defined in lib/string.c for removing leading
spaces from strings all over the tree.
It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
text data bss dec hex filename
64688 584 592 65864 10148 (TOTALS-BEFORE)
64641 584 592 65817 10119 (TOTALS-AFTER)
Also, while at it, if we see (*str && isspace(*str)), we can be sure to
remove the first condition (*str) as the second one (isspace(*str)) also
evaluates to 0 whenever *str == 0, making it redundant. In other words,
"a char equals zero is never a space".
Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
and found occurrences of this pattern on 3 more files:
drivers/leds/led-class.c
drivers/leds/ledtrig-timer.c
drivers/video/output.c
@@
expression str;
@@
( // ignore skip_spaces cases
while (*str && isspace(*str)) { \(str++;\|++str;\) }
|
- *str &&
isspace(*str)
)
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Julia Lawall <julia@diku.dk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel offers with TIOCL_GETKMSGREDIRECT ioctl() the possibility to
redirect the kernel messages to a specific console.
However, since it's not possible to switch to the kernel message console
after a panic(), it would be nice if the kernel would print the panic
message on the current console.
This patch series adds a new interface to access the global kmsg_redirect
variable by a function to be able to use it in code where
CONFIG_VT_CONSOLE is not set (kernel/panic.c).
This patch:
Instead of using and exporting a global value kmsg_redirect, introduce a
function vt_kmsg_redirect() that both can set and return the console where
messages are printed.
Change all users of kmsg_redirect (the VT code itself and kernel/power.c)
to the new interface.
The main advantage is that vt_kmsg_redirect() can also be used when
CONFIG_VT_CONSOLE is not set.
Signed-off-by: Bernhard Walle <bernhard@bwalle.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_each_thread/while_each_thread wrap a block of code that is in this format:
for (...)
do
...
while
If curly braces do not surround the inner loop the following warning is
generated by sparse:
warning: do-while statement is not a compound statement
Fix the warning by adding the braces.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use smp_processor_id() instead of get_cpu() and put_cpu() in
generic_smp_call_function_interrupt(), It's no need to disable preempt,
because we must call generic_smp_call_function_interrupt() with interrupts
disabled.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt reported we have this problem:
setting max_map_count to a value large enough results in programs dying at
first try. This is on 2.6.31.6:
15:59 borg:/proc/sys/vm # echo $[1<<31-1] >max_map_count
15:59 borg:/proc/sys/vm # cat max_map_count
1073741824
15:59 borg:/proc/sys/vm # echo $[1<<31] >max_map_count
15:59 borg:/proc/sys/vm # cat max_map_count
Killed
This is because we have a chance to make 'max_map_count' negative. but
it's meaningless. Make it only accept non-negative values.
Reported-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: James Morris <jmorris@namei.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch derives a "nodes_allowed" node mask from the numa mempolicy of
the task modifying the number of persistent huge pages to control the
allocation, freeing and adjusting of surplus huge pages when the pool page
count is modified via the new sysctl or sysfs attribute
"nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows:
* For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
is produced. This will cause the hugetlb subsystem to use
node_online_map as the "nodes_allowed". This preserves the
behavior before this patch.
* For "preferred" mempolicy, including explicit local allocation,
a nodemask with the single preferred node will be produced.
"local" policy will NOT track any internode migrations of the
task adjusting nr_hugepages.
* For "bind" and "interleave" policy, the mempolicy's nodemask
will be used.
* Other than to inform the construction of the nodes_allowed node
mask, the actual mempolicy mode is ignored. That is, all modes
behave like interleave over the resulting nodes_allowed mask
with no "fallback".
See the updated documentation [next patch] for more information
about the implications of this patch.
Examples:
Starting with:
Node 0 HugePages_Total: 0
Node 1 HugePages_Total: 0
Node 2 HugePages_Total: 0
Node 3 HugePages_Total: 0
Default behavior [with or without this patch] balances persistent
hugepage allocation across nodes [with sufficient contiguous memory]:
sysctl vm.nr_hugepages[_mempolicy]=32
yields:
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 2 HugePages_Total: 8
Node 3 HugePages_Total: 8
Of course, we only have nr_hugepages_mempolicy with the patch,
but with default mempolicy, nr_hugepages_mempolicy behaves the
same as nr_hugepages.
Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
'--membind' because it allows multiple nodes to be specified
and it's easy to type]--we can allocate huge pages on
individual nodes or sets of nodes. So, starting from the
condition above, with 8 huge pages per node, add 8 more to
node 2 using:
numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
This yields:
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 2 HugePages_Total: 16
Node 3 HugePages_Total: 8
The incremental 8 huge pages were restricted to node 2 by the
specified mempolicy.
Similarly, we can use mempolicy to free persistent huge pages
from specified nodes:
numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
yields:
Node 0 HugePages_Total: 4
Node 1 HugePages_Total: 4
Node 2 HugePages_Total: 16
Node 3 HugePages_Total: 8
The 8 huge pages freed were balanced over nodes 0 and 1.
[rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit d8e180dcd5 "bsdacct: switch
credentials for writing to the accounting file" introduced credential
switching during final acct data collecting. However, uid/gid pair
continued to be collected from current which became credentials of who
created acct file, not who exits.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14676
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Juho K. Juopperi <jkj@kapsi.fi>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
plists are used with spinlocks and raw_spinlocks. Change the plist
debugging to handle both types.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Make the name space hierarchy of locking functions consistent:
raw_spin* -> _raw_spin* -> __raw_spin*
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
The name space hierarchy for the internal lock functions is now a bit
backwards. raw_spin* functions map to _spin* which use __spin*, while
we would like to have _raw_spin* and __raw_spin*.
_raw_spin* is already used by lock debugging, so rename those funtions
to do_raw_spin* to free up the _raw_spin* name space.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Now that the raw_spin name space is freed up, we can implement
raw_spinlock and the related functions which are used to annotate the
locks which are not converted to sleeping spinlocks in preempt-rt.
A side effect is that only such locks can be used with the low level
lock fsunctions which circumvent lockdep.
For !rt spin_* functions are mapped to the raw_spin* implementations.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Further name space cleanup. No functional change
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Separate spin_lock and rw_lock functions. Preempt-RT needs to exclude
the rw_lock functions from being compiled. The reordering allows to do
that with a single #ifdef.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
itimer: Fix the itimer trace print format
hrtimer: move timer stats helper functions to hrtimer.c
hrtimer: Tune hrtimer_interrupt hang logic
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Avoid out of bounds array reference in save_trace()
futex: Take mmap_sem for get_user_pages in fault_in_user_writeable
lockstat: Add usage info to Documentation/lockstat.txt
lockstat: Fix min, max times in /proc/lock_stats
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Remove comparing of NULL to va_list in trace_array_vprintk()
tracing: Fix function graph trace_pipe to properly display failed entries
tracing: Add full state to trace_seq
tracing: Buffer the output of seq_file in case of filled buffer
tracing: Only call pipe_close if pipe_close is defined
tracing: Add pipe_close interface
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
x86, perf events: Check if we have APIC enabled
perf_event: Fix variable initialization in other codepaths
perf kmem: Fix unused argument build warning
perf symbols: perf_header__read_build_ids() offset'n'size should be u64
perf symbols: dsos__read_build_ids() should read both user and kernel buildids
perf tools: Align long options which have no short forms
perf kmem: Show usage if no option is specified
sched: Mark sched_clock() as notrace
perf sched: Add max delay time snapshot
perf tools: Correct size given to memset
perf_event: Fix perf_swevent_hrtimer() variable initialization
perf sched: Fix for getting task's execution time
tracing/kprobes: Fix field creation's bad error handling
perf_event: Cleanup for cpu_clock_perf_event_update()
perf_event: Allocate children's perf_event_ctxp at the right time
perf_event: Clean up __perf_event_init_context()
hw-breakpoints: Modify breakpoints without unregistering them
perf probe: Update perf-probe document
perf probe: Support --del option
trace-kprobe: Support delete probe syntax
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits)
tty: split the lock up a bit further
tty: Move the leader test in disassociate
tty: Push the bkl down a bit in the hangup code
tty: Push the lock down further into the ldisc code
tty: push the BKL down into the handlers a bit
tty: moxa: split open lock
tty: moxa: Kill the use of lock_kernel
tty: moxa: Fix modem op locking
tty: moxa: Kill off the throttle method
tty: moxa: Locking clean up
tty: moxa: rework the locking a bit
tty: moxa: Use more tty_port ops
tty: isicom: fix deadlock on shutdown
tty: mxser: Use the new locking rules to fix setserial properly
tty: mxser: use the tty_port_open method
tty: isicom: sort out the board init logic
tty: isicom: switch to the new tty_port_open helper
tty: tty_port: Add a kref object to the tty port
tty: istallion: tty port open/close methods
tty: stallion: Convert to the tty_port_open/close methods
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Always process the whole breakpoint list on activate or deactivate
kgdb: continue and warn on signal passing from gdb
kgdb,x86: do not set kgdb_single_step on x86
kgdb: allow for cpu switch when single stepping
kgdb,i386: Fix corner case access to ss with NMI watch dog exception
kgdb: Replace strstr() by strchr() for single-character needles
kgdbts: Read buffer overflow
kgdb: Read buffer overflow
kgdb,x86: remove redundant test
There are two call points, both want to check that tty->signal->leader is
set. Move the test into disassociate_ctty() as that will make locking
changes easier in a bit
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes 2 edge cases in using kgdb in conjunction with gdb.
1) kgdb_deactivate_sw_breakpoints() should process the entire array of
breakpoints. The failure to do so results in breakpoints that you
cannot remove, because a break point can only be removed if its
state flag is set to BP_SET.
The easy way to duplicate this problem is to plant a break point in
a kernel module and then unload the kernel module.
2) kgdb_activate_sw_breakpoints() should process the entire array of
breakpoints. The failure to do so results in missed breakpoints
when a breakpoint cannot be activated.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
On some architectures for the segv trap, gdb wants to pass the signal
back on continue. For kgdb this is not the default behavior, because
it can cause the kernel to crash if you arbitrarily pass back a
exception outside of kgdb.
Instead of causing instability, pass a message back to gdb about the
supported kgdb signal passing and execute a standard kgdb continue
operation.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The kgdb core should not assume that a single step operation of a
kernel thread will complete on the same CPU. The single step flag is
set at the "thread" level and it is possible in a multi cpu system
that a kernel thread can get scheduled on another cpu the next time it
is run.
As a further safety net in case a slave cpu is hung, the debug master
cpu will try 100 times before giving up and assuming control of the
slave cpus is no longer possible. It is more useful to be able to get
some information out of kgdb instead of spinning forever.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Roel Kluin reported an error found with Parfait. Where we want to
ensure that that kgdb_info[-1] never gets accessed.
Also check to ensure any negative tid does not exceed the size of the
shadow CPU array, else report critical debug context because it is an
internal kgdb failure.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
This build warning:
kernel/sched.c: In function 'set_task_cpu':
kernel/sched.c:2070: warning: unused variable 'old_rq'
Made me realize that the forced2_migrations stat looks pretty
pointless (and a misnomer) - remove it.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the second in each of these pairs of allocations fails, then the
first one will not be freed in the error route out.
Found by a static code analysis tool.
Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1260448177-28448-1-git-send-email-ext-phil.2.carmody@nokia.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no reason to make timer_stats_hrtimer_set_start_info and
friends visible to the rest of the kernel. So move all of them to
hrtimer.c. Also make timer_stats_hrtimer_set_start_info a static
inline function so it gets inlined and we avoid another function call.
Based on a patch by Thomas Gleixner.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20091210095629.GC4144@osiris.boeblingen.de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.
This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.
Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.
The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list
Inspired by a patch from Marcelo.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org