adulau/aha - Forgejo: adulau git carryall

adulau/aha

mirror of https://github.com/adulau/aha.git synced 2024-12-28 19:56:18 +00:00

Author	SHA1	Message	Date
Nick Piggin	cf40bd16fd	lockdep: annotate reclaim context (__GFP_NOFS) Here is another version, with the incremental patch rolled up, and added reclaim context annotation to kswapd, and allocation tracing to slab allocators (which may only ever reach the page allocator in rare cases, so it is good to put annotations here too). Haven't tested this version as such, but it should be getting closer to merge worthy ;) -- After noticing some code in mm/filemap.c accidentally perform a __GFP_FS allocation when it should not have been, I thought it might be a good idea to try to catch this kind of thing with lockdep. I coded up a little idea that seems to work. Unfortunately the system has to actually be in __GFP_FS page reclaim, then take the lock, before it will mark it. But at least that might still be some orders of magnitude more common (and more debuggable) than an actual deadlock condition, so we have some improvement I hope (the concept is no less complete than discovery of a lock's interrupt contexts). I guess we could even do the same thing with __GFP_IO (normal reclaim), and even GFP_NOIO locks too... but filesystems will have the most locks and fiddly code paths, so let's start there and see how it goes. It seems to work. I did a quick test. ================================= [ INFO: inconsistent lock state ] 2.6.28-rc6-00007-ged31348-dirty #26 --------------------------------- inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage. modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd] {in-reclaim-W} state was registered at: [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60 [<ffffffff80268f71>] lock_acquire+0x91/0xc0 [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310 [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd] [<ffffffff8020903b>] _stext+0x3b/0x170 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 3929 hardirqs last enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310 hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310 softirqs last enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0 softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0 other info that might help us debug this: 1 lock held by modprobe/8526: #0: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd] stack backtrace: Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26 Call Trace: [<ffffffff80265483>] print_usage_bug+0x193/0x1d0 [<ffffffff80266530>] mark_lock+0xaf0/0xca0 [<ffffffff80266735>] mark_held_locks+0x55/0xc0 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60 [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580 [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd] [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffff8020903b>] _stext+0x3b/0x170 [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10 [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180 [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:49 +01:00
David Rientjes	3718909448	slub: fix per cpu kmem_cache_cpu array memory leak The per cpu array of kmem_cache_cpu structures accomodates NR_KMEM_CACHE_CPU such structs. When this array overflows and a struct is allocated by kmalloc(), it may have an address at the upper bound of this array. If this happens, it does not get freed and the per cpu kmem_cache_cpu_free pointer will be out of bounds after kmem_cache_destroy() or cpu offlining. Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-01-28 10:43:42 +02:00
Frederik Schwarzer	0211a9c850	trivial: fix an -> a typos in documentation and comments It is always "an" if there is a vowel _spoken_ (not written). So it is: "an hour" (spoken vowel) but "a uniform" (spoken 'j') Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:07 +01:00
Rusty Russell	174596a0b9	cpumask: convert mm/ Impact: Use new API Convert kernel mm functions to use struct cpumask. We skip include/linux/percpu.h and mm/allocpercpu.c, which are in flux. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org>	2009-01-01 10:12:29 +10:30
Rusty Russell	2ca1a61583	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: arch/x86/kernel/io_apic.c	2008-12-31 23:05:57 +10:30
Pekka Enberg	3c506efd7e	Merge branch 'topic/failslab' into for-linus Conflicts: mm/slub.c Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:47:05 +02:00
Pekka Enberg	fd37617e69	Merge branches 'topic/fixes', 'topic/cleanups' and 'topic/documentation' into for-linus	2008-12-29 11:45:47 +02:00
David Rientjes	7b8f3b66d9	slub: avoid leaking caches or refcounts on sysfs error If a slab cache is mergeable and the sysfs alias cannot be added, the target cache shall have its refcount decremented. kmem_cache_create() will return NULL, so if kmem_cache_destroy() is ever called on the target cache, it will never be freed if the refcount has been leaked. Likewise, if a slab cache is not mergeable and the sysfs link cannot be added, the new cache shall be removed from the slab_caches list. kmem_cache_create() will return NULL, so it will be impossible to call kmem_cache_destroy() on it. Both of these operations require slub_lock since refcount of all slab caches and slab_caches are protected by the lock. In the mergeable case, it would be better to restore objsize and offset back to their original values, but this could race with another merge since slub_lock was dropped. Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:40:58 +02:00
OGAWA Hirofumi	89124d706d	slub: Add might_sleep_if() to slab_alloc() Currently SLUB doesn't warn about __GFP_WAIT. Add it into slab_alloc(). Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:40:51 +02:00
Akinobu Mita	773ff60e84	SLUB: failslab support Currently fault-injection capability for SLAB allocator is only available to SLAB. This patch makes it available to SLUB, too. [penberg@cs.helsinki.fi: unify slab and slub implementations] Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:27:46 +02:00
Rusty Russell	29c0177e6a	cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: mingo@redhat.com Cc: tony.luck@intel.com Cc: ralf@linux-mips.org Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: cl@linux-foundation.org Cc: srostedt@redhat.com	2008-12-13 21:20:25 +10:30
Hugh Dickins	9c24624727	KSYM_SYMBOL_LEN fixes Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my `966c8c12dc` sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:54 -08:00
Nick Andrew	9f6c708e5c	slub: Fix incorrect use of loose It should be 'lose', not 'loose'. Signed-off-by: Nick Andrew <nick@nick-andrew.net> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-08 10:41:10 +02:00
KAMEZAWA Hiroyuki	dc19f9db38	memcg: memory hotplug fix for notifier callback Fixes for memcg/memory hotplug. While memory hotplug allocate/free memmap, page_cgroup doesn't free page_cgroup at OFFLINE when page_cgroup is allocated via bootomem. (Because freeing bootmem requires special care.) Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated by memory hotplug, page_cgroup->page == page is no longer true. But current MEM_ONLINE handler doesn't check it and update page_cgroup->page if it's not necessary to allocate page_cgroup. (This was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.) And I noticed that MEM_ONLINE can be called against "part of section". So, freeing page_cgroup at CANCEL_ONLINE will cause trouble. (freeing used page_cgroup) Don't rollback at CANCEL. One more, current memory hotplug notifier is stopped by slub because it sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback never be called. (low priority than slub now.) I think this slub's behavior is not intentional(BUG). and fixes it. Another way to be considered about page_cgroup allocation: - free page_cgroup at OFFLINE even if it's from bootmem and remove specieal handler. But it requires more changes. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041 Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-01 19:55:24 -08:00
David Rientjes	0094de92a4	slub: make early_kmem_cache_node_alloc void The return value for early_kmem_cache_node_alloc() is unused, so it is better defined as void. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:47:26 +02:00
Cyrill Gorcunov	e9beef1815	slub - fix get_object_page comment Use 'slab page' instead of 'slab object'. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:47:25 +02:00
Eduard - Gabriel Munteanu	ce71e27c6f	SLUB: Replace __builtin_return_address(0) with _RET_IP_. This patch replaces __builtin_return_address(0) with _RET_IP_, since a previous patch moved _RET_IP_ and _THIS_IP_ to include/linux/kernel.h and they're widely available now. This makes for shorter and easier to read code. [penberg@cs.helsinki.fi: remove _RET_IP_ casts to void pointer] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:47:25 +02:00
Cyrill Gorcunov	210b5c0613	SLUB: cleanup - define macros instead of hardcoded numbers Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:47:24 +02:00
Alexey Dobriyan	7b3c3a50a3	proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c Lose dummy ->write hook in case of SLUB, it's possible now. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-10-23 15:20:06 +04:00
Salman Qazi	02b71b7012	slub: fixed uninitialized counter in struct kmem_cache_node Initialized total objects atomic for the node in init_kmem_cache_node. The uninitialized value was ruining the stats in /proc/slabinfo. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-09-15 09:49:05 +03:00
Christoph Lameter	e2cb96b7ec	slub: Disable NUMA remote node defragmentation by default Switch remote node defragmentation off by default. The current settings can cause excessive node local allocations with hackbench: SLAB: % cat /proc/meminfo MemTotal: 7701760 kB MemFree: 5940096 kB Slab: 123840 kB SLUB: % cat /proc/meminfo MemTotal: 7701376 kB MemFree: 4740928 kB Slab: 1591680 kB [Note: this feature is not related to slab defragmentation.] You can find the original discussion here: http://lkml.org/lkml/2008/8/4/308 Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-08-20 21:50:21 +03:00
Pekka Enberg	5595cffc82	SLUB: dynamic per-cache MIN_PARTIAL This patch changes the static MIN_PARTIAL to a dynamic per-cache ->min_partial value that is calculated from object size. The bigger the object size, the more pages we keep on the partial list. I tested SLAB, SLUB, and SLUB with this patch on Jens Axboe's 'netio' example script of the fio benchmarking tool. The script stresses the networking subsystem which should also give a fairly good beating of kmalloc() et al. To run the test yourself, first clone the fio repository: git clone git://git.kernel.dk/fio.git and then run the following command n times on your machine: time ./fio examples/netio The results on my 2-way 64-bit x86 machine are as follows: [ the minimum, maximum, and average are captured from 50 individual runs ] real time (seconds) min max avg sd SLAB 22.76 23.38 22.98 0.17 SLUB 22.80 25.78 23.46 0.72 SLUB (dynamic) 22.74 23.54 23.00 0.20 sys time (seconds) min max avg sd SLAB 6.90 8.28 7.70 0.28 SLUB 7.42 16.95 8.89 2.28 SLUB (dynamic) 7.17 8.64 7.73 0.29 user time (seconds) min max avg sd SLAB 36.89 38.11 37.50 0.29 SLUB 30.85 37.99 37.06 1.67 SLUB (dynamic) 36.75 38.07 37.59 0.32 As you can see from the above numbers, this patch brings SLUB to the same level as SLAB for this particular workload fixing a ~2% regression. I'd expect this change to help similar workloads that allocate a lot of objects that are close to the size of a page. Cc: Matthew Wilcox <matthew@wil.cx> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-08-05 09:28:47 +03:00
Adrian Bunk	231367fd9b	mm: unexport ksize This patch removes the obsolete and no longer used exports of ksize. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-29 23:44:26 +03:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Andy Whitcroft	8a38082d21	slub: record page flag overlays explicitly SLUB reuses two page bits for internal purposes, it overlays PG_active and PG_error. This is hidden away in slub.c. Document these overlays explicitly in the main page-flags enum along with all the others. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Pekka Enberg	0ebd652b35	slub: dump more data on slab corruption The limit of 128 bytes is too small when debugging slab corruption of the skb cache, for example. So increase the limit to PAGE_SIZE to make debugging corruptions easier. Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-19 14:17:22 +03:00
Alexey Dobriyan	41ab8592ca	SLUB: simplify re on_each_cpu() on_each_cpu() expands to function call on UP, too. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-16 23:55:00 +03:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Alexey Dobriyan	88e4ccf294	slub: current is always valid Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-15 20:36:01 +03:00
Christoph Lameter	0937502af7	slub: Add check for kfree() of non slab objects. We can detect kfree()s on non slab objects by checking for PageCompound(). Works in the same way as for ksize. This helped me catch an invalid kfree(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-15 20:36:01 +03:00
Linus Torvalds	7daf705f36	Start using the new '%pS' infrastructure to print symbols This simplifies the code significantly, and was the whole point of the exercise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-14 12:12:53 -07:00
Dmitry Adamushko	bdb2192851	slub: Fix use-after-preempt of per-CPU data structure Vegard Nossum reported a crash in kmem_cache_alloc(): BUG: unable to handle kernel paging request at da87d000 IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0 pde = 28180163 pte = 1a87d160 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5) EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0 EIP is at kmem_cache_alloc+0xc7/0xe0 EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 and analyzed it: "The register %ecx looks innocent but is very important here. The disassembly: mov %edx,%ecx shr $0x2,%ecx rep stos %eax,%es:(%edi) <-- the fault So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE. (0x6b6b6b6b >> 2 == 0x1adadada.) %ecx is the counter for the memset, from here: memset(object, 0, c->objsize); i.e. %ecx was loaded from c->objsize, so "c" must have been freed. Where did "c" come from? Uh-oh... c = get_cpu_slab(s, smp_processor_id()); This looks like it has very much to do with CPU hotplug/unplug. Is there a race between SLUB/hotplug since the CPU slab is used after it has been freed?" Good analysis. Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc() can be migrated on another CPU right after local_irq_restore() and before memset(). The inital cpu can become offline in the mean time (or a migration is a consequence of the CPU going offline) so its 'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback). At some point of time the caller continues on another CPU having an obsolete pointer... Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-10 15:18:50 -07:00
Christoph Lameter	cde5353599	Christoph has moved Remove all clameter@sgi.com addresses from the kernel tree since they will become invalid on June 27th. Change my maintainer email address for the slab allocators to cl@linux-foundation.org (which will be the new email address for the future). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-04 10:40:04 -07:00
Christoph Lameter	41d54d3bf8	slub: Do not use 192 byte sized cache if minimum alignment is 128 byte The 192 byte cache is not necessary if we have a basic alignment of 128 byte. If it would be used then the 192 would be aligned to the next 128 byte boundary which would result in another 256 byte cache. Two 256 kmalloc caches cause sysfs to complain about a duplicate entry. MIPS needs 128 byte aligned kmalloc caches and spits out warnings on boot without this patch. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-03 19:01:55 +03:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Pekka Enberg	76994412f8	slub: ksize() abuse checks Add a WARN_ON for pages that don't have PageSlab nor PageCompound set to catch the worst abusers of ksize() in the kernel. Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-05-22 19:52:18 +03:00
Benjamin Herrenschmidt	4ea33e2dc2	slub: fix atomic usage in any_slab_objects() any_slab_objects() does an atomic_read on an atomic_long_t, this fixes it to use atomic_long_read instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-08 10:46:56 -07:00
Christoph Lameter	f6acb63508	slub: #ifdef simplification If we make SLUB_DEBUG depend on SYSFS then we can simplify some #ifdefs and avoid others. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-05-02 00:27:13 +03:00
Christoph Lameter	0121c619d0	slub: Whitespace cleanup and use of strict_strtoul Fix some issues with wrapping and use strict_strtoul to make parameter passing from sysfs safer. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-05-02 00:26:31 +03:00
Roman Zippel	f8bd2258e2	remove div_long_long_rem x86 is the only arch right now, which provides an optimized for div_long_long_rem and it has the downside that one has to be very careful that the divide doesn't overflow. The API is a little akward, as the arguments for the unsigned divide are signed. The signed version also doesn't handle a negative divisor and produces worse code on 64bit archs. There is little incentive to keep this API alive, so this converts the few users to the new API. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Thomas Gleixner	3ac7fe5a4a	infrastructure to debug (dynamic) objects We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:53 -07:00
Nadia Derbey	0c40ba4fd6	ipc: define the slab_memory_callback priority as a constant This is a trivial patch that defines the priority of slab_memory_callback in the callback chain as a constant. This is to prepare for next patch in the series. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Linus Torvalds	e97e386b12	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: pack objects denser slub: Calculate min_objects based on number of processors. slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS slub: Simplify any_slab_object checks slub: Make the order configurable for each slab cache slub: Drop fallback to page allocator method slub: Fallback to minimal order during slab page allocation slub: Update statistics handling for variable order slabs slub: Add kmem_cache_order_objects struct slub: for_each_object must be passed the number of objects in a slab slub: Store max number of objects in the page struct. slub: Dump list of objects not freed on kmem_cache_close() slub: free_list() cleanup slub: improve kmem_cache_destroy() error message slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment.	2008-04-28 14:08:56 -07:00
Pekka Enberg	1b27d05b6e	mm: move cache_line_size() to <linux/cache.h> Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Mel Gorman	dd1a239f6f	mm: have zonelist contains structs with both a zone pointer and zone_idx Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	54a6eb5c47	mm: use two zonelist that are filtered by GFP mask Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	0e88460da6	mm: introduce node_zonelist() for accessing the zonelist for a GFP mask Introduce a node_zonelist() helper function. It is used to lookup the appropriate zonelist given a node and a GFP mask. The patch on its own is a cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If necessary, it can be merged with the next patch in this set without problems. Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Christoph Lameter	c124f5b54f	slub: pack objects denser Since we now have more orders available use a denser packing. Increase slab order if more than 1/16th of a slab would be wasted. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:40 +03:00
Christoph Lameter	9b2cd506e5	slub: Calculate min_objects based on number of processors. The mininum objects per slab is calculated based on the number of processors that may come online. Processors min_objects --------------------------- 1 8 2 12 4 16 8 20 16 24 32 28 64 32 1024 48 4096 56 The higher the number of processors the large the order sizes used for various slab caches will become. This has been shown to address the performance issues in hackbench on 16p etc. The calculation is only performed if slub_min_objects is zero (default). If one specifies a slub_min_objects on boot then that setting is taken. As suggested by Zhang Yanmin's performance tests on 16-core Tigerton, use the formula '4 * (fls(nr_cpu_ids) + 1)': ./hackbench 100 process 2000: 1) 2.6.25-rc6slab: 23.5 seconds 2) 2.6.25-rc7SLUB+slub_min_objects=20: 31 seconds 3) 2.6.25-rc7SLUB+slub_min_objects=24: 23.5 seconds Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:40 +03:00
Christoph Lameter	114e9e89e6	slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS We can now fallback to order 0 slabs. So set the slub_max_order to PAGE_CACHE_ORDER_COSTLY but keep the slub_min_objects at 4. This will mostly preserve the orders used in 2.6.25. F.e. The 2k kmalloc slab will use order 1 allocs and the 4k kmalloc slab order 2. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:39 +03:00

1 2 3 4 5

219 commits