Commit graph

777 commits

Author SHA1 Message Date
Nick Piggin
643b52b9c0 radix-tree: fix small lockless radix-tree bug
We shrink a radix tree when its root node has only one child, in the left
most slot.  The child becomes the new root node.  To perform this
operation in a manner compatible with concurrent lockless lookups, we
atomically switch the root pointer from the parent to its child.

However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next).  For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing.  Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.

This is all pretty standard RCU stuff.  It is worth repeating because in
my eagerness to obey the radix tree node constructor scheme, I had broken
it by zeroing the radix tree node before the grace period.

What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the slot
contained NULL due to the concurrent shrinker having zeroed the parent
node before waiting for a grace period.  The lookup would return a false
negative as a result.

Fix it by doing that clearing in the RCU callback.  I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).

This was never actually found in any lockless pagecache testing or by the
test harness, but by seeing the odd problem with my scalable vmap rewrite.
 I have not tickled the test harness into reproducing it yet, but I'll
keep working at it.

Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12 18:05:41 -07:00
Jeremy Fitzhardinge
d5e181f78a add an inlined version of iter_div_u64_rem
iter_div_u64_rem is used in the x86-64 vdso, which cannot call other
kernel code.  For this case, provide the always_inlined version,
__iter_div_u64_rem.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:47:58 +02:00
Jeremy Fitzhardinge
f595ec964d common implementation of iterative div/mod
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.

The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:47:56 +02:00
Alex Chiang
8344b568f5 PCI: ACPI PCI slot detection driver
Detect all physical PCI slots as described by ACPI, and create entries in
/sys/bus/pci/slots/.

Not all physical slots are hotpluggable, and the acpiphp module does not
detect them.  Now we know the physical PCI geography of our system, without
caring about hotplug.

[kaneshige.kenji@jp.fujitsu.com: export-kobject_rename-for-pci_hotplug_core]
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Greg KH <greg@kroah.com>
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix build with CONFIG_DMI=n]
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 14:37:14 -07:00
Harvey Harrison
3527fb326f lib: export bitrev16
Bluetooth will be able to use this.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-06 11:29:10 -07:00
Ingo Molnar
886dd58258 debugging: make stacktrace independent from DEBUG_KERNEL
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 15:55:20 +02:00
Ingo Molnar
9c44bc03ff softlockup: allow panic on lockup
allow users to configure the softlockup detector to generate a panic
instead of a warning message.

high-availability systems might opt for this strict method (combined
with panic_timeout= boot option/sysctl), instead of generating
softlockup warnings ad infinitum.

also, automated tests work better if the system reboots reliably (into
a safe kernel) in case of a lockup.

The full spectrum of configurability is supported: boot option, sysctl
option and Kconfig option.

it's default-disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 06:34:44 +02:00
Steven Rostedt
654e478768 ftrace: use the new kbuild CFLAGS_REMOVE for lib directory
This patch removes the Makefile turd and uses the nice CFLAGS_REMOVE macro
in the lib directory.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 22:46:23 +02:00
Steven Rostedt
9d0a420b73 ftrace: remove function tracing from spinlock debug
The debug functions in spin_lock debugging pollute the output of the
function tracer. This patch adds the debug files in the lib director
to those that should not be compiled with mcount tracing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:14:28 +02:00
Steven Rostedt
3594136ad6 ftrace: do not profile lib/string.o
Most archs define the string and memory compare functions in assembly.
Some do not. But these functions may be used in some archs at early
boot up.

Since most archs define this code in assembly and they are not usually
traced, there's no need to trace them when they are not defined in
assembly.

This patch removes the -pg from the CFLAGS for lib/string.o.
This prevents the string functions use in either vdso or early bootup
from crashing the system.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:56:43 +02:00
Steven Rostedt
5568b139f4 ftrace: debug smp_processor_id, use notrace preempt disable
The debug smp_processor_id caused a recursive fault in debugging
the irqsoff tracer. The tracer used a smp_processor_id in the
ftrace callback, and this function called preempt_disable which
also is traced. This caused a recursive fault (stack overload).

Since using smp_processor_id without debugging on does not cause
faults with the tracer (even when the tracer is wrong), the
debug version should not cause a system reboot.

This changes the debug_smp_processor_id to use the notrace versions
of preempt_disable and enable.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:39:17 +02:00
Arnaldo Carvalho de Melo
16444a8a40 ftrace: add basic support for gcc profiler instrumentation
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.

The ftrace routine will then call a registered function if a function
happens to be registered.

[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
  so don't blame Arnaldo for all of this ;-) ]

Update:
  It is now possible to register more than one ftrace function.
  If only one ftrace function is registered, that will be the
  function that ftrace calls directly. If more than one function
  is registered, then ftrace will call a function that will loop
  through the functions to call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:31:58 +02:00
Arnaldo Carvalho de Melo
6e766410c4 ftrace: annotate core code that should not be traced
Mark with "notrace" functions in core code that should not be
traced.  The "notrace" attribute will prevent gcc from adding
a call to ftrace on the annotated funtions.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 20:31:48 +02:00
Mike Travis
41df0d61c2 x86: Add performance variants of cpumask operators
* Increase performance for systems with large count NR_CPUS by limiting
    the range of the cpumask operators that loop over the bits in a cpumask_t
    variable.  This removes a large amount of wasted cpu cycles.

  * Add performance variants of the cpumask operators:

    int cpus_weight_nr(mask)	     Same using nr_cpu_ids instead of NR_CPUS
    int first_cpu_nr(mask)	     Number lowest set bit, or nr_cpu_ids
    int next_cpu_nr(cpu, mask)	     Next cpu past 'cpu', or nr_cpu_ids
    for_each_cpu_mask_nr(cpu, mask)  for-loop cpu over mask using nr_cpu_ids

  * Modify following to use performance variants:

    #define num_online_cpus()	cpus_weight_nr(cpu_online_map)
    #define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
    #define num_present_cpus()	cpus_weight_nr(cpu_present_map)

    #define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
    #define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), ...)
    #define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), ...)

  * Comment added to include/linux/cpumask.h:

    Note: The alternate operations with the suffix "_nr" are used
	  to limit the range of the loop to nr_cpu_ids instead of
	  NR_CPUS when NR_CPUS > 64 for performance reasons.
	  If NR_CPUS is <= 64 then most assembler bitmask
	  operators execute faster with a constant range, so
	  the operator will continue to use NR_CPUS.

	  Another consideration is that nr_cpu_ids is initialized
	  to NR_CPUS and isn't lowered until the possible cpus are
	  discovered (including any disabled cpus).  So early uses
	  will span the entire range of NR_CPUS.

    (The net effect is that for systems with 64 or less CPU's there are no
     functional changes.)

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-23 18:23:38 +02:00
Franck Bui-Huu
82524746c2 rcu: split list.h and move rcu-protected lists into rculist.h
Move rcu-protected lists from list.h into a new header file rculist.h.

This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.

For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h.  It actually compiles because users of
rcu_dereference() are macros.  Others RCU functions could be used too but
aren't probably because of this.

Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-19 10:01:37 +02:00
Kumar Gala
f9ebcd9d41 lmb: Fix compile warning
lib/lmb.c: In function 'lmb_dump_all':
lib/lmb.c:51: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'u64'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-05-18 23:35:43 -05:00
Linus Torvalds
8f40f672e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix error path during early mount
  9p: make cryptic unknown error from server less scary
  9p: fix flags length in net
  9p: Correct fidpool creation failure in p9_client_create
  9p: use struct mutex instead of struct semaphore
  9p: propagate parse_option changes to client and transports
  fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
  9p: Documentation updates
  add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
2008-05-14 19:30:51 -07:00
Linus Torvalds
8978a31883 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Use a TS_RESTORE_SIGMASK
  lmb: Make lmb debugging more useful.
  lmb: Fix inconsistent alignment of size argument.
  sparc: Fix mremap address range validation.
2008-05-14 19:11:36 -07:00
Harvey Harrison
3fc957721d lib: create common ascii hex array
Add a common hex array in hexdump.c so everyone can use it.

Add a common hi/lo helper to avoid the shifting masking that is
done to get the upper and lower nibbles of a byte value.

Pull the pack_hex_byte helper from kgdb as it is opencoded many
places in the tree that will be consolidated.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-14 19:11:14 -07:00
Markus Armbruster
b32a09db4f add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
match_strcpy() is a somewhat creepy function: the caller needs to make sure
that the destination buffer is big enough, and when he screws up or
forgets, match_strcpy() happily overruns the buffer.

There's exactly one customer: v9fs_parse_options().  I believe it currently
can't overflow its buffer, but that's not exactly obvious.

The source string is a substing of the mount options.  The kernel silently
truncates those to PAGE_SIZE bytes, including the terminating zero.  See
compat_sys_mount() and do_mount().

The destination buffer is obtained from __getname(), which allocates from
name_cachep, which is initialized by vfs_caches_init() for size PATH_MAX.

We're safe as long as PATH_MAX <= PAGE_SIZE.  PATH_MAX is 4096.  As far as
I know, the smallest PAGE_SIZE is also 4096.

Here's a patch that makes the code a bit more obviously correct.  It
doesn't depend on PATH_MAX <= PAGE_SIZE.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Jim Meyering <meyering@redhat.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-14 19:23:25 -05:00
Paul Jackson
f4ed0deae8 cpumask: remove bitmap_scnprintf_len and cpumask_scnprintf_len
They aren't used.  They were briefly used as part of some other patches to
provide an alternative format for displaying some /proc and /sys cpumasks.
They probably should have been removed when those other patches were dropped,
in favor of a different solution.

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: "Mike Travis" <travis@sgi.com>
Cc: "Bert Wesarg" <bert.wesarg@googlemail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-13 08:02:25 -07:00
David S. Miller
faa6cfde74 lmb: Make lmb debugging more useful.
Having to muck with the build and set DEBUG just to
get lmb_dump_all() to print things isn't very useful.

So use pr_info() and use an early boot param
"lmb=debug" so we can simply ask users to reboot
with this option when we need some debugging from
them.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12 17:21:55 -07:00
David S. Miller
4978db5bd9 lmb: Fix inconsistent alignment of size argument.
When allocating, if we will align up the size when making
the reservation, we should also align the size for the
check that the space is actually available.

The simplest thing is to just aling the size up from
the beginning, then we can use plain 'size' throughout.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12 16:51:15 -07:00
Linus Torvalds
8e3e076c5a BKL: revert back to the old spinlock implementation
The generic semaphore rewrite had a huge performance regression on AIM7
(and potentially other BKL-heavy benchmarks) because the generic
semaphores had been rewritten to be simple to understand and fair.  The
latter, in particular, turns a semaphore-based BKL implementation into a
mess of scheduling.

The attempt to fix the performance regression failed miserably (see the
previous commit 00b41ec261 'Revert
"semaphore: fix"'), and so for now the simple and sane approach is to
instead just go back to the old spinlock-based BKL implementation that
never had any issues like this.

This patch also has the advantage of being reported to fix the
regression completely according to Yanmin Zhang, unlike the semaphore
hack which still left a couple percentage point regression.

As a spinlock, the BKL obviously has the potential to be a latency
issue, but it's not really any different from any other spinlock in that
respect.  We do want to get rid of the BKL asap, but that has been the
plan for several years.

These days, the biggest users are in the tty layer (open/release in
particular) and Alan holds out some hope:

  "tty release is probably a few months away from getting cured - I'm
   afraid it will almost certainly be the very last user of the BKL in
   tty to get fixed as it depends on everything else being sanely locked."

so while we're not there yet, we do have a plan of action.

Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-10 20:58:02 -07:00
Linus Torvalds
2e83fc4df5 Merge branch 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Assign PDE->data before gluing PDE into /proc tree
  [POWERPC] devres: Add devm_ioremap_prot()
  [POWERPC] macintosh: ADB driver: adb_handler_sem semaphore to mutex
  [POWERPC] macintosh: windfarm_smu_sat: semaphore to mutex
  [POWERPC] macintosh: therm_pm72: driver_lock semaphore to mutex
2008-05-05 15:48:53 -07:00
Jan Engelhardt
e024cbd257 kgdb: kconfig fix xconfig/menuconfig element
Kconfig.kgdb: fix menuconfig element

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-05-05 07:13:21 -05:00
Emil Medve
b41e5fffe8 [POWERPC] devres: Add devm_ioremap_prot()
We provide an ioremap_flags, so this provides a corresponding
devm_ioremap_prot.  The slight name difference is at Ben
Herrenschmidt's request as he plans on changing ioremap_flags to
ioremap_prot in the future.

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Tejun Heo <htejun@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-05 16:47:14 +10:00
Nadia Derbey
af8e2a4cb9 idr: fix idr_remove()
The return inside the loop makes us free only a single layer.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:00 -07:00
David Brownell
34990cf702 Add a new sysfs_streq() string comparison function
Add a new sysfs_streq() string comparison function, which ignores
the trailing newlines found in sysfs inputs.  By example:

	sysfs_streq("a", "b")	==> false
	sysfs_streq("a", "a")	==> true
	sysfs_streq("a", "a\n")	==> true
	sysfs_streq("a\n", "a")	==> true

This is intended to simplify parsing of sysfs inputs, letting them
avoid the need to manually strip off newlines from inputs.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
Roman Zippel
6f6d6a1a6a rename div64_64 to div64_u64
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel
2418f4f28f introduce explicit signed/unsigned 64bit divide
The current do_div doesn't explicitly say that it's unsigned and the signed
counterpart is missing, which is e.g.  needed when dealing with time values.

This introduces 64bit signed/unsigned divide functions which also attempts to
cleanup the somewhat awkward calling API, which often requires the use of
temporary variables for the dividend.  To avoid the need for temporary
variables everywhere for the remainder, each divide variant also provides a
version which doesn't return the remainder.

Each architecture can now provide optimized versions of these function,
otherwise generic fallback implementations will be used.

As an example I provided an alternative for the current x86 divide, which
avoids the asm casts and using an union allows gcc to generate better code.
It also avoids the upper divde in a few more cases, where the result is known
(i.e.  upper quotient is zero).

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Greg Kroah-Hartman
c3bb7fadaf klist: fix coding style errors in klist.h and klist.c
Finally clean up the odd spacing in these files.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:58 -07:00
Kumar Gala
4f452e8aa4 devres: support addresses greater than an unsigned long via dev_ioremap
Use a resource_size_t instead of unsigned long since some arch's are
capable of having ioremap deal with addresses greater than the size of a
unsigned long.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:48 -07:00
Kay Sievers
a4ca661742 kobject: do not copy vargs, just pass them around
This prevents a few unneeded copies.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:48 -07:00
Tejun Heo
93dd40013f klist: implement klist_add_{after|before}()
Add klist_add_after() and klist_add_before() which puts a new node
after and before an existing node, respectively.  This is useful for
callers which need to keep klist ordered.  Note that synchronizing
between simultaneous additions for ordering is the caller's
responsibility.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:47 -07:00
Harvey Harrison
810304db75 lib: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Thomas Gleixner
c6f3a97f86 debugobjects: add timer specific object debugging code
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Thomas Gleixner
3ac7fe5a4a infrastructure to debug (dynamic) objects
We can see an ever repeating problem pattern with objects of any kind in the
kernel:

1) freeing of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore.  One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.

While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause.  This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.

The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.

Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.

The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash.  Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.

Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.

The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation.  When the sanity check triggers a warning message
and a stack trace is printed.

The list of operations can be extended if the need arises.  For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets.  The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
global lock.

The debug code can be compiled in without being active.  The runtime overhead
is minimal and could be optimized by asm alternatives.  A kernel command line
option enables the debugging code.

Thanks to Ingo Molnar for review, suggestions and cleanup patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Peter Zijlstra
a42dde0415 mm: bdi: allow setting a maximum for the bdi dirty limit
Add "max_ratio" to /sys/class/bdi.  This indicates the maximum percentage of
the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in max_ratio_store().
 - export bdi_set_max_ratio() to modules
 - limit bdi_dirty with bdi->max_ratio
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Peter Zijlstra
cf0ca9fe5d mm: bdi: export BDI attributes in sysfs
Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all relevant
users and /sys/block/<block>/queue/read_ahead_kb should be deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

 - split off NFS and FUSE changes into separate patches
 - document new sysfs attributes under Documentation/ABI
 - do bdi_class_init as a core_initcall, otherwise the "default" BDI
   won't be initialized
 - remove bdi_init_fmt macro, it's not used very much

[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Linus Torvalds
867a89e0b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
  [RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
  [RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
  [RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
  [RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
  [RAPIDIO] Auto-probe the RapidIO system size
  [RAPIDIO] Add OF-tree support to RapidIO controller driver
  [RAPIDIO] Add RapidIO multi mport support
  [RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
  [RAPIDIO] Add RapidIO option to kernel configuration
  [RAPIDIO] Change RIO function mpc85xx_ to fsl_
  [POWERPC] Provide walk_memory_resource() for powerpc
  [POWERPC] Update lmb data structures for hotplug memory add/remove
  [POWERPC] Hotplug memory remove notifications for powerpc
  [POWERPC] windfarm: Add PowerMac 12,1 support
  [POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
  [POWERPC] Add IRQSTACKS support on ppc32
  [POWERPC] Use __always_inline for xchg* and cmpxchg*
  [POWERPC] Add fast little-endian switch system call
2008-04-29 08:19:14 -07:00
Thomas Gleixner
fee4b19fb3 bitops: remove "optimizations"
The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2 increased the
binary size on non x86 architectures.

Looking into the real effects of the "optimizations" it turned out
that they are not used in find_next_bit() and find_next_zero_bit().

The ones in find_first_bit() and find_first_zero_bit() are used in a
couple of places but none of them is a real hot path.

Remove the "optimizations" all together and call the library functions
unconditionally.

Boot-tested on x86 and compile tested on every cross compiler I have.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:11:16 -07:00
Akinobu Mita
199f0ca514 idr: create idr_layer_cache at boot time
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.

This change also enables us to eliminate the check every time idr_init() is
called.

[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:25 -07:00
Arthur Kepner
309df0c503 dma/ia64: update ia64 machvecs, swiotlb.c
Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
Implement the old dma_*map_*() interfaces in terms of the corresponding new
interfaces.  For ia64/sn, make use of one dma attribute,
DMA_ATTR_WRITE_BARRIER.  Introduce swiotlb_*map*_attrs() functions.

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Dave Young
5f97a5a879 isolate ratelimit from printk.c for other use
Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file.  For some
serious complaining of kernel, we need repeat the warnings, so here I isolate
the ratelimit part of printk.c to a standalone file.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
FUJITA Tomonori
a852250920 swiotlb: use iommu_is_span_boundary helper function
iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa1).  SWIOTLB can use it instead
of the homegrown function.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Andrew Morton
a7133a1558 lib/swiotlb.c: cleanups
There's a pointlessly braced block of code in there.  Remove the braces and
save a tabstop.

Cc: Andi Kleen <ak@suse.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Benjamin Herrenschmidt
b70d3a2c59 iomap: fix 64 bits resources on 32 bits
Almost all implementations of pci_iomap() in the kernel, including the generic
lib/iomap.c one, copies the content of a struct resource into unsigned long's
which will break on 32 bits platforms with 64 bits resources.

This fixes all definitions of pci_iomap() to use resource_size_t.  I also
"fixed" the 64bits arch for consistency.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Jim Meyering
22caa0417d lib/inflate.c: handle failed malloc()
lib/inflate.c (inflate_dynamic): Don't deref NULL upon failed malloc.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Badari Pulavarty
9d88a2eb6e [POWERPC] Provide walk_memory_resource() for powerpc
Provide walk_memory_resource() for 64-bit powerpc.  PowerPC maintains
logical memory region mapping in the lmb.memory structure.  Walk
through these structures and do the callbacks for the contiguous
chunks.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Badari Pulavarty
98d5c21c81 [POWERPC] Update lmb data structures for hotplug memory add/remove
The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.

This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed.  This information is useful for eHEA
driver to find out the memory layout and holes.

NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Paul Jackson
7ea931c9fc mempolicy: add bitmap_onto() and bitmap_fold() operations
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.

The bitmap_onto() operator computes one bitmap relative to another.  If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.

The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.

There are two substantive changes between this patch and its
predecessor bitmap_relative:
 1) Renamed bitmap_relative() to be bitmap_onto().
 2) Added bitmap_fold().

The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.

Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
 1) A task cannot at present reliably establish a cpuset
    relative mempolicy because there is an essential race
    condition, in that the tasks cpuset may be changed in
    between the time the task can query its cpuset placement,
    and the time the task can issue the applicable mbind or
    set_memplicy system call.
 2) A task cannot at present establish what cpuset relative
    mempolicy it would like to have, if it is in a smaller
    cpuset than it might have mempolicy preferences for,
    because the existing interface only allows specifying
    mempolicies for nodes currently allowed by the cpuset.

Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.

The motivation for the added bitmap_fold() can be seen in the following
example.

Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15.  Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Christoph Lameter
488514d179 Remove set_migrateflags()
Migrate flags must be set on slab creation as agreed upon when the antifrag
logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
unmovable and others in the reclaimable section depending on which flag was
active when a new slab page was allocated.

This likely slid in somehow when antifrag was merged. Remove it.

The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
effect there.

Radix tree allocations are not directly reclaimable but they are allocated
with __GFP_RECLAIMABLE set on each allocation.  We now set
SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
tree slabs are consistently placed in the reclaimable section.  Radix tree
slabs will also be accounted as such.

There is then no user left of set_migratepages. So remove it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Alexander van Heukelum
19870def58 x86, bitops: select the generic bitmap search functions
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.

I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:17 +02:00
Alexander van Heukelum
77b9bd9c49 x86: generic versions of find_first_(zero_)bit, convert i386
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.

The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.

This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum
64970b68d2 x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.

On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:

In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

On x86_64, 52 locations are optimized with a minimal increase in
code size:

Current #testing defconfig:
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392637  846592  724424 6963653  6a41c5 vmlinux

After removing the x86_64 specific optimization for find_next_*bit:
	94 x bsf, 79 x find_next_*bit
   text    data     bss     dec     hex filename
   5392358  846592  724424 6963374  6a40ae vmlinux

After this patch (making the optimization generic):
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392396  846592  724424 6963412  6a40d4 vmlinux

[ tglx@linutronix.de: build fixes ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum
6fd92b63d0 x86: change x86 to use generic find_next_bit
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.

Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.

		Athlon		Xeon		Opteron 32/64bit
x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s

If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.

Using the generic version does give a slightly bigger kernel, though.

defconfig:	   text    data     bss     dec     hex filename
x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Andi Kleen
35bb5b1e0e Add option to enable -Wframe-larger-than= on gcc 4.4
Add option to enable -Wframe-larger-than= on gcc 4.4

gcc mainline (upcoming 4.4) added a new -Wframe-larger-than=...
option to warn at build time about too large stack frames. Add a config
option to enable this warning, since this very useful for the kernel.

I choose (somewhat arbitarily) 2048 as default warning threshold for 64bit
and 1024 as default for 32bit architectures.  With some research and
fixing all the code for smaller values these defaults should be probably
lowered.

With the default allyesconfigs have some new warnings, but I think
that is all code that should be just fixed.

At some point (when gcc 4.4 is released and widely used) this should
obsolete make checkstack

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-04-25 20:23:47 +02:00
David S. Miller
7347aefbcc [LMB]: Fix lmb allocation regression.
Changeset d9024df02f ("[LMB] Restructure
allocation loops to avoid unsigned underflow") removed the alignment
of the 'size' argument to call lmb_add_region() done by __lmb_alloc_base().

In doing so it reintroduced the bug fixed by changeset
eea89e13a9 ("[LMB]: Fix bug in
__lmb_alloc_base().").

This puts it back.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-23 23:30:59 -07:00
Linus Torvalds
9a64388d83 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
  [POWERPC] Fix compile breakage for 64-bit UP configs
  [POWERPC] Define copy_siginfo_from_user32
  [POWERPC] Add compat handler for PTRACE_GETSIGINFO
  [POWERPC] i2c: Fix build breakage introduced by OF helpers
  [POWERPC] Optimize fls64() on 64-bit processors
  [POWERPC] irqtrace support for 64-bit powerpc
  [POWERPC] Stacktrace support for lockdep
  [POWERPC] Move stackframe definitions to common header
  [POWERPC] Fix device-tree locking vs. interrupts
  [POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
  [POWERPC] Remove unused __max_memory variable
  [POWERPC] Simplify xics direct/lpar irq_host setup
  [POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
  [POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
  [POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
  [POWERPC] Use asm-generic/bitops/find.h in bitops.h
  [POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
  [POWERPC] 85xx: Fix the size of qe muram for MPC8568E
  [POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
  [POWERPC] 86xx: mark functions static, other minor cleanups
  ...
2008-04-21 15:50:49 -07:00
Linus Torvalds
e80ab411e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
  SCSI: convert struct class_device to struct device
  DRM: remove unused dev_class
  IB: rename "dev" to "srp_dev" in srp_host structure
  IB: convert struct class_device to struct device
  memstick: convert struct class_device to struct device
  driver core: replace remaining __FUNCTION__ occurrences
  sysfs: refill attribute buffer when reading from offset 0
  PM: Remove destroy_suspended_device()
  Firmware: add iSCSI iBFT Support
  PM: Remove legacy PM (fix)
  Kobject: Replace list_for_each() with list_for_each_entry().
  SYSFS: Explicitly include required header file slab.h.
  Driver core: make device_is_registered() work for class devices
  PM: Convert wakeup flag accessors to inline functions
  PM: Make wakeup flags available whenever CONFIG_PM is set
  PM: Fix misuse of wakeup flag accessors in serial core
  Driver core: Call device_pm_add() after bus_add_device() in device_add()
  PM: Handle device registrations during suspend/resume
  block: send disk "change" event for rescan_partitions()
  sysdev: detect multiple driver registrations
  ...

Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
2008-04-21 15:49:58 -07:00
Linus Torvalds
429f731dea Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
  Deprecate the asm/semaphore.h files in feature-removal-schedule.
  Convert asm/semaphore.h users to linux/semaphore.h
  security: Remove unnecessary inclusions of asm/semaphore.h
  lib: Remove unnecessary inclusions of asm/semaphore.h
  kernel: Remove unnecessary inclusions of asm/semaphore.h
  include: Remove unnecessary inclusions of asm/semaphore.h
  fs: Remove unnecessary inclusions of asm/semaphore.h
  drivers: Remove unnecessary inclusions of asm/semaphore.h
  net: Remove unnecessary inclusions of asm/semaphore.h
  arch: Remove unnecessary inclusions of asm/semaphore.h
2008-04-21 15:41:27 -07:00
Linus Torvalds
ec965350bb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
  sched: build fix
  sched: better rt-group documentation
  sched: features fix
  sched: /debug/sched_features
  sched: add SCHED_FEAT_DEADLINE
  sched: debug: show a weight tree
  sched: fair: weight calculations
  sched: fair-group: de-couple load-balancing from the rb-trees
  sched: fair-group scheduling vs latency
  sched: rt-group: optimize dequeue_rt_stack
  sched: debug: add some debug code to handle the full hierarchy
  sched: fair-group: SMP-nice for group scheduling
  sched, cpuset: customize sched domains, core
  sched, cpuset: customize sched domains, docs
  sched: prepatory code movement
  sched: rt: multi level group constraints
  sched: task_group hierarchy
  sched: fix the task_group hierarchy for UID grouping
  sched: allow the group scheduler to have multiple levels
  sched: mix tasks and groups
  ...
2008-04-21 15:40:24 -07:00
Robert P. J. Day
c6a2a3dc26 Kobject: Replace list_for_each() with list_for_each_entry().
Use the more concise list_for_each_entry(), which allows for the
deletion of the to_kobj() routine at the same time.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:27 -07:00
Greg Kroah-Hartman
c1ebdae514 kobject: catch kobjects that are not initialized
Add warnings to kobject_put() to catch kobjects that are cleaned up but
were never initialized to begin with.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:17 -07:00
Mike Travis
30ca60c15a cpumask: add cpumask_scnprintf_len function
Add a new function cpumask_scnprintf_len() to return the number of
characters needed to display "len" cpumask bits.  The current method
of allocating NR_CPUS bytes is incorrect as what's really needed is
9 characters per 32-bit word of cpumask bits (8 hex digits plus the
seperator [','] or the terminating NULL.)  This function provides the
caller the means to allocate the correct string length.

Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Dave Hansen
ad775f5a8f [PATCH] r/o bind mounts: debugging for missed calls
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.

This patch provides a very simple debugging framework to catch these kinds of
bugs.  It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.

I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.

[hch: made it conditional on a debug option.
      But it's still a little bit too ugly]

[hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:28 -04:00
Matthew Wilcox
6188e10d38 Convert asm/semaphore.h users to linux/semaphore.h
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:22:54 -04:00
Matthew Wilcox
f42b38009e lib: Remove unnecessary inclusions of asm/semaphore.h
reed_solomon doesn't use any of the functionality promised by
asm/semaphore.h.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:17:17 -04:00
Linus Torvalds
334d094504 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
  [NET]: Fix and allocate less memory for ->priv'less netdevices
  [IPV6]: Fix dangling references on error in fib6_add().
  [NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
  [PKT_SCHED]: Fix datalen check in tcf_simp_init().
  [INET]: Uninline the __inet_inherit_port call.
  [INET]: Drop the inet_inherit_port() call.
  SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
  [netdrvr] forcedeth: internal simplifications; changelog removal
  phylib: factor out get_phy_id from within get_phy_device
  PHY: add BCM5464 support to broadcom PHY driver
  cxgb3: Fix __must_check warning with dev_dbg.
  tc35815: Statistics cleanup
  natsemi: fix MMIO for PPC 44x platforms
  [TIPC]: Cleanup of TIPC reference table code
  [TIPC]: Optimized initialization of TIPC reference table
  [TIPC]: Remove inlining of reference table locking routines
  e1000: convert uint16_t style integers to u16
  ixgb: convert uint16_t style integers to u16
  sb1000.c: make const arrays static
  sb1000.c: stop inlining largish static functions
  ...
2008-04-18 18:02:35 -07:00
Linus Torvalds
2cca775bae Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
  [SCSI] iscsi: bidi support for iscsi_tcp
  [SCSI] iscsi: bidi support at the generic libiscsi level
  [SCSI] iscsi: extended cdb support
  [SCSI] zfcp: Fix error handling for blocked unit for send FCP command
  [SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
  [SCSI] zfcp: fix 31 bit compile warnings
  [SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
  [SCSI] bsg: remove minor in struct bsg_device
  [SCSI] bsg: use better helper list functions
  [SCSI] bsg: replace kobject_get with blk_get_queue
  [SCSI] bsg: takes a ref to struct device in fops->open
  [SCSI] qla1280: remove version check
  [SCSI] libsas: fix endianness bug in sas_ata
  [SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
  [SCSI] aacraid: Do not describe check_reset parameter with its value
  [SCSI] aacraid: Fix down_interruptible() to check the return value
  [SCSI] sun3_scsi_vme: add MODULE_LICENSE
  [SCSI] st: rename flush_write_buffer()
  [SCSI] tgt: use KMEM_CACHE macro
  [SCSI] initio: fix big endian problems for auto request sense
  ...
2008-04-18 11:25:31 -07:00
Linus Torvalds
eddeb0e2d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (43 commits)
  firewire: cleanups
  firewire: fix synchronization of gap counts
  firewire: wait until PHY configuration packet was transmitted (fix bus reset loop)
  firewire: remove unused struct member
  firewire: use bitwise and to get reg in handle_registers
  firewire: replace more hex values with defined csr constants
  firewire: reread config ROM when device reset the bus
  firewire: replace static ROM cache by allocated cache
  firewire: fw-ohci: work around generation bug in TI controllers (fix AV/C and more)
  firewire: fw-ohci: extend logging of bus generations and node ID
  firewire: fw-ohci: conditionally log busReset interrupts
  firewire: fw-ohci: don't append to AT context when it's not active
  firewire: fw-ohci: log regAccessFail events
  firewire: fw-ohci: make sure HCControl register LPS bit is set
  firewire: fw-ohci: missing PPC PMac feature calls in failure path
  firewire: fw-ohci: untangle a mixed unsigned/signed expression
  firewire: debug interrupt events
  firewire: fw-ohci: catch self_id_count == 0
  firewire: fw-ohci: add self ID error check
  firewire: fw-ohci: refactor probe, remove, suspend, resume
  ...
2008-04-18 11:24:29 -07:00
Stefan Richter
080de8c2c5 firewire: fw-ohci: add option for remote debugging
This way firewire-ohci can be used for remote debugging like ohci1394.
Version with amendment from Fri, 11 Apr 2008 00:08:08 +0200.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Bernhard Kaindl <bk@suse.de>
2008-04-18 17:55:33 +02:00
Linus Torvalds
9732b61123 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb:
  kgdb: always use icache flush for sw breakpoints
  kgdb: fix SMP NMI kgdb_handle_exception exit race
  kgdb: documentation fixes
  kgdb: allow static kgdbts boot configuration
  kgdb: add documentation
  kgdb: Kconfig fix
  kgdb: add kgdb internal test suite
  kgdb: fix several kgdb regressions
  kgdb: kgdboc pl011 I/O module
  kgdb: fix optional arch functions and probe_kernel_*
  kgdb: add x86 HW breakpoints
  kgdb: print breakpoint removed on exception
  kgdb: clocksource watchdog
  kgdb: fix NMI hangs
  kgdb: fix kgdboc dynamic module configuration
  kgdb: document parameters
  x86: kgdb support
  consoles: polling support, kgdboc
  kgdb: core
  uaccess: add probe_kernel_write()
2008-04-18 08:37:01 -07:00
Linus Torvalds
d7bb545d86 Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
  Remove DEBUG_SEMAPHORE from Kconfig
  Improve semaphore documentation
  Simplify semaphore implementation
  Add down_timeout and change ACPI to use it
  Introduce down_killable()
  Generic semaphore implementation
  Add semaphore.h to kernel_lock.c
  Fix quota.h includes
2008-04-18 08:25:29 -07:00
David S. Miller
1e42198609 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-04-17 23:56:30 -07:00
Jason Wessel
974460c5bf kgdb: allow static kgdbts boot configuration
This patch adds in the ability to compile the kgdb internal test
string into the kernel so as to run the tests at boot without changing
the kernel boot arguments.  This patch also changes all the error
paths to invoke WARN_ON(1) which will emit the line number of the file
and dump the kernel stack when an error occurs.

You can disable the tests in a kernel that is built this way
using "kgdbts="

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 20:05:43 +02:00
Jason Wessel
e8d31c204e kgdb: add kgdb internal test suite
This patch adds regression tests for testing the kgdb core and arch
specific implementation.

The kgdb test suite is designed to be built into the kernel and not as
a module because it uses a number of low level kernel and kgdb
primitives which should not be exported externally.

The kgdb test suite is designed as a KGDB I/O module which
simulates the communications that a debugger would have with kgdb.
The tests are broken up in to a line by line and referenced here as
a "get" which is kgdb requesting input and "put" which is kgdb
sending a response.

The kgdb suite can be invoked from the kernel command line
arguments system or executed dynamically at run time.  The test
suite uses the variable "kgdbts" to obtain the information about
which tests to run and to configure the verbosity level.  The
following are the various characters you can use with the kgdbts=
line:

When using the "kgdbts=" you only choose one of the following core
test types:
A = Run all the core tests silently
V1 = Run all the core tests with minimal output
V2 = Run all the core tests in debug mode

You can also specify optional tests:
N## = Go to sleep with interrupts of for ## seconds
      to test the HW NMI watchdog
F## = Break at do_fork for ## iterations
S## = Break at sys_open for ## iterations

NOTE: that the do_fork and sys_open tests are mutually exclusive.

To invoke the kgdb test suite from boot you use a kernel start
argument as follows:
	kgdbts=V1 kgdbwait
Or if you wanted to perform the NMI test for 6 seconds and do_fork
test for 100 forks, you could use:
	kgdbts=V1N6F100 kgdbwait

The test suite can also be invoked at run time with:
echo kgdbts=V1N6F100 > /sys/module/kgdbts/parameters/kgdbts
Or as another example:
echo kgdbts=V2 > /sys/module/kgdbts/parameters/kgdbts

When developing a new kgdb arch specific implementation or
using these tests for the purpose of regression testing,
several invocations are required.

1) Boot with the test suite enabled by using the kernel arguments
      "kgdbts=V1F100 kgdbwait"
   ## If kgdb arch specific implementation has NMI use
      "kgdbts=V1N6F100

2) After the system boot run the basic test.
echo kgdbts=V1 > /sys/module/kgdbts/parameters/kgdbts

3) Run the concurrency tests.  It is best to use n+1
   while loops where n is the number of cpus you have
   in your system.  The example below uses only two
   loops.

## This tests break points on sys_open
while [ 1 ] ; do find / > /dev/null 2>&1 ; done &
while [ 1 ] ; do find / > /dev/null 2>&1 ; done &
echo kgdbts=V1S10000 > /sys/module/kgdbts/parameters/kgdbts
fg # and hit control-c
fg # and hit control-c
## This tests break points on do_fork
while [ 1 ] ; do date > /dev/null ; done &
while [ 1 ] ; do date > /dev/null ; done &
echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts
fg # and hit control-c

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 20:05:42 +02:00
Jason Wessel
dc7d552705 kgdb: core
kgdb core code. Handles the protocol and the arch details.

[ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
[ xemul@openvz.org: use find_task_by_pid_ns ]

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 20:05:37 +02:00
Matthew Wilcox
2342e51ba2 Remove DEBUG_SEMAPHORE from Kconfig
Alpha and FRV mutexes had an option to print lots of debugging messages
in their semaphore implementation.  This feature has not been carried
over to the generic semaphores, so remove the stale Kconfig option.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-17 10:53:01 -04:00
Matthew Wilcox
64ac24e738 Generic semaphore implementation
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility.  Thanks to Peter Zijlstra for fixing the lockdep
warning.  Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 10:42:34 -04:00
Matthew Wilcox
e48b3deee4 Add semaphore.h to kernel_lock.c
kernel_lock.c uses DECLARE_MUTEX, up() and down() without explicitly
including asm/semaphore.h.  This is fragile and leaves it vulnerable
to breakage during header reorganisations.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-17 10:42:27 -04:00
Paul Mackerras
d9024df02f [LMB] Restructure allocation loops to avoid unsigned underflow
There is a potential bug in __lmb_alloc_base where we subtract `size'
from the base address of a reserved region without checking whether
the subtraction could wrap around and produce a very large unsigned
value.  In fact it probably isn't possible to hit the bug in practice
since it would only occur in the situation where we can't satisfy the
allocation request and there is a reserved region starting at 0.

This fixes the potential bug by breaking out of the loop when we get
to the point where the base of the reserved region is less than the
size requested.  This also restructures the loop to be a bit easier to
follow.

The same logic got copied into lmb_alloc_nid_unreserved, so this makes
a similar change there.  Here the bug is more likely to be hit because
the outer loop  (in lmb_alloc_nid) goes through the memory regions in
increasing order rather than decreasing order as __lmb_alloc_base
does, and we are therefore more likely to hit the case where we are
testing against a reserved region with a base address of 0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-15 21:22:17 +10:00
Paul Mackerras
300613e523 [LMB] Fix some whitespace and other formatting issues, use pr_debug
This makes no semantic changes.  It fixes the whitespace and formatting
a bit, gets rid of a local DBG macro and uses the equivalent pr_debug
instead, and restructures one while loop that had a function call and
assignment in the condition to be a bit more readable.  Some comments
about functions being called with relocation disabled were also removed
as they would just be confusing to most readers now that the code is
in lib/.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-15 21:22:17 +10:00
David S. Miller
c50f68c8ae [LMB] Add lmb_alloc_nid()
A variant of lmb_alloc() that tries to allocate memory on a specified
NUMA node 'nid' but falls back to normal lmb_alloc() if that fails.

The caller provides a 'nid_range' function pointer which assists the
allocator.  It is given args 'start', 'end', and pointer to integer
'this_nid'.

It places at 'this_nid' the NUMA node id that corresponds to 'start',
and returns the end address within 'start' to 'end' at which memory
assosciated with 'nid' ends.

This callback allows a platform to use lmb_alloc_nid() in just
about any context, even ones in which early_pfn_to_nid() might
not be working yet.

This function will be used by the NUMA setup code on sparc64, and also
it can be used by powerpc, replacing it's hand crafted
"careful_allocation()" function in arch/powerpc/mm/numa.c

If x86 ever converts it's NUMA support over to using the LMB helpers,
it can use this too as it has something entirely similar.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-15 21:22:17 +10:00
Christoph Lameter
5b06c853ad slub: Deal with config variable dependencies
count_partial() is used by both slabinfo and the sysfs proc support. Move
the function directly before the beginning of the sysfs code so that it can
be easily found. Rework the preprocessor conditional to take into account
that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.

Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
is no point of keeping statistics if no one can restrive them.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-04-14 18:51:34 +03:00
Paul Mackerras
ac7c5353b1 Merge branch 'linux-2.6' 2008-04-14 21:11:02 +10:00
David S. Miller
df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Harvey Harrison
a1e58bbdc9 lzo: fix typo in decompressor
Shift of a LE value seems strange, probably meant to shift the cpu-order
variable as in the prvious section of the switch statement.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10 15:34:05 -07:00
FUJITA Tomonori
b1adaf65ba [SCSI] block: add sg buffer copy helper functions
This patch adds new three helper functions to copy data between an SG
list and a linear buffer.

- sg_copy_from_buffer copies data from linear buffer to an SG list

- sg_copy_to_buffer copies data from an SG list to a linear buffer

When the APIs copy data from a linear buffer to an SG list,
flush_kernel_dcache_page is called. It's not necessary for everyone
but it's a no-op on most architectures and in general the API is not
used in performance critical path.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-04-07 12:15:45 -05:00
David S. Miller
3bb5da3837 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-04-03 14:33:42 -07:00
Andi Kleen
61407f80f7 [NET]: srandom32 fixes for networking v2
- Let it update the state of all CPUs. The network stack goes
into pains to feed the current IP addresses in, but it is not very
effective if that is only done for some random CPU instead of all.
So change it to feed bits into all CPUs.  I decided to do that lockless 
because well somewhat random results are ok.

v2: Drop rename so that this patch doesn't depend on x86 maintainers

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:07:02 -07:00
Denis V. Lunev
a4aa834a91 [NETNS]: Declare init_net even without CONFIG_NET defined.
This does not look good, but there is no other choice. The compilation
without CONFIG_NET is broken and can not be fixed with ease.

After that there is no need for the following commits:
1567ca7eec
3edf8fa5cc
2d38f9a4f8

Revert them.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 13:04:33 -07:00
Mark Lord
a9edadbf79 fix uevent action-string regression
Mark Lord wrote:
>
> On boot, syslog is flooded with "uevent: unsupported action-string;" messages.
..
> Mar 28 14:43:29 shrimp kernel: tty ptyqd: uevent: unsupported
> action-string; this will be ignored in a future kernel version
> Mar 28 14:43:29 shrimp kernel: tty ptyqe: uevent: unsupported
> action-string; this will be ignored in a future kernel version
> Mar 28 14:43:29 shrimp kernel: tty ptyqf: uevent: unsupported
> action-string; this will be ignored in a future kernel version
> Mar 28 14:43:29 shrimp kernel: tty ptyr0: uevent: unsupported
> action-string; this will be ignored in a future kernel version
..

These messages are a regression compared with 2.6.24, which did not
flood the syslog with them.

The actual underlying problem was introduced in 2.6.23, when somebody
made the string parsing no longer accept nul-terminated strings as a
valid input to store_uevent().

Eg.  "add\0" was valid prior to 2.6.23, where the code regressed to
require "add" without the '\0'.

This patch fixes the 2.6.23 / 2.6.24 regressions, by having the code
once again tolerate the trailing '\0', if present.

According to GregKH, this mainly affects older Ubuntu systems, such as
the one I have here that requires this fix.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:55:49 -07:00
Pavel Emelyanov
095d911201 [LIB]: Drop the pcounter itself.
The knock-out. The pcounter abstraction is not used any longer in the
kernel.

Not sure whether this should go via netdev tree, but as far as I
remember it was added via this one, and besides Eric thinks that
Andrew shouldn't mind this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:39:58 -07:00
Denis V. Lunev
2d38f9a4f8 [NETNS]: Do no include NET related headers if CONFIG_NET is not set.
This fix broken compilation for 'allnoconfig'. This was introduced by
Introduced by commit 1218854afa ("[NET]
NETNS: Omit seq_net_private->net without CONFIG_NET_NS.")

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 14:26:30 -07:00
Paul Mackerras
54f53f2b94 Merge branch 'linux-2.6' 2008-03-26 08:44:18 +11:00
Linus Torvalds
b9e76a0074 x86-32: Pass the full resource data to ioremap()
It appears that 64-bit PCI resources cannot possibly ever have worked on
x86-32 even when the RESOURCES_64BIT config option was set, because any
driver that tried to [pci_]ioremap() the resource would have been unable
to do so because the high 32 bits would have been silently dropped on
the floor by the ioremap() routines that only used "unsigned long".

Change them to use "resource_size_t" instead, which properly encodes the
whole 64-bit resource data if RESOURCES_64BIT is enabled.

Acked-by: H. Peter Anvin <hpa@kernel.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-24 11:22:39 -07:00
Tejun Heo
916fbfb7ae devres: implement pcim_iomap_regions_request_all()
Some drivers need to reserve all PCI BARs to prevent other drivers
misusing unoccupied BARs.  pcim_iomap_regions_request_all() requests
all BARs and iomap specified BARs.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-03-17 08:26:44 -04:00
Jan Beulich
b15a3891c9 avoid endless loops in lib/swiotlb.c
Commit 681cc5cd3e ("iommu sg merging:
swiotlb: respect the segment boundary limits") introduced two
possibilities for entering an endless loop in lib/swiotlb.c:

 - if max_slots is zero (possible if mask is ~0UL)
 - if the number of slots requested fits into a swiotlb segment, but is
   too large for the part of a segment which remains after considering
   offset_slots

This fixes them

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-13 13:15:52 -07:00
Paul Mackerras
bed04a4413 Merge branch 'linux-2.6' 2008-03-13 15:26:33 +11:00
Linus Torvalds
2c6f2db13a Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  debugfs: fix sparse warnings
  Driver core: Fix cleanup when failing device_add().
  driver core: Remove dpm_sysfs_remove() from error path of device_add()
  PM: fix new mutex-locking bug in the PM core
  PM: Do not acquire device semaphores upfront during suspend
  kobject: properly initialize ksets
  sysfs: CONFIG_SYSFS_DEPRECATED fix
  driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
2008-03-04 16:37:35 -08:00
FUJITA Tomonori
3715863aa1 iommu: export iommu_is_span_boundary helper function
iommu_is_span_boundary is used internally in the IOMMU helper
(lib/iommu-helper.c), a primitive function that judges whether a memory area
spans LLD's segment boundary or not.

It's difficult to convert some IOMMUs to use the IOMMU helper but
iommu_is_span_boundary is still useful for them.  So this patch exports it.

This is needed for the parisc iommu fixes.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:17 -08:00
Greg Kroah-Hartman
a4573c488d kobject: properly initialize ksets
kset_initialize was calling kobject_init_internal() which didn't
initialize the kobject as well as kobject_init() was.  So have
kobject_init() call kobject_init_internal() and move the logic to
initalize the kobject there.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-04 14:47:05 -08:00
Paul Mackerras
f8303dd3db Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/lmb-2.6 2008-02-26 21:08:45 +11:00
Hoang-Nam Nguyen
4f9d5f4a35 lib/vsprintf.c: fix bug omitting minus sign of numbers (module_param)
lib/vsprintf.c: Fix bug omitting minus sign of numbers (module_param)

Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:14 -08:00
Chris Snook
fddd9cf82c make LKDTM depend on BLOCK
Make LKDTM depend on BLOCK to prevent build failures with certain configs.

Signed-off-by: Chris Snook <csnook@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:13 -08:00
Kumar Gala
74b20dad1c [LMB]: Fix lmb_add_region if region should be added at the head
We introduced a bug in fixing lmb_add_region to handle an initial
region being non-zero.  Before that fix it was impossible to insert a
region at the head of the list since the first region always started
at zero.

Now that its possible for the first region to be non-zero we need to
check to see if the new region should be added at the head and if so
actually add it.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 21:28:18 -08:00
Sam Ravnborg
fa2144ba9a kbuild: explain why DEBUG_SECTION_MISMATCH is UNDEFINED
We started to see patches enabling this - so explain why
it is disabled and the condition to enable it again.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-15 13:53:11 +01:00
Becky Bruce
e5f2709543 [LMB]: Make lmb support large physical addressing
Convert the lmb code to use u64 instead of unsigned long for physical
addresses and sizes.  This is needed to support large amounts of RAM
on 32-bit systems that support 36-bit physical addressing.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 16:58:39 -08:00
Kumar Gala
27e6672bb9 [LMB]: Fix initial lmb add region with a non-zero base
If we add to an empty lmb region with a non-zero base we will not
coalesce the number of regions down to one.  This causes problems on
ppc32 for the memory region as its assumed to only have one region.

We can fix this be easily specially casing the initial add to just
replace the dummy region.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 16:58:11 -08:00
David S. Miller
eea89e13a9 [LMB]: Fix bug in __lmb_alloc_base().
We need to check lmb_add_region() for errors, it can run out
of regions etc.

Also, the size needs to be padded to the given alignment
or else the lmb.reserved regions don't get expanded and
instead we get tons of holes and eventually run out of
regions prematurely.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 16:57:09 -08:00
David S. Miller
d9b2b2a277 [LIB]: Make PowerPC LMB code generic so sparc64 can use it too.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 16:56:49 -08:00
Denys Vlasenko
9b706aee7d x86: trivial printk optimizations
In arch/x86/boot/printf.c gets rid of unused tail of digits: const char
*digits = "0123456789abcdefghijklmnopqrstuvwxyz"; (we are using 0-9a-f
only)

Uses smaller/faster lowercasing (by ORing with 0x20)
if we know that we work on numbers/digits. Makes
strtoul smaller, and also we are getting rid of

  static const char small_digits[] = "0123456789abcdefx";
  static const char large_digits[] = "0123456789ABCDEFX";

since this works equally well:

  static const char digits[16] = "0123456789ABCDEF";

Size savings:

$ size vmlinux.org vmlinux
   text    data     bss     dec     hex filename
 877320  112252   90112 1079684  107984 vmlinux.org
 877048  112252   90112 1079412  107874 vmlinux

It may be also a tiny bit faster because code has less
branches now, but I doubt it is measurable.

[ hugh@veritas.com: uppercase pointers fix ]

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00
Harvey Harrison
185c045c24 x86, core: remove CONFIG_FORCED_INLINING
Other than the defconfigs, remove the entry in compiler-gcc4.h,
Kconfig.debug and feature-removal-schedule.txt.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00
Guennadi Liakhovetski
4600ecfcf3 lib/scatterlist.o needed by a module only - link it in unconditionally
lib/scatterlist.c is needed by drivers/media/video/videobuf-dma-sg.c, and
we would like to be able to use the latter without PCI too, for example, on
PXA270 ARM CPU.  It is then possible to create a configuration with
CONFIG_BLOCK=n, where only module code will need scatterlist.c.  Therefore
it must be in obj-y.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 15:33:33 -08:00
Yi Yang
06b2a76d25 Add new string functions strict_strto* and convert kernel params to use them
Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/uul, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
succesive invalid char and converts prefix part to a numeric although input
is acctually invalid.

This patch tries to fix the aforementioned issues and implements
strict_strtox serial functions, kernel/params.c uses them to strictly
validate input, so module params will reject such values as 0x1234xxxx and
returns an error:

write error: Invalid argument

Any modules which export numeric sysfs node can use strict_strtox instead of
simple_strtox to reject any invalid input.

Here are some test results:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#

After applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#

[akpm@linux-foundation.org: fix compiler warnings]
[akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de]
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:41 -08:00
Christoph Hellwig
8b88b0998e libfs: allow error return from simple attributes
Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly.  In fact we have that situation in
spufs already which is the original user of the simple attributes.  This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:34 -08:00
Harvey Harrison
9f741cb8fe lib: remove fastcall from lib/*
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
David Howells
b920de1b77 mn10300: add the MN10300/AM33 architecture to the kernel
Add architecture support for the MN10300/AM33 CPUs produced by MEI to the
kernel.

This patch also adds board support for the ASB2303 with the ASB2308 daughter
board, and the ASB2305.  The only processor supported is the MN103E010, which
is an AM33v2 core plus on-chip devices.

[akpm@linux-foundation.org: nuke cvs control strings]
Signed-off-by: Masakazu Urade <urade.masakazu@jp.panasonic.com>
Signed-off-by: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:30 -08:00
Christoph Lameter
8ff12cfc00 SLUB: Support for performance statistics
The statistics provided here allow the monitoring of allocator behavior but
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure. The per cpu structure may be extended by the
statistics to grow larger than one cacheline which will increase the cache
footprint of SLUB.

There is a compile option to enable/disable the inclusion of the runtime
statistics and its off by default.

The slabinfo tool is enhanced to support these statistics via two options:

-D 	Switches the line of information displayed for a slab from size
	mode to activity mode.

-A	Sorts the slabs displayed by activity. This allows the display of
	the slabs most important to the performance of a certain load.

-r	Report option will report detailed statistics on

Example (tbench load):

slabinfo -AD		->Shows the most active slabs

Name                   Objects    Alloc     Free   %Fast
skbuff_fclone_cache         33 111953835 111953835  99  99
:0000192                  2666  5283688  5281047  99  99
:0001024                   849  5247230  5246389  83  83
vm_area_struct            1349   119642   118355  91  22
:0004096                    15    66753    66751  98  98
:0000064                  2067    25297    23383  98  78
dentry                   10259    28635    18464  91  45
:0000080                 11004    18950     8089  98  98
:0000096                  1703    12358    10784  99  98
:0000128                   762    10582     9875  94  18
:0000512                   184     9807     9647  95  81
:0002048                   479     9669     9195  83  65
anon_vma                   777     9461     9002  99  71
kmalloc-8                 6492     9981     5624  99  97
:0000768                   258     7174     6931  58  15

So the skbuff_fclone_cache is of highest importance for the tbench load.
Pretty high load on the 192 sized slab. Look for the aliases

slabinfo -a | grep 000192
:0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili

Likely skbuff_head_cache.


Looking into the statistics of the skbuff_fclone_cache is possible through

slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned


.... Usual output ...

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath             111953360 111946981  99  99
Slowpath                 1044     7423   0   0
Page Alloc                272      264   0   0
Add partial                25      325   0   0
Remove partial             86      264   0   0
RemoteObj/SlabFrozen      350     4832   0   0
Total                111954404 111954404

Flushes       49 Refill        0
Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)

Looks good because the fastpath is overwhelmingly taken.


skbuff_head_cache:

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath              5297262  5259882  99  99
Slowpath                 4477    39586   0   0
Page Alloc                937      824   0   0
Add partial                 0     2515   0   0
Remove partial           1691      824   0   0
RemoteObj/SlabFrozen     2621     9684   0   0
Total                 5301739  5299468

Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)


Descriptions of the output:

Total:		The total number of allocation and frees that occurred for a
		slab

Fastpath:	The number of allocations/frees that used the fastpath.

Slowpath:	Other allocations

Page Alloc:	Number of calls to the page allocator as a result of slowpath
		processing

Add Partial:	Number of slabs added to the partial list through free or
		alloc (occurs during cpuslab flushes)

Remove Partial:	Number of slabs removed from the partial list as a result of
		allocations retrieving a partial slab or by a free freeing
		the last object of a slab.

RemoteObj/Froz:	How many times were remotely freed object encountered when a
		slab was about to be deactivated. Frozen: How many times was
		free able to skip list processing because the slab was in use
		as the cpuslab of another processor.

Flushes:	Number of times the cpuslab was flushed on request
		(kmem_cache_shrink, may result from races in __slab_alloc)

Refill:		Number of times we were able to refill the cpuslab from
		remotely freed objects for the same slab.

Deactivate:	Statistics how slabs were deactivated. Shows how they were
		put onto the partial list.

In general fastpath is very good. Slowpath without partial list processing is
also desirable. Any touching of partial list uses node specific locks which
may potentially cause list lock contention.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-07 17:47:41 -08:00
Andrew Morton
b41ecbebd4 debug_smp_processor_id() fixlets
- Account for debug_smp_processor_id()'s own preempt_disable() when
  displaying the preempt_count().

- 80 cols, not 800.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:09 -08:00
Eric Dumazet
15ae02baf0 lib/extable.c: remove an expensive integer divide in search_extable()
Actual code let compiler generates idiv instruction on x86.

Using a right shift is OK here and readable as well.

Before patch
   10:   57                      push   %edi
   11:   56                      push   %esi
   12:   89 d6                   mov    %edx,%esi
   14:   53                      push   %ebx
   15:   89 c3                   mov    %eax,%ebx
   17:   eb 22                   jmp    3b <search_extable+0x2b>
   19:   89 f0                   mov    %esi,%eax
   1b:   ba 02 00 00 00          mov    $0x2,%edx
   20:   29 d8                   sub    %ebx,%eax
   22:   89 d7                   mov    %edx,%edi
   24:   c1 f8 03                sar    $0x3,%eax
   27:   99                      cltd
   28:   f7 ff                   idiv   %edi
   2a:   8d 04 c3                lea    (%ebx,%eax,8),%eax
   2d:   39 08                   cmp    %ecx,(%eax)
...

After patch

00000010 <search_extable>:
   10:   53                      push   %ebx
   11:   89 c3                   mov    %eax,%ebx
   13:   eb 18                   jmp    2d <search_extable+0x1d>
   15:   89 d0                   mov    %edx,%eax
   17:   29 d8                   sub    %ebx,%eax
   19:   c1 f8 04                sar    $0x4,%eax
   1c:   8d 04 c3                lea    (%ebx,%eax,8),%eax
   1f:   39 08                   cmp    %ecx,(%eax)
...

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:08 -08:00
Nick Piggin
e2848a0efe radix-tree: avoid atomic allocations for preloaded insertions
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone->lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
FUJITA Tomonori
681cc5cd3e iommu sg merging: swiotlb: respect the segment boundary limits
This patch makes swiotlb not allocate a memory area spanning LLD's segment
boundary.

is_span_boundary() judges whether a memory area spans LLD's segment boundary.
If map_single finds such a area, map_single tries to find the next available
memory area.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:12 -08:00
FUJITA Tomonori
0291df8cc9 iommu sg: add IOMMU helper functions for the free area management
This adds IOMMU helper functions for the free area management.  These
functions take care of LLD's segment boundary limit for IOMMUs.  They would be
useful for IOMMUs that use bitmap for the free area management.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:11 -08:00
Linus Torvalds
f5bb3a5e9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits)
  Jesper Juhl is the new trivial patches maintainer
  Documentation: mention email-clients.txt in SubmittingPatches
  fs/binfmt_elf.c: spello fix
  do_invalidatepage() comment typo fix
  Documentation/filesystems/porting fixes
  typo fixes in net/core/net_namespace.c
  typo fix in net/rfkill/rfkill.c
  typo fixes in net/sctp/sm_statefuns.c
  lib/: Spelling fixes
  kernel/: Spelling fixes
  include/scsi/: Spelling fixes
  include/linux/: Spelling fixes
  include/asm-m68knommu/: Spelling fixes
  include/asm-frv/: Spelling fixes
  fs/: Spelling fixes
  drivers/watchdog/: Spelling fixes
  drivers/video/: Spelling fixes
  drivers/ssb/: Spelling fixes
  drivers/serial/: Spelling fixes
  drivers/scsi/: Spelling fixes
  ...
2008-02-04 07:58:52 -08:00
Linus Torvalds
519cb68807 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
  scsi: fix dependency bug in aic7 Makefile
  kbuild: add svn revision information to setlocalversion
  kbuild: do not warn about __*init/__*exit symbols being exported
  Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig
  Add HAVE_KPROBES
  Add HAVE_OPROFILE
  Create arch/Kconfig
  Fix ARM to play nicely with generic Instrumentation menu
  kconfig: ignore select of unknown symbol
  kconfig: mark config as changed when loading an alternate config
  kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH
  Remove __INIT_REFOK and __INITDATA_REFOK
  kbuild: print only total number of section mismatces found
2008-02-04 07:56:17 -08:00
Joe Perches
643d1f7fe3 lib/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:48:52 +02:00
Geert Uytterhoeven
d6fbfa4fce kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH
Including additional fixes from Randy Dunlap <randy.dunlap@oracle.com>

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03 08:58:07 +01:00
Sam Ravnborg
e5f95c8b77 kbuild: print only total number of section mismatces found
We have too many section mismatches detected at the moment.
So silence modpost and prevent the option from being
set in a typical allyesconfig build.

Tell the user how to see all the deteils in the summary
message from modpost.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03 08:58:07 +01:00
Dave Young
f70701a34e kobject: kerneldoc comment fix
Fix kerneldoc comment of kobject_create.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-02 15:14:48 -08:00
Heiko Carstens
aa7d93506c latencytop: Change Kconfig dependency.
Change latencytop Kconfig entry so it doesn't list the archictectures
that support it. Instead introduce HAVE_LATENCY_SUPPORT which any
architecture can set. Should reduce patch conflicts.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Holger Wolf <wolf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:45:14 +01:00
Bernhard Kaindl
f212ec4b7b x86: early boot debugging via FireWire (ohci1394_dma=early)
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.

If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.

With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.

In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.

An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.

A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt

It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff

Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Arjan van de Ven
6dab27784b x86: add a simple backtrace test module
During the work on the x86 32 and 64 bit backtrace code I found it useful
to have a simple test module to test a process and irq context backtrace.
Since the existing backtrace code was buggy, I figure it might be useful
to have such a test module in the kernel so that maybe we can even
detect such bugs earlier..

[ mingo@elte.hu: build fix ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:08 +01:00
Ingo Molnar
d50efc6c40 x86: fix UML and -regparm=3
introduce the "asmregparm" calling convention: for functions
implemented in assembly with a fixed regparm input parameters
calling convention.

mark the semaphore and rwsem slowpath functions with that.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:00 +01:00
Ananth N Mavinakayanahalli
8c1c935642 x86: kprobes: add kprobes smoke tests that run on boot
Here is a quick and naive smoke test for kprobes. This is intended to
just verify if some unrelated change broke the *probes subsystem. It is
self contained, architecture agnostic and isn't of any great use by itself.

This needs to be built in the kernel and runs a basic set of tests to
verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
of an error, it'll print out a message with a "BUG" prefix.

This is a start; we intend to add more tests to this bucket over time.

Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.

Tested on x86 (32/64) and powerpc.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:53 +01:00
Linus Torvalds
0ba6c33bcd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
  [IPV6] ADDRLABEL: Fix double free on label deletion.
  [PPP]: Sparse warning fixes.
  [IPV4] fib_trie: remove unneeded NULL check
  [IPV4] fib_trie: More whitespace cleanup.
  [NET_SCHED]: Use nla_policy for attribute validation in ematches
  [NET_SCHED]: Use nla_policy for attribute validation in actions
  [NET_SCHED]: Use nla_policy for attribute validation in classifiers
  [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
  [NET_SCHED]: sch_api: introduce constant for rate table size
  [NET_SCHED]: Use typeful attribute parsing helpers
  [NET_SCHED]: Use typeful attribute construction helpers
  [NET_SCHED]: Use NLA_PUT_STRING for string dumping
  [NET_SCHED]: Use nla_nest_start/nla_nest_end
  [NET_SCHED]: Propagate nla_parse return value
  [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
  [NET_SCHED]: act_api: use nlmsg_parse
  [NET_SCHED]: act_api: fix netlink API conversion bug
  [NET_SCHED]: sch_netem: use nla_parse_nested_compat
  [NET_SCHED]: sch_atm: fix format string warning
  [NETNS]: Add namespace for ICMP replying code.
  ...
2008-01-29 22:54:01 +11:00
Linus Torvalds
5ea293a904 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits)
  Remove references to "make dep"
  kconfig: document use of HAVE_*
  Introduce new section reference annotations tags: __ref, __refdata, __refconst
  kbuild: warn about ld added unique sections
  kbuild: add verbose option to Section mismatch reporting in modpost
  kconfig: tristate choices with mixed tristate and boolean values
  asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies
  remove __attribute_used__
  kbuild: support ARCH=x86 in buildtar
  kconfig: remove "enable"
  kbuild: simplified warning report in modpost
  kbuild: introduce a few helpers in modpost
  kbuild: use simpler section mismatch warnings in modpost
  kbuild: link vmlinux.o before kallsyms passes
  kbuild: introduce new option to enhance section mismatch analysis
  Use separate sections for __dev/__cpu/__mem code/data
  compiler.h: introduce __section()
  all archs: consolidate init and exit sections in vmlinux.lds.h
  kbuild: check section names consistently in modpost
  kbuild: introduce blacklisting in modpost
  ...
2008-01-29 22:46:14 +11:00
Aneesh Kumar K.V
aa02ad67d9 ext4: Add ext4_find_next_bit()
This function is used by the ext4 multi block allocator patches.

Also add generic_find_next_le_bit

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-01-28 23:58:27 -05:00
Eric Dumazet
571e768202 [LIB] pcounter : unline too big functions
Before pushing pcounter to Linus tree, I would like to make some adjustments.

Goal is to reduce kernel text size, by unlining too big functions.

When a pcounter is bound to a statically defined per_cpu variable,
we define two small helpers functions. (No more folding function
using the fat for_each_possible_cpu(cpu) ... )

static DEFINE_PER_CPU(int, NAME##_pcounter_values);
static void NAME##_pcounter_add(struct pcounter *self, int val)
{
       __get_cpu_var(NAME##_pcounter_values) += val;
}
static int NAME##_pcounter_getval(const struct pcounter *self, int cpu)
{
       return per_cpu(NAME##_pcounter_values, cpu);
}

Fast path is therefore unchanged, while folding/alloc/free is now unlined.

This saves 228 bytes on i386

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:35 -08:00
Arnaldo Carvalho de Melo
de4d1db369 [LIB]: Introduce struct pcounter
This just generalises what was introduced by Eric Dumazet for the struct proto
inuse field in 286ab3d460:

    [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.

Please look at the comment in there to see the rationale.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:39 -08:00
Sam Ravnborg
588ccd732b kbuild: add verbose option to Section mismatch reporting in modpost
If the config option CONFIG_SECTION_MISMATCH is not set and
we see a Section mismatch present the following to the user:

modpost: Found 1 section mismatch(es).
To see additional details select "Enable full Section mismatch analysis"
in the Kernel Hacking menu (CONFIG_SECTION_MISMATCH).

If the option CONFIG_SECTION_MISMATCH is selected
then be verbose in the Section mismatch reporting from mdopost.
Sample outputs:

WARNING: o-x86_64/vmlinux.o(.text+0x7396): Section mismatch in reference from the function discover_ebda() to the variable .init.data:ebda_addr
The function  discover_ebda() references
the variable __initdata ebda_addr.
This is often because discover_ebda lacks a __initdata
annotation or the annotation of ebda_addr is wrong.

WARNING: o-x86_64/vmlinux.o(.data+0x74d58): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit()
The variable pci_serial_quirks references
the function __devexit pci_plx9050_exit()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: o-x86_64/vmlinux.o(__ksymtab+0x630): Section mismatch in reference from the variable __ksymtab_arch_register_cpu to the function .cpuinit.text:arch_register_cpu()
The symbol arch_register_cpu is exported and annotated __cpuinit
Fix this by removing the __cpuinit annotation of arch_register_cpu or drop the export.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:18 +01:00
Sam Ravnborg
91341d4b2c kbuild: introduce new option to enhance section mismatch analysis
Setting the option DEBUG_SECTION_MISMATCH will
report additional section mismatch'es but this
should in the end makes it possible to get rid of
all of them.

See help text in lib/Kconfig.debug for details.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:21:18 +01:00
James Bottomley
7cedb1f17f SG: work with the SCSI fixed maximum allocations.
SCSI sg table allocation has a maximum size (of SCSI_MAX_SG_SEGMENTS,
currently 128) and this will cause a BUG_ON() in SCSI if something
tries an allocation over it.  This patch adds a size limit to the
chaining allocator to allow the specification of the maximum
allocation size for chaining, so we always chain in units of the
maximum SCSI allocation size.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:54:49 +01:00
Jens Axboe
0db9299f48 SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
Manually doing chained sg lists is not trivial, so add some helpers
to make sure that drivers get it right.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:05:27 +01:00
Arjan van de Ven
9745512ce7 sched: latencytop support
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Ingo Molnar
6478d8800b sched: remove the !PREEMPT_BKL code
remove the !PREEMPT_BKL code.

this removes 160 lines of legacy code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:33 +01:00
Greg Kroah-Hartman
e374a2bfeb Kobject: fix coding style issues in kobject c files
Clean up the kobject.c and kobject_uevent.c files to follow the
proper coding style rules.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 21:59:04 -08:00
Kay Sievers
af5ca3f4ec Driver core: change sysdev classes to use dynamic kobject names
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00