Commit graph

398 commits

Author SHA1 Message Date
Nick Piggin
db64fe0225 mm: rewrite vmap layer
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap.  Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache.  This is all done under a global lock.  As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
 This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock.  It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems.  The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated.  So the addresses aren't allocated again until
a subsequent TLB flush.  A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap.  Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages.  Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
in nanoseconds per map+touch+unmap.

threads           vanilla         vmap rewrite
1                 14700           2900
2                 33600           3000
4                 49500           2800
8                 70631           2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram...  along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system.  I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now.  vmap is pretty well blown off the
profiles.

Before:
1352059 total                                      0.1401
798784 _write_lock                              8320.6667 <- vmlist_lock
529313 default_idle                             1181.5022
 15242 smp_call_function                         15.8771  <- vmap tlb flushing
  2472 __get_vm_area_node                         1.9312  <- vmap
  1762 remove_vm_area                             4.5885  <- vunmap
   316 map_vm_area                                0.2297  <- vmap
   312 kfree                                      0.1950
   300 _spin_lock                                 3.1250
   252 sn_send_IPI_phys                           0.4375  <- tlb flushing
   238 vmap                                       0.8264  <- vmap
   216 find_lock_page                             0.5192
   196 find_next_bit                              0.3603
   136 sn2_send_IPI                               0.2024
   130 pio_phys_write_mmr                         2.0312
   118 unmap_kernel_range                         0.1229

After:
 78406 total                                      0.0081
 40053 default_idle                              89.4040
 33576 ia64_spinlock_contention                 349.7500
  1650 _spin_lock                                17.1875
   319 __reg_op                                   0.5538
   281 _atomic_dec_and_lock                       1.0977
   153 mutex_unlock                               1.5938
   123 iget_locked                                0.1671
   117 xfs_dir_lookup                             0.1662
   117 dput                                       0.1406
   114 xfs_iget_core                              0.0268
    92 xfs_da_hashname                            0.1917
    75 d_alloc                                    0.0670
    68 vmap_page_range                            0.0462 <- vmap
    58 kmem_cache_alloc                           0.0604
    57 memset                                     0.0540
    52 rb_next                                    0.1625
    50 __copy_user                                0.0208
    49 bitmap_find_free_region                    0.2188 <- vmap
    46 ia64_sn_udelay                             0.1106
    45 find_inode_fast                            0.1406
    42 memcmp                                     0.2188
    42 finish_task_switch                         0.1094
    42 __d_lookup                                 0.0410
    40 radix_tree_lookup_slot                     0.1250
    37 _spin_unlock_irqrestore                    0.3854
    36 xfs_bmapi                                  0.0050
    36 kmem_cache_free                            0.0256
    35 xfs_vn_getattr                             0.0322
    34 radix_tree_lookup                          0.1062
    33 __link_path_walk                           0.0035
    31 xfs_da_do_buf                              0.0091
    30 _xfs_buf_find                              0.0204
    28 find_get_page                              0.0875
    27 xfs_iread                                  0.0241
    27 __strncpy_from_user                        0.2812
    26 _xfs_buf_initialize                        0.0406
    24 _xfs_buf_lookup_pages                      0.0179
    24 vunmap_page_range                          0.0250 <- vunmap
    23 find_lock_page                             0.0799
    22 vm_map_ram                                 0.0087 <- vmap
    20 kfree                                      0.0125
    19 put_page                                   0.0330
    18 __kmalloc                                  0.0176
    17 xfs_da_node_lookup_int                     0.0086
    17 _read_lock                                 0.0885
    17 page_waitqueue                             0.0664

vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.

[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
Adrian Bunk
73b4a24f5f init/do_mounts_md.c must #include <linux/delay.h>
This patch fixes the following compile error caused by commit
589f800bb1 ("fastboot: make the raid
autodetect code wait for all devices to init"):

    CC      init/do_mounts_md.o
  init/do_mounts_md.c: In function 'autodetect_raid':
  init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
  make[2]: *** [init/do_mounts_md.o] Error 1

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 13:35:34 -07:00
Thomas Petazzoni
ebf3f09c63 Configure out AIO support
This patchs adds the CONFIG_AIO option which allows to remove support
for asynchronous I/O operations, that are not necessarly used by
applications, particularly on embedded devices. As this is a
size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
save ~7 kilobytes of kernel code/data:

   text	   data	    bss	    dec	    hex	filename
1115067	 119180	 217088	1451335	 162547	vmlinux
1108025	 119048	 217088	1444161	 160941	vmlinux.new
  -7042    -132       0   -7174   -1C06 +/-

This patch has been originally written by Matt Mackall
<mpm@selenic.com>, and is part of the Linux Tiny project.

[randy.dunlap@oracle.com: build fix]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:51 -07:00
Nye Liu
889d51a107 initramfs: add option to preserve mtime from initramfs cpio images
When unpacking the cpio into the initramfs, mtimes are not preserved by
default.  This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
mtimes stored in the cpio image to be used when constructing the
initramfs.

For embedded applications that run exclusively out of the initramfs, this
is invaluable:

When building embedded application initramfs images, its nice to know when
the files were actually created during the build process - that makes it
easier to see what files were modified when so we can compare the files
that are being used on the image with the files used during the build
process.  This might help (for example) to determine if the target system
has all the updated files you expect to see w/o having to check MD5s etc.

In our environment, the whole system runs off the initramfs partition, and
seeing the modified times of the shared libraries (for example) helps us
find bugs that may have been introduced by the build system incorrectly
propogating outdated shared libraries into the image.

Similarly, many of the initializion/configuration files in /etc might be
dynamically built by the build system, and knowing when they were modified
helps us sanity check whether the target system has the "latest" files
etc.

Finally, we might use last modified times to determine whether a hot fix
should be applied or not to the running ramfs.

Signed-off-by: Nye Liu <nyet@nyet.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:31 -07:00
Geert Uytterhoeven
93fd85d005 identify_ramdisk_image(): correct typo about return value in comment
identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:

	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
		printk(KERN_NOTICE
		       "RAMDISK: Compressed image found at block %d\n",
		       start_block);
		nblocks = 0;
		^^^^^^^^^^^
		goto done;
	}

	...

done:
	sys_lseek(fd, start_block * BLOCK_SIZE, 0);
	kfree(buf);
	return nblocks;
	^^^^^^^^^^^^^^

Hence correct the typo in the comment, which has existed since the
addition of compressed ramdisk support in 1.3.48.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:30 -07:00
Linus Torvalds
2d51b75370 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
  raid, fastboot: hide RAID autodetect option if MD is compiled as a module
  raid: make RAID autodetect default a KConfig option
  warning: fix init do_mounts_md c
  fastboot: make the RAID autostart code print a message just before waiting
  fastboot: make the raid autodetect code wait for all devices to init
  fastboot: Fix bootgraph.pl initcall name regexp
  fastboot: fix issues and improve output of bootgraph.pl
  Add a script to visualize the kernel boot process / time
2008-10-14 12:28:02 -07:00
Linus Torvalds
20272c8994 Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc
* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
  proc: remove kernel.maps_protect
  proc: remove now unneeded ADDBUF macro
  [PATCH] proc: show personality via /proc/pid/personality
  [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
  proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
  proc: make grab_header() static
  proc: remove unused get_dma_list()
  proc: remove dummy vmcore_open()
  proc: proc_sys_root tweak
  proc: fix return value of proc_reg_open() in "too late" case

Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h
2008-10-13 10:04:04 -07:00
Arjan van de Ven
a364092a41 raid: make RAID autodetect default a KConfig option
RAID autodetect has the side effect of requiring synchronisation
of all device drivers, which can make the boot several seconds longer
(I've measured 7 on one of my laptops).... even for systems that don't
have RAID setup for the root filesystem (the only FS where this matters).

This patch makes the default for autodetect a config option; either way
the user can always override via the kernel command line.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NeilBrown <neilb@suse.de>
2008-10-12 08:25:02 -07:00
Ingo Molnar
82cbc11a41 warning: fix init do_mounts_md c
fix warning:

  init/do_mounts_md.c: In function ‘md_run_setup’:
  init/do_mounts_md.c:282: warning: ISO C90 forbids mixed declarations and code

also, use the opportunity to put the RAID autodetection code
into a separate function - this also solves a checkpatch style warning.

No code changed:

md5:
   aa36a35faef371b05f1974ad583bdbbd  do_mounts_md.o.before.asm
   aa36a35faef371b05f1974ad583bdbbd  do_mounts_md.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 08:24:34 -07:00
Arjan van de Ven
02c15def84 fastboot: make the RAID autostart code print a message just before waiting
As requested/suggested by Neil Brown: make the raid code print that it's
about to wait for probing to be done as well as give a suggestion on how
to disable the probing if the user doesn't use raid.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com
2008-10-12 08:23:53 -07:00
Arjan van de Ven
589f800bb1 fastboot: make the raid autodetect code wait for all devices to init
The raid autodetect code really needs to have all devices probed before
it can detect raid arrays; not doing so would give rather messy situations
where arrays would get detected as degraded while they shouldn't be etc.

This is in preparation of removing the "wait for everything to init"
code that makes everyone pay, not just raid users.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-12 08:23:04 -07:00
Arjan van de Ven
f9b9796ade Add a script to visualize the kernel boot process / time
When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with some of the initializing
going asynchronous soon, it's valuable to track/print which worker thread
is executing the initialization.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-12 08:07:20 -07:00
Alexey Dobriyan
53167a3ef2 proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-10 04:18:57 +04:00
Tejun Heo
55dc7db70a init: DEBUG_BLOCK_EXT_DEVT requires explicit root= param
DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless.  Root devices should
be explicitly specified using textual names.  Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled.  Also, add warning
to the help text.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-09 08:56:11 +02:00
Linus Torvalds
96d746c68f Fix init/main.c to use regular printk with '%pF' for initcall fn
.. small detail, but the silly e1000e initcall warning debugging caused
me to look at this code.  Rather than gouge my eyes out with a spoon, I
just fixed it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-03 13:38:07 -07:00
Andi Kleen
9e94cd325b Move sysctl check into debugging section and don't make it default y
I noticed that sysctl_check.o was the largest object file in
a allnoconfig build in kernel/*.

  36243       0       0   36243    8d93 kernel/sysctl_check.o

This is because it was default y and && EMBEDDED. But I don't
really see a need for a non kernel developer to have their
sysctls checked all the time.

So move the Kconfig into the kernel debugging section and
also drop the default y and the EMBEDDED check.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-16 17:13:43 -07:00
Arjan van de Ven
59f9415ffb modules: extend initcall_debug functionality to the module loader
The kernel has this really nice facility where if you put "initcall_debug"
on the kernel commandline, it'll print which function it's going to
execute just before calling an initcall, and then after the call completes
it will

1) print if it had an error code

2) checks for a few simple bugs (like leaving irqs off)
and

3) print how long the init call took in milliseconds.

While trying to optimize the boot speed of my laptop, I have been loving
number 3 to figure out what to optimize...  ...  and then I wished that
the same thing was done for module loading.

This patch makes the module loader use this exact same functionality; it's
a logical extension in my view (since modules are just sort of late
binding initcalls anyway) and so far I've found it quite useful in finding
where things are too slow in my boot.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-08-12 17:52:54 +10:00
Robert P. J. Day
0b0de14433 Kconfig: Extend "menuconfig" for modules to simplify Kconfig file
Given that the init/Kconfig file uses a "menuconfig" directive for
modules already, might as well wrap all the submenu entries in an "if"
to toss all those dependencies.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-08-06 22:14:04 +02:00
Bartlomiej Zolnierkiewicz
b5b9309d34 remove unnecessary <linux/hdreg.h> includes
Following files don't need <linux/hdreg.h> at all:

- arch/mips/jazz/setup.c
- arch/sh/boards/mach-systemh/irq.c
- drivers/macintosh/mediabay.c
- drivers/scsi/hptiop.c
- drivers/usb/storage/freecom.c
- arch/powerpc/include/asm/ide.h
- init/main.c

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-08-05 18:16:58 +02:00
Linus Torvalds
e811603feb Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: scripts/ver_linux: don't set PATH
  Kconfig/init: change help text to match default value
  kbuild: genksyms: Include extern information in dumps
  kbuild: genksyms parser: fix the __attribute__ rule
  kbuild: scripts/genksyms/lex.l: add %option noinput
  kconfig: scripts/kconfig/zconf.l: add %option noinput
  kbuild: fix O=... build of um
2008-08-01 11:50:21 -07:00
Linus Torvalds
57b1494d2b Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  generic, x86: fix add iommu_num_pages helper function
  x86: remove stray <6> in BogoMIPS printk
  x86: move dma32_reserve_bootmem() after reserve_crashkernel()
2008-08-01 10:28:17 -07:00
jkacur
775a7229ac Kconfig/init: change help text to match default value
Change the "If unsure" message to match the default value.

Signed-off-by: John Kacur <jkacur at gmail dot com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-31 23:33:10 +02:00
Geert Uytterhoeven
bd673c7c3b initrd: cast initrd_start' to void *'
commit fb6624ebd9 (initrd: Fix virtual/physical
mix-up in overwrite test) introduced the compiler warning below on mips,
as its virt_to_page() doesn't cast the passed address to unsigned long
internally, unlike on most other architectures:

init/main.c: In function `start_kernel':
init/main.c:633: warning: passing argument 1 of `virt_to_phys' makes pointer from integer without a cast
init/main.c:636: warning: passing argument 1 of `virt_to_phys' makes pointer from integer without a cast

For now, kill the warning by explicitly casting initrd_start to `void *', as
that's the type it should really be.

Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30 09:41:45 -07:00
Ingo Molnar
35780c8ea7 Merge commit 'v2.6.27-rc1' into x86/urgent 2008-07-29 12:10:50 +02:00
Ingo Molnar
cb28a1bbdb Merge branch 'linus' into core/generic-dma-coherent
Conflicts:

	arch/x86/Kconfig

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-29 00:07:55 +02:00
Joe Perches
d7ba11d01c x86: remove stray <6> in BogoMIPS printk
Rabin Vincent noticed that there's a stray <6> in BogoMIPS printk:

> Remove the extra KERN_INFO which causes this:
> Calibrating delay loop... <6>179.40 BogoMIPS (lpj=897024)
> -	printk(KERN_INFO "%lu.%02lu BogoMIPS (lpj=%lu)\n",
> -			loops_per_jiffy/(500000/HZ),
> -			(loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
> +	printk("%lu.%02lu BogoMIPS (lpj=%lu)\n",
> +		loops_per_jiffy/(500000/HZ),
> +		(loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
>  }

How about just using KERN_CONT and leaving the whitespace
for a patch that does the entire file?

Reported-by: Rabin Vincent <rabin@rab.in>
2008-07-28 14:22:26 +02:00
Linus Torvalds
6948385cbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
  setlocalversion: do not describe if there is nothing to describe
  kconfig: fix typos: "Suport" -> "Support"
  kconfig: make defconfig is no longer chatty
  kconfig: make oldconfig is now less chatty
  kconfig: speed up all*config + randconfig
  kconfig: set all new symbols automatically
  kconfig: add diffconfig utility
  kbuild: remove Module.markers during mrproper
  kbuild: sparse needs CF not CHECKFLAGS
  kernel-doc: handle/strip __init
  vmlinux.lds: move __attribute__((__cold__)) functions back into final .text section
  init: fix URL of "The GNU Accounting Utilities"
  kbuild: add arch/$ARCH/include to search path
  kbuild: asm symlink support for arch/$ARCH/include
  kbuild: support arch/$ARCH/include for tags, cscope
  kbuild: prepare headers_* for arch/$ARCH/include
  kbuild: install all headers when arch is changed
  kbuild: make clean removes *.o.* as well
  kbuild: optimize headers_* targets
  kbuild: only one call for include/ in make headers_*
  ...
2008-07-27 09:59:59 -07:00
Adrian Bunk
f56f6d30c7 make init/do_mounts.c:root_device_name static
This patch makes the needlessly global root_device_name static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:12 -07:00
Eduard - Gabriel Munteanu
7babe8db99 Full conversion to early_initcall() interface, remove old interface
A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls.  Now we remove the older interface, converting all
existing users to the new one.

[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu
c2147a5092 Better interface for hooking early initcalls
Added early initcall (pre-SMP) support, using an identical interface to
that of regular initcalls.  Functions called from do_pre_smp_initcalls()
could be converted to use this cleaner interface.

This is required by CPU hotplug, because early users have to register
notifiers before going SMP.  One such CPU hotplug user is the relay
interface with buffer-only channels, which needs to register such a
notifier, to be usable in early code.  This in turn is used by kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Heikki Orsila
12d2b8f951 kconfig: fix typos: "Suport" -> "Support"
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-25 22:12:52 +02:00
S.Çağlar Onur
37a4c94074 init: fix URL of "The GNU Accounting Utilities"
Following patch corrects URL of "The GNU Accounting Utilities" in init/Kconfig.

Noticed by: Bart Van Assche" <bart.vanassche@gmail.com>

Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-25 22:12:36 +02:00
Adrian Bunk
3ae4eed34b proper pid{hash,map}_init() prototypes
This patch adds proper prototypes for pid{hash,map}_init() in
include/linux/pid_namespace.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Daniel Guilak
197dcffc8b init/version.c: define version_string only if CONFIG_KALLSYMS is not defined
int Version_* is only used with ksymoops, which is only needed (according
to README and Documentation/Changes) if CONFIG_KALLSYMS is NOT defined.
Therefore this patch defines version_string only if CONFIG_KALLSYMS is not
defined.

Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Daniel Guilak
277e2c6959 init/version.c: silence sparse warning by declaring the version string
Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:28 -07:00
Thomas Petazzoni
2d6ffcca62 inflate: refactor inflate malloc code
Inflate requires some dynamic memory allocation very early in the boot
process and this is provided with a set of four functions:
malloc/free/gzip_mark/gzip_release.

The old inflate code used a mark/release strategy rather than implement
free.  This new version instead keeps a count on the number of outstanding
allocations and when it hits zero, it resets the malloc arena.

This allows removing all the mark and release implementations and unifying
all the malloc/free implementations.

The architecture-dependent code must define two addresses:
 - free_mem_ptr, the address of the beginning of the area in which
   allocations should be made
 - free_mem_end_ptr, the address of the end of the area in which
   allocations should be made. If set to 0, then no check is made on
   the number of allocations, it just grows as much as needed

The architecture-dependent code can also provide an arch_decomp_wdog()
function call.  This function will be called several times during the
decompression process, and allow to notify the watchdog that the system is
still running.  If an architecture provides such a call, then it must
define ARCH_HAS_DECOMP_WDOG so that the generic inflate code calls
arch_decomp_wdog().

Work initially done by Matt Mackall, updated to a recent version of the
kernel and improved by me.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:28 -07:00
Robert P. J. Day
cb345d7352 init/: delete hard-coded setting and testing of BUILD_CRAMDISK
There seems to be little point in explicitly setting, then testing the macro
BUILD_CRAMDISK within the context of a single source file.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Adrian Bunk
82c8253ac2 init/do_mounts.c should #include <linux/initrd.h>
Every file should include the headers containing the externs for its
global code (in this case for rd_doload).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Linus Torvalds
7f9dce3837 Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: hrtick_enabled() should use cpu_active()
  sched, x86: clean up hrtick implementation
  sched: fix build error, provide partition_sched_domains() unconditionally
  sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
  cpu hotplug: Make cpu_active_map synchronization dependency clear
  cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
  sched: rework of "prioritize non-migratable tasks over migratable ones"
  sched: reduce stack size in isolated_cpu_setup()
  Revert parts of "ftrace: do not trace scheduler functions"

Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
2008-07-23 19:36:53 -07:00
Johannes Berg
baabaae981 make CONFIG_KMOD invisible
... as preparation for removing it completely, make it an
invisible bool defaulting to yes.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:28 +10:00
Denys Vlasenko
f7f5b67557 Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module.c and module.h conatains code for finding
exported symbols which are declared with EXPORT_UNUSED_SYMBOL,
and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set
and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway
(because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then).

This patch adds required #ifdefs.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:27 +10:00
Geert Uytterhoeven
fb6624ebd9 initrd: Fix virtual/physical mix-up in overwrite test
On recent kernels, I get the following error when using an initrd:

| initrd overwritten (0x00b78000 < 0x07668000) - disabling it.

My Amiga 4000 has 12 MiB of RAM at physical address 0x07400000 (virtual
0x00000000).
The initrd is located at the end of RAM: 0x00b78000 - 0x00c00000 (virtual).
The overwrite test compares the (virtual) initrd location to the (physical)
first available memory location, which fails.

This patch converts initrd_start to a page frame number, so it can safely be
compared with min_low_pfn.

Before the introduction of discontiguous memory support on m68k
(12d810c1b8), min_low_pfn was just left
untouched by the m68k-specific code (zero, I guess), and everything worked
fine.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:24:40 -07:00
Ingo Molnar
f6dc8ccaab Merge branch 'linus' into core/generic-dma-coherent
Conflicts:

	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 21:13:20 +02:00
Max Krasnyansky
e761b77252 cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
	while true; do make clean; make -j 8; done
and
	while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 13:22:25 +02:00
Linus Torvalds
9c1be0c471 Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: include to compilation
  UBIFS: add new flash file system
  UBIFS: add brief documentation
  MAINTAINERS: add UBIFS section
  do_mounts: allow UBI root device name
  VFS: export sync_sb_inodes
  VFS: move inode_lock into sync_sb_inodes
2008-07-16 15:02:57 -07:00
Linus Torvalds
59190f4213 Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  generic-ipi: more merge fallout
  generic-ipi: merge fix
  x86, visws: use mach-default/entry_arch.h
  x86, visws: fix generic-ipi build
  generic-ipi: fixlet
  generic-ipi: fix s390 build bug
  generic-ipi: fix linux-next tree build failure
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix "smp_call_function: get rid of the unused nonatomic/retry argument"
  on_each_cpu(): kill unused 'retry' parameter
  smp_call_function: get rid of the unused nonatomic/retry argument
  sh: convert to generic helpers for IPI function calls
  parisc: convert to generic helpers for IPI function calls
  mips: convert to generic helpers for IPI function calls
  m32r: convert to generic helpers for IPI function calls
  arm: convert to generic helpers for IPI function calls
  alpha: convert to generic helpers for IPI function calls
  ia64: convert to generic helpers for IPI function calls
  powerpc: convert to generic helpers for IPI function calls
  ...

Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
2008-07-15 14:12:03 -07:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Ingo Molnar
6c9fcaf2ee Merge branch 'core/rcu' into core/rcu-for-linus 2008-07-15 21:10:12 +02:00
Adrian Hunter
2d62f48858 do_mounts: allow UBI root device name
Similarly to MTD devices, allow UBI devices.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-07-14 19:10:52 +03:00
Dmitry Baryshkov
ee7e5516be generic: per-device coherent dma allocator
Currently x86_32, sh and cris-v32 provide per-device coherent dma
memory allocator.

However their implementation is nearly identical. Refactor out
common code to be reused by them.

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 12:51:05 +02:00