ACPI defines a hardware signature. BIOS calculates the signature according to
hardware configure and if hardware changes while hibernated, the signature
will change. In that case, S4 resume should fail.
Still, there may be systems on which this mechanism does not work correctly,
so it is better to provide a workaround for them. For this reason, add a new
switch to the acpi_sleep= command line argument allowing one to disable
hardware signature checking.
[shaohua.li@intel.com: build fix]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <Valdis.Kletnieks@vt.edu>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boot-time test for system suspend states (STR or standby). The generic
RTC framework triggers wakeup alarms, which are used to exit those states.
- Measures some aspects of suspend time ... this uses "jiffies" until
someone converts it to use a timebase that works properly even while
timer IRQs are disabled.
- Triggered by a command line parameter. By default nothing even
vaguely troublesome will happen, but "test_suspend=mem" will give
you a brief STR test during system boot. (Or you may need to use
"test_suspend=standby" instead, if your hardware needs that.)
This isn't without problems. It fires early enough during boot that for
example both PCMCIA and MMC stacks have misbehaved. The workaround in
those cases was to boot without such media cards inserted.
[matthltc@us.ibm.com: fix compile failure in boot time suspend selftest]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of using the variable mmu_huge_psize to keep track of the huge
page size we use an array of MMU_PAGE_* values. For each supported huge
page size we need to know the hugepte_shift value and have a
pgtable_cache. The hstate or an mmu_huge_psizes index is passed to
functions so that they know which huge page size they should use.
The hugepage sizes 16M and 64K are setup(if available on the hardware) so
that they don't have to be set on the boot cmd line in order to use them.
The number of 16G pages have to be specified at boot-time though (e.g.
hugepagesz=16G hugepages=5).
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow configurations with the default huge page size which is different to
the traditional HPAGE_SIZE size. The default huge page size is the one
represented in the legacy /proc ABIs, SHM, and which is defaulted to when
mounting hugetlbfs filesystems.
This is implemented with a new kernel option default_hugepagesz=, which
defaults to HPAGE_SIZE if not specified.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an hugepagesz=... option similar to IA64, PPC etc. to x86-64.
This finally allows to select GB pages for hugetlbfs in x86 now that all
the infrastructure is in place.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boot initialisation is very complex, with significant numbers of
architecture-specific routines, hooks and code ordering. While significant
amounts of the initialisation is architecture-independent, it trusts the data
received from the architecture layer. This is a mistake, and has resulted in
a number of difficult-to-diagnose bugs.
This patchset adds some validation and tracing to memory initialisation. It
also introduces a few basic defensive measures. The validation code can be
explicitly disabled for embedded systems.
This patch:
Add additional debugging and verification code for memory initialisation.
Once enabled, the verification checks are always run and when required
additional debugging information may be outputted via a mminit_loglevel=
command-line parameter.
The verification code is placed in a new file mm/mm_init.c. Ideally other mm
initialisation code will be moved here over time.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix invalid proc_handler for softlockup_panic
softlockup: fix watchdog task wakeup frequency
softlockup: fix watchdog task wakeup frequency
softlockup: show irqtrace
softlockup: print a module list on being stuck
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
softlockup: fix softlockup_thresh fix
softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
softlockup: allow panic on lockup
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netfilter: nf_conntrack_sctp: fix sparse warnings
netfilter: nf_nat_sip: c= is optional for session
netfilter: xt_TCPMSS: collapse tcpmss_reverse_mtu{4,6} into one function
netfilter: nfnetlink_log: send complete hardware header
netfilter: xt_time: fix time's time_mt()'s use of do_div()
netfilter: accounting rework: ct_extend + 64bit counters (v4)
netlink: add NLA_PUT_BE64 macro
netfilter: nf_nat_core: eliminate useless find_appropriate_src for IP_NAT_RANGE_PROTO_RANDOM
hdlcdrv: Fix CRC calculation.
Revert "pkt_sched: Make default qdisc nonshared-multiqueue safe."
net: In __netif_schedule() use WARN_ON instead of BUG_ON
net: Improve simple_tx_hash().
pkt_sched: Remove unused variable skb in dev_deactivate_queue function.
sunhme: Remove stop/wake TX queue calls in set-multicast-list handler.
ucc_geth: do not touch net queue in adjust_link phylib callback
gianfar: do not touch net queue in adjust_link phylib callback
atl1: Do not wake queue before queue has been started.
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.
This patch:
- reimplements accounting with respect to the extension infrastructure,
- makes one global version of seq_print_acct() instead of two seq_print_counters(),
- makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
- makes it possible to enable/disable it at runtime by sysctl or sysfs,
- extends counters from 32bit to 64bit,
- renames ip_conntrack_counter -> nf_conn_counter,
- enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
- set initial accounting enable state based on CONFIG_NF_CT_ACCT
- removes buggy IPCT_COUNTER_FILLING event handling.
If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not possible to enable the unknown_nmi_panic sysctl option
until init is run. It's useful to be able to panic the kernel
during boot too, this adds a parameter to enable this option.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is against linux-2.6-tip, branch pci-ioapic-boot-irq-quirks.
From: Stefan Assmann <sassmann@suse.de>
Subject: Introduce config option for pci reroute quirks
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS is introduced to
enable (or disable) the redirection of the interrupt handler to the boot
interrupt line by default. Depending on the existence of interrupt
masking / threaded interrupt handling in the kernel (vanilla, rt, ...)
and the maturity of the rerouting patch, users can enable or disable the
redirection by default.
This means that the reroute quirk can be applied to any kernel without
changing it.
Interrupt sharing could be increased if this option is enabled. However this
option is vital for threaded interrupt handling, as done by the RT kernel.
It should simplify the consolidation with the RT kernel.
The option can be overridden by either pci=ioapicreroute or
pci=noioapicreroute.
Signed-off-by: Stefan Assmann <sassmann@suse.de>
Signed-off-by: Olaf Dabrunz <od@suse.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jon Masters <jonathan@jonmasters.org>
Cc: Ihno Krumreich <ihno@suse.de>
Cc: Sven Dietrich <sdietrich@suse.de>
Cc: Daniel Gollub <dgollub@suse.de>
Cc: Felix Foerster <ffoerster@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
"idle=nomwait" disables the use of the MWAIT
instruction from both C1 (C1_FFH) and deeper (C2C3_FFH)
C-states.
When MWAIT is unavailable, the BIOS and OS generally
negotiate to use the HALT instruction for C1,
and use IO accesses for deeper C-states.
This option is useful for power and performance
comparisons, and also to work around BIOS bugs
where broken MWAIT support is advertised.
http://bugzilla.kernel.org/show_bug.cgi?id=10807http://bugzilla.kernel.org/show_bug.cgi?id=10914
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
"idle=halt" limits the idle loop to using
the halt instruction. No MWAIT, no IO accesses,
no C-states deeper than C1.
If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (80 commits)
ide-floppy: fix unfortunate function naming
ide-tape: unify idetape_create_read/write_cmd
ide: add ide_pc_intr() helper
ide-{floppy,scsi}: read Status Register before stopping DMA engine
ide-scsi: add more debugging to idescsi_pc_intr()
ide-scsi: use pc->callback
ide-floppy: add more debugging to idefloppy_pc_intr()
ide-tape: always log debug info in idetape_pc_intr() if debugging is enabled
ide-tape: add ide_tape_io_buffers() helper
ide-tape: factor out DSC handling from idetape_pc_intr()
ide-{floppy,tape}: move checking of ->failed_pc to ->callback
ide: add ide_issue_pc() helper
ide: add PC_FLAG_DRQ_INTERRUPT pc flag
ide-scsi: move idescsi_map_sg() call out from idescsi_issue_pc()
ide: add ide_transfer_pc() helper
ide-scsi: set drive->scsi flag for devices handled by the driver
ide-{cd,floppy,tape}: remove checking for drive->scsi
ide: add PC_FLAG_ZIP_DRIVE pc flag
ide-tape: factor out waiting for good ireason from idetape_transfer_pc()
ide-tape: set PC_FLAG_DMA_IN_PROGRESS flag in idetape_transfer_pc()
...
* 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: add PCI ID for 6300ESB force hpet
x86: add another PCI ID for ICH6 force-hpet
kernel-paramaters: document pmtmr= command line option
acpi_pm clccksource: fix printk format warning
nohz: don't stop idle tick if softirqs are pending.
pmtmr: allow command line override of ioport
nohz: reduce jiffies polling overhead
hrtimer: Remove unused variables in ktime_divns()
hrtimer: remove warning in hres_timers_resume
posix-timers: print RT watchdog message
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (241 commits)
[ARM] 5171/1: ep93xx: fix compilation of modules using clocks
[ARM] 5133/2: at91sam9g20 defconfig file
[ARM] 5130/4: Support for the at91sam9g20
[ARM] 5160/1: IOP3XX: gpio/gpiolib support
[ARM] at91: Fix NAND FLASH timings for at91sam9x evaluation kits.
[ARM] 5084/1: zylonite: Register AC97 device
[ARM] 5085/2: PXA: Move AC97 over to the new central device declaration model
[ARM] 5120/1: pxa: correct platform driver names for PXA25x and PXA27x UDC drivers
[ARM] 5147/1: pxaficp_ir: drop pxa_gpio_mode calls, as pin setting
[ARM] 5145/1: PXA2xx: provide api to control IrDA pins state
[ARM] 5144/1: pxaficp_ir: cleanup includes
[ARM] pxa: remove pxa_set_cken()
[ARM] pxa: allow clk aliases
[ARM] Feroceon: don't disable BPU on boot
[ARM] Orion: LED support for HP mv2120
[ARM] Orion: add RD88F5181L-FXO support
[ARM] Orion: add RD88F5181L-GE support
[ARM] Orion: add Netgear WNR854T support
[ARM] s3c2410_defconfig: update for current build
[ARM] Acer n30: Minor style and indentation fixes.
...
x2apic support. Interrupt-remapping must be enabled before enabling x2apic,
this is needed to ensure that IO interrupts continue to work properly after the
cpu mode is changed to x2apic(which uses 32bit extended physical/cluster
apic id).
On systems where apicid's are > 255, BIOS can handover the control to OS in
x2apic mode. Or if the OS handover was in legacy xapic mode, check
if it is capable of x2apic mode. And if we succeed in enabling
Interrupt-remapping, then we can enable x2apic mode in the CPU.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce pci=ioapicreroute kernel cmdline option to enable rerouting of boot
interrupts to the primary io-apic.
Signed-off-by: Stefan Assmann <sassmann@suse.de>
Signed-off-by: Olaf Dabrunz <od@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Applies on top of the previous patch:
x86 boot: add code to add BIOS provided EFI memory entries to kernel
Instead of always adding EFI memory map entries (if present) to the
memory map after initially finding either E820 BIOS memory map entries
and/or kernel command line memmap entries, -instead- only add such
additional EFI memory map entries if the kernel boot option:
add_efi_memmap
is specified.
Requiring this 'add_efi_memmap' option is backward compatible with
kernels that didn't load such additional EFI memory map entries in
the first place, and it doesn't override a configuration that tries
to replace all E820 or EFI BIOS memory map entries with ones given
entirely on the kernel command line.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: "Yinghai Lu" <yhlu.kernel@gmail.com>
Cc: "Jack Steiner" <steiner@sgi.com>
Cc: "Mike Travis" <travis@sgi.com>
Cc: "Huang
Cc: Ying" <ying.huang@intel.com>
Cc: "Andi Kleen" <andi@firstfloor.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Document the kernel boot parameter: relax_domain_level=.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix stale references to source files in kernel-parameters.txt.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ACPI PM: Add possibility to change suspend sequence
There are some systems out there that don't work correctly with
our current suspend/hibernation code ordering. Provide a workaround
for these systems allowing them to pass 'acpi_sleep=old_ordering' in
the kernel command line so that it will use the pre-ACPI 2.0 ("old")
suspend code ordering.
Unfortunately, this requires us to add a platform hook to the
resuming of devices for recovering the platform in case one of the
device drivers' .suspend() routines returns error code. Namely,
ACPI 1.0 specifies that _PTS should be called before suspending
devices, but _WAK still should be called before resuming them in
order to undo the changes made by _PTS. However, if there is an
error during suspending devices, they are automatically resumed
without returning control to the PM core, so the _WAK has to be
called from within device_resume() in that cases.
The patch also reorders and refactors the ACPI suspend/hibernation
code to avoid duplication as far as reasonably possible.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Contention for scarce PCI memory resources has been growing
due to an increasing number of PCI slots in large multi-node
systems. The kernel currently attempts by default to
allocate memory for all PCI expansion ROMs so there has
also been an increasing number of PCI memory allocation
failures seen on these systems. This occurs because the
BIOS either (1) provides insufficient PCI memory resource
for all the expansion ROMs or (2) provides adequate PCI
memory resource for expansion ROMs but provides the
space in kernel unexpected BIOS assigned P2P non-prefetch
windows.
The resulting PCI memory allocation failures may be benign
when related to memory requests for expansion ROMs themselves
but in some cases they can occur when attempting to allocate
space for more critical BARs. This can happen when a successful
expansion ROM allocation request consumes memory resource
that was intended for a non-ROM BAR. We have seen this
happen during PCI hotplug of an adapter that contains a
P2P bridge where successful memory allocation for an
expansion ROM BAR on device behind the bridge consumed
memory that was intended for a non-ROM BAR on the P2P bridge.
In all cases the allocation failure messages can be very
confusing for users.
This patch provides a new 'pci=norom' kernel boot parameter
that can be used to disable the default PCI expansion ROM memory
resource allocation. This provides a way to avoid the above
described issues on systems that do not contain PCI devices
for which drivers or user-level applications depend on the
default PCI expansion ROM memory resource allocation behavior.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Loop through mtrr chunk_size and gran_size from 1M to 2G to find out
the optimal value so user does not need to add mtrr_chunk_size and
mtrr_gran_size to the kernel command line.
If optimal value is not found, print out all list to help select less
optimal value.
Add mtrr_spare_reg_nr= so user could set 2 instead of 1, if the card
need more entries.
v2: find the one with more spare entries
v3: fix hole_basek offset
v4: tight the compare between range and range_new
loop stop with 4g
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Mika Fischer <mika.fischer@zoopnet.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
some BIOS like to use continus MTRR layout, and X driver can not add
WB entries for graphical cards when 4g or more RAM installed.
the patch will change MTRR to discrete.
mtrr_chunk_size= could be used to have smaller continuous block to hold holes.
default is 256m, could be set according to size of graphics card memory.
mtrr_gran_size= could be used to send smallest mtrr block to avoid run out of MTRRs
v2: fix -1 for UC checking
v3: default to disable, and need use enable_mtrr_cleanup to enable this feature
skip the var state change warning.
remove next_basek in range_to_mtrr()
v4: correct warning mask.
v5: CONFIG_MTRR_SANITIZER
v6: fix 1g, 2g, 512 aligment with extra hole
v7: gran_sizek to prevent running out of MTRRs.
v8: fix hole_basek caculation caused when removing next_basek
gran_sizek using when basek is 0.
need to apply
[PATCH] x86: fix trimming e820 with MTRR holes.
right after this one.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
allow users to configure the softlockup detector to generate a panic
instead of a warning message.
high-availability systems might opt for this strict method (combined
with panic_timeout= boot option/sysctl), instead of generating
softlockup warnings ad infinitum.
also, automated tests work better if the system reboots reliably (into
a safe kernel) in case of a lockup.
The full spectrum of configurability is supported: boot option, sysctl
option and Kconfig option.
it's default-disabled.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cio_msg= is gone, also remove it from kernel-parameters.txt.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The sequence executed in check_sal_cache_flush:
- pend a timer interrupt
- call SAL_CACHE_FLUSH
- see if interrupt is still pending
can hang HP machines with buggy SAL_CACHE_FLUSH implementations.
Provide a kernel command-line argument to allow users skip this
check if desired. Using this parameter will force ia64_sal_cache_flush
to call ia64_pal_cache_flush() instead of SAL_CACHE_FLUSH.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86 PCI: call dmi_check_pciprobe()
x86/pci: add pci=skip_isa_align command lines.
x86/pci: remove flag in pci_cfg_space_size_ext
x86: fix section mismatch in pci_scan_bus
Remove the rest of the old mac_esp driver. Also ditch the rest of the
machw mechanism, it needs to be replaced by a fake openfirmware tree.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
so we don't align the io port start address for pci cards.
also move out dmi check out acpi.c, because it has nothing to do with acpi.
it could spare some calling when we have several peer root buses.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We can see an ever repeating problem pattern with objects of any kind in the
kernel:
1) freeing of active objects
2) reinitialization of active objects
Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.
While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.
The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.
Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.
The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.
Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.
The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.
The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list
Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.
The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).
The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.
The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.
Thanks to Ingo Molnar for review, suggestions and cleanup patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a minimalistic braille screen reader support. This is meant to
be used by blind people e.g. on boot failures or when / cannot be mounted
etc and thus the userland screen readers can not work.
[akpm@linux-foundation.org: fix exports]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jiri Kosina <jikos@jikos.cz>
Cc: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk
Drive Services. CONFIG_EDD_OFF disables EDD while still compiling EDD into
the kernel. Default behavior can be forced using 'edd=on' or 'edd=off' as
a kernel parameter.
[akpm@linux-foundation.org: fix kernel-parameters.txt]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for OLPC XO hardware. Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also
adds functionality for running EC commands, and a CONFIG_OLPC.
A number of OLPC drivers depend upon CONFIG_OLPC.
olpc_ec_timeout is a hack to work around Embedded Controller bugs.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
This patch is for batching up the flushing of the IOTLB for the DMAR
implementation found in the Intel VT-d hardware. It works by building a list
of to be flushed IOTLB entries and a bitmap list of which DMAR engine they are
from.
After either a high water mark (250 accessible via debugfs) or 10ms the list
of iova's will be reclaimed and the DMAR engines associated are IOTLB-flushed.
This approach recovers 15 to 20% of the performance lost when using the IOMMU
for my netperf udp stream benchmark with small packets. It can be disabled
with a kernel boot parameter "intel_iommu=strict".
Its use does weaken the IOMMU protections a bit.
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We currently keep 2 lists of PCI devices in the system, one in the
driver core, and one all on its own. This second list is sorted at boot
time, in "BIOS" order, to try to remain compatible with older kernels
(2.2 and earlier days). There was also a "nosort" option to turn this
sorting off, to remain compatible with even older kernel versions, but
that just ends up being what we have been doing from 2.5 days...
Unfortunately, the second list of devices is not really ever used to
determine the probing order of PCI devices or drivers[1]. That is done
using the driver core list instead. This change happened back in the
early 2.5 days.
Relying on BIOS ording for the binding of drivers to specific device
names is problematic for many reasons, and userspace tools like udev
exist to properly name devices in a persistant manner if that is needed,
no reliance on the BIOS is needed.
Matt Domsch and others at Dell noticed this back in 2006, and added a
boot option to sort the PCI device lists (both of them) in a
breadth-first manner to help remain compatible with the 2.4 order, if
needed for any reason. This option is not going away, as some systems
rely on them.
This patch removes the sorting of the internal PCI device list in "BIOS"
mode, as it's not needed at all anymore, and hasn't for many years.
I've also removed the PCI flags for this from some other arches that for
some reason defined them, but never used them.
This should not change the ordering of any drivers or device probing.
[1] The old-style pci_get_device and pci_find_device() still used this
sorting order, but there are very few drivers that use these functions,
as they are deprecated for use in this manner. If for some reason, a
driver rely on the order and uses these functions, the breadth-first
boot option will resolve any problem.
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- noexec32 is on by default for years already
- add noexec32 to kernel-parameters and fix noexec typo in there
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the security= boot parameter. This is done to avoid LSM
registration clashes in case of more than one bult-in module.
User can choose a security module to enable at boot. If no
security= boot parameter is specified, only the first LSM
asking for registration will be loaded. An invalid security
module name will be treated as if no module has been chosen.
LSM modules must check now if they are allowed to register
by calling security_module_enable(ops) first. Modify SELinux
and SMACK to do so.
Do not let SMACK register smackfs if it was not chosen on
boot. Smackfs assumes that smack hooks are registered and
the initial task security setup (swapper->security) is done.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
This option is obsolete and can be removed safely.
It allows us to remove the pci_get_device_reverse() function from the
PCI core.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Fix coding style in pci-dma_64.c and add stubs for documentation. I
hope someone fills the rest, I understand maybe off and soft...
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'docs' of git://git.lwn.net/linux-2.6:
Add additional examples in Documentation/spinlocks.txt
Move sched-rt-group.txt to scheduler/
Documentation: move rpc-cache.txt to filesystems/
Documentation: move nfsroot.txt to filesystems/
Spell out behavior of atomic_dec_and_lock() in kerneldoc
Fix a typo in highres.txt
Fixes to the seq_file document
Fill out information on patch tags in SubmittingPatches
Add the seq_file documentation
Documentation/ is a little large, and filesystems/ seems an obvious
place for this file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem
As a result there will only ever be one call to foo->create(), at init time;
all processes will stay in this group, and the group will never be mounted on
a visible hierarchy. Any additional effects (e.g. not allocating metadata)
are up to the foo subsystem.
This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
but it could easily be extended to do so if any of the early_init systems
wanted it - I think it would just involve some nastier parameter processing
since it would occur before the command-line argument parser had been run.
Hugh said:
Ballpark figures, I'm trying to get this question out rather than
processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
to the affected paths, booting with cgroup_disable=memory cuts that back to
1% overhead (due to slightly bigger struct page).
I'm no expert on distros, they may have no interest whatever in
CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
without it, or apply the cgroup_disable=memory patches.
Unix bench's execl test result on x86_64 was
== just after boot without mounting any cgroup fs.==
mem_cgorup=off : Execl Throughput 43.0 3150.1 732.6
mem_cgroup=on : Execl Throughput 43.0 2932.6 682.0
==
[lizf@cn.fujitsu.com: fix boot option parsing]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch defines kernel parameter "nptcg=". The parameter overrides max number
of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some time ago it turned out that our suspend code ordering broke some
NVidia-based systems that hung if _PTS was executed with one of the PCI
devices, specifically a USB controller, in a low power state.
Then, it was noticed that the suspend code ordering was not compliant
with ACPI 1.0, although it was compliant with ACPI 2.0 (and later), and
it was argued that the code had to be changed for that reason (ref.
http://bugzilla.kernel.org/show_bug.cgi?id=9528).
So we did, but evidently we did wrong, because it's now turning out that
some systems have been broken by this change. Refs:
http://bugzilla.kernel.org/show_bug.cgi?id=10340https://bugzilla.novell.com/show_bug.cgi?id=374217#c16
[ I said at that time that something like this might happend, but the
majority of people involved thought that it was improbable due to the
necessity to preserve the compliance of hardware with ACPI 1.0. ]
This actually is a quite serious regression from 2.6.24.
Moreover, the ACPI 1.0 ordering of suspend code introduced another issue
that I have only noticed recently. Namely, if the suspend of one of
devices fails, the already suspended devices will be resumed without
executing _WAK before, which leads to problems on some systems (for
example, in such situations thermal management is broken on my HP
nx6325). Consequently, it also breaks suspend debugging on the affected
systems.
Note also, that the requirement to execute _PTS before suspending
devices does not really make sense, because the device in question may
be put into a low power state at run time for a reason unrelated to a
system-wide suspend.
For the reasons outlined above, the change of the suspend ordering
should be reverted, which is done by the patch below.
[ Felix Möller: "I am the reporter from the original Novell Bug:
https://bugzilla.novell.com/show_bug.cgi?id=374217
I just tried current git head (two hours ago) with the patch (the one
from the beginning of this thread) from Rafael and without it. With
the patch my MacBook does suspend without it does not." ]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Felix Möller <felix@derklecks.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Old-world powermacs don't set L2CR or L3CR on processor upgrade cards.
This simple patch allows the setting of L3CR via a kernel parameter
(like the existing kernel parameter to set L2CR).
Signed-off-by: Robert Brose <bob@qbjnet.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Provide example for memmap exclude option (it is slightly strange and
non-trivial) and provide nice small HOWTO for people with bad memory.
Signed-off-by: Jan-Simon Moeller <dl9pf@gmx.de>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This essentially reverts commit 71fc47a9ad
("ACPI: basic initramfs DSDT override support"), because the code simply
isn't ready.
It did ugly things to the init sequence to populate the rootfs image
early, but that just ended up showing other problems with the whole
approach. The fact is, the VFS layer simply isn't initialized this
early, and the relevant ACPI code should either run much later, or this
shouldn't be done at all.
For 2.6.25, we'll just pick the latter option. We can revisit this
concept later if necessary.
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Markus Gaugusch <dsdt@gaugusch.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move 00-INDEX entries to power/00-INDEX (and add entry for
pm_qos_interface.txt).
Update references to moved filenames.
Fix some trailing whitespace.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch implements libata.force module parameter which can
selectively override ATA port, link and device configurations
including cable type, SATA PHY SPD limit, transfer mode and NCQ.
For example, you can say "use 1.5Gbps for all fan-out ports attached
to the second port but allow 3.0Gbps for the PMP device itself, oh,
the device attached to the third fan-out port chokes on NCQ and
shouldn't go over UDMA4" by the following.
libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch removes the mca-pentium boot option that was a noop.
besides the source code cleanup factor, this saves some text as well:
arch/x86/kernel/cpu/bugs.o:
text data bss dec hex filename
651 77 4 732 2dc bugs.o.before
631 53 4 688 2b0 bugs.o.after
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The scheduled removal of the 'time' option.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The acpi_no_initrd_override parameter permits to disable the load of an ACPI
table from the initramfs.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Len Brown <len.brown@intel.com>
with module_param macro, the __setup code can be killed now:
const __setup("all-generic-ide", ide_generic_all_on);
and the module name "generic.ko" is not descriptive to its functionality,
can be changed in Makefile, the "ide-pci-generic.ko" is better.
the ide-pci-generic.all-generic-ide parameter also documented
in Documentation/kernel-parameters.txt
Signed-off-by: Denis Cheng <crquan@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The ACPI 1.0 specification wants us to put devices into low power
states after executing the _PTS global control method, while ACPI
2.0 and later want us to do that in the reverse order. The current
suspend code follows ACPI 2.0 in that respect which causes some
ACPI 1.0x systems to hang during suspend (ref.
http://bugzilla.kernel.org/show_bug.cgi?id=9528).
Make the suspend code execute _PTS before putting devices into low
power states (ie. in accordance with ACPI 1.0x) and provide a command
line option to override the default if need be.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
The new "mfgptfix" boot command line option may be usd to fix MFGPT
timers on AMD Geode platforms when the BIOS has incorrectly applied
a workaround. TinyBIOS version 0.98 is known to be affected, 0.99
fixes the problem by letting the user disable the workaround.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
when MTRRs are not covering the whole e820 table, we need to trim the
RAM and need to update e820.
reuse some code on 64-bit as well.
here need to add early_get_cap and use it in early_cpu_detect, and move
mtrr_bp_init early.
The code successfully trimmed the memory map on Justin's system:
from:
[ 0.000000] BIOS-e820: 0000000100000000 - 000000022c000000 (usable)
to:
[ 0.000000] modified: 0000000100000000 - 0000000228000000 (usable)
[ 0.000000] modified: 0000000228000000 - 000000022c000000 (reserved)
According to Justin it makes quite a difference:
| When I boot the box without any trimming it acts like a 286 or 386,
| takes about 10 minutes to boot (using raptor disks).
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add a generic option to clear any cpuid bit. I added it because it was
very easy to add with the new generic cpuid disable bitmap and perhaps
it will be useful in the future.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To disable CLFLUSH usage, especially in change_page_attr().
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all
available RAM, meaning the last few megs (or even gigs) of memory will be
marked uncached. Since Linux tends to allocate from high memory addresses
first, this causes the machine to be unusably slow as soon as the kernel
starts really using memory (i.e. right around init time).
This patch works around the problem by scanning the MTRRs at boot and
figuring out whether the current end_pfn value (setup by early e820 code)
goes beyond the highest WB MTRR range, and if so, trimming it to match. A
fairly obnoxious KERN_WARNING is printed too, letting the user know that
not all of their memory is available due to a likely BIOS bug.
Something similar could be done on i386 if needed, but the boot ordering
would be slightly different, since the MTRR code on i386 depends on the
boot_cpu_data structure being setup.
This patch fixes a bug in the last patch that caused the code to run on
non-Intel machines (AMD machines apparently don't need it and it's untested
on other non-Intel machines, so best keep it off).
Further enhancements and fixes from:
Yinghai Lu <Yinghai.Lu@Sun.COM>
Andi Kleen <ak@suse.de>
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For K8 system: 4G RAM with memory hole remapping enabled, or more than
4G RAM installed.
when try to use kexec second kernel, and the first doesn't include
gart_shutdown. the second kernel could have different aper position than
the first kernel. and second kernel could use that hole as RAM that is
still used by GART set by the first kernel. esp. when try to kexec
2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
10). the new kernel will use aper by GART (set by first kernel) for
vmemmap. and after new kernel setting one new GART. the position will be
real RAM. the _mapcount set is lost.
Bad page state in process 'swapper'
page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13
Call Trace:
[<ffffffff8026401f>] bad_page+0x63/0x8d
[<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5
[<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198
[<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76
[<ffffffff80ba3461>] mem_init+0x3b/0x152
[<ffffffff80b959d3>] start_kernel+0x236/0x2c2
[<ffffffff80b9511a>] _sinittext+0x11a/0x121
and
[ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
phys addr is : 0x1c200000
RHEL 5.1 kernel -53 said:
PCI-DMA: aperture base @ 1c000000 size 65536 KB
new kernel said:
Mapping aperture over 65536 KB of RAM @ 3c000000
So could try to disable that GART if possible.
According to Ingo
> hm, i'm wondering, instead of modifying the GART, why dont we simply
> _detect_ whatever GART settings we have inherited, and propagate that
> into our e820 maps? I.e. if there's inconsistency, then punch that out
> from the memory maps and just dont use that memory.
>
> that way it would not matter whether the GART settings came from a [old
> or crashing] Linux kernel that has not called gart_iommu_shutdown(), or
> whether it's a BIOS that has set up an aperture hole inconsistent with
> the memory map it passed. (or the memory map we _think_ i tried to pass
> us)
>
> it would also be more robust to only read and do a memory map quirk
> based on that, than actively trying to change the GART so early in the
> bootup. Later on we have to re-enable the GART _anyway_ and have to
> punch a hole for it.
>
> and as a bonus, we would have shored up our defenses against crappy
> BIOSes as well.
add e820 modification for gart inconsistent setting.
gart_fix_e820=off could be used to disable e820 fix.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The 32 bit x86 tree has a very useful feature that prints the Code: line
for the code even before the trapping instrution (and the start of the
trapping instruction is then denoted with a <>). Unfortunately, the 64 bit
x86 tree does not yet have this feature, making diagnosing backtraces harder
than needed.
This patch adds this feature in the same was as the 32 bit tree has
(including the same kernel boot parameter), and including a bugfix
to make the code use probe_kernel_address() rarther than a buggy (deadlocking)
__get_user.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
support according to fixes of x86_64 support.
- Delete efi_rt_lock because it is used during system early boot,
before SMP is initialized.
- Change local_flush_tlb() to __flush_tlb_all() to flush global page
mapping.
- Clean up includes.
- Revise Kconfig description.
- Enable noefi kernel parameter on i386.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This makes x86_64's ia32 emulation support share the sources used in the
32-bit kernel for the 32-bit vDSO and much of its setup code.
The 32-bit vDSO mapping now behaves the same on x86_64 as on native 32-bit.
The abi.syscall32 sysctl on x86_64 now takes the same values that
vm.vdso_enabled takes on the 32-bit kernel. That is, 1 means a randomized
vDSO location, 2 means the fixed old address. The CONFIG_COMPAT_VDSO
option is now available to make this the default setting, the same meaning
it has for the 32-bit kernel. (This does not affect the 64-bit vDSO.)
The argument vdso32=[012] can be used on both 32-bit and 64-bit kernels to
set this paramter at boot time. The vdso=[012] argument still does this
same thing on the 32-bit kernel.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
various changes to the in_p/out_p delay details:
- add the io_delay=none method
- make each method selectable from the kernel config
- simplify the delay code a bit by getting rid of an indirect function call
- add the /proc/sys/kernel/io_delay_type sysctl
- change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed
- make the io delay config not depend on CONFIG_DEBUG_KERNEL
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "David P. Reed" <dpreed@reed.com>
x86: provide a DMI based port 0x80 I/O delay override.
Certain (HP) laptops experience trouble from our port 0x80 I/O delay
writes. This patch provides for a DMI based switch to the "alternate
diagnostic port" 0xed (as used by some BIOSes as well) for these.
David P. Reed confirmed that port 0xed works for him and provides a
proper delay. The symptoms of _not_ working are a hanging machine,
with "hwclock" use being a direct trigger.
Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist but that approach has
two problems.
First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but...
Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally some drivers may be racy on SMP without the bus
locking outb.
Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.
This also introduces a command-line parameter "io_delay" to override
the DMI based choice again:
io_delay=<standard|alternate>
where "standard" means using the standard port 0x80 and "alternate"
port 0xed.
This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and
command-line ("io_delay=udelay") choice for testing purposes as well.
This does not change the io_delay() in the boot code which is using
the same port 0x80 I/O delay but those do not appear to be a problem
as David P. Reed reported the problem was already gone after using the
udelay version. He moreover reported that booting with "acpi=off" also
fixed things and seeing as how ACPI isn't touched until after this DMI
based I/O port switch I believe it's safe to leave the ones in the boot
code be.
The DMI strings from David's HP Pavilion dv9000z are in there already
and we need to get/verify the DMI info from other machines with the
problem, notably the HP Pavilion dv6000z.
This patch is partly based on earlier patches from Pavel Machek and
David P. Reed.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Information about a ccw device will be dumped in
case of a ccw timeout. This can be enabled with
the kernel parameter ccw_timeout_log.
Signed-off-by: Sebastian Ott <sebott@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
[SCSI] usbstorage: use last_sector_bug flag universally
[SCSI] libsas: abstract STP task status into a function
[SCSI] ultrastor: clean up inline asm warnings
[SCSI] aic7xxx: fix firmware build
[SCSI] aacraid: fib context lock for management ioctls
[SCSI] ch: remove forward declarations
[SCSI] ch: fix device minor number management bug
[SCSI] ch: handle class_device_create failure properly
[SCSI] NCR5380: fix section mismatch
[SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
[SCSI] IB/iSER: add logical unit reset support
[SCSI] don't use __GFP_DMA for sense buffers if not required
[SCSI] use dynamically allocated sense buffer
[SCSI] scsi.h: add macro for enclosure bit of inquiry data
[SCSI] sd: add fix for devices with last sector access problems
[SCSI] fix pcmcia compile problem
[SCSI] aacraid: add Voodoo Lite class of cards.
[SCSI] aacraid: add new driver features flags
[SCSI] qla2xxx: Update version number to 8.02.00-k7.
[SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
...
Change the NMI handler to use the die notifier chain to signal anyone
who cares. Add a simple "nmi debugger" which hooks into this chain and
that may dump registers, task state, etc. when it happens.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
This adds the hugepagesz boot-time parameter for ppc64. It lets one
pick the size for huge pages. The choices available are 64K and 16M
when the base page size is 4k. It defaults to 16M (previously the
only only choice) if nothing or an invalid choice is specified.
Tested 64K huge pages successfully with the libhugetlbfs 1.2.
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Minor corrections and additions to 'scsi_logging_level', as pointed out
by Chuck Ebbert.
Also point out the IBM S390-tools 'scsi_logging_level' script.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The console is now by default in UTF-8 mode. Fix the documentation on
the default value, so that we can explain behaviour that otherwise
causes bug-reports like this:
http://bugzilla.kernel.org/show_bug.cgi?id=9319
Also add the needed "vt." prefix, so that the boot-time config options
to switch back to the legacy 8-bit mode is actually documented
correctly.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Start in POLL mode, and if we receive confirmation GPE,
switch to INT mode.
If confirmations are not sent, switch back to POLL.
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes
the limitation in Documentation/kernel-parameters.txt and prints a
warning at boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a quirk from LinuxBIOS to force enable HPET on
the nVidia CK804 (nForce 4) chipset.
This quirk can very likely support more than just nForce 4
(LinuxBIOS use the same code for nForce 5), and possibly nForce 3,
but I don't have those chipsets, so cannot add and test them.
Tested on an Abit KN9 (CK804).
Signed-off-by: Carlos Corbacho <cathectic@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Documentation/kernel-parameters.txt | 3 +-
arch/x86/kernel/quirks.c | 37 +++++++++++++++++++++++++++++++++++-
2 files changed, 38 insertions(+), 2 deletions(-)
Actual intel IOMMU driver. Hardware spec can be found at:
http://www.intel.com/technology/virtualization
This driver sets X86_64 'dma_ops', so hook into standard DMA APIs. In this
way, PCI driver will get virtual DMA address. This change is transparent to
PCI drivers.
[akpm@linux-foundation.org: remove unneeded cast]
[akpm@linux-foundation.org: build fix]
[bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
fix do_sys_open() prototype
sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
Documentation: Fix typo in SubmitChecklist.
Typo: depricated -> deprecated
Add missing profile=kvm option to Documentation/kernel-parameters.txt
fix typo about TBI in e1000 comment
proc.txt: Add /proc/stat field
small documentation fixes
Fix compiler warning in smount example program from sharedsubtree.txt
docs/sysfs: add missing word to sysfs attribute explanation
documentation/ext3: grammar fixes
Documentation/java.txt: typo and grammar fixes
Documentation/filesystems/vfs.txt: typo fix
include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
trivial copy_data_pages() tidy up
Fix typo in arch/x86/kernel/tsc_32.c
file link fix for Pegasus USB net driver help
remove unused return within void return function
Typo fixes retrun -> return
x86 hpet.h: remove broken links
...
Whilst looking up what profile=sleep did, I noticed that we missed
adding docs for the most recent addition to the profiler.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (33 commits)
x86: convert cpuinfo_x86 array to a per_cpu array
x86: introduce frame_pointer() and stack_pointer()
x86 & generic: change to __builtin_prefetch()
i386: do not BUG_ON() when MSR is unknown
x86: acpi use cpu_physical_id
x86: convert cpu_llc_id to be a per cpu variable
x86: convert cpu_to_apicid to be a per cpu variable
i386: introduce "used_vectors" bitmap which can be used to reserve vectors.
x86: use raw locks during oopses
x86: honor _PAGE_PSE bit on page walks
i386: do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
x86: implement missing x86_64 function smp_call_function_mask()
x86: use descriptor's functions instead of inline assembly
i386: consolidate show_regs and show_registers for i386
i386: make callgraph use dump_trace() on i386/x86_64
x86: enable iommu_merge by default
i386: i386 add AMD64 Barcelona PMU MSR definitions to msr.h
x86: Unify i386 and x86-64 early quirks
x86: enable HPET on ICH3 and ICH4
x86: force enable HPET on VT8235/8237 chipsets
...
Manually fix trivial conflict with task pid container helper changes in
arch/x86/kernel/process_32.c
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
given the following:
$ grep -rw sg_def_reserved_size *
Documentation/kernel-parameters.txt: sg_def_reserved_size= [SCSI]
$
that kernel parameter looks exceedingly dead.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
>From commit cd8c93a4e0:
"Similar functionality was turned on by acpi_fake_ecdt=1 command line
before. Now it is on all the time."
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Since the OSS-related "sb" kernel parameter doesn't seem to be
supported, drop its reference from the kernel-parameters.txt file.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
This adds the documentation for the extended crashkernel syntax into
Documentation/kdump/kdump.txt.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
add force_hpet boot option.
(this will be useful to make the forced-enable quirks depend on.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop
the serial console from being suspended when the rest of the machine goes
to sleep. This is incredibly useful for debugging power management-related
things; however, having it as a compile-time option has proved to be
incredibly inconvenient for us (OLPC). There are plenty of times that we
want serial console to not suspend, but for the most part we'd like serial
console to be suspended.
This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel
boot parameter (no_console_suspend). By default, the serial console will
be suspended along with the rest of the system; by passing
'no_console_suspend' to the kernel during boot, serial console will remain
alive during suspend.
For now, this is pretty serial console specific; further fixes could be
applied to make this work for things like netconsole.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the = into the __setup line.
Document the option in kernel-parameters.txt by adding a pointer
to the x86-64 specific documentation.
[ tglx: arch/x86 adaptation ]
Pointed out by Robert Day
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert LSM into a static interface, as the ability to unload a security
module is not required by in-tree users and potentially complicates the
overall security architecture.
Needlessly exported LSM symbols have been unexported, to help reduce API
abuse.
Parameters for the capability and root_plug modules are now specified
at boot.
The SECURITY_FRAMEWORK_VERSION macro has also been removed.
In a nutshell, there is no safe way to unload an LSM. The modular interface
is thus unecessary and broken infrastructure. It is used only by out-of-tree
modules, which are often binary-only, illegal, abusive of the API and
dangerous, e.g. silently re-vectoring SELinux.
[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: USB Kconfig fix]
[randy.dunlap@oracle.com: fix LSM kernel-doc]
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the "ramdisk" kernel parameter has been officially deprecated
since at least 2.6.18, might as well finally get rid of it.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add logo.nologo kernel boot option to disable the logo in order to provide
more screen space for kernel messages; especially useful when debugging and
screen space is more critical.
newport_con driver changes are untested.
[akpm@linux-foundation.org: cleanups, coding-style fixes]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Optionally add a boot delay after each kernel printk() call, crudely
measured in milliseconds, with a maximum delay of 10 seconds per printk.
Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.
It has been useful in cases like "during boot, my machine just reboots or the
screen goes black" by slowing down printk, (and adding initcall_debug), we can
usually see the last thing that happened before the lights went out which is
usually a valuable clue.
[akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
[akpm@linux-foundation.org: fix lots of stuff]
[bunk@stusta.de: kernel/printk.c: make 2 variables static]
[heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits)
Input: use full RCU API
Input: remove tsdev interface
Input: add support for Blackfin BF54x Keypad controller
Input: appletouch - another fix for idle reset logic
HWMON: hdaps - switch to using input-polldev
Input: add support for SEGA Dreamcast keyboard
Input: omap-keyboard - don't pretend we support changing keymap
Input: lifebook - fix X and Y axis range
Input: usbtouchscreen - add support for GeneralTouch devices
Input: fix open count handling in input interfaces
Input: keyboard - add CapsShift lock
Input: adbhid - produce all CapsLock key events
Input: ALPS - add signature for ThinkPad R61
Input: jornada720_kbd - send MSC_SCAN events
Input: add support for the HP Jornada 7xx (710/720/728) touchscreen
Input: add support for HP Jornada 7xx onboard keyboard
Input: add support for HP Jornada onboard keyboard (HP6XX)
Input: ucb1400_ts - use schedule_timeout_uninterruptible
Input: xpad - fix dependancy on LEDS class
Input: auto-select INPUT for MAC_EMUMOUSEBTN option
...
Resolved conflicts manually in drivers/hwmon/applesmc.c: converting from
a class device to a device and converting to use input-polldev created a
few apparently trivial clashes..
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
NFSv4: Fix a typo in nfs_inode_reclaim_delegation
NFS: Add a boot parameter to disable 64 bit inode numbers
NFS: nfs_refresh_inode should clear cache_validity flags on success
NFS: Fix a connectathon regression in NFSv3 and NFSv4
NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
SUNRPC: Don't call xprt_release in call refresh
SUNRPC: Don't call xprt_release() if call_allocate fails
SUNRPC: Fix buggy UDP transmission
[23/37] Clean up duplicate includes in
[2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
SUNRPC: Use correct type in buffer length calculations
SUNRPC: Fix default hostname created in rpc_create()
nfs: add server port to rpc_pipe info file
NFS: Get rid of some obsolete macros
NFS: Simplify filehandle revalidation
NFS: Ensure that nfs_link() returns a hashed dentry
NFS: Be strict about dentry revalidation when doing exclusive create
NFS: Don't zap the readdir caches upon error
NFS: Remove the redundant nfs_reval_fsid()
NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
...
Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (119 commits)
[libata] struct pci_dev related cleanups
libata: use ata_exec_internal() for PMP register access
libata: implement ATA_PFLAG_RESETTING
libata: add @timeout to ata_exec_internal[_sg]()
ahci: fix notification handling
ahci: clean up PORT_IRQ_BAD_PMP enabling
ahci: kill leftover from enabling NCQ over PMP
libata: wrap schedule_timeout_uninterruptible() in loop
libata: skip suppress reporting if ATA_EHI_QUIET
libata: clear ehi description after initial host report
pata_jmicron: match vendor and class code only
libata: add ST9160821AS / 3.ALD to NCQ blacklist
pata_acpi: ACPI driver support
libata-core: Expose gtm methods for driver use
libata: add HDT722516DLA380 to NCQ blacklist
libata: blacklist NCQ on Seagate Barracuda ST380817AS
[libata] Turn on ACPI by default
libata_scsi: Fix ATAPI transfer lengths
libata: correct handling of SRST reset sequences
libata: Integrate ACPI-based PATA/SATA hotplug - version 5
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (37 commits)
PCI: merge almost all of pci_32.h and pci_64.h together
PCI: X86: Introduce and enable PCI domain support
PCI: Add 'nodomains' boot option, and pci_domains_supported global
PCI: modify PCI bridge control ISA flag for clarity
PCI: use _CRS for PCI resource allocation
PCI: avoid P2P prefetch window for expansion ROMs
PCI: skip ISA ioresource alignment on some systems
PCI: remove transparent bridge sizing
pci: write file size to inode on proc bus file write
pci: use size stored in proc_dir_entry for proc bus files
pci: implement "pci=noaer"
PCI: fix IDE legacy mode resources
MSI: Use correct data offset for 32-bit MSI in read_msi_msg()
PCI: Fix incorrect argument order to list_add_tail() in PCI dynamic ID code
PCI: i386: Compaq EVO N800c needs PCI bus renumbering
PCI: Remove no longer correct documentation regarding MSI vector assignment
PCI: re-enable onboard sound on "MSI K8T Neo2-FIR"
PCI: quirk_vt82c586_acpi: Omit reading PCI revision ID
PCI: quirk amd_8131_mmrbc: Omit reading pci revision ID
cpqphp: Use PCI_CLASS_REVISION instead of PCI_REVISION_ID for read
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
PM: merge device power-management source files
sysfs: add copyrights
kobject: update the copyrights
kset: add some kerneldoc to help describe what these strange things are
Driver core: rename ktype_edd and ktype_efivar
Driver core: rename ktype_driver
Driver core: rename ktype_device
Driver core: rename ktype_class
driver core: remove subsystem_init()
sysfs: move sysfs file poll implementation to sysfs_open_dirent
sysfs: implement sysfs_open_dirent
sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
sysfs: make sysfs_root a regular directory dirent
sysfs: open code sysfs_attach_dentry()
sysfs: make s_elem an anonymous union
sysfs: make bin attr open get active reference of parent too
sysfs: kill unnecessary NULL pointer check in sysfs_release()
sysfs: kill unnecessary sysfs_get() in open paths
sysfs: reposition sysfs_dirent->s_mode.
sysfs: kill sysfs_update_file()
...
* Introduce pci_domains_supported global, hardcoded to zero if
!CONFIG_PCI_DOMAINS.
* Introduce 'nodomains' boot option, which clears pci_domains_supported
on platforms that enable it by default (x86, x86-64, and others when
they are converted to use this).
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use _CRS for PCI resource allocation
This patch resolves an issue where incorrect PCI memory and i/o ranges
are being assigned to hotplugged PCI devices on some IBM systems. The
resource mis-allocation not only makes the PCI device unuseable but
often makes the entire system unuseable due to resulting machine checks.
The hotplug capable PCI slots on the affected systems are not located
under a standard P2P bridge but are instead located under PCI root
bridges or subtractive decode P2P bridges. For example, the IBM x3850
contains 2 hotplug capable PCI-X slots and 4 hotplug capable PCIe slots
with the PCI-X slots each located under a PCI root bridge and the PCIe
slots each located under a subtractive decode P2P bridge.
The current i386/x86_64 PCI resource allocation code does not use _CRS
returned resource information. No other resource information source is
available for slots that are not below a standard P2P bridge so
incorrect ranges are being allocated from e820 hole causing the bad
result.
This patch causes the kernel to use _CRS returned resource info. It is
roughly based on a change provided by Matthew Wilcox for the ia64 kernel
in 2005. Due to possible buggy BIOS factor and possible yet to be
discovered kernel issues the function is disabled by default and can be
enabled with pci=use_crs.
Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For cases in which CONFIG_PCIEAER=y (such as distro kernels), allow users
to disable PCIE Advanced Error Reporting by using "pci=noaer" on the
kernel command line.
This can be used to work around hardware or (kernel) software problems.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add support for an MFGPT clock event device; this allows us to use MFGPTs as
the basis for high-resolution timers.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds support for Multi-Function General Purpose Timers. It detects the
available timers during southbridge init, and provides an API for allocating
and setting the timers. They're higher resolution than the standard PIT, so
the MFGPTs come in handy for quite a few things.
Note that we never clobber the timers that the BIOS might have opted to use;
we just check for unused timers.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This boot parameter will allow legacy 32-bit applications which call stat()
to continue to function even if the NFSv3/v4 server uses 64-bit inode
numbers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since this boot-time option was removed in commit
9ab7e323af, delete the reference to it.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Since this boot-time option was removed in commit
9ab7e323af, delete the reference to it.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled.
However, in ACPI mode, these parameters didn't completely disable
the IO APIC initialization code and boot failed.
init/main.c:
Disable the IO_APIC if "nosmp" or "maxcpus=0"
undefine disable_ioapic_setup() when it doesn't apply.
i386:
delete ioapic_setup(), it was a duplicate of parse_noapic()
delete undefinition of disable_ioapic_setup()
x86_64:
rename disable_ioapic_setup() to parse_noapic() to match i386
define disable_ioapic_setup() in header to match i386
http://bugzilla.kernel.org/show_bug.cgi?id=1641
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Some hardware will malfunction at a temperature below
the BIOS provided critical shutdown threshold.
This hook allows moving the critical trip points down
to a temperature which provokes a graceful shutdown
before the hardware malfunction.
http://bugzilla.kernel.org/show_bug.cgi?id=8884
WARNING: A trip-point override will not get noticed
until the system delivers a temperature change event,
or unless thermal zone polling is enabled.
eg. "thermal.tzp=10"
Signed-off-by: Len Brown <len.brown@intel.com>
thermal.act=-1 disables all active trip points
in all ACPI thermal zones.
thermal.act=C, where C > 0, overrides all lowest temperature
active trip points in all thermal zones to C degrees Celsius.
Raising this trip-point may allow you to keep your system silent
up to a higher temperature. However, it will not allow you to
raise the lowest temperature trip point above the next higher
trip point (if there is one). Lowering this trip point may
kick in the fan sooner.
Note that overriding this trip-point will disable any BIOS attempts
to implement hysteresis around the lowest temperature trip point.
This may result in the fan starting and stopping frequently
if temperature frequently crosses C.
WARNING: raising trip points above the manufacturer's defaults
may cause the system to run at higher temperature and shorten
its life.
Signed-off-by: Len Brown <len.brown@intel.com>
thermal.nocrt=1 disables actions on _CRT and _HOT
ACPI thermal zone trip-points. They will be marked
as <disabled> in /proc/acpi/thermal_zone/*/trip_points.
There are two cases where this option is used:
1. Debugging a hot system crossing valid trip point.
If your system fan is spinning at full speed,
be sure that the vent is not clogged with dust.
Many laptops have very fine thermal fins that are easily blocked.
Check that the processor fan-sink is properly seated,
has the proper thermal grease, and is really spinning.
Check for fan related options in BIOS SETUP.
Sometimes there is a performance vs quiet option.
Defaults are generally the most conservative.
If your fan is not spinning, yet /proc/acpi/fan/
has files in it, please file a Linux/ACPI bug.
WARNING: you risk shortening the lifetime of your
hardware if you use this parameter on a hot system.
Note that this refers to all system components,
including the disk drive.
2. Working around a cool system crossing critical
trip point due to erroneous temperature reading.
Try again with CONFIG_HWMON=n
There is known potential for conflict between the
the hwmon sub-system and the ACPI BIOS.
If this fixes it, notify lm-sensors@lm-sensors.org
and linux-acpi@vger.kernel.org
Otherwise, file a Linux/ACPI bug, or notify
just linux-acpi@vger.kernel.org.
Signed-off-by: Len Brown <len.brown@intel.com>
"thermal.psv=-1" disables passive trip points
for all ACPI thermal zones.
"thermal.psv=C", where 'C' is degrees Celsius,
overrides all existing passive trip points
for all ACPI thermal zones.
thermal.psv is checked at module load time,
and in response to trip-point change events.
Note that if the system does not deliver thermal zone
temperature change events near the new trip-point,
then it will not be noticed. To force your custom
trip point to be noticed, you may need to enable polling:
eg. thermal.tzp=3000 invokes polling every 5 minutes.
Note that once passive thermal throttling is invoked,
it has its own internal Thermal Sampling Period (_TSP),
that is unrelated to _TZP.
WARNING: disabling or raising a thermal trip point
may result in increased running temperature and
shorter hardware lifetime on some systems.
Signed-off-by: Len Brown <len.brown@intel.com>
Thermal Zone Polling frequency (_TZP) is an optional ACPI object
recommending the rate that the OS should poll the associated thermal zone.
If _TZP is 0, no polling should be used.
If _TZP is non-zero, then the platform recommends that
the OS poll the thermal zone at the specified rate.
The minimum period is 30 seconds.
The maximum period is 5 minutes.
(note _TZP and thermal.tzp units are in deci-seconds,
so _TZP = 300 corresponds to 30 seconds)
If _TZP is not present, ACPI 3.0b recommends that the
thermal zone be polled at an "OS provided default frequency".
However, common industry practice is:
1. The BIOS never specifies any _TZP
2. High volume OS's from this century never poll any thermal zones
Ie. The OS depends on the platform's ability to
provoke thermal events when necessary, and
the "OS provided default frequency" is "never":-)
There is a proposal that ACPI 4.0 be updated to reflect
common industry practice -- ie. no _TZP, no polling.
The Linux kernel already follows this practice --
thermal zones are not polled unless _TZP is present and non-zero.
But thermal zone polling is useful as a workaround for systems
which have ACPI thermal control, but have an issue preventing
thermal events. Indeed, some Linux distributions still
set a non-zero thermal polling frequency for this reason.
But rather than ask the user to write a polling frequency
into all the /proc/acpi/thermal_zone/*/polling_frequency
files, here we simply document and expose the already
existing module parameter to do the same at system level,
to simplify debugging those broken platforms.
Note that thermal.tzp is a module-load time parameter only.
Signed-off-by: Len Brown <len.brown@intel.com>
"thermal.off=1" disables all ACPI thermal support at boot time.
CONFIG_ACPI_THERMAL=n can do this at build time.
"# rmmod thermal" can do this at run time,
as long as thermal is built as a module.
WARNING: On some systems, disabling ACPI thermal support
will cause the system to run hotter and reduce the
lifetime of the hardware.
Signed-off-by: Len Brown <len.brown@intel.com>
Documentation/watchdog/watchdog.txt does not exist, it is Documentation/watchdog/wdt.txt
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows debugging of problems which happen eary in the kernel
boot process (after bootargs are parsed, but before serial subsystem
is fully initialized)
Signed-off-by: Robin Getz <robin.getz@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Revert 7e92b4fc34. It broke Sébastien Dugué's
machine and Jeff said (persuasively)
This seems like it will break decades-long-working stuff, in favor of
breaking new ground in our favorite area, "trusting the BIOS."
It's just not worth it for serial ports, IMO. Serial ports are something
that just shouldn't break at this late stage in the game. My new Intel
platform boxes don't even have serial ports, so I question the value of
messing with serial port probing even more... because... just wait a year,
and your box won't have a serial port either! :)
I certainly don't object to the use of platform devices (or isa_driver),
but the probe change seems questionable. That's sorta analagous to
rewriting the floppy driver probe routine. Sure you could do it... but why
risk all that damage and go through debugging all over again?
It seems clear from this report that we cannot, should not, trust BIOS for
something (a) so simple and (b) that has been working for over a decade.
Much discussion ensued and we've decided to have another go at all of this.
Cc: Sébastien Dugué <sebastien.dugue@bull.net>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jeff Garzik <jeff@garzik.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Sascha Sommer <saschasommer@freenet.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"acpi_no_auto_ssdt" prevents Linux from automatically loading
all the SSDTs listed in the RSDT/XSDT.
This is needed for debugging. In particular,
it allows a DSDT override to optionally be a DSDT+SSDT override.
http://bugzilla.kernel.org/show_bug.cgi?id=3774
Signed-off-by: Len Brown <len.brown@intel.com>
This implements new vDSO for x86-64. The concept is similar
to the existing vDSOs on i386 and PPC. x86-64 has had static
vsyscalls before, but these are not flexible enough anymore.
A vDSO is a ELF shared library supplied by the kernel that is mapped into
user address space. The vDSO mapping is randomized for each process
for security reasons.
Doing this was needed for clock_gettime, because clock_gettime
always needs a syscall fallback and having one at a fixed
address would have made buffer overflow exploits too easy to write.
The vdso can be disabled with vdso=0
It currently includes a new gettimeofday implemention and optimized
clock_gettime(). The gettimeofday implementation is slightly faster
than the one in the old vsyscall. clock_gettime is significantly faster
than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
The new calls are generally faster than the old vsyscall.
Advantages over the old x86-64 vsyscalls:
- Extensible
- Randomized
- Cleaner
- Easier to virtualize (the old static address range previously causes
overhead e.g. for Xen because it has to create special page tables for it)
Weak points:
- glibc support still to be written
The VM interface is partly based on Ingo Molnar's i386 version.
Includes compile fix from Joachim Deguara
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds a new parameter for sizing ZONE_MOVABLE called
movablecore=. While kernelcore= is used to specify the minimum amount of
memory that must be available for all allocation types, movablecore= is
used to specify the minimum amount of memory that is used for migratable
allocations. The amount of memory used for migratable allocations
determines how large the huge page pool could be dynamically resized to at
runtime for example.
How movablecore is actually handled is that the total number of pages in
the system is calculated and a value is set for kernelcore that is
kernelcore == totalpages - movablecore
Both kernelcore= and movablecore= can be safely specified at the same time.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the kernelcore= parameter for x86.
Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.
From: Yasunori Goto <y-goto@jp.fujitsu.com>
When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.
This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.
[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-CPU vector domain support for IA64_GENERIC. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
splice: direct splicing updates ppos twice
more ACSI removal
umem: Fix match of pci_ids in umem driver
umem: Remove references to dead CONFIG_MM_MAP_MEMORY variable
remove the documentation for the legacy CDROM drivers
It's useful sometimes to disable the softlockup checker at boottime.
Especially if it triggers during a distro install.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some buses (e.g. USB and MMC) do their scanning of devices in the
background, causing a race between them and prepare_namespace(). In order
to be able to use these buses without an initrd, we now wait for the device
specified in root= to actually show up.
If the device never shows up than we will hang in an infinite loop. In
order to not mess with setups that reboot on panic, the feature must be
turned on via the command line option "rootwait".
[bunk@stusta.de: root_wait can become static]
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow printk_time to be enabled or disabled at boot time. Previously it
could be enabled only, but not disabled.
Change printk_time from an int to a bool since that's what it is. Make its
logical (exposed) name just be "time" (was "printk_time").
Note: Changes kernel boot option syntax from "time" to "printk.time=value".
Since printk_time is declared as a module_param, it can also be
changed at run-time by modifying
/sys/module/printk/parameters/time
to a value of 1/Y/y to enabled it or 0/N/n to disable it.
Since printk_time is declared as a module_param, its value can also
be set at boot-time by using
linux printk.time=<bool>
If the "time" boot option is used, print a message that it is deprecated
and will be removed.
Note its planned removal in feature-removal-schedule.txt.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the print-fatal-signals=1 boot option and the
/proc/sys/kernel/print-fatal-signals runtime switch.
This feature prints some minimal information about userspace segfaults to
the kernel console. This is useful to find early bootup bugs where
userspace debugging is very hard.
Defaults to off.
[akpm@linux-foundation.org: Don't add new sysctl numbers]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch contains the scheduled removal of OSS drivers that:
- have ALSA drivers for the same hardware without known regressions and
- whose Kconfig options have been removed in 2.6.20.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new configuration variable
CONFIG_SLUB_DEBUG_ON
If set then the kernel will be booted by default with slab debugging
switched on. Similar to CONFIG_SLAB_DEBUG. By default slab debugging
is available but must be enabled by specifying "slub_debug" as a
kernel parameter.
Also add support to switch off slab debugging for a kernel that was
built with CONFIG_SLUB_DEBUG_ON. This works by specifying
slub_debug=-
as a kernel parameter.
Dave Jones wanted this feature.
http://marc.info/?l=linux-kernel&m=118072189913045&w=2
[akpm@linux-foundation.org: clean up switch statement]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make zonelist creation policy selectable from sysctl/boot option v6.
This patch makes NUMA's zonelist (of pgdat) order selectable.
Available order are Default(automatic)/ Node-based / Zone-based.
[Default Order]
The kernel selects Node-based or Zone-based order automatically.
[Node-based Order]
This policy treats the locality of memory as the most important parameter.
Zonelist order is created by each zone's locality. This means lower zones
(ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
IOW. ZONE_DMA will be in the middle of zonelist.
current 2.6.21 kernel uses this.
Pros.
* A user can expect local memory as much as possible.
Cons.
* lower zone will be exhansted before higher zone. This may cause OOM_KILL.
Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
because of ZONE_DMA exhaution and you need the best locality.
(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
*node(0)'s memory allocation order:
node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.
*node(1)'s memory allocation order:
node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
[Zone-based order]
This policy treats the zone type as the most important parameter.
Zonelist order is created by zone-type order. This means lower zone
never be used bofere higher zone exhaustion.
IOW. ZONE_DMA will be always at the tail of zonelist.
Pros.
* OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
Cons.
* memory locality may not be best.
(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
*node(0)'s memory allocation order:
node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.
*node(1)'s memory allocation order:
node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.
command:
%echo N > /proc/sys/vm/numa_zonelist_order
Will rebuild zonelist in Node-based order.
command:
%echo Z > /proc/sys/vm/numa_zonelist_order
Will rebuild zonelist in Zone-based order.
Thanks to Lee Schermerhorn, he gives me much help and codes.
[Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
[akpm@linux-foundation.org: build fix]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in
serial initializing stage. the console_init=>serial8250_console_init=>
register_console=>serial8250_console_setup will return -ENDEV, and console
ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in
drivers/serial/serial_core.c to call register_console to get console ttyS0.
that is too late.
Make early_uart to use early_param, so uart console can be used earlier. Make
it to be bootconsole with CON_BOOT flag, so can use console handover feature.
and it will switch to corresponding normal serial console automatically.
new command line will be:
console=uart8250,io,0x3f8,9600n8
console=uart8250,mmio,0xff5e0000,115200n8
or
earlycon=uart8250,io,0x3f8,9600n8
earlycon=uart8250,mmio,0xff5e0000,115200n8
it will print in very early stage:
Early serial console at I/O port 0x3f8 (options '9600n8')
console [uart0] enabled
later for console it will print:
console handover: boot [uart0] -> real [ttyS0]
Signed-off-by: <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the documentation for the removed legacy CDROM drivers.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.
this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.
(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)
under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This looks like leftover text in the kernel parameter in documentation.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update documentation to describe how to read a SLUB error report.
Add slub parameters to Documentation/kernel-parameters.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The boot option "acpi_osi=" has always disabled Linux _OSI support,
thus disabling all OS Interface strings which are advertised
by Linux to the BIOS.
Now...
acpi_osi="string" adds the interface string, and
acpi_osi="!string" invalidates the pre-defined interface string
eg. acpi_osi="!Windows 2006"
would disable Linux's claim of Vista compatibility.
Signed-off-by: Len Brown <len.brown@intel.com>
Document the available clocksources per platform and move clocksource= into
the correct (alpha) location in the file.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Looks like you removed the combined_mode quirk (yay!) but didn't update
kernel-parameters.txt... might confuse people. Here's a patch to remove
mention of it from the documentation.
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add description to Documentation/kernel-parameters.txt on new options
default_blue, default_grn, default_red, and default_utf8.
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make x86 COM ports into platform devices and don't probe for them
if we have PNP.
This prevents double discovery, where a device was found both by
the legacy probe and by 8250_pnp, e.g.,
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
This also means IRDA devices without a UART PNP ID will no longer be
claimed by the serial driver, which might require changes in IRDA
drivers and administration.
In addition to this patch, you may need to configure a setserial init
script, e.g., /etc/init.d/setserial, so it doesn't poke legacy UART
stuff back in. On Debian, "dpkg-reconfigure setserial" with the "kernel"
option does this.
To force the old legacy probe behavior even when we have PNPBIOS or
ACPI, load the new legacy_serial module (or build 8250 static) with
the "legacy_serial.force" option.
[akpm@linux-foundation.org: fix makefiles]
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Russell King <rmk+serial@arm.linux.org.uk>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Claim devices using PNP, unless the user explicitly specified device
addresses. This can be disabled with the "smsc-ircc2.nopnp" option.
This removes the need for probing legacy addresses and helps untangle IR
devices from serial8250 devices.
Sometimes the SMC device is at a legacy COM port address but does not use the
legacy COM IRQ. In this case, claiming the device using PNP rather than 8250
legacy probe means we can automatically use the correct IRQ rather than
forcing the user to use "setserial" to set the IRQ manually.
If the PNP claim doesn't work, make sure you don't have a setserial init
script, e.g., /etc/init.d/setserial, configured to poke in legacy COM port
resources for the IRDA device. That causes the serial driver to claim
resources needed by this driver.
Based on this patch by Ville Syrjälä:
http://www.hpl.hp.com/personal/Jean_Tourrilhes/IrDA/ir260_smsc_pnp.diff
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Russell King <rmk+serial@arm.linux.org.uk>
Acked-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It doesn't put the CPU into deeper sleep states, so it's better to use the standard
idle loop to save power. But allow to reenable it anyways for benchmarking.
I also removed the obsolete idle=halt on i386
Cc: andreas.herrmann@amd.com
Signed-off-by: Andi Kleen <ak@suse.de>
Now that relocation of the VDSO for COMPAT_VDSO users is done at
runtime rather than compile time, it is possible to enable/disable
compat mode at runtime.
This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the
kernel command line, or via sysctl. (Switching on a running system
shouldn't be done lightly; any process which was relying on the compat
VDSO will be upset if it goes away.)
The COMPAT_VDSO config option still exists, but if enabled it just
makes vdso_enabled default to VDSO_COMPAT.
+From: Hugh Dickins <hugh@veritas.com>
Fix oops from i386-make-compat_vdso-runtime-selectable.patch.
Even mingetty at system startup finds it easy to trigger an oops
while reading /proc/PID/maps: though it has a good hold on the mm
itself, that cannot stop exit_mm() from resetting tsk->mm to NULL.
(It is usually show_map()'s call to get_gate_vma() which oopses,
and I expect we could change that to check priv->tail_vma instead;
but no matter, even m_start()'s call just after get_task_mm() is racy.)
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (105 commits)
sonypi: use mutex instead of semaphore
sony-laptop: remove user visible camera controls as platform attributes
meye: make meye use sony-laptop instead of sonypi
sony-laptop: add a meye-usable include file for camera ops
sony-laptop: complete the motion eye camera support in sony-laptop
sonypi: try to detect if sony-laptop has already taken one of the known ioports
sonypi: suggest sonypi users to try sony-laptop instead
sony-laptop: add edge modem support (also called WWAN)
sony-laptop: add locking on accesses to the ioport and global vars
sony-laptop: add camera enable/disable parameter, better handle possible infinite loop
thinkpad-acpi: make drivers/misc/thinkpad_acpi:fan_mutex static
ACPI: thinkpad-acpi: add sysfs support to wan and bluetooth subdrivers
ACPI: thinkpad-acpi: add sysfs support to hotkey subdriver
ACPI: thinkpad-acpi: improve dock subdriver initialization
ACPI: thinkpad-acpi: improve debugging for acpi helpers
ACPI: thinkpad-acpi: improve fan control documentation
ACPI: thinkpad-acpi: map ENXIO to EINVAL for fan sysfs
ACPI: thinkpad-acpi: fix a fan watchdog invocation
ACPI: thinkpad-acpi: do not arm fan watchdog if it would not work
ACPI: thinkpad-acpi: add a fan-control feature master toggle
...
This patch (as867) adds an entry for the new power/autosuspend
attribute in Documentation/ABI/testing, and it changes the behavior of
the delay value. Now a delay of 0 means to autosuspend as soon as
possible, and negative values will prevent autosuspend.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now we use acpi.debug_level and acpi.debug_layer as kernel boot
parameters instead of acpi_dbg_level and acpi_dbg_layer.
Thanks to Andi Kleen for pointing it out.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Needed for any architecture that claims ARCH_APICTIMER_STOPS_ON_C3,
not just i386.
I'm hoping Thomas will clean this up a bit later..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turned out that it is almost impossible to trust ACPI, BIOS & Co.
regarding the C states. This was the reason to switch the local apic
timer off in C2 state already. OTOH there are sane and well behaving
systems, which get punished by that decision.
Allow the user to confirm that the local apic timer is trustworthy in C2
state. This keeps the default behaviour on the safe side.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: IA64: fix %ll build warnings
ACPI: IA64: fix allnoconfig build
ACPI: Only use IPI on known broken machines (AMD, Dothan/BaniasPentium M)
ACPI: ibm-acpi: allow module to load when acpi notifiers can't be set (v2)
ACPI: parse 2nd MADT by default
ACPICA: revert "acpi_serialize" changes
sony-laptop: MAINTAINERS fix entry, add L: and W:
ACPI: resolve HP nx6125 S3 immediate wakeup regression
ACPI: Add support to parse 2nd MADT
The local APIC timer stops to work in deeper C-States. This is handled by
the ACPI code and a broadcast mechanism in the clockevents / tick managment
code.
Some systems do not expose the deeper C-States to the kernel, but switch
into deeper C-States behind the kernels back. This delays the local apic
timer interrupts for ever and makes the systems unusable.
Add a command line option to disable the local apic timer and a dmi
quirk for known broken systems.
Andi sayeth:
While not wrong by itself i think it is still better to use some heuristic
-- like "has battery in ACPI" With the DMI table if the problem is more wide
spread we will just continue extending it.
But anyways should be ok now for .21 although I'm not really happy with
it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Grudgingly-acked-by: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a BIOS bug presents multiple APIC/MADTs,
Linux currently uses the 1st and ignores the 2nd.
But some machines work better if we use the 2nd.
http://bugzilla.kernel.org/show_bug.cgi?id=7465
Add a warning and boot parameter "acpi_apic_instance=2"
to allow parsing the 2nd.
No change to default behaviour in this patch.
Signed-off-by: Len Brown <len.brown@intel.com>
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Kill off I/O cruft for R7780RP.
sh: Revert lazy dcache writeback changes.
sh: Enable SM501 support for RTS7751R2D.
sh: Use L1_CACHE_BYTES for .data.cacheline_aligned.
sysctl: Support vdso_enabled sysctl on SH.
sh: Fix kernel thread stack corruption with preempt.
doc: Add SH to vdso and earlyprintk in kernel-parameters.txt
sh: Fix sigmask trampling in signal delivery.
sh: Clear UBC when not in use.
Provide a module param "pool_mode" for sunrpc.ko which allows a sysadmin to
choose the mode for mapping NFS thread service pools to CPUs. Values are:
auto choose a mapping mode heuristically
global (default, same as the pre-2.6.19 code) a single global pool
percpu one pool per CPU
pernode one pool per NUMA node
Note that since 2.6.19 the hardcoded behaviour has been "auto", this patch
makes the default "global".
The pool mode can be changed after boot/modprobe using /sys, if the NFS and
lockd services have been shut down. A useful side effect of this change is to
fix a small memory leak when unloading the module.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (as859) makes the default USB autosuspend delay a module
parameter of usbcore. By setting the delay value at boot time, users
will be able to prevent the system from autosuspending devices which
for some reason can't handle it.
The patch also stores the autosuspend delay as a per-device value. A
later patch will allow the user to change the value, tailoring the
delay for each individual device. A delay value of 0 will prevent
autosuspend.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
* 'acpi' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] libata: wrong sizeof for BUFFER
[PATCH] libata: change order of _SDD/_GTF execution (resend #3)
[PATCH] libata: ACPI _SDD support
[PATCH] libata: ACPI and _GTF support
Some people are confused about maxcpus=1 and maxcpus=0,
so put the documentation text from init/main.c into
Documentation/kernel-parameters.txt also.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
CARDBUS_MEM_SIZE was increased to 64MB on 2.6.20-rc2, but larger size might
result in allocation failure for the reserving itself on some platforms
(for example typical 32bit MIPS). Make it (and CARDBUS_IO_SIZE too)
customizable by "pci=" option for such platforms.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
_GTF is an acpi method that is used to reinitialize the drive. It returns
a task file containing ata commands that are sent back to the drive to restore
it to boot up defaults.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
(cherry picked from 9c69cab24b51a89664f4c0dfaf8a436d32117624 commit)
Implement high resolution timers on top of the hrtimers infrastructure and the
clockevents / tick-management framework. This provides accurate timers for
all hrtimer subsystem users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With Ingo Molnar <mingo@elte.hu>
Add functions to provide dynamic ticks and high resolution timers. The code
which keeps track of jiffies and handles the long idle periods is shared
between tick based and high resolution timer based dynticks. The dyntick
functionality can be disabled on the kernel commandline. Provide also the
infrastructure to support high resolution timers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes developers need to see more object code in an oops report,
e.g. when kernel may be corrupted at runtime.
Add the "code_bytes" option for this.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- add SWIOTLB config help text
- mention Documentation/x86_64/boot-options.txt in
Documentation/kernel-parameters.txt
- remove the duplication of the iommu kernel parameter documentation.
- Better explanation of some of the iommu kernel parameter options.
- "32MB<<order" instead of "32MB^order".
- Mention the default "order" value.
- list the four existing PCI-DMA mapping implementations of arch x86_64
- group the iommu= option keywords by PCI-DMA mapping implementation.
- Distinguish iommu= option keywords from number arguments.
- Explain the meaning of DAC and SAC.
Signed-off-by: Karsten Weiss <knweiss@science-computing.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add retain_initrd option to control freeing of initrd memory after
extraction. By default, free memory as previously.
The first boot will need to hold a copy of the in memory fs for the second
boot. This image can be large (much larger than the kernel), hence we can
save time when the memory loader is slow. Also, it reduces the memory
footprint while extracting the first boot since you don't need another copy
of the fs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Certain boards seem to like to issue false overcurrent notifications,
for example on ports that don't have anything connected to them. This
looks like a hardware error, at the level of noise to those ports'
overcurrent input signals (or non-debounced VBUS comparators). This
surfaces to users as truly massive amounts of syslog spam from khubd
(which is appropriate for real hardware problems, except for the
volume from multiple ports).
Using this new "ignore_oc" flag helps such systems work more sanely,
by preventing such indications from getting to khubd (and spamming
syslog). The downside is of course that true overcurrent errors will
be masked; they'll appear as spontaneous disconnects, without the
diagnostics that will let users troubleshoot issues like
short-circuited cables. In addition, controllers with no devices
attached will be forced to poll for new devices rather than relying on
interrupts, since each overcurrent event would generate a new
interrupt.
This patch (as826) is essentially a copy of David Brownell's ignore_oc
patch for ehci-hcd, ported to uhci-hcd.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Most distributions enable sysrq support but set it to 0 by default. Add a
sysrq_always_enabled boot option to always-enable sysrq keys. Useful for
debugging - without having to modify the disribution's config files (which
might not be possible if the kernel is on a live CD, etc.).
Also, while at it, clean up the sysrq interfaces.
[bunk@stusta.de: make sysrq_always_enabled_setup() static]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch set provides some fault-injection capabilities.
- kmalloc() failures
- alloc_pages() failures
- disk IO errors
We can see what really happens if those failures happen.
In order to enable these fault-injection capabilities:
1. Enable relevant config options (CONFIG_FAILSLAB, CONFIG_PAGE_ALLOC,
CONFIG_MAKE_REQUEST) and if you want to configure them via debugfs,
enable CONFIG_FAULT_INJECTION_DEBUG_FS.
2. Build and boot with this kernel
3. Configure fault-injection capabilities behavior by boot option or debugfs
- Boot option
failslab=
fail_page_alloc=
fail_make_request=
- Debugfs
/debug/failslab/*
/debug/fail_page_alloc/*
/debug/fail_make_request/*
Please refer to the Documentation/fault-injection/fault-injection.txt
for details.
4. See what really happens.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Don Mullis <dwm@meer.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes the kernel prints something interesting while userspace bootup
keeps messages turned off via loglevel. Enable the printing of /all/
kernel messages via the "ignore_loglevel" boot option. Off by default.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement prof=sleep profiling. TASK_UNINTERRUPTIBLE sleeps will be taken
as a profile hit, and every millisecond spent sleeping causes a profile-hit
for the call site that initiated the sleep.
Sample readprofile output on i386:
306 ps2_sendbyte 1.3973
432 call_usermodehelper_keys 1.9548
484 ps2_command 0.6453
790 __driver_attach 4.7879
1593 msleep 44.2500
3976 sync_buffer 64.1290
4076 do_lookup 12.4648
8587 sync_page 122.6714
20820 total 0.0067
(NOTE: architectures need to check whether get_wchan() can be called from
deep within the wakeup path.)
akpm: we need to mark more functions __sched. lock_sock(), msleep(), others..
akpm: the contention in do_lookup() is a surprise. Presumably doing disk
reads for directory contents while holding i_mutex.
[akpm@osdl.org: various fixes]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows a hyphenated range of positive numbers in the string passed
to command line helper function, get_options.
Currently the command line option "isolcpus=" takes as its argument a
list of cpus.
Format: <cpu number>,...,<cpu number>
Valid values of <cpu_number> include all cpus, 0 to "number of CPUs in
system - 1". This can get extremely long when isolating the majority of
cpus on a large system. The kernel isolcpus code would not need any
changing to use this feature. To use it, the change would be in the
command line format for 'isolcpus='
Format:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number> (must be a positive range in ascending
order.)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
Signed-off-by: Derek Fults <dfults@sgi.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Document the "resume_offset=" command line parameter as well as the way in
which swap files are supported by swsusp.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When using numa=fake on non-NUMA hardware there is no benefit to having the
alien caches, and they consume much memory.
Add a kernel boot option to disable them.
Christoph sayeth "This is good to have even on large NUMA. The problem is
that the alien caches grow by the square of the size of the system in terms of
nodes."
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add debugging printks to the unwinder to allow easier debugging
when something goes wrong with it.
This can be controlled with the new unwinder_debug=N option
Most output is given by N=1
AK: Added documentation of unwinder_debug=
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add a way to disable the timer IRQ routing check via a boot option. The
VMI timer code uses this to avoid triggering the pester Mingo code, which
probes for some very unusual and broken motherboard routings. It fires
100% of the time when using a paravirtual delay mechanism instead of using
a realtime delay, since there is no elapsed real time, and the 4 timer IRQs
have not yet been delivered.
In addition, it is entirely possible, though improbable, that this bug
could surface on real hardware which picks a particularly bad time to enter
SMM mode, causing a long latency during one of the timer IRQs.
While here, make check_timer be __init.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
[chrisw: use no_timer_check to bring inline with x86_64 as per Andi's request]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
It's bitrotten, long unmaintained, long hidden under BROKEN_ON_SMP,
etc. As scheduled in feature-removal-schedule.txt, and ack'd several
times on lkml.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Timer overrides are normally disabled on Nvidia board because
they are commonly wrong, except on new ones with HPET support.
Unfortunately there are quite some Asus boards around that
don't have HPET, but need a timer override.
We don't know yet how to handle this transparently,
but at least add a command line option to force the timer override
and let them boot.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation. Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively. Many people have come to expect this naming. Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations). I also have reports that various Sun and HP servers
have similar behavior.
Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also). 2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.
On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1. If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.
A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.
-[0000:00]-+-00.0 Intel Corporation 5000P Chipset Memory Controller Hub
+-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
+-1c.0-[0000:01-02]----00.0-[0000:02]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)
Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.
Solution:
The solution can come in multiple steps.
Suggested fix#1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels. It adds two new
command line options:
pci=bfsort
pci=nobfsort
to force the sort order, or not, as you wish. It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".
Suggested fix#2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in. I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first. It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.
Suggested fix#3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order. This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers). I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.
Feedback appreciated. Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.
You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm. I think
that's both safe and appropriate in this instance.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since it often takes around 20-30 seconds to scan a scsi bus, it's
highly advantageous to do this in parallel with other things. The bulk
of this patch is ensuring that devices don't change numbering, and that
all devices are discovered prior to trying to start init. For those
who build SCSI as modules, there's a new scsi_wait_scan module that will
ensure all bus scans are finished.
This patch only handles drivers which call scsi_scan_host. Fibre Channel,
SAS, SATA, USB and Firewire all need additional work.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch contains the scheduled removal of OSS drivers that:
- have ALSA drivers for the same hardware without known regressions and
- whose Kconfig options have been removed in 2.6.17.
[michal.k.k.piotrowski@gmail.com: build fix]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu
rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled,
which records the fact that one of CPUs has sent a resched IPI since the
last rcu_start_batch().
Roughly speaking, we need two rcu_start_batch()s in order to move callbacks
from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark
and continues to grow, we should send a resched IPI, and then do it again
after we gone through a quiescent state.
On the other hand, if it was already sent, we don't need to do it again
when another CPU detects overflow of the queue.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Documentation fix for the arm and arm26 architectures,
in which the reboot kernel parameter is set in arch/*/kernel/process.c
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Randy brought it to my attention that in proper english "can not" should always
be written "cannot". I donot see any reason to argue, even if I mightnot
understand why this rule exists. This patch fixes "can not" in several
Documentation files as well as three Kconfigs.
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
This patch fixes typos in various Documentation txts.
This patch addresses some words starting with the letter 'A'.
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Fix "quiet" parameter doc. No trailing '=' sign, no value after it. And
it disables "most" kernel messages, not all of them.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Resetting the devices during driver initialization can be a costly
operation in terms of time (especially scsi devices). This option can be
used by drivers to know that user forcibly wants the devices to be reset
during initialization.
This option can be useful while kernel is booting in unreliable
environment. For ex. during kdump boot where devices are in unknown
random state and BIOS execution has been skipped.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
Add a boot parameter to reserve high linear address space for hypervisors.
This is necessary to allow dynamically loaded hypervisor modules, which might
not happen until userspace is already running, and also provides a useful tool
to benchmark the performance impact of reduced lowmem address space.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some buggy systems can machine check when config space accesses
happen for some non existent devices. i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.
This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.
Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
This reverts commits 11012d419c and
40dd2d20f2, which allowed us to use the
MMIO accesses for PCI config cycles even without the area being marked
reserved in the e820 memory tables.
Those changes were needed for EFI-environment Intel macs, but broke some
newer Intel 965 boards, so for now it's better to revert to our old
2.6.17 behaviour and at least avoid introducing any new breakage.
Andi Kleen has a set of patches that work with both EFI and the broken
Intel 965 boards, which will be applied once they get wider testing.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm not sure if documenting this here is appropriate, but
if it is, here is some text to put there.
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a replacement for the earlier removal of the e820 MCFG check
we blacklist the Intel SDV with the original BIOS bug that
motivated that check. On those machines don't use MMCONFIG.
This also adds a new pci=mmconf parameter to override the blacklist.
Cc: Greg KH <gregkh@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable delay accounting by default so that feature gets coverage testing
without requiring special measures.
Earlier, it was off by default and had to be enabled via a boot time param.
This patch reverses the default behaviour to improve coverage testing. It
can be removed late in the kernel development cycle if its believed users
shouldn't have to incur any cost if they don't want delay accounting. Or
it can be retained forever if the utility of the stats is deemed common
enough to warrant keeping the feature on.
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initialization code related to collection of per-task "delay" statistics which
measure how long it had to wait for cpu, sync block io, swapping etc. The
collection of statistics and the interface are in other patches. This patch
sets up the data structures and allows the statistics collection to be
disabled through a kernel boot parameter.
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce DEBUG_LOCKING_API_SELFTESTS, which uses the generic lock debugging
code's silent-failure feature to run a matrix of testcases. There are 210
testcases currently:
+-----------------------
| Locking API testsuite:
+------------------------------+------+------+------+------+------+------+
| spin |wlock |rlock |mutex | wsem | rsem |
-------------------------------+------+------+------+------+------+------+
A-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok |
double unlock: ok | ok | ok | ok | ok | ok |
bad unlock order: ok | ok | ok | ok | ok | ok |
--------------------------------------+------+------+------+------+------+
recursive read-lock: | ok | | ok |
--------------------------------------+------+------+------+------+------+
non-nested unlock: ok | ok | ok | ok |
--------------------------------------+------+------+------+
hard-irqs-on + irq-safe-A/12: ok | ok | ok |
soft-irqs-on + irq-safe-A/12: ok | ok | ok |
hard-irqs-on + irq-safe-A/21: ok | ok | ok |
soft-irqs-on + irq-safe-A/21: ok | ok | ok |
sirq-safe-A => hirqs-on/12: ok | ok | ok |
sirq-safe-A => hirqs-on/21: ok | ok | ok |
hard-safe-A + irqs-on/12: ok | ok | ok |
soft-safe-A + irqs-on/12: ok | ok | ok |
hard-safe-A + irqs-on/21: ok | ok | ok |
soft-safe-A + irqs-on/21: ok | ok | ok |
hard-safe-A + unsafe-B #1/123: ok | ok | ok |
soft-safe-A + unsafe-B #1/123: ok | ok | ok |
hard-safe-A + unsafe-B #1/132: ok | ok | ok |
soft-safe-A + unsafe-B #1/132: ok | ok | ok |
hard-safe-A + unsafe-B #1/213: ok | ok | ok |
soft-safe-A + unsafe-B #1/213: ok | ok | ok |
hard-safe-A + unsafe-B #1/231: ok | ok | ok |
soft-safe-A + unsafe-B #1/231: ok | ok | ok |
hard-safe-A + unsafe-B #1/312: ok | ok | ok |
soft-safe-A + unsafe-B #1/312: ok | ok | ok |
hard-safe-A + unsafe-B #1/321: ok | ok | ok |
soft-safe-A + unsafe-B #1/321: ok | ok | ok |
hard-safe-A + unsafe-B #2/123: ok | ok | ok |
soft-safe-A + unsafe-B #2/123: ok | ok | ok |
hard-safe-A + unsafe-B #2/132: ok | ok | ok |
soft-safe-A + unsafe-B #2/132: ok | ok | ok |
hard-safe-A + unsafe-B #2/213: ok | ok | ok |
soft-safe-A + unsafe-B #2/213: ok | ok | ok |
hard-safe-A + unsafe-B #2/231: ok | ok | ok |
soft-safe-A + unsafe-B #2/231: ok | ok | ok |
hard-safe-A + unsafe-B #2/312: ok | ok | ok |
soft-safe-A + unsafe-B #2/312: ok | ok | ok |
hard-safe-A + unsafe-B #2/321: ok | ok | ok |
soft-safe-A + unsafe-B #2/321: ok | ok | ok |
hard-irq lock-inversion/123: ok | ok | ok |
soft-irq lock-inversion/123: ok | ok | ok |
hard-irq lock-inversion/132: ok | ok | ok |
soft-irq lock-inversion/132: ok | ok | ok |
hard-irq lock-inversion/213: ok | ok | ok |
soft-irq lock-inversion/213: ok | ok | ok |
hard-irq lock-inversion/231: ok | ok | ok |
soft-irq lock-inversion/231: ok | ok | ok |
hard-irq lock-inversion/312: ok | ok | ok |
soft-irq lock-inversion/312: ok | ok | ok |
hard-irq lock-inversion/321: ok | ok | ok |
soft-irq lock-inversion/321: ok | ok | ok |
hard-irq read-recursion/123: ok |
soft-irq read-recursion/123: ok |
hard-irq read-recursion/132: ok |
soft-irq read-recursion/132: ok |
hard-irq read-recursion/213: ok |
soft-irq read-recursion/213: ok |
hard-irq read-recursion/231: ok |
soft-irq read-recursion/231: ok |
hard-irq read-recursion/312: ok |
soft-irq read-recursion/312: ok |
hard-irq read-recursion/321: ok |
soft-irq read-recursion/321: ok |
--------------------------------+-----+----------------
Good, all 210 testcases passed! |
--------------------------------+
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits)
[PATCH] devfs: Remove it from the feature_removal.txt file
[PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
[PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
[PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
[PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed
[PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed
[PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
[PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
[PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
[PATCH] devfs: Remove devfs_remove() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
[PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree
[PATCH] devfs: Remove devfs support from the sound subsystem
[PATCH] devfs: Remove devfs support from the ide subsystem.
[PATCH] devfs: Remove devfs support from the serial subsystem
[PATCH] devfs: Remove devfs from the init code
[PATCH] devfs: Remove devfs from the partition code
...
Implementation of new kernel parameter vmpanic that provides a means to
perform a z/VM CP command after a kernel panic occurred.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the i386 VDSO down into a vma and thus randomize it.
Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.
It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).
There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO. Newer
distributions (using glibc 2.3.3 or later) can turn this option off. Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.
There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.
(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)
This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.
[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the time sources for i386 (acpi_pm, cyclone, hpet, pit, and tsc).
With this patch, the conversion of the i386 arch to the generic timekeeping
code should be complete.
The patch should be fairly straight forward, only adding the new clocksources.
[hirofumi@mail.parknet.co.jp: acpi_pm cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces the clocksource management infrastructure. A clocksource is a
driver-like architecture generic abstraction of a free-running counter. This
code defines the clocksource structure, and provides management code for
registering, selecting, accessing and scaling clocksources.
Additionally, this includes the trivial jiffies clocksource, a lowest common
denominator clocksource, provided mainly for use as an example.
[hirofumi@mail.parknet.co.jp: Don't enable IRQ too early]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Manual resolve of conflicts in:
arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
include/acpi/processor.h
Add new per-packet access controls to SELinux, replacing the old
packet controls.
Packets are labeled with the iptables SECMARK and CONNSECMARK targets,
then security policy for the packets is enforced with these controls.
To allow for a smooth transition to the new controls, the old code is
still present, but not active by default. To restore previous
behavior, the old controls may be activated at runtime by writing a
'1' to /selinux/compat_net, and also via the kernel boot parameter
selinux_compat_net. Switching between the network control models
requires the security load_policy permission. The old controls will
probably eventually be removed and any continued use is discouraged.
With this patch, the new secmark controls for SElinux are disabled by
default, so existing behavior is entirely preserved, and the user is
not affected at all.
It also provides a config option to enable the secmark controls by
default (which can always be overridden at boot and runtime). It is
also noted in the kconfig help that the user will need updated
userspace if enabling secmark controls for SELinux and that they'll
probably need the SECMARK and CONNMARK targets, and conntrack protocol
helpers, although such decisions are beyond the scope of kernel
configuration.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch somewhat diverted the train of thought.
Here I am trying to bring the valued reader back on track.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Doc/kernel-parameters.txt: mention modinfo and sysfs
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Doc/kernel-parameters.txt: delete false version information and history
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
My patch to add brief documentation of the nomca boot parameter
added it out of alphabetical order.
Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
This can sometimes be used to work around broken BIOS.
Use "Microsoft Windows" to take the same path
through the BIOS as Windows98 would.
The default is "Microsoft Windows NT", which
is what NT and later versions of Windows use,
and is the most tested path through most BIOS.
Set it to anything else, including "Linux", at your
own risk, as it seems that virtually no BIOS
has been tested with anything but the two options above.
Note that this uses the legacy _OS interface, so
we don't expect this to ever change.
Signed-off-by: Len Brown <len.brown@intel.com>
Add info on flow control for serial consoles. Refer to netconsole option
also.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several drivers are starting to grow options to disable MSI. However,
it's often a host chipset issue, not something which individual drivers
should handle. So we add the pci=nomsi kernel parameter to allow the user
to disable MSI modes for systems we haven't added to the quirk list yet.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.
If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.
The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow the x86 "sep" feature to be disabled at bootup. This forces use of the
int80 vsyscall. Mainly for testing or benchmarking the int80 vsyscall code.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ATI chipsets tend to generate double timer interrupts for the local APIC
timer when both the 8254 and the IO-APIC timer pins are enabled. This is
because they route it to both and the result is anded together and the CPU
ends up processing it twice.
This patch changes check_timer to disable the 8254 routing for interrupt 0.
I think it would be safe on all chipsets actually (i tested it on a couple
and it worked everywhere) and Windows seems to do it in a similar way, but
to be conservative this patch only enables this mode on ATI (and adds
options to enable/disable too)
Ported over from a similar x86-64 change.
I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but
tweaked it a bit to work even without ACPI.
Inspired by a patch from Chuck Ebbert, but redone.
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds new tunables for RCU queue and finished batches. There are
two types of controls - number of completed RCU updates invoked in a batch
(blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark,
qlowmark).
By default, the per-cpu batch limit is set to a small value. If the input
RCU rate exceeds the high watermark, we do two things - force quiescent
state on all cpus and set the batch limit of the CPU to INTMAX. Setting
batch limit to INTMAX forces all finished RCUs to be processed in one shot.
If we have more than INTMAX RCUs queued up, then we have bigger problems
anyway. Once the incoming queued RCUs fall below the low watermark, the
batch limit is set to the default.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.
Also I moved the variable to a more appropiate place and make
it independent from sysctl
And marked __read_mostly which it is.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Ingo Molnar <mingo@elte.hu>
This is the latest version of the scheduler cache-hot-auto-tune patch.
The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:
- I've added a 'domain distance' function, which is used to cache
measurement results. Each distance is only measured once. This means
that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
distances 0 and 1, and on SMP distance 0 is measured. The code walks
the domain tree to determine the distance, so it automatically follows
whatever hierarchy an architecture sets up. This cuts down on the boot
time significantly and removes the O(N^2) limit. The only assumption
is that migration costs can be expressed as a function of domain
distance - this covers the overwhelming majority of existing systems,
and is a good guess even for more assymetric systems.
[ People hacking systems that have assymetries that break this
assumption (e.g. different CPU speeds) should experiment a bit with
the cpu_distance() function. Adding a ->migration_distance factor to
the domain structure would be one possible solution - but lets first
see the problem systems, if they exist at all. Lets not overdesign. ]
Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:
- Instead of relying on a single cache-size provided by the platform and
sticking to it, the code now auto-detects the 'effective migration
cost' between two measured CPUs, via iterating through a wide range of
cachesizes. The code searches for the maximum migration cost, which
occurs when the working set of the test-workload falls just below the
'effective cache size'. I.e. real-life optimized search is done for
the maximum migration cost, between two real CPUs.
This, amongst other things, has the positive effect hat if e.g. two
CPUs share a L2/L3 cache, a different (and accurate) migration cost
will be found than between two CPUs on the same system that dont share
any caches.
(The reliable measurement of migration costs is tricky - see the source
for details.)
Furthermore i've added various boot-time options to override/tune
migration behavior.
Firstly, there's a blanket override for autodetection:
migration_cost=1000,2000,3000
will override the depth 0/1/2 values with 1msec/2msec/3msec values.
Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:
migration_factor=120
will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.
I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:
Dual Celeron (128K L2 cache):
---------------------
migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
---------------------
[00] [01]
[00]: - 1.7(1)
[01]: 1.7(1) -
---------------------
cacheflush times [2]: 0.0 (0) 1.7 (1784008)
---------------------
Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.
Dual HT P4 (512K L2 cache):
---------------------
migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
---------------------
[00] [01] [02] [03]
[00]: - 0.4(1) 0.0(0) 0.4(1)
[01]: 0.4(1) - 0.4(1) 0.0(0)
[02]: 0.0(0) 0.4(1) - 0.4(1)
[03]: 0.4(1) 0.0(0) 0.4(1) -
---------------------
cacheflush times [2]: 0.0 (33900) 0.4 (448514)
---------------------
Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
8-way P3/Xeon [2MB L2 cache]:
---------------------
migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1)
[05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1)
[06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1)
[07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) -
---------------------
cacheflush times [2]: 0.0 (0) 19.2 (19281756)
---------------------
This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- elfcorehdr= specifies the location of elf core header stored by the
crashed kernel. This command line option will be passed by the kexec-tools
to capture kernel.
Changes in this version :
- Added more comments in kernel-parameters.txt and in code.
Signed-off-by: Murali M Chakravarthy <muralim@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>