adulau/aha - Forgejo: adulau git carryall

adulau/aha

mirror of https://github.com/adulau/aha.git synced 2025-01-05 15:43:22 +00:00

Author	SHA1	Message	Date
Sheng Yang	756975bbfd	KVM: Fix apic_mmio_write return for unaligned write Some in-famous OS do unaligned writing for APIC MMIO, and the return value has been missed in recent change, then the OS hangs. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:08 +03:00
Gleb Natapov	0105d1a526	KVM: x2apic interface to lapic This patch implements MSR interface to local apic as defines by x2apic Intel specification. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:08 +03:00
Gleb Natapov	fc61b800f9	KVM: Add Directed EOI support to APIC emulation Directed EOI is specified by x2APIC, but is available even when lapic is in xAPIC mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:07 +03:00
Avi Kivity	cb24772140	KVM: Trace apic registers using their symbolic names Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:07 +03:00
Avi Kivity	aec51dc4f1	KVM: Trace mmio Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:07 +03:00
Andre Przywara	c323c0e5f0	KVM: Ignore PCI ECS I/O enablement Linux guests will try to enable access to the extended PCI config space via the I/O ports 0xCF8/0xCFC on AMD Fam10h CPU. Since we (currently?) don't use ECS, simply ignore write and read attempts. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:06 +03:00
Michael S. Tsirkin	bda9020e24	KVM: remove in_range from io devices This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write functions. in_range now becomes unused so it is removed from device ops in favor of read/write callbacks performing range checks internally. This allows aliasing (mostly for in-kernel virtio), as well as better error handling by making it possible to pass errors up to userspace. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:05 +03:00
Michael S. Tsirkin	6c47469453	KVM: convert bus to slots_lock Use slots_lock to protect device list on the bus. slots_lock is already taken for read everywhere, so we only need to take it for write when registering devices. This is in preparation to removing in_range and kvm->lock around it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:05 +03:00
Michael S. Tsirkin	108b56690f	KVM: switch pit creation to slots_lock switch pit creation to slots_lock. slots_lock is already taken for read everywhere, so we only need to take it for write when creating pit. This is in preparation to removing in_range and kvm->lock around it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:05 +03:00
Marcelo Tosatti	2023a29cbe	KVM: remove old KVMTRACE support code Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Andre Przywara	ed85c06853	KVM: introduce module parameter for ignoring unknown MSRs accesses KVM will inject a #GP into the guest if that tries to access unhandled MSRs. This will crash many guests. Although it would be the correct way to actually handle these MSRs, we introduce a runtime switchable module param called "ignore_msrs" (defaults to 0). If this is Y, unknown MSR reads will return 0, while MSR writes are simply dropped. In both cases we print a message to dmesg to inform the user about that. You can change the behaviour at any time by saying: # echo 1 > /sys/modules/kvm/parameters/ignore_msrs Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Andre Przywara	1fdbd48c24	KVM: ignore reads from AMDs C1E enabled MSR If the Linux kernel detects an C1E capable AMD processor (K8 RevF and higher), it will access a certain MSR on every attempt to go to halt. Explicitly handle this read and return 0 to let KVM run a Linux guest with the native AMD host CPU propagated to the guest. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Andre Przywara	8f1589d95e	KVM: ignore AMDs HWCR register access to set the FFDIS bit Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM does not handle the needed MSR, the injected #GP will panic the Linux kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux boot with an AMD K8 family guest CPU. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Marcelo Tosatti	894a9c5543	KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths Correct missing locking in a few places in x86's vm_ioctl handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Joerg Roedel	ec04b2604c	KVM: Prepare memslot data structures for multiple hugepage sizes [avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Andre Przywara	4668f05078	KVM: x86 emulator: Add sysexit emulation Handle #UD intercept of the sysexit instruction in 64bit mode returning to 32bit compat mode on an AMD host. Setup the segment descriptors for CS and SS and the EIP/ESP registers according to the manual. Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:01 +03:00
Andre Przywara	8c60435261	KVM: x86 emulator: Add sysenter emulation Handle #UD intercept of the sysenter instruction in 32bit compat mode on an AMD host. Setup the segment descriptors for CS and SS and the EIP/ESP registers according to the manual. Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:01 +03:00
Andre Przywara	e66bb2ccdc	KVM: x86 emulator: add syscall emulation Handle #UD intercept of the syscall instruction in 32bit compat mode on an Intel host. Setup the segment descriptors for CS and SS and the EIP/ESP registers according to the manual. Save the RIP and EFLAGS to the correct registers. [avi: fix build on i386 due to missing R11] Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:00 +03:00
Andre Przywara	e99f050712	KVM: x86 emulator: Prepare for emulation of syscall instructions Add the flags needed for syscall, sysenter and sysexit to the opcode table. Catch (but for now ignore) the opcodes in the emulation switch/case. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:00 +03:00
Andre Przywara	b1d861431e	KVM: x86 emulator: Add missing EFLAGS bit definitions Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:00 +03:00
Andre Przywara	0cb5762ed2	KVM: Allow emulation of syscalls instructions on #UD Add the opcodes for syscall, sysenter and sysexit to the list of instructions handled by the undefined opcode handler. Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:59 +03:00
Marcelo Tosatti	229456fc34	KVM: convert custom marker based tracing to event traces This allows use of the powerful ftrace infrastructure. See Documentation/trace/ for usage information. [avi, stephen: various build fixes] [sheng: fix control register breakage] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:59 +03:00
Alexander Graf	219b65dcf6	KVM: SVM: Improve nested interrupt injection While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:59 +03:00
Alexander Graf	ff092385e8	KVM: SVM: Implement INVLPGA SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Alexander Graf	3c5d0a44b0	KVM: Implement MSRs used by Hyper-V Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. [jaswinder: fix build for linux-next changes] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Alexander Graf	0367b4330e	x86: Add definition for IGNNE MSR Hyper-V accesses MSR_IGNNE while running under KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Avi Kivity	b3dbf89e67	KVM: SVM: Don't save/restore host cr2 The host never reads cr2 in process context, so are free to clobber it. The vmx code does this, so we can safely remove the save/restore code. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:58 +03:00
Avi Kivity	d3edefc003	KVM: VMX: Only reload guest cr2 if different from host cr2 cr2 changes only rarely, and writing it is expensive. Avoid the costly cr2 writes by checking if it does not already hold the desired value. Shaves 70 cycles off the vmexit latency. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:57 +03:00
Jan Kiszka	681405bfc7	KVM: Drop useless atomic test from timer function The current code tries to optimize the setting of KVM_REQ_PENDING_TIMER but used atomic_inc_and_test - which always returns true unless pending had the invalid value of -1 on entry. This patch drops the test part preserving the original semantic but expressing it less confusingly. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:57 +03:00
Jan Kiszka	f7104db26a	KVM: Fix racy event propagation in timer Minor issue that likely had no practical relevance: the kvm timer function so far incremented the pending counter and then may reset it again to 1 in case reinjection was disabled. This opened a small racy window with the corresponding VCPU loop that may have happened to run on another (real) CPU and already consumed the value. Fix it by skipping the incrementation in case pending is already > 0. This opens a different race windows, but may only rarely cause lost events in case we do not care about them anyway (!reinject). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:57 +03:00
Gleb Natapov	33e4c68656	KVM: Optimize searching for highest IRR Most of the time IRR is empty, so instead of scanning the whole IRR on each VM entry keep a variable that tells us if IRR is not empty. IRR will have to be scanned twice on each IRQ delivery, but this is much more rare than VM entry. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:57 +03:00
Gleb Natapov	6edf14d8d0	KVM: Replace pending exception by PF if it happens serially Replace previous exception with a new one in a hope that instruction re-execution will regenerate lost exception. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:56 +03:00
Marcelo Tosatti	54dee9933e	KVM: VMX: conditionally disable 2M pages Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled. [avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:56 +03:00
Marcelo Tosatti	68f89400bc	KVM: VMX: EPT misconfiguration handler Handler for EPT misconfiguration which checks for valid state in the shadow pagetables, printing the spte on each level. The separate WARN_ONs are useful for kerneloops.org. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:56 +03:00
Marcelo Tosatti	94d8b056a2	KVM: MMU: add kvm_mmu_get_spte_hierarchy helper Required by EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:56 +03:00
Marcelo Tosatti	4d88954d62	KVM: MMU: make for_each_shadow_entry aware of largepages This way there is no need to add explicit checks in every for_each_shadow_entry user. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:55 +03:00
Marcelo Tosatti	e799794e02	KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits Required for EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:55 +03:00
Andre Przywara	71db602322	KVM: Move performance counter MSR access interception to generic x86 path The performance counter MSRs are different for AMD and Intel CPUs and they are chosen mainly by the CPUID vendor string. This patch catches writes to all addresses (regardless of VMX/SVM path) and handles them in the generic MSR handler routine. Writing a 0 into the event select register is something we perfectly emulate ;-), so don't print out a warning to dmesg in this case. This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:54 +03:00
Marcelo Tosatti	2920d72857	KVM: MMU audit: largepage handling Make the audit code aware of largepages. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:54 +03:00
Marcelo Tosatti	2aaf65e8c4	KVM: MMU audit: audit_mappings tweaks - Fail early in case gfn_to_pfn returns is_error_pfn. - For the pre pte write case, avoid spurious "gva is valid but spte is notrap" messages (the emulation code does the guest write first, so this particular case is OK). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:54 +03:00
Marcelo Tosatti	48fc03174b	KVM: MMU audit: nontrapping ptes in nonleaf level It is valid to set non leaf sptes as notrap. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:54 +03:00
Marcelo Tosatti	e58b0f9e0e	KVM: MMU audit: update audit_write_protection - Unsync pages contain writable sptes in the rmap. - rmaps do not exclusively contain writable sptes anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:53 +03:00
Marcelo Tosatti	08a3732bf2	KVM: MMU audit: update count_writable_mappings / count_rmaps Under testing, count_writable_mappings returns a value that is 2 integers larger than what count_rmaps returns. Suspicion is that either of the two functions is counting a duplicate (either positively or negatively). Modifying check_writable_mappings_rmap to check for rmap existance on all present MMU pages fails to trigger an error, which should keep Avi happy. Also introduce mmu_spte_walk to invoke a callback on all present sptes visible to the current vcpu, might be useful in the future. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:53 +03:00
Marcelo Tosatti	776e663336	KVM: MMU: introduce is_last_spte helper Hiding some of the last largepage / level interaction (which is useful for gbpages and for zero based levels). Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:53 +03:00
Avi Kivity	3f5d18a965	KVM: Return to userspace on emulation failure Instead of mindlessly retrying to execute the instruction, report the failure to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Gleb Natapov	988a2cae6a	KVM: Use macro to iterate over vcpus. [christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Gleb Natapov	73880c80aa	KVM: Break dependency between vcpu index in vcpus array and vcpu_id. Archs are free to use vcpu_id as they see fit. For x86 it is used as vcpu's apic id. New ioctl is added to configure boot vcpu id that was assumed to be 0 till now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Gleb Natapov	1ed0ce000a	KVM: Use pointer to vcpu instead of vcpu_id in timer code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Gleb Natapov	c5af89b68a	KVM: Introduce kvm_vcpu_is_bsp() function. Use it instead of open code "vcpu_id zero is BSP" assumption. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:51 +03:00
Avi Kivity	d555c333aa	KVM: MMU: s/shadow_pte/spte/ We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:51 +03:00
Avi Kivity	43a3795a3a	KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:51 +03:00
Avi Kivity	439e218a6f	KVM: MMU: Fix is_dirty_pte() is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid shadow_dirty_mask and use PT_DIRTY_MASK instead. Misdetecting dirty pages could lead to unnecessarily setting the dirty bit under EPT. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:50 +03:00
Avi Kivity	7ffd92c53c	KVM: VMX: Move rmode structure to vmx-specific code rmode is only used in vmx, so move it to vmx.c Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:50 +03:00
Nitin A Kamble	3a624e29c7	KVM: VMX: Support Unrestricted Guest feature "Unrestricted Guest" feature is added in the VMX specification. Intel Westmere and onwards processors will support this feature. It allows kvm guests to run real mode and unpaged mode code natively in the VMX mode when EPT is turned on. With the unrestricted guest there is no need to emulate the guest real mode code in the vm86 container or in the emulator. Also the guest big real mode code works like native. The attached patch enhances KVM to use the unrestricted guest feature if available on the processor. It also adds a new kernel/module parameter to disable the unrestricted guest feature at the boot time. Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:49 +03:00
Marcelo Tosatti	fa40a8214b	KVM: switch irq injection/acking data structures to irq_lock Protect irq injection/acking data structures with a separate irq_lock mutex. This fixes the following deadlock: CPU A CPU B kvm_vm_ioctl_deassign_dev_irq() mutex_lock(&kvm->lock); worker_thread() -> kvm_deassign_irq() -> kvm_assigned_dev_interrupt_work_handler() -> deassign_host_irq() mutex_lock(&kvm->lock); -> cancel_work_sync() [blocked] [gleb: fix ia64 path] Reported-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:49 +03:00
Marcelo Tosatti	9f4cc12765	KVM: Grab pic lock in kvm_pic_clear_isr_ack isr_ack is protected by kvm_pic->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:48 +03:00
Jan Kiszka	238adc7705	KVM: Cleanup LAPIC interface None of the interface services the LAPIC emulation provides need to be exported to modules, and kvm_lapic_get_base is even totally unused today. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:48 +03:00
Avi Kivity	596ae89565	KVM: VMX: Fix reporting of unhandled EPT violations Instead of returning -ENOTSUPP, exit normally but indicate the hardware exit reason. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:46 +03:00
Avi Kivity	6de4f3ada4	KVM: Cache pdptrs Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:46 +03:00
Avi Kivity	8f5d549f02	KVM: VMX: Simplify pdptr and cr3 management Instead of reading the PDPTRs from memory after every exit (which is slow and wrong, as the PDPTRs are stored on the cpu), sync the PDPTRs from memory to the VMCS before entry, and from the VMCS to memory after exit. Do the same for cr3. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:46 +03:00
Avi Kivity	2d84e993a8	KVM: VMX: Avoid duplicate ept tlb flush when setting cr3 vmx_set_cr3() will call vmx_tlb_flush(), which will flush the ept context. So there is no need to call ept_sync_context() explicitly. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:46 +03:00
Gregory Haskins	6b66ac1ae3	KVM: do not register i8254 PIO regions until we are initialized We currently publish the i8254 resources to the pio_bus before the devices are fully initialized. Since we hold the pit_lock, its probably not a real issue. But lets clean this up anyway. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:45 +03:00
Gregory Haskins	d76685c4a0	KVM: cleanup io_device code We modernize the io_device code so that we use container_of() instead of dev->private, and move the vtable to a separate ops structure (theoretically allows better caching for multiple instances of the same ops structure) Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:45 +03:00
Avi Kivity	6c8166a77c	KVM: SVM: Fold kvm_svm.h info svm.c kvm_svm.h is only included from svm.c, so fold it in. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:44 +03:00
Andre Przywara	017cb99e87	KVM: SVM: use explicit 64bit storage for sysenter values Since AMD does not support sysenter in 64bit mode, the VMCB fields storing the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values in a separate 64bit storage to avoid truncation. [andre: fix amd->amd migration] Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:43 +03:00
Jan Kiszka	c5ff41ce66	KVM: Allow PIT emulation without speaker port The in-kernel speaker emulation is only a dummy and also unneeded from the performance point of view. Rather, it takes user space support to generate sound output on the host, e.g. console beeps. To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel speaker port emulation via a flag passed along the new IOCTL. It also leaves room for future extensions of the PIT configuration interface. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:41 +03:00
Gregory Haskins	721eecbf4f	KVM: irqfd KVM provides a complete virtual system environment for guests, including support for injecting interrupts modeled after the real exception/interrupt facilities present on the native platform (such as the IDT on x86). Virtual interrupts can come from a variety of sources (emulated devices, pass-through devices, etc) but all must be injected to the guest via the KVM infrastructure. This patch adds a new mechanism to inject a specific interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal on the irqfd (using eventfd semantics from either userspace or kernel) will translate into an injected interrupt in the guest at the next available interrupt window. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:41 +03:00
Avi Kivity	0ba12d1081	KVM: Move common KVM Kconfig items to new file virt/kvm/Kconfig Reduce Kconfig code duplication. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:41 +03:00
Gleb Natapov	787ff73637	KVM: Drop interrupt shadow when single stepping should be done only on VMX The problem exists only on VMX. Also currently we skip this step if there is pending exception. The patch fixes this too. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:41 +03:00
Christoph Hellwig	284e9b0f5a	KVM: cleanup arch/x86/kvm/Makefile Use proper foo-y style list additions to cleanup all the conditionals, move module selection after compound object selection and remove the superflous comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:40 +03:00
Avi Kivity	ee3d29e8be	KVM: x86 emulator: fix jmp far decoding (opcode 0xea) The jump target should not be sign extened; use an unsigned decode flag. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:40 +03:00
Avi Kivity	c9eaf20f26	KVM: x86 emulator: Implement zero-extended immediate decoding Absolute jumps use zero extended immediate operands. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:39 +03:00
Mark McLoughlin	cb007648de	KVM: fix cpuid E2BIG handling for extended request types If we run out of cpuid entries for extended request types we should return -E2BIG, just like we do for the standard request types. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput	60af2ecdc5	KVM: Use MSR names in place of address Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:39 +03:00
Huang Ying	890ca9aefa	KVM: Add MCE support The related MSRs are emulated. MCE capability is exported via extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation such as the mcg_cap. MCE is injected via vcpu ioctl command KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are not implemented. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput	af24a4e4ae	KVM: Replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h Use standard msr-index.h's MSR declaration. MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves 80 column issue. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:38 +03:00
Gleb Natapov	ae0bb3e011	KVM: VMX: Properly handle software interrupt re-injection in real mode When reinjecting a software interrupt or exception, use the correct instruction length provided by the hardware instead of a hardcoded 1. Fixes problems running the suse 9.1 livecd boot loader. Problem introduced by commit f0a3602c20 ("KVM: Move interrupt injection logic to x86.c"). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:38 +03:00
Yang Xiaowei	2496afbf1e	xen: use stronger barrier after unlocking lock We need to have a stronger barrier between releasing the lock and checking for any waiting spinners. A compiler barrier is not sufficient because the CPU's ordering rules do not prevent the read xl->spinners from happening before the unlock assignment, as they are different memory locations. We need to have an explicit barrier to enforce the write-read ordering to different memory locations. Because of it, I can't bring up > 4 HVM guests on one SMP machine. [ Code and commit comments expanded -J ] [ Impact: avoid deadlock when using Xen PV spinlocks ] Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-09 16:38:44 -07:00
Jeremy Fitzhardinge	4d576b57b5	xen: only enable interrupts while actually blocking for spinlock Where possible we enable interrupts while waiting for a spinlock to become free, in order to reduce big latency spikes in interrupt handling. However, at present if we manage to pick up the spinlock just before blocking, we'll end up holding the lock with interrupts enabled for a while. This will cause a deadlock if we recieve an interrupt in that window, and the interrupt handler tries to take the lock too. Solve this by shrinking the interrupt-enabled region to just around the blocking call. [ Impact: avoid race/deadlock when using Xen PV spinlocks ] Reported-by: "Yang, Xiaowei" <xiaowei.yang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-09 16:38:11 -07:00
Jeremy Fitzhardinge	577eebeae3	xen: make -fstack-protector work under Xen -fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-09 16:37:39 -07:00
Rafael J. Wysocki	bf992fa2bc	Merge branch 'master' into for-linus	2009-09-10 00:02:02 +02:00
Jiri Slaby	748df9a4c6	x86/PCI: pci quirks, fix pci refcounting Stanse found a pci reference leak in quirk_amd_nb_node. Instead of putting nb_ht, there is a put of dev passed as an argument. http://stanse.fi.muni.cz/ Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-09-09 14:11:02 -07:00
Jack Steiner	fa526d0d64	x86, pat: Fix cacheflush address in change_page_attr_set_clr() Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Stable team <stable@kernel.org>	2009-09-09 14:05:24 -07:00
Alex Williamson	80286879c2	PCI iommu: iommu=pt is a valid early param This avoids a "Malformed early option 'iommu'" on boot when trying to use pass-through mode. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-09-09 13:29:28 -07:00
Jesse Barnes	2547089ca2	x86/PCI: initialize PCI bus node numbers early The current mp_bus_to_node array is initialized only by AMD specific code, since AMD platforms have registers that can be used for determining mode numbers. On new Intel platforms it's necessary to initialize this array as well though, otherwise all PCI node numbers will be 0, when in fact they should be -1 (indicating that I/O isn't tied to any particular node). So move the mp_bus_to_node code into the common PCI code, and initialize it early with a default value of -1. This may be overridden later by arch code (e.g. the AMD code). With this change, PCI consistent memory and other node specific allocations (e.g. skbuff allocs) should occur on the "current" node. If, for performance reasons, applications want to be bound to specific nodes, they should open their devices only after being pinned to the CPU where they'll run, for maximum locality. Acked-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-09-09 13:29:21 -07:00
Alex Chiang	a7db504052	PCI: remove pcibios_scan_all_fns() This was #define'd as 0 on all platforms, so let's get rid of it. This change makes pci_scan_slot() slightly easier to read. Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Tony Luck <tony.luck@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Russell King <linux@arm.linux.org.uk> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-09-09 13:29:18 -07:00
Tejun Heo	3e5cd1f257	dmi: extend dmi_get_year() to dmi_get_date() There are cases where full date information is required instead of just the year. Add month and day parsing to dmi_get_year() and rename it to dmi_get_date(). As the original function only required '/' followed by any number of parseable characters at the end of the string, keep that behavior to avoid upsetting existing users. The new function takes dates of format [mm[/dd]]/yy[yy]. Year, month and date are checked to be in the ranges of [1-9999], [1-12] and [1-31] respectively and any invalid or out-of-range component is returned as zero. The dummy implementation is updated accordingly but the return value is updated to indicate field not found which is consistent with how other dummy functions behave. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-09-08 21:17:48 -04:00
Peter Zijlstra	a8fae3ec5f	sched: enable SD_WAKE_IDLE Now that SD_WAKE_IDLE doesn't make pipe-test suck anymore, enable it by default for MC, CPU and NUMA domains. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-07 22:00:17 +02:00
Tobias Klauser	d535e4319a	x86: Make memtype_seq_ops const Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-06 06:33:33 +02:00
Rafael J. Wysocki	23b6c52cf5	x86: Decrease the level of some NUMA messages to KERN_DEBUG Some NUMA messages in srat_32.c are confusing to users, because they seem to indicate errors, while in fact they reflect normal behaviour. Decrease the level of these messages to KERN_DEBUG so that they don't show up unnecessarily. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <200909050107.45175.rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-06 06:32:23 +02:00
Ingo Molnar	ed011b22ce	Merge commit 'v2.6.31-rc9' into tracing/core Merge reason: move from -rc5 to -rc9. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-06 06:11:42 +02:00
H. Peter Anvin	b19ae39998	x86, msr: change msr-reg.o to obj-y, and export its symbols Change msr-reg.o to obj-y (it will be included in virtually every kernel since it is used by the initialization code for AMD processors) and add a separate C file to export its symbols to modules, so that msr.ko can use them; on uniprocessors we bypass the helper functions in msr.o and use the accessor functions directly via inlines. Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <20090904140834.GA15789@elte.hu> Cc: Borislav Petkov <petkovbb@googlemail.com>	2009-09-04 10:00:09 -07:00
Pekka Enberg	8e019366ba	kmemleak: Don't scan uninitialized memory when kmemcheck is enabled Ingo Molnar reported the following kmemcheck warning when running both kmemleak and kmemcheck enabled: PM: Adding info for No Bus:vcsa7 WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f6f6e1a4) d873f9f600000000c42ae4c1005c87f70000000070665f666978656400000000 i i i i u u u u i i i i i i i i i i i i i i i i i i i i i u u u ^ Pid: 3091, comm: kmemleak Not tainted (2.6.31-rc7-tip #1303) P4DC6 EIP: 0060:[<c110301f>] EFLAGS: 00010006 CPU: 0 EIP is at scan_block+0x3f/0xe0 EAX: f40bd700 EBX: f40bd780 ECX: f16b46c0 EDX: 00000001 ESI: f6f6e1a4 EDI: 00000000 EBP: f10f3f4c ESP: c2605fcc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: e89a4844 CR3: 30ff1000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff4ff0 DR7: 00000400 [<c110313c>] scan_object+0x7c/0xf0 [<c1103389>] kmemleak_scan+0x1d9/0x400 [<c1103a3c>] kmemleak_scan_thread+0x4c/0xb0 [<c10819d4>] kthread+0x74/0x80 [<c10257db>] kernel_thread_helper+0x7/0x3c [<ffffffff>] 0xffffffff kmemleak: 515 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 42 new suspected memory leaks (see /sys/kernel/debug/kmemleak) The problem here is that kmemleak will scan partially initialized objects that makes kmemcheck complain. Fix that up by skipping uninitialized memory regions when kmemcheck is enabled. Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-09-04 16:05:55 +01:00
Ingo Molnar	695a461296	Merge branch 'amd-iommu/2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu	2009-09-04 14:44:16 +02:00
Ingo Molnar	840a065310	sched: Turn on SD_BALANCE_NEWIDLE Start the re-tuning of the balancer by turning on newidle. It improves hackbench performance and parallelism on a 4x4 box. The "perf stat --repeat 10" measurements give us: domain0 domain1 ....................................... -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE: 2041.273208 task-clock-msecs # 9.354 CPUs ( +- 0.363% ) +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE: 2086.326925 task-clock-msecs # 11.934 CPUs ( +- 0.301% ) +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE: 2115.289791 task-clock-msecs # 12.158 CPUs ( +- 0.263% ) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 11:52:54 +02:00
Ingo Molnar	47734f89be	sched: Clean up topology.h Re-organize the flag settings so that it's visible at a glance which sched-domains flags are set and which not. With the new balancer code we'll need to re-tune these details anyway, so make it cleaner to make fewer mistakes down the road ;-) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 11:52:53 +02:00
Yinghai Lu	0d96b9ff74	x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus Otherwise, system with apci id lifting will have wrong apicid in /proc/cpuinfo. and use that in srat_detect_node(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <4A998CCA.1040407@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:55:29 +02:00
markus.t.metzger@intel.com	1653192f51	x86, perf_counter, bts: Do not allow kernel BTS tracing for now Kernel BTS tracing generates too much data too fast for us to handle, causing the kernel to hang. Fail for BTS requests for kernel code. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zjilstra@chello.nl> LKML-Reference: <20090902140616.901253000@intel.com> [ This is really a workaround - but we want BTS tracing in .32 so make sure we dont regress. The lockup should be fixed ASAP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:26:40 +02:00
markus.t.metzger@intel.com	596da17f94	x86, perf_counter, bts: Correct pointer-to-u64 casts On 32bit, pointers in the DS AREA configuration are cast to u64. The current (long) cast to avoid compiler warnings results in a signed 64bit address. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090902140615.305889000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:26:39 +02:00
markus.t.metzger@intel.com	747b50aaf7	x86, perf_counter, bts: Fail if BTS is not available Reserve PERF_COUNT_HW_BRANCH_INSTRUCTIONS with sample_period == 1 for BTS tracing and fail, if BTS is not available. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090902140612.943801000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:26:39 +02:00
Jeremy Fitzhardinge	53f824520b	x86/i386: Put aligned stack-canary in percpu shared_aligned section Pack aligned things together into a special section to minimize padding holes. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <4AA035C0.9070202@goop.org> [ queued up in tip:x86/asm because it depends on this commit: x86/i386: Make sure stack-protector segment base is cache aligned ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 07:10:31 +02:00
Andreas Herrmann	5a925b4282	x86, sched: Workaround broken sched domain creation for AMD Magny-Cours Current sched domain creation code can't handle multi-node processors. When switching to power_savings scheduling errors show up and system might hang later on (due to broken sched domain hierarchy): # echo 0 >> /sys/devices/system/cpu/sched_mc_power_savings CPU0 attaching sched-domain: domain 0: span 0-5 level MC groups: 0 1 2 3 4 5 domain 1: span 0-23 level NODE groups: 0-5 6-11 18-23 12-17 ... # echo 1 >> /sys/devices/system/cpu/sched_mc_power_savings CPU0 attaching sched-domain: domain 0: span 0-11 level MC groups: 0 1 2 3 4 5 6 7 8 9 10 11 ERROR: parent span is not a superset of domain->span domain 1: span 0-5 level CPU ERROR: domain->groups does not contain CPU0 groups: 6-11 (__cpu_power = 12288) ERROR: groups don't span domain->span domain 2: span 0-23 level NODE groups: ERROR: domain->cpu_power not set ERROR: groups don't span domain->span ... Fixing all aspects of power-savings scheduling for Magny-Cours needs some larger changes in the sched domain creation code. As a short-term and temporary workaround avoid the problems by extending "the worst possible hack" ;-( and always use llc_shared_map on AMD Magny-Cours when MC domain span is calculated. With this I get: # echo 1 >> /sys/devices/system/cpu/sched_mc_power_savings CPU0 attaching sched-domain: domain 0: span 0-5 level MC groups: 0 1 2 3 4 5 domain 1: span 0-5 level CPU groups: 0-5 (__cpu_power = 6144) domain 2: span 0-23 level NODE groups: 0-5 (__cpu_power = 6144) 6-11 (__cpu_power = 6144) 18-23 (__cpu_power = 6144) 12-17 (__cpu_power = 6144) ... I.e. no errors during sched domain creation, no system hangs, and also mc_power_savings scheduling works to a certain extend. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:10:14 -07:00
Andreas Herrmann	cb9805ab5b	x86, mcheck: Use correct cpumask for shared bank4 This fixes threshold_bank4 support on multi-node processors. The correct mask to use is llc_shared_map, representing an internal node on Magny-Cours. We need to create 2 sets of symlinks for sibling shared banks -- one set for each internal node, symlinks of each set should target the first core on same internal node. Currently only one set is created where all symlinks are targeting the first core of the entire socket. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:10:08 -07:00
Andreas Herrmann	a326e948c5	x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors L3 cache size, associativity and shared_cpu information need to be adapted to show information for an internal node instead of the entire physical package. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:10:03 -07:00
Andreas Herrmann	4a376ec3a2	x86: Fix CPU llc_shared_map information for AMD Magny-Cours Construct entire NodeID and use it as cpu_llc_id. Thus internal node siblings are stored in llc_shared_map. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:09:59 -07:00
Jeremy Fitzhardinge	1ea0d14e48	x86/i386: Make sure stack-protector segment base is cache aligned The Intel Optimization Reference Guide says: In Intel Atom microarchitecture, the address generation unit assumes that the segment base will be 0 by default. Non-zero segment base will cause load and store operations to experience a delay. - If the segment base isn't aligned to a cache line boundary, the max throughput of memory operations is reduced to one [e]very 9 cycles. [...] Assembly/Compiler Coding Rule 15. (H impact, ML generality) For Intel Atom processors, use segments with base set to 0 whenever possible; avoid non-zero segment base address that is not aligned to cache line boundary at all cost. We can't avoid having a non-zero base for the stack-protector segment, but we can make it cache-aligned. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: <stable@kernel.org> LKML-Reference: <4AA01893.6000507@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-03 21:30:51 +02:00
Ingo Molnar	8adf65cfae	x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too The macro was defined in the 32-bit path as well - breaking the build on 32-bit platforms: arch/x86/lib/msr-reg.S: Assembler messages: arch/x86/lib/msr-reg.S:53: Error: Bad macro parameter list arch/x86/lib/msr-reg.S💯 Error: invalid character '_' in mnemonic arch/x86/lib/msr-reg.S:101: Error: invalid character '_' in mnemonic Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <tip-f6909f394c2d4a0a71320797df72d54c49c5927e@git.kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-03 21:26:34 +02:00
Joerg Roedel	2b681fafcc	Merge branch 'amd-iommu/pagetable' into amd-iommu/2.6.32 Conflicts: arch/x86/kernel/amd_iommu.c	2009-09-03 17:14:57 +02:00
Joerg Roedel	03362a05c5	Merge branch 'amd-iommu/passthrough' into amd-iommu/2.6.32 Conflicts: arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c	2009-09-03 16:34:23 +02:00
Joerg Roedel	85da07c409	Merge branches 'gart/fixes', 'amd-iommu/fixes+cleanups' and 'amd-iommu/fault-handling' into amd-iommu/2.6.32	2009-09-03 16:32:00 +02:00
Pavel Vasilyev	6ac162d6c0	x86/gart: Do not select AGP for GART_IOMMU There is no dependency from the gart code to the agp code. And since a lot of systems today do not have agp anymore remove this dependency from the kernel configuration. Signed-off-by: Pavel Vasilyev <pavel@pavlinux.ru> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:20:55 +02:00
Joerg Roedel	4751a95134	x86/amd-iommu: Initialize passthrough mode when requested This patch enables the passthrough mode for AMD IOMMU by running the initialization function when iommu=pt is passed on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:46 +02:00
Joerg Roedel	a1ca331c8a	x86/amd-iommu: Don't detach device from pt domain on driver unbind This patch makes sure a device is not detached from the passthrough domain when the device driver is unloaded or does otherwise release the device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:46 +02:00
Joerg Roedel	21129f786f	x86/amd-iommu: Make sure a device is assigned in passthrough mode When the IOMMU driver runs in passthrough mode it has to make sure that every device not assigned to an IOMMU-API domain must be put into the passthrough domain instead of keeping it unassigned. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:45 +02:00
Joerg Roedel	eba6ac60ba	x86/amd-iommu: Align locking between attach_device and detach_device This patch makes the locking behavior between the functions attach_device and __attach_device consistent with the locking behavior between detach_device and __detach_device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:44 +02:00
Joerg Roedel	aa879fff5d	x86/amd-iommu: Fix device table write order The V bit of the device table entry has to be set after the rest of the entry is written to not confuse the hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:43 +02:00
Joerg Roedel	0feae533dd	x86/amd-iommu: Add passthrough mode initialization functions When iommu=pt is passed on kernel command line the devices should run untranslated. This requires the allocation of a special domain for that purpose. This patch implements the allocation and initialization path for iommu=pt. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:42 +02:00
Joerg Roedel	2650815fb0	x86/amd-iommu: Add core functions for pd allocation/freeing This patch factors some code of protection domain allocation into seperate functions. This way the logic can be used to allocate the passthrough domain later. As a side effect this patch fixes an unlikely domain id leakage bug. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:34 +02:00
Joerg Roedel	ac0101d396	x86/dma: Mark iommu_pass_through as __read_mostly This variable is read most of the time. This patch marks it as such. It also documents the meaning the this variable while at it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:13:46 +02:00
Joerg Roedel	abdc5eb3d6	x86/amd-iommu: Change iommu_map_page to support multiple page sizes This patch adds a map_size parameter to the iommu_map_page function which makes it generic enough to handle multiple page sizes. This also requires a change to alloc_pte which is also done in this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:11:17 +02:00
Joerg Roedel	a6b256b413	x86/amd-iommu: Support higher level PTEs in iommu_page_unmap This patch changes fetch_pte and iommu_page_unmap to support different page sizes too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:11:08 +02:00
Joerg Roedel	674d798a80	x86/amd-iommu: Remove old page table handling macros These macros are not longer required. So remove them. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:49 +02:00
Joerg Roedel	8f7a017ce0	x86/amd-iommu: Use 2-level page tables for dma_ops domains The driver now supports a dynamic number of levels for IO page tables. This allows to reduce the number of levels for dma_ops domains by one because a dma_ops domain has usually an address space size between 128MB and 4G. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:49 +02:00
Joerg Roedel	bad1cac28a	x86/amd-iommu: Remove bus_addr check in iommu_map_page The driver now supports full 64 bit device address spaces. So this check is not longer required. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:48 +02:00
Joerg Roedel	8c8c143cdc	x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX This change allows to remove these old macros later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:47 +02:00
Joerg Roedel	8bc3e12742	x86/amd-iommu: Change alloc_pte to support 64 bit address space This patch changes the alloc_pte function to be able to map pages into the whole 64 bit address space supported by AMD IOMMU hardware from the old limit of 2**39 bytes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:46 +02:00
Joerg Roedel	50020fb632	x86/amd-iommu: Introduce increase_address_space function This function will be used to increase the address space size of a protection domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:46 +02:00
Joerg Roedel	04bfdd8406	x86/amd-iommu: Flush domains if address space size was increased Thist patch introduces the update_domain function which propagates the larger address space of a protection domain to the device table and flushes all relevant DTEs and the domain TLB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:45 +02:00
Joerg Roedel	407d733e30	x86/amd-iommu: Introduce set_dte_entry function This function factors out some logic of attach_device to a seperate function. This new function will be used to update device table entries when necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:44 +02:00
Joerg Roedel	6a0dbcbe4e	x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices This patch adds a generic variant of amd_iommu_flush_all_devices function which flushes only the DTEs for a given protection domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:43 +02:00
Joerg Roedel	a6d41a4027	x86/amd-iommu: Use fetch_pte in amd_iommu_iova_to_phys Don't reimplement the page table walker in this function. Use the generic one. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:42 +02:00
Joerg Roedel	38a76eeeaf	x86/amd-iommu: Use fetch_pte in iommu_unmap_page Instead of reimplementing existing logic use fetch_pte to walk the page table in iommu_unmap_page. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:41 +02:00
Joerg Roedel	9355a08186	x86/amd-iommu: Make fetch_pte aware of dynamic mapping levels This patch changes the fetch_pte function in the AMD IOMMU driver to support dynamic mapping levels. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:34 +02:00
Joerg Roedel	6a1eddd2f9	x86/amd-iommu: Reset command buffer if wait loop fails Instead of a panic on an comletion wait loop failure, try to recover from that event from resetting the command buffer. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:56:19 +02:00
Joerg Roedel	b26e81b871	x86/amd-iommu: Panic if IOMMU command buffer reset fails To prevent the driver from doing recursive command buffer resets, just panic when that recursion happens. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:35 +02:00
Joerg Roedel	a345b23b79	x86/amd-iommu: Reset command buffer on ILLEGAL_COMMAND_ERROR On an ILLEGAL_COMMAND_ERROR the IOMMU stops executing further commands. This patch changes the code to handle this case better by resetting the command buffer in the IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:34 +02:00
Joerg Roedel	93f1cc67cf	x86/amd-iommu: Add reset function for command buffers This patch factors parts of the command buffer initialization code into a seperate function which can be used to reset the command buffer later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:34 +02:00
Joerg Roedel	d586d7852c	x86/amd-iommu: Add function to flush all DTEs on one IOMMU This function flushes all DTE entries on one IOMMU for all devices behind this IOMMU. This is required for command buffer resetting later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:23 +02:00
Joerg Roedel	e0faf54ee8	x86/amd-iommu: fix broken check in amd_iommu_flush_all_devices The amd_iommu_pd_table is indexed by protection domain number and not by device id. So this check is broken and must be removed. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:56 +02:00
Joerg Roedel	ae908c22aa	x86/amd-iommu: Remove redundant 'IOMMU' string The 'IOMMU: ' prefix is not necessary because the DUMP_printk macro already prints its own prefix. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:56 +02:00
Joerg Roedel	4c6f40d4e0	x86/amd-iommu: replace "AMD IOMMU" by "AMD-Vi" This patch replaces the "AMD IOMMU" printk strings with the official name for the hardware: "AMD-Vi". Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:56 +02:00
Joerg Roedel	f2430bd104	x86/amd-iommu: Remove some merge helper code This patch removes some left-overs which where put into the code to simplify merging code which also depends on changes in other trees. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:55 +02:00
Joerg Roedel	e394d72aa8	x86/amd-iommu: Introduce function for iommu-local domain flush This patch introduces a function to flush all domain tlbs for on one given IOMMU. This is required later to reset the command buffer on one IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:41:34 +02:00
Joerg Roedel	945b4ac44e	x86/amd-iommu: Dump illegal command on ILLEGAL_COMMAND_ERROR This patch adds code to dump the command which caused an ILLEGAL_COMMAND_ERROR raised by the IOMMU hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 14:28:04 +02:00
Joerg Roedel	e3e59876e8	x86/amd-iommu: Dump fault entry on DTE error This patch adds code to dump the content of the device table entry which caused an ILLEGAL_DEV_TABLE_ENTRY error from the IOMMU hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 14:17:08 +02:00
Ingo Molnar	f76bd108e5	Merge branch 'perfcounters/urgent' into perfcounters/core Merge reason: We are going to modify a place modified by perfcounters/urgent. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-02 21:42:59 +02:00
David Howells	ee18d64c1f	KEYS: Add a keyctl to install a process's session keyring on its parent [try #6 ] Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-09-02 21:29:22 +10:00
Ingo Molnar	936e894a97	Merge commit 'v2.6.31-rc8' into x86/txt Conflicts: arch/x86/kernel/reboot.c security/Kconfig Merge reason: resolve the conflicts, bump up from rc3 to rc8. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-02 08:17:56 +02:00
Huang Ying	ae4b688db2	x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h This function measures whether the FPU/SSE state can be touched in interrupt context. If the interrupted code is in user space or has no valid FPU/SSE context (CR0.TS == 1), FPU/SSE state can be used in IRQ or soft_irq context too. This is used by AES-NI accelerated AES implementation and PCLMULQDQ accelerated GHASH implementation. v3: - Renamed to irq_fpu_usable to reflect the purpose of the function. v2: - Renamed to irq_is_fpu_using to reflect the real situation. Signed-off-by: Huang Ying <ying.huang@intel.com> CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-01 21:39:15 -07:00
Shane Wang	69575d3886	x86, intel_txt: clean up the impact on generic code, unbreak non-x86 Move tboot.h from asm to linux to fix the build errors of intel_txt patch on non-X86 platforms. Remove the tboot code from generic code init/main.c and kernel/cpu.c. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-01 18:25:07 -07:00
H. Peter Anvin	f6909f394c	x86, msr: fix msr-reg.S compilation with gas 2.16.1 msr-reg.S used the :req option on a macro argument, which wasn't supported by gas 2.16.1 (but apparently by some earlier versions of gas, just to be confusing.) It isn't necessary, so just remove it. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <petkovbb@googlemail.com>	2009-09-01 13:37:21 -07:00
Ingo Molnar	c931aaf0e1	Merge branch 'x86/paravirt' into x86/cpu Conflicts: arch/x86/include/asm/paravirt.h Manual merge: arch/x86/include/asm/paravirt_types.h Merge reason: x86/paravirt conflicts non-trivially with x86/cpu, resolve it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-01 12:13:30 +02:00
Catalin Marinas	acde31dc46	kmemleak: Ignore the aperture memory hole on x86_64 This block is allocated with alloc_bootmem() and scanned by kmemleak but the kernel direct mapping may no longer exist. This patch tells kmemleak to ignore this memory hole. The dma32_bootmem_ptr in dma32_reserve_bootmem() is also ignored. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2009-09-01 11:12:32 +01:00
H. Peter Anvin	ff55df53df	x86, msr: Export the register-setting MSR functions via /dev/*/msr Make it possible to access the all-register-setting/getting MSR functions via the MSR driver. This is implemented as an ioctl() on the standard MSR device node. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <petkovbb@gmail.com>	2009-08-31 16:16:04 -07:00
H. Peter Anvin	8b956bf1f0	x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs() Create _on_cpu helpers for {rw,wr}msr_safe_regs() analogously with the other MSR functions. This will be necessary to add support for these to the MSR driver. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <petkovbb@gmail.com>	2009-08-31 16:15:57 -07:00
H. Peter Anvin	0cc0213e73	x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT For some reason, the _safe MSR functions returned -EFAULT, not -EIO. However, the only user which cares about the return code as anything other than a boolean is the MSR driver, which wants -EIO. Change it to -EIO across the board. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-08-31 15:15:23 -07:00
H. Peter Anvin	79c5dca361	x86, msr: CFI annotations, cleanups for msr-reg.S Add CFI annotations for native_{rd,wr}msr_safe_regs(). Simplify the 64-bit implementation: we don't allow the upper half registers to be set, and so we can use them to carry state across the operation. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com>	2009-08-31 15:14:47 -07:00
H. Peter Anvin	709972b1f6	x86, asm: Make _ASM_EXTABLE() usable from assembly code We have had this convenient macro _ASM_EXTABLE() to generate exception table entry in inline assembly. Make it also usable for pure assembly. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:30 -07:00
H. Peter Anvin	fe9b4e4e40	x86, asm: Add 32-bit versions of the combined CFI macros Add 32-bit versions of the combined CFI macros, equivalent to the 64-bit ones except, obviously, operating on 32-bit stack words. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:29 -07:00
Borislav Petkov	6b0f43ddfa	x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit `fbd8b1819e` turns off the bit for /proc/cpuinfo. However, a proper/full fix would be to additionally turn off the bit in the CPUID output so that future callers get correct CPU features info. Do that by basically reversing what the BIOS wrongfully does at boot. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1251705011-18636-3-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:29 -07:00
Borislav Petkov	177fed1ee8	x86, msr: Rewrite AMD rd/wrmsr variants Switch them to native_{rd,wr}msr_safe_regs and remove pv_cpu_ops.read_msr_amd. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:28 -07:00
Borislav Petkov	132ec92f3f	x86, msr: Add rd/wrmsr interfaces with preset registers native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow presetting of a subset of eight x86 GPRs before executing the rd/wrmsr instructions. This is needed at least on AMD K8 for accessing an erratum workaround MSR. Originally based on an idea by H. Peter Anvin. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:26 -07:00
Michal Schmidt	23386d63bb	x86: Detect stack protector for i386 builds on x86_64 Stack protector support was not detected when building with ARCH=i386 on x86_64 systems: arch/x86/Makefile:80: stack protector enabled but no compiler support The "-m32" argument needs to be passed to the detection script. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <20090829182718.10f566b1@leela> Signed-off-by: Ingo Molnar <mingo@elte.hu> --	2009-08-30 20:39:48 +02:00
Jan Beulich	47d25003cb	x86: Fix earlyprintk=dbgp for machines without NX Since parse_early_param() may (e.g. for earlyprintk=dbgp) involve calls to page table manipulation functions (here set_fixmap_nocache()), NX hardware support must be determined before calling that function (so that __supported_pte_mask gets properly set up). But the call after parse_early_param() can also not go away, as that will honor eventual command line specified disabling of the NX functionality. ( This will then just result in whatever mappings got established during parse_early_param() having the NX bit set despite it being disabled on the command line, but I think that's tolerable). Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <4A97F3BD02000078000121B9@vpn.id2.novell.com> [ merged to x86/pat to resolve a conflict. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-29 15:47:32 +02:00
Ingo Molnar	eebc57f73d	Merge branch 'for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 into x86/apic Merge reason: the SFI (Simple Firmware Interface) feature in the ACPI tree needs this cleanup, pull it into the APIC branch as well so that there's no interactions. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-29 09:31:47 +02:00
Feng Tang	2a4ab640d3	ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n Some IO-APIC routines are ACPI specific now, but need to be exposed when CONFIG_ACPI=n for the benefit of SFI. Remove #ifdef ACPI around these routines: io_apic_get_unique_id(int ioapic, int apic_id); io_apic_get_version(int ioapic); io_apic_get_redir_entries(int ioapic); Move these routines from ACPI-specific boot.c to io_apic.c: uniq_ioapic_id(u8 id) mp_find_ioapic() mp_find_ioapic_pin() mp_register_ioapic() Also, since uniq_ioapic_id() is now no longer static, re-name it to io_apic_unique_id() for consistency with the other public io_apic routines. For simplicity, do not #ifdef the resulting code ACPI \|\| SFI, thought that could be done in the future if it is important to optimize the !ACPI !SFI IO-APIC x86 kernel for size. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2009-08-28 19:57:27 -04:00
H. Peter Anvin	b855192c08	Merge branch 'x86/urgent' into x86/pat Reason: Change to is_new_memtype_allowed() in x86/urgent Resolved semantic conflicts in: arch/x86/mm/pat.c arch/x86/mm/ioremap.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 17:24:28 -07:00
Venkatesh Pallipadi	d886c73cd4	x86, pat: Sanity check remap_pfn_range for RAM region Add sanity check for remap_pfn_range of RAM regions using lookup_memtype(). Previously, we did not have anyway to get the type of RAM memory regions as they were tracked using a single bit in page_struct (WB, nonWB). Now we can get the actual type from page struct (WB, WC, UC_MINUS) and make sure the requester gets that type. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:35 -07:00
Venkatesh Pallipadi	1087637616	x86, pat: Lookup the protection from memtype list on vm_insert_pfn() Lookup the reserved memtype during vm_insert_pfn and use that memtype for the new mapping. This takes care or handling of vm_insert_pfn() interface in track_pfn_vma*/untrack_pfn_vma. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:32 -07:00
Venkatesh Pallipadi	637b86e75f	x86, pat: Add lookup_memtype to get the current memtype of a paddr Add a new routine lookup_memtype() to get the current memtype based on the PAT reserves and frees. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:28 -07:00
Venkatesh Pallipadi	f584174096	x86, pat: Use page flags to track memtypes of RAM pages Change reserve_ram_pages_type and free_ram_pages_type to use 2 page flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking just tracked WB or NonWB, which was not complete and did not allow tracking of RAM fully and there was no way to get the actual type reserved by looking at the page flags. We use the memtype_lock spinlock for atomicity in dealing with memtype tracking in struct page. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:24 -07:00
Venkatesh Pallipadi	46cf98cdae	x86, pat: Generalize the use of page flag PG_uncached Only IA64 was using PG_uncached as of now. We now intend to use this bit in x86 as well, to keep track of memory type of those addresses that have page struct for them. So, generalize the use of that bit across ia64 and x86. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:22 -07:00
Venkatesh Pallipadi	335ef896d4	x86, pat: Add rbtree to do quick lookup in memtype tracking PAT memtype tracking uses a linear link list to keep track of IO (non-RAM) regions and their memtypes. The code used a last_accessed pointer as a cache to speedup the lookup. As per discussions with H. Peter Anvin a while back, having a rbtree here will avoid bad performances in pathological cases where we may end up with huge linked list. This may not add any noticable performance speedup in normal case as the number of entires in PAT memtype list tend to be ~20-30 range. The patch removes the "cached_entry" logic as with rbtree we have more generic way of speeding up the lookup. With this patch, we use rbtree to do the quick lookup. We still use linked list as the memtype range tracked can be of different sizes and can overlap in different ways. We also keep track of usage counts with linked list. Example: Multiple ioremaps with different sizes uncached-minus @ 0xfffff00000-0xfffff04000 uncached-minus @ 0xfffff02000-0xfffff03000 And one userlevel mmap and the thread forks a new process uncached-minus @ 0xbf453000-0xbf454000 uncached-minus @ 0xbf453000-0xbf454000 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:19 -07:00
Venkatesh Pallipadi	9e36fda0b3	x86, pat: Add PAT reserve free to io_mapping* APIs io_mapping_* interfaces were added, mainly for graphics drivers. Make this interface go through the PAT reserve/free, instead of hardcoding WC mapping. This makes sure that there are no aliases due to unconditional WC setting. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:16 -07:00
Venkatesh Pallipadi	9fd126bc74	x86, pat: New i/f for driver to request memtype for IO regions Add new routines to request memtype for IO regions. This will currently be a backend for io_mapping_* routines. But, it can also be made available to drivers directly in future, in case it is needed. reserve interface reserves the memory, makes sure we have a compatible memory type available and keeps the identity map in sync when needed. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:10 -07:00
Venkatesh Pallipadi	279e669b3f	x86, pat: ioremap to follow same PAT restrictions as other PAT users ioremap has this hard-coded check for new type and requested type. That check differs from other PAT users like /dev/mem mmap, remap_pfn_range in only one condition where requested type is UC_MINUS and new type is WC. Under that condition, ioremap fails. But other PAT interfaces succeed with a WC mapping. Change to make ioremap be in sync with other PAT APIs and use the same macro as others. Also changes the error print to KERN_ERR instead of pr_debug. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:07 -07:00
Venkatesh Pallipadi	5fc517466d	x86, pat: Keep identity maps consistent with mmaps even when pat_disabled Make reserve_memtype internally take care of pat disabled case and fallback to default return values. Remove the specific pat_disabled checks in track_* routines. Change kernel_map_sync_memtype to sync identity map even when pat_disabled. This change ensures that, even for pat_disabled case, we take care of keeping identity map in sync. Before this patch, in pat disabled case, ioremap() keeps the identity maps in sync and other APIs like pci and /dev/mem mmap don't, which is not a very consistent behavior. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:40:58 -07:00
Jason Baron	117226d158	tracing: Remove FTRACE_SYSCALL_MAX definitions Remove the FTRACE_SYSCALL_MAX definitions now that we have converted the syscall event tracing code to use NR_syscalls. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <f2240cdc8f0b1ca7617390c8f5ec90ba2bd348cf.1251146513.git.jbaron@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 21:30:39 +02:00
Jason Baron	57421dbbdc	tracing: Convert event tracing code to use NR_syscalls Convert the syscalls event tracing code to use NR_syscalls, instead of FTRACE_SYSCALL_MAX. NR_syscalls is standard accross most arches, and reduces code confusion/complexity. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <9b4f1a84ecae57cc6599412772efa36f0d2b815b.1251146513.git.jbaron@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 21:30:02 +02:00
Jason Baron	a5a2f8e2ac	tracing: Define NR_syscalls for x86_64 Express the available number of syscalls in a standard way by defining NR_syscalls. The common way to define it is to place its definition in asm/unistd.h However, the number of syscalls is defined using __NR_syscall_max in x86-64 after building a dynamic header file "asm-offsets.h" The source file that generates this header, asm-offsets-64.c includes unistd.h, then if we want to express NR_syscalls from __NR_syscall_max in unistd.h only after generating the dynamic header file, we need a watchguard. If unistd.h is included from asm-offsets-64.c, then we are generating asm-offset.h which defines __NR_syscall_max. At this time, we don't want to (we can't) define NR_syscalls, then we do nothing. Otherwise we define NR_syscalls because we know asm-offsets.h has been generated. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <20090826160910.GB2658@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 21:29:58 +02:00
Jason Baron	dd86dda24c	tracing: Define NR_syscalls for x86 (32) Add a NR_syscalls #define for x86. This is used in the syscall events tracing code. Todo: make it dynamic like x86_64. NR_syscalls is the usual name used to determine the number of syscalls supported by the current arch. We want to unify the use of this number across archs that support the syscall tracing. This also prepare to move some of the arch code to core code in the syscall tracing area. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <0f33c0f96d198fccc3ddd9ff7f5334ff5cb42706.1251146513.git.jbaron@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 21:29:55 +02:00
Cyrill Gorcunov	d3a247bfb2	x86, apic: Slim down stack usage in early_init_lapic_mapping() As far as I see there is no external poking of mp_lapic_addr in this procedure which could lead to unpredited changes and require local storage unit for it. Lets use it plain forward. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090826171324.GC4548@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 20:26:23 +02:00
Hidetoshi Seto	680b6cfd3c	x86, mce: CE in last bank prevents panic by unknown MCE If MCE handler is called but none of mces_seen have machine check event which might signal the MCE (i.e. event higher than MCE_KEEP_SEVERITY), panic with "Machine check from unknown source" will be taken since the MCE is assumed to be signaled from external agent or so. Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE. But it can happen because initial value of mces_seen is accidentally modified by mce_no_way_out() - in case if mce_no_way_out() run through all banks and the last bank has the CE, mces_seen points the CE and the "panic by unknown" will not be taken. This patch fixes this undesired behavior, and clarifies the logic. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com> LKML-Reference: <4A94E244.3020301@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>	2009-08-26 20:21:11 +02:00
Yinghai Lu	295594e9cf	x86: Fix vSMP boot crash 2.6.31-rc7 does not boot on vSMP systems: [ 8.501108] CPU31: Thermal monitoring enabled (TM1) [ 8.501127] CPU 31 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 [ 8.650254] CPU31: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz stepping 04 [ 8.710324] Brought up 32 CPUs [ 8.713916] Total of 32 processors activated (162314.96 BogoMIPS). [ 8.721489] ERROR: parent span is not a superset of domain->span [ 8.727686] ERROR: domain->groups does not contain CPU0 [ 8.733091] ERROR: groups don't span domain->span [ 8.737975] ERROR: domain->cpu_power not set [ 8.742416] Ravikiran Thirumalai bisected it to: \| commit `2759c3287d` \| x86: don't call read_apic_id if !cpu_has_apic The problem is that on vSMP systems the CPUID derived initial-APICIDs are overlapping - so we need to fall back on hard_smp_processor_id() which reads the local APIC. Both come from the hardware (influenced by firmware though) so it's a tough call which one to trust. Doing the quirk expresses the vSMP property properly and also does not affect other systems, so we go for this solution instead of a revert. Reported-and-Tested-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shai Fultheim <shai@scalex86.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4A944D3C.5030100@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 10:13:17 +02:00
Cyrill Gorcunov	5051fd6977	x86, e820: Guard against array overflowed in __e820_add_region() Better to be paranoid against unpredicted nr_map modifications. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090824175551.146070377@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 08:17:47 +02:00
Cyrill Gorcunov	ffc438366c	x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources() alloc_bootmem() already panics on allocation failure. There is no need to check the result. Also there is a way to unbind global variable from its body and use it as a parameter which allow us to simplify ioapic_init_mappings as well -- "for" cycle already uses nr_ioapics as a conditional variable and there is no need to check if ioapic_setup_resources was returning NULL again. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090824175551.493629148@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 08:16:38 +02:00
Cyrill Gorcunov	8f3e1df48b	x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant We already have APIC_DEFAULT_PHYS_BASE so just to be consistent. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090824175550.927946757@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 08:16:38 +02:00
H. Peter Anvin	7adb4df410	x86, xen: Initialize cx to suppress warning Initialize cx before calling xen_cpuid(), in order to suppress the "may be used uninitialized in this function" warning. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2009-08-25 21:10:32 -07:00
Jeremy Fitzhardinge	d560bc6157	x86, xen: Suppress WP test on Xen Xen always runs on CPUs which properly support WP enforcement in privileged mode, so there's no need to test for it. This also works around a crash reported by Arnd Hannemann, though I think its just a band-aid for that case. Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-25 21:10:32 -07:00
H. Peter Anvin	ab94fcf528	x86: allow "=rm" in native_save_fl() This is a partial revert of `f1f029c7bf`. "=rm" is allowed in this context, because "pop" is explicitly defined to adjust the stack pointer before it evaluates its effective address, if it has one. Thus, we do end up writing to the correct address even if we use an on-stack memory argument. The original reporter for `f1f029c7bf` was apparently using a broken x86 simulator. [ Impact: performance ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Gabe Black <spamforgabe@umich.edu>	2009-08-25 16:47:16 -07:00
Josh Stone	1c569f0264	tracing: Create generic syscall TRACE_EVENTs This converts the syscall_enter/exit tracepoints into TRACE_EVENTs, so you can have generic ftrace events that capture all system calls with arguments and return values. These generic events are also renamed to sys_enter/exit, so they're more closely aligned to the specific sys_enter_foo events. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-5-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 00:41:48 +02:00
H. Peter Anvin	e8a2eb47e6	Merge commit 'origin/x86/urgent' into x86/asm	2009-08-25 15:40:29 -07:00
Josh Stone	9741987586	tracing: Move tracepoint callbacks from declaration to definition It's not strictly correct for the tracepoint reg/unreg callbacks to occur when a client is hooking up, because the actual tracepoint may not be present yet. This happens to be fine for syscall, since that's in the core kernel, but it would cause problems for tracepoints defined in a module that hasn't been loaded yet. It also means the reg/unreg has to be EXPORTed for any modules to use the tracepoint (as in SystemTap). This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces DEFINE_TRACE_FN which stores the callbacks in struct tracepoint. The callbacks are used now when the active state of the tracepoint changes in set_tracepoint & disable_tracepoint. This also introduces TRACE_EVENT_FN, so ftrace events can also provide registration callbacks if needed. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 00:36:41 +02:00
Josh Stone	6670000119	tracing: Rename FTRACE_SYSCALLS for tracepoints s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g The syscall enter/exit tracing is no longer specific to just ftrace, so they now have names that reflect their tie to tracepoints instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 00:17:35 +02:00
Linus Torvalds	44afa9a4b8	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevent: Prevent dead lock on clockevents_lock timers: Drop write permission on /proc/timer_list	2009-08-25 11:24:04 -07:00
Linus Torvalds	9f459fadbb	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix build with older binutils and consolidate linker script x86: Fix an incorrect argument of reserve_bootmem() x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile xen: rearrange things to fix stackprotector x86: make sure load_percpu_segment has no stackprotector i386: Fix section mismatches for init code with !HOTPLUG_CPU x86, pat: Allow ISA memory range uncacheable mapping requests	2009-08-25 11:23:25 -07:00
Roel Kluin	005155b1f6	x86: Fix x86_model test in es7000_apic_is_cluster() For the x86_model to be greater than 6 or less than 12 is logically always true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-25 15:58:12 +02:00
Jan Beulich	c62e43202e	x86: Fix build with older binutils and consolidate linker script binutils prior to 2.17 can't deal with the currently possible situation of a new segment following the per-CPU segment, but that new segment being empty - objcopy misplaces the .bss (and perhaps also the .brk) sections outside of any segment. However, the current ordering of sections really just appears to be the effect of cumulative unrelated changes; re-ordering things allows to easily guarantee that the segment following the per-CPU one is non-empty, and at once eliminates the need for the bogus data.init2 segment. Once touching this code, also use the various data section helper macros from include/asm-generic/vmlinux.lds.h. -v2: fix !SMP builds. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: <sam@ravnborg.org> LKML-Reference: <4A94085D02000078000119A5@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-25 15:54:16 +02:00
Amerigo Wang	a6a06f7b57	x86: Fix an incorrect argument of reserve_bootmem() This line looks suspicious, because if this is true, then the 'flags' parameter of function reserve_bootmem_generic() will be unused when !CONFIG_NUMA. I don't think this is what we want. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-24 20:22:55 +02:00
Alexey Dobriyan	10f02d1168	x86: uv: Clean up uv_ptc_init(), use proc_create() create_proc_entry() is getting duhprecated. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: cpw@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-24 12:26:46 +02:00

... 2 3 4 5 6 ...

8764 commits