Some code cleanliness issues found by Andrew Morton (thanks!) which should
not affect functionality, but which should help make the code more
maintainable.
In particular, we now:
* convert all #define's w/ a parameter to static inlines
* use 1UL rather than 1ULL when calculating an unsigned long
* use pci_disable_device
The resulting code is tested and seems to work fine...
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Explicitly unmask ECC errors we are interested in reporting.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is possible that the BIOS did not enable ECC at boot time. We check
for that case and fail to load if it is true.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The error mask we use to trigger ECC notifications is missing many bits of
interest. We add these bits here so that all possible ECC errors can be
reported.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Preliminary support for the Intel 5100 MCH. CE and UE errors are reported
along with the current DIMM label information and other memory parameters.
Reasons why this is preliminary:
1) This chip has 2 independent memory controllers which, for best
perforance, use interleaved accesses to the DDR2 memory. This
architecture does not map very well to the current edac data structures
which depend on symmetric channel access to the interleaved data.
Without core changes, the best I could do for now is to map both memory
controllers to different csrows (first all ranks of controller 0, then
all ranks of controller 1). Someone much more familiar with the edac
core than I will probably need to come up with a more general data
structure to handle the interleaving and de-interleaving of the two
memory controllers.
2) I have not yet tackled the de-interleaving of the rank/controller
address space into the physical address space of the CPU. There is
nothing fundamentally missing, it is just ending up to be a lot of
code, and I'd rather keep it separate for now, esp since it doesn't
work yet...
3) The code depends on a particular i5100 chip select to DIMM mainboard
chip select mapping. This mapping seems obvious to me in order to
support dual and single ranked memory, but it is not unique and DIMM
labels could be wrong on other mainboards. There is no way to query
this mapping that I know of.
4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per
controller, 2 ranks per DIMM are supported. I do not have hardware
(nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
per controller) mode.
5) The serial presence detect code should be broken out into a "real"
i2c driver so that decode-dimms.pl can work.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If correctable error occurs the syndrome code was logged as 0. This patch
lets EDAC to log a correct syndrome code to make problem investigation
easier.
Signed-off-by: Maxim Shchetynin <maxim@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
including of <asm/mpc85xx.h> causes build problems since it doesn't exist.
Also removed warning:
drivers/edac/mpc85xx_edac.c:45: warning: 'mpc85xx_ctl_name' defined but not used
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 06916639e2 ("driver-core: add
dev_name() to help transition away from using bus_id") added a static
inline dev_name() and used it in dev_printk.
Unfortunately, drivers/edac/edac_core.h defines a macro called
dev_name(). Rename the latter.
Diagnosis by Tony Breeds and Michael Ellerman.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c3c52bce69 ("edac: fix module
initialization on several modules 2nd time") added a call to opstate_init
but did not include linux/edac.h that declares it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I implemented opstate_init() as a inline function in linux/edac.h.
added calling opstate_init() to:
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
I wrote a fixed patch of
edac-fix-module-initialization-on-several-modules.patch,
and tested building 2.6.25-rc7 with applying this. It was succeed.
I think the patch is now correct.
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Collection of patches, merged into one, from Adrian that do the following:
1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()
2) Remove unneeded function edac_device_find()
3) Added #if 0 around function edac_pci_find()
4) make the needlessly global edac_pci_generic_check() static
5) Removed function edac_check_mc_devices()
Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a module parameter "sysbus_parity" to allow forcing system bus parity
error checking on or off. Also add support to automatically disable system
bus parity errors for processors which do not support it.
If the sysbus_parity parameter is specified, sysbus parity detection will be
forced on or off. If it is not specified, the driver will attempt to look at
the CPU identifier string and determine if the CPU supports system bus parity.
A blacklist was used instead of a whitelist so that system bus parity would
be enabled by default and to minimize the chances of breaking things for those
people already using the driver which for some reason have a processor that
does not have a valid CPU identifier string.
[akpm@linux-foundation.org: coding-style fixes]
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By popular request, add a comment documenting the implicit type promotion
here.
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a missing sequence of initialization code during startup.
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmisson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Made a previous global variable, static in scope
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modified to run on x86_64 as well as x86
i3000_edac builds (and runs) fine on x86_64.
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
following problem:
In the kernel there is a pci device attribute located in sysfs that is
checked by the EDAC PCI scanning code. If that attribute is set,
PCI parity/error scannining is skipped for that device. The attribute
is:
broken_parity_status
as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
PCI devices.
I don't think this check was actually implemented. I have a misbehaved card
that reports a parity error every 1000 ms:
Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Setting that card's broken_parity_status bit did not mask the error:
echo "1" > /sys/bus/pci/devices/0000:05:01.0/broken_parity_status
I looked through the EDAC code and did not readily see any reference to
broken_parity_status at all (which makes sense based on the behavior I am
seeing). I applied the following patch as a proof-of-concept and now EDAC's
PCI parity error reporting behaves as documented:
bryan
Good regression find, bryan. It used to work. sigh.
I added more logic to your patch, for more coverage of the error.
Doug T
Signed-off-by: Bryan Boatright <b1@omega71.com>
Signed-off-by: Doug Thompson <dougthompson@xmisson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marvell mv64x60 SoC support for EDAC. Used on PPC and MIPS platforms.
Development and testing done on PPC Motorola prpmc2800 ATCA board.
[akpm@linux-foundation.org: make mv64x60_ctl_name static]
Signed-off-by: Dave Jiang <djiang@mvista.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds driver for the Cell memory controller when used without a Hypervisor such
as on the IBM Cell blades. There might still be some improvements to do to
this such as finding if it's possible to properly obtain more details about
the address of the error but it's good enough already to report CE counts
which is our main priority at the moment.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the definitions for the Rambus XDR memory type used by the Cell processor.
It's a pre-requisite for the followup Cell EDAC patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When rounding a relative timeout we need to use round_jiffies_relative().
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ENABLE the 'logging' of CE and UE events for the EDAC_DEVICE class of error
harvester in EDAC
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a "default" ktype for a kset. We should set this
explicitly every time for each kset. This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.
This patch is based on a lot of help from Kay Sievers.
Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There will be more product numbers in the future than just PA6T-1682M,
but they will share much of the features. Remove some of the explicit
references and compatibility checks with 1682M, and replace most of them
with the more generic term "PWRficient".
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
The i5000_edac driver's PCI registration structure has the name
""i5000_edac"" (with extra set of double-quotes) which is probably not
intentional. Get rid of __stringify.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
define global BIT macro
move all local BIT defines to the new globally define macro.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A kset should not have its name set directly, so dynamically set the
name at runtime.
This is needed to remove the static array in the kobject structure which
will be changed in a future patch.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch changes the error code when dev0:fun1 was hidden by BIOS to one
more appropriate.
Signed-off-by: Aristeu Rozanski <aris@ruivo.org>
Signed-off-by: Mark Gross <mark.gross@intel.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When EDAC is configured for EDAC DEBUGGING, the debug printk output level
was set TOO high (EMERG). This patch brings it down to a DEBUG level
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixed 'depends on PPC_PASEMI' in EDAC Kconfig. Module PASEMI depends ONLY on
the PASEMI on PPC.
Was previously enabled for ALL PPC
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Egor N. Martovetsky <egor@pasemi.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes sysfs exit code for the EDAC PCI device in a similiar manner
and the previous fixes for EDAC_MC and EDAC_DEVICE.
It removes the old (and incorrect) completion model and uses reference counts
on per instance kobjects and on the edac core module.
This pattern was applied to the edac_mc and edac_device code, but the EDAC PCI
code was missed. In addition, this fixes a system hang after a low level
driver was unloaded. (A cleanup function was called twice, which really
screwed things up)
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a deadlock that could occur on a 'setup' and 'teardown' sequence of
the workq for a edac_mc control structure instance. A similiar fix was
previously implemented for the edac_device code.
In addition, the edac_mc device code there was missing code to allow the workq
period valu to be altered via sysfs control.
This patch adds that fix on the code, and allows for the changing of the
period value as well.
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/edac/edac_stub.c:15:22: asm/edac.h: No such file or directory
was it even supposed to work?
Cc: Douglas Thompson <dougthompson@xmission.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some simple fixes to properly reference counter values from the block
attribute level of edac_device objects. Properly sequencing the array pointer
was added, resulting in correct identification of block level attributes from
their base class functions.
Added more verbose debug statement for event tracking.
Also during some corner testing, found a bug in the store/show sequence
of operations for the block attribute/controls management.
An old intermediate structure for 'blocks' was still in the processing
pipeline. This patch removes that old structure and correctly utilizes the
new struct edac_dev_sysfs_block_attribute for passing control from the sysfs
to the low level store/show function of the edac driver.
Now the proper kobj pointer to passed downward to the store/show
functions.
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix mutex locking deadlock on the device controller linked list. Was calling
a lock then a function that could call the same lock. Moved the cancel workq
function to outside the lock
Added some short circuit logic in the workq code
Added comments of description
Code tidying
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With feedback, this patch corrects operation of the kobject release operation
on kobjects, attributes and controls for the edac_device.
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch refactors the 'releasing' of kobjects for the edac_mc type of
device. The correct pattern of kobject release is followed.
As internal kobjs are allocated they bump a ref count on the top level kobj.
It in turn has a module ref count on the edac_core module. When internal
kobjects are released, they dec the ref count on the top level kobj. When the
top level kobj reaches zero, it decrements the ref count on the edac_core
object, allow it to be unloaded, as all resources have all now been released.
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Refactoring of sysfs code necessitated the refactoring of the
edac_device_alloc() and edac_device_add_device() apis, of moving the index
value to the alloc() function. This patch alters the in tree drivers to
utilize this new api signature.
Having the index value performed later created a chicken-and-the-egg issue.
Moving it to the alloc() function allows for creating the necessary sysfs
entries with the proper index number
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Refactoring of sysfs code necessitated the refactoring of the edac_mc_alloc()
and edac_mc_add_mc() apis, of moving the index value to the alloc() function.
This patch alters the in tree drivers to utilize this new api signature.
Having the index value performed later created a chicken-and-the-egg issue.
Moving it to the alloc() function allows for creating the necessary sysfs
entries with the proper index number
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes and enhances the driver level set of sysfs attributes that
can be added to the 'block' level of an edac_device type of driver.
There is a controller information structure, which contains one or more
instances of device. Each instance will have one or more blocks of device
specific counters. This patch fixes the ability to have more detailed
attributes/controls for each of the 'blocks', providing for the addition of
controls/attributes from the low level driver to user space via sysfs.
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NEW EDAC driver for the memory controllers on PA Semi PA6T-1682M.
Changes since last submission:
* Rebased on top of 2.6.22-rc4-mm2 with the EDAC changes merged there.
* Minor checkpatch.pl cleanups
* Renamed ctl_name
* Added dev_name
* edac_mc.h -> edac_core.h
[akpm@linux-foundation.org: make printk more informative]
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Egor Martovetsky <egor@pasemi.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Doug Thompson <dougthompson@xmission.com
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found a 'reversal' decoding bug in the driver. This patch fixes that mapping
to correctly display the CSROW entries in their proper order. Users will be
enable to correctly identifiy the failing DIMM with this fix.
[akpm@linux-foundation.org: unneeded (and undesirable) cast of void*]
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Mark Grondona <mgrondona@llnl.gov>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A previous patch changed the edac_mc src file from semaphore usage to mutex
This patch changes the edac_device src file as well, from semaphore use to
mutex operation.
Use a mutex primitive for mutex operations, as it does not require a
semaphore
Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Refactored the function edac_op_state_toString() to be edac_op_state_to_string()
for consistent style, and its callers
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Refactor the edac_align_ptr() function to reduce the noise of casting the
aligned pointer to the various types of data objects and modified its callers
to its new signature
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For the file edac_device.c perform some coding style enhancements
Add some function header comments
Made for better readability commands
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Various code style conformance patches on the i5000 driver
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patches to conform to coding style, namely static don't need to be initialized
to NULL nor '0', as that is the default
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found a typo in one of the #defines in the driver
MTR_DIM_RANKS --> MTR_DIMM_RANK
Signed-off-by: Marisuz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compiling this module gave a warning that the return value of
'pci_bus_add_device()' was not checked.
This patch adds that check and an output message
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If ERRSTS indicates that there's no error then we don't need to bother reading
the other registers.
In addition to making the common case faster, this actually fixes a small race
where we don't see an error but we clear the error bits anyway, potentially
wiping away info on an error that happened in the interim (or where a CE
arrives between the first and second read of ERRSTS, causing us to falsely
claim "UE overwrote CE").
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) Remove an old CVS ID string
2) change EDAC from a tristate option to a simple bool option
3) In addition to the X86 arch, PPC and MIPS also have drivers in the
submission queue. This patch turns on the EDAC flag for those archs. Each
driver will have its respective 'depends on ARCH' set.
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes some remnant spaces inserted by the use of Lindent.
Seems Lindent adds some spaces when it shoulded. These have been fixed.
In addition, goto targets have issues, these have been fixed
in this patch.
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kconfig - modified the help of EDAC
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The error handling output strings needed to be refactored for better
displaying of the error informaton.
Also needed to added offset_value for output as well
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added new controls for the edac_device and edac_mc sysfs folder.
These can be initialized by the low level driver to provide misc
controls into the low level driver for its use
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move x86 drivers to new pci controller setup
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Run r82600_edac.c file through Lindent for cleanup
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Run i82443bxgx.c file through Lindent for cleanup
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Run e752x_edac.c file through Lindent for cleanup
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ran e752x_edac.c file through Lindent for cleanup
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ran this driver through Lindent for cleanup
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The origin of this code comes from patches at sourceforge, that
allow EDAC to be updated to various kernels. With kernel version 2.6.20 a
new workq system was installed, thus the patches needed to be modified
based on the kernel version. For submitting to the latest kernel.org
those #ifdefs are removed
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Removal of some old dead and disabled code from the edac_device sysfs code
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Run the EDAC CORE files through Lindent for cleanup
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixup poll values for MC and PCI.
Also make mc function names unique to mc.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmissin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change error check and clear variable from an atomic to an int
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Moving PCI to a per-instance device model
This should include the correct sysfs setup as well. Please review.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the memory controller object to work queue based implementation from the
kernel thread based.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's a driver for the Intel 3000 and 3010 memory controllers,
relative to today's Sourceforge code drop. This has only had light
testing (I've yet to actually see it handle a memory error) but it
detects my hardware correctly.
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move dev_name() macro to a more generic interface since it's not possible
to determine whether a device is pci, platform, or of_device easily.
Now each low level driver sets the name into the control structure, and
the EDAC core references the control structure for the information.
Better abstraction.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the refactoring of edac_mc.c into several subsystem files,
the header file edac_mc.h became meaningless. A new header file
edac_core.h was created. All the files that previously included
"edac_mc.h" are changed to include "edac_core.h".
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.
Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!
Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It will claim the PCI devices from under intel_agp.ko's feet. Greg is brewing
some fix for that.
Cc: Douglas Thompson <dougthompson@xmission.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a NEW EDAC Memory Controller driver for the 440BX chipset (I82443BXGX)
created and submitted by Timm Small
Signed-off-by: Tim Small <tim@buttersideup.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch to fix some scrubbing #defines in the edac_core.h file
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Wollesen ported the Bluesmoke Memory Controller driver (written by Doug
Thompson) for the Intel 5000X/V/P (Blackford/Greencreek) chipset to the in
kernel EDAC model.
This patch incorporates the module for the 5000X/V/P chipset family
[m.kozlowski@tuxland.pl: edac i5000 parenthesis balance fix]
Signed-off-by: Eric Wollesen <ericw@xmtp.net>
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The EDAC core code uses a semaphore as mutex. use the mutex API
instead of the (binary) semaphore.
Matthaias wrote this, but since I had some patches ahead of it,
I need to modify it to follow my patches.
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding missing mem types for use in the sysfs presentation file for
Memory Controller device objects.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new 'class' of object to be managed, named: 'edac_device'.
As a peer of the 'edac_mc' class of object, it provides a non-memory centric
view of an ERROR DETECTING device in hardware. It provides a sysfs interface
and an abstraction for varioius EDAC type devices.
Multiple 'instances' within the class are possible, with each 'instance'
able to have multiple 'blocks', and each 'block' having 'attributes'.
At the 'block' level there are the 'ce_count' and 'ue_count' fields
which the device driver can update and/or call edac_device_handle_XX()
functions. At each higher level are additional 'total' count fields,
which are a summation of counts below that level.
This 'edac_device' has been used to capture and present ECC errors
which are found in a a L1 and L2 system on a per CORE/CPU basis.
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a large patch to refactor the original EDAC module in the kernel
and to break it up into better file granularity, such that each source
file contains a given subsystem of the EDAC CORE.
Originally, the EDAC 'core' was contained in one source file: edac_mc.c
with it corresponding edac_mc.h file.
Now, there are the following files:
edac_module.c The main module init/exit function and other overhead
edac_mc.c Code handling the edac_mc class of object
edac_mc_sysfs.c Code handling for sysfs presentation
edac_pci_sysfs.c Code handling for PCI sysfs presentation
edac_core.h CORE .h include file for 'edac_mc' and 'edac_device' drivers
edac_module.h Internal CORE .h include file
This forms a foundation upon which a later patch can create the 'edac_device'
class of object code in a new file 'edac_device.c'.
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes needlessly global code static, in the edac core
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This simple patch adds an important CORE API for EDAC that EDAC drivers can
use to find their edac_mc control structure by passing a mem_ctl_info
'instance' value
Needed for subsequent patches
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add "depends on HAS_IOMEM" to a number of menus to make them
disappear for s390 which does not have I/O memory.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The 82875 EDAC driver enables an otherwise-hidden PCI device, but doesn't
register it as a PCI device properly. Therefore, the device list in
/proc/bus/pci/devices is different than the tree in /sys/bus/pci. This
usually manifests as the X server failing to start, since it expects the
two lists to be consistent.
Signed-off-by: Adam Jackson <ajackson@redhat.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Doug Thompson <norsk5@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Wollesen ported the Bluesmoke Memory Controller driver for the Intel
5000X/V/P (Blackford/Greencreek) chipset to the in kernel EDAC model.
This patch incorporates those required changes to the edac_mc.c and edac_mc.h
core files by added new Fully Buffered DIMM interface to the EDAC Core module.
Signed-off-by: eric wollesen <ericw@xmtp.net>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an attempt of providing an interface for memory scrubbing control in
EDAC.
This patch modifies the EDAC Core to provide the Interface for memory
controller modules to implment.
The following things are still outstanding:
- K8 is the first implemenation,
The patch provide a method of configuring the K8 hardware memory scrubber
via the 'mcX' sysfs directory. There should be some fallback to a generic
scrubber implemented in software if the hardware does not support
scrubbing.
Or .. the scrubbing sysfs entry should not be visible at all.
- Only works with SDRAM, not cache,
The K8 can scrub cache and l2cache also - but I think this is not so
useful as the cache is busy all the time (one hopes).
One would also expect that cache scrubbing requires hardware support.
- Error Handling,
I would like that errors are returned to the user in "terms of file
system".
- Presentation,
I chose Bandwidth in Bytes/Second as a representation of the scrubbing
rate for the following reasons:
I like that the sysfs entries are sort-of textual, related to something
that makes sense instead of magical values that must be looked up.
"My People" wants "% main memory scrubbed per hour" others prefer "%
memory bandwidth used" as representation, "bandwith used" makes it easy to
calculate both versions in one-liner scripts.
If one later wants to scrub cache, the scaling becomes wierd for K8
changing from "blocks of 64 byte memory" to "blocks of 64 cache lines" to
"blocks of 64 bit". Using "bandwidth used" makes sense in all three cases,
(I.M.O. anyway ;-).
- Discovery,
There is no way to discover the possible settings and what they do
without reading the code and the documentation.
*I* do not know how to make that work in a practical way.
- Bugs(??),
other tools can set invalid values in the memory scrub control register,
those will read back as '-1', requiring the user to reset the scrub rate.
This is how *I* think it should be.
- Afflicting other areas of code,
I made changes to edac_mc.c and edac_mc.h which will show up globally -
this is not nice, it would be better that the memory scrubbing fuctionality
and interface could be entirely contained within the memory controller it
applies to.
Frithiof Jensen
edac_mc.c and its .h file is a CORE helper module for EDAC
driver modules. This provides the abstraction for device specific
drivers. It is fine to modify this CORE to provide help for
new features of the the drivers
doug thompson
Signed-off-by: Frithiof Jensen <frithiof.jensen@ericson.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fix/change returns the offset into the page for the ce/ue error, instead
of just 0. The e752x dram controller reads 34:6 of the linear address with
the error.
Signed-off-by: Mike Chan <mikechan@google.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reading of the DRA registers should be a byte at a time (one register at a
time) instead of 4 bytes at a time (four registers). Reading a dword at a
time retrieves erroneous information from all but the first register. A
change was made to read in each register in a loop prior to using the data in
those registers.
Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fatal vs. non-fatal mask for the sysbus FERR status is incorrect
according to the E7520 datasheet. This patch corrects the mask to correctly
handle fatal and non-fatal errors.
Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.
[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Call sysdev_class_unregister() on failure in edac_sysfs_memctrl_setup()
and decrease identation level for clear logic.
Acked-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_PCI=n:
CC drivers/edac/edac_mc.o
drivers/edac/edac_mc.c: In function âadd_mc_to_global_listâ:
drivers/edac/edac_mc.c:1362: error: implicit declaration of function âto_platform_deviceâ
drivers/edac/edac_mc.c:1362: error: invalid type argument of â->â
drivers/edac/edac_mc.c: In function âedac_mc_add_mcâ:
drivers/edac/edac_mc.c:1467: error: invalid type argument of â->â
drivers/edac/edac_mc.c: In function âedac_mc_del_mcâ:
drivers/edac/edac_mc.c:1504: error: invalid type argument of â->â
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When EDAC was first introduced into the kernel it had a sysfs interface,
but due to some problems it was disabled in 2.6.16 and remained disabled in
2.6.17.
With feedback, several of the control and attribute files of that interface
had some good constructive feedback. PCI Blacklist/Whitelist was a major
set which has design issues and it has been removed in this patch. Instead
of storing PCI broken parity status in EDAC, it has been moved to the
pci_dev structure itself by a previous PCI patch. A future patch will
enable that feature in EDAC by utilizing the pci_dev info.
The sysfs is now enabled in this patch, with a minimal set of control and
attribute files for examining EDAC state and for enabling/disabling the
memory and PCI operations.
The Documentation for EDAC has also been updated to reflect the new state
of EDAC operation.
Signed-off-by:Doug Thompson <norsk5@xmisson.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the quoted module name in the sysfs for EDAC modules and reported by several
people.
Instead of ../_edac_e752x_/ now the following will be presented, like other
modules: ../edac_e752x/
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Add lower-level functions that handle various parts of the initialization
done by the xxx_probe1() functions. Some of the xxx_probe1() functions are
much too long and complicated (see "Chapter 5: Functions" in
Documentation/CodingStyle).
- Cleanup of probe1() functions in EDAC
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove add_mc_to_global_list(). In next patch, this function will be
reimplemented with different semantics.
1 Reimplement add_mc_to_global_list() with semantics that allow the caller to
determine the ID number for a mem_ctl_info structure. Then modify
edac_mc_add_mc() so that the caller specifies the ID number for the new
mem_ctl_info structure. Platform-specific code should be able to assign the
ID numbers in a platform-specific manner. For instance, on Opteron it makes
sense to have the ID of the mem_ctl_info structure match the ID of the node
that the memory controller belongs to.
2 Modify callers of edac_mc_add_mc() so they use the new semantics.
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change MC drivers from using CVS revision strings for their version number,
Now each driver has its own local string.
Remove some PCI dependencies from the core EDAC module. Made the code 'struct
device' centric instead of 'struct pci_dev' Most of the code changes here are
from a patch by Dave Jiang. It may be best to eventually move the
PCI-specific code into a separate source file.
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address the issue of EDAC/BIOS coexistence for the e752x chip-sets.
We have found a problem where the BIOS will start the system with the error
registers (dev0:fun1) hidden and assuming it has exclusive access to them.
The edac driver violates this assumption.
The workaround this patch offers is to honor the hidden-ness as an
indication that it is not safe to use those registers.
Signed-off-by: Mark Gross <mark.gross@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
EDAC_752X uses pci_scan_single_device(), which is only available if
CONFIG_HOTPLUG is enabled, so limit this driver with HOTPLUG.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Dave Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change all instances of EXPORT_SYMBOL() in the core EDAC module to
EXPORT_SYMBOL_GPL().
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Dave Jiang <djiang@mvista.com>: Fix EDAC e752x driver so it
outputs sysbus-specific error message when sysbus error detected.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cosmetic indentation/formatting cleanup for EDAC code. Make sure we
are using tabs rather than spaces to indent, etc.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix EDAC code so EXPORT_SYMBOL comes after the function that is being
exported. This is to maintain consistency with the rest of the kernel.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Add x86 dependency in drivers/edac/Kconfig for all current
platform-specific modules.
- Add PCI dependency to Radisys 82600 driver
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fix code so we always hold mem_ctls_mutex while we are stepping
through the list of mem_ctl_info structures. Otherwise bad things
may happen if one task is stepping through the list while another
task is modifying it. We may eventually want to use reference
counting to manage the mem_ctl_info structures. In the meantime we
may as well fix this bug.
- Don't disable interrupts while we are walking the list of
mem_ctl_info structures in check_mc_devices(). This is unnecessary.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- After we unregister a kobject, wait for our kobject release method
to call complete(). This causes us to wait until the kobject
reference count reaches 0. Otherwise, a task accessing the EDAC
sysfs interface can hold the reference count above 0 until after the
EDAC module has been unloaded. When the reference count finally
drops to 0, this will result in an attempt to call our release
method inside the EDAC module after the module has already been
unloaded.
This isn't the best fix, since a process can get stuck sleeping forever
uninterruptibly if the user does the following:
rmmod my_module < /sys/my_sysfs/file
I'll go back and implement a better fix later. However this should
be ok for now.
- Call edac_remove_sysfs_mci_device() from edac_mc_del_mc() rather
than from edac_mc_free(). Since edac_mc_add_mc() calls
edac_create_sysfs_mci_device(), edac_mc_del_mc() should call
edac_remove_sysfs_mci_device().
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Remove calls to kobject_init(). These are unnecessary because
kobject_register() calls kobject_init().
- Remove extra calls to kobject_put(). When we call
kobject_unregister(), this releases our reference to the kobject.
The extra calls to kobject_put() may cause the reference count to
drop to 0 while a kobject is still in use.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is part 2 of a 2-part patch set.
Fix edac_mc_add_mc() so it cleans up properly if call to
edac_create_sysfs_mci_device() fails.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is part 1 of a 2-part patch set. The code changes are split into
two parts to make the patches more readable.
Move complete_mc_list_del() and del_mc_from_global_list() so we can
call del_mc_from_global_list() from edac_mc_add_mc() without forward
declarations. Perhaps using forward declarations would be better?
I'm doing things this way because the rest of the code is missing
them.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix xxx_probe1() functions so they call xxx_get_error_info() functions
to clear initial errors. This is simpler and cleaner than duplicating
the low-level code for accessing PCI config space.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix minor logic bug in e7xxx_remove_one().
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fix i82875p_probe1() so it calls pci_get_device() instead of
pci_find_device().
- Fix i82875p_probe1() so it cleans up properly on failure.
- Fix i82875p_init() so it cleans up properly on failure.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fix i82860_init() so it cleans up properly on failure.
- Fix i82860_exit() so it cleans up properly.
- Fix typo in comment (i.e. www.redhat.com.com).
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Add ctl_dev field to "struct e752x_dev_info". Then we can eliminate
ugly switch statement from e752x_probe1().
- Remove code from e752x_probe1() that clears initial PCI bus parity
errors. The core EDAC module already does this.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate unnecessary calls to pci_dev_get() and pci_dev_put() from
amd76x driver.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Perform the following name substitutions on all source files:
sed 's/BS_MOD_STR/EDAC_MOD_STR/g'
sed 's/bs_thread_info/edac_thread_info/g'
sed 's/bs_thread/edac_thread/g'
sed 's/bs_xstr/edac_xstr/g'
sed 's/bs_str/edac_str/g'
The names that start with BS_ or bs_ are artifacts of when the code
was called "bluesmoke".
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This implements the following idea:
On Monday 30 January 2006 19:22, Eric W. Biederman wrote:
> One piece missing from this conversation is the issue that we need errors
> in a uniform format. That is why edac_mc has helper functions.
>
> However there will always be errors that don't fit any particular model.
> Could we add a edac_printk(dev, ); That is similar to dev_printk but
> prints out an EDAC header and the device on which the error was found?
> Letting the rest of the string be user specified.
>
> For actual control that interface may be to blunt, but at least for people
> looking in the logs it allows all of the errors to be detected and
> harvested.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch was originally posted by Christoph Hellwig (see
http://lkml.org/lkml/2006/2/14/331):
"Christoph Hellwig" <hch@lst.de> wrote:
> Use the kthread_ API instead of opencoding lots of hairy code for kernel
> thread creation and teardown, including tasklist_lock abuse.
>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: <dave_peterson@pobox.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Disable the EDAC sysfs code. The sysfs interface that EDAC presents to
user space needs more thought, and is likely to change substantially.
Therefore disable it for now so users don't start depending on it in its
current form.
- Disable the default behavior of calling panic() when an uncorrectible
error is detected (since for now, there is no sysfs interface that allows
the user to configure this behavior).
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Disable (via ugly #if 0's) the 3 sysfs files that I think by now we all
agree are very much wrong. These files shouldn't become part of the ABI by
the 2.6.16 release, so I rather have this minimal patch merged to disable
them for now, the real fix can then come during the 2.6.17 devel window.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
EDAC is still causing a few problems and the code is relatively green. Mark
it as experimental until thing settle down.
Also, provide some documentation pointers in Kconfig help.
Signed-off-by: Tim Small <tim@buttersideup.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By including version.h edac_mc was rebuilding on every incremental build.
Which defeats the point of incremental builds.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The AMD76x chipsets aren't used in 64-bit, so don't offer the driver to the
user.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a subset of the bluesmoke project core code, stripped of the NMI work
which isn't ready to merge and some of the "interesting" proc functionality
that needs reworking or just has no place in kernel. It requires no core
kernel changes except the added scrub functions already posted.
The goal is to merge further functionality only after the core code is
accepted and proven in the base kernel, and only at the point the upstream
extras are really ready to merge.
From: doug thompson <norsk5@xmission.com>
This converts EDAC to sysfs and is the final chunk neccessary before EDAC
has a stable user space API and can be considered for submission into the
base kernel.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>