Merge branch 'tracing/core' into tracing/hw-breakpoints

Conflicts:
	arch/Kconfig
	kernel/trace/trace.h

Merge reason: resolve the conflicts, plus adopt to the new
              ring-buffer APIs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
Ingo Molnar 2009-09-07 08:19:51 +02:00
commit a1922ed661
4885 changed files with 363495 additions and 305501 deletions

1
.gitignore vendored
View file

@ -27,6 +27,7 @@
*.gz
*.lzma
*.patch
*.gcno
#
# Top-level generic files

View file

@ -1856,7 +1856,7 @@ E: rfkoenig@immd4.informatik.uni-erlangen.de
D: The Linux Support Team Erlangen
N: Andreas Koensgen
E: ajk@iehk.rwth-aachen.de
E: ajk@comnets.uni-bremen.de
D: 6pack driver for AX.25
N: Harald Koerfgen
@ -2006,6 +2006,9 @@ E: paul@laufernet.com
D: Soundblaster driver fixes, ISAPnP quirk
S: California, USA
N: Jonathan Layes
D: ARPD support
N: Tom Lees
E: tom@lpsg.demon.co.uk
W: http://www.lpsg.demon.co.uk/
@ -3802,6 +3805,9 @@ S: van Bronckhorststraat 12
S: 2612 XV Delft
S: The Netherlands
N: Thomas Woller
D: CS461x Cirrus Logic sound driver
N: David Woodhouse
E: dwmw2@infradead.org
D: JFFS2 file system, Memory Technology Device subsystem,

View file

@ -94,28 +94,37 @@ What: /sys/block/<disk>/queue/physical_block_size
Date: May 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
This is the smallest unit the storage device can write
without resorting to read-modify-write operation. It is
usually the same as the logical block size but may be
bigger. One example is SATA drives with 4KB sectors
that expose a 512-byte logical block size to the
operating system.
This is the smallest unit a physical storage device can
write atomically. It is usually the same as the logical
block size but may be bigger. One example is SATA
drives with 4KB sectors that expose a 512-byte logical
block size to the operating system. For stacked block
devices the physical_block_size variable contains the
maximum physical_block_size of the component devices.
What: /sys/block/<disk>/queue/minimum_io_size
Date: April 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Storage devices may report a preferred minimum I/O size,
which is the smallest request the device can perform
without incurring a read-modify-write penalty. For disk
drives this is often the physical block size. For RAID
arrays it is often the stripe chunk size.
Storage devices may report a granularity or preferred
minimum I/O size which is the smallest request the
device can perform without incurring a performance
penalty. For disk drives this is often the physical
block size. For RAID arrays it is often the stripe
chunk size. A properly aligned multiple of
minimum_io_size is the preferred request size for
workloads where a high number of I/O operations is
desired.
What: /sys/block/<disk>/queue/optimal_io_size
Date: April 2009
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Storage devices may report an optimal I/O size, which is
the device's preferred unit of receiving I/O. This is
rarely reported for disk drives. For RAID devices it is
usually the stripe width or the internal block size.
the device's preferred unit for sustained I/O. This is
rarely reported for disk drives. For RAID arrays it is
usually the stripe width or the internal track size. A
properly aligned multiple of optimal_io_size is the
preferred request size for workloads where sustained
throughput is desired. If no optimal I/O size is
reported this file contains 0.

View file

@ -122,3 +122,10 @@ Description:
This symbolic link appears when a device is a Virtual Function.
The symbolic link points to the PCI device sysfs entry of the
Physical Function this device associates with.
What: /sys/bus/pci/slots/.../module
Date: June 2009
Contact: linux-pci@vger.kernel.org
Description:
This symbolic link points to the PCI hotplug controller driver
module that manages the hotplug slot.

View file

@ -0,0 +1,125 @@
What: /sys/class/mtd/
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
The mtd/ class subdirectory belongs to the MTD subsystem
(MTD core).
What: /sys/class/mtd/mtdX/
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
The /sys/class/mtd/mtd{0,1,2,3,...} directories correspond
to each /dev/mtdX character device. These may represent
physical/simulated flash devices, partitions on a flash
device, or concatenated flash devices. They exist regardless
of whether CONFIG_MTD_CHAR is actually enabled.
What: /sys/class/mtd/mtdXro/
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
These directories provide the corresponding read-only device
nodes for /sys/class/mtd/mtdX/ . They are only created
(for the benefit of udev) if CONFIG_MTD_CHAR is enabled.
What: /sys/class/mtd/mtdX/dev
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
Major and minor numbers of the character device corresponding
to this MTD device (in <major>:<minor> format). This is the
read-write device so <minor> will be even.
What: /sys/class/mtd/mtdXro/dev
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
Major and minor numbers of the character device corresponding
to the read-only variant of thie MTD device (in
<major>:<minor> format). In this case <minor> will be odd.
What: /sys/class/mtd/mtdX/erasesize
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
"Major" erase size for the device. If numeraseregions is
zero, this is the eraseblock size for the entire device.
Otherwise, the MEMGETREGIONCOUNT/MEMGETREGIONINFO ioctls
can be used to determine the actual eraseblock layout.
What: /sys/class/mtd/mtdX/flags
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
A hexadecimal value representing the device flags, ORed
together:
0x0400: MTD_WRITEABLE - device is writable
0x0800: MTD_BIT_WRITEABLE - single bits can be flipped
0x1000: MTD_NO_ERASE - no erase necessary
0x2000: MTD_POWERUP_LOCK - always locked after reset
What: /sys/class/mtd/mtdX/name
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
A human-readable ASCII name for the device or partition.
This will match the name in /proc/mtd .
What: /sys/class/mtd/mtdX/numeraseregions
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
For devices that have variable eraseblock sizes, this
provides the total number of erase regions. Otherwise,
it will read back as zero.
What: /sys/class/mtd/mtdX/oobsize
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
Number of OOB bytes per page.
What: /sys/class/mtd/mtdX/size
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
Total size of the device/partition, in bytes.
What: /sys/class/mtd/mtdX/type
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
One of the following ASCII strings, representing the device
type:
absent, ram, rom, nor, nand, dataflash, ubi, unknown
What: /sys/class/mtd/mtdX/writesize
Date: April 2009
KernelVersion: 2.6.29
Contact: linux-mtd@lists.infradead.org
Description:
Minimal writable flash unit size. This will always be
a positive integer.
In the case of NOR flash it is 1 (even though individual
bits can be cleared).
In the case of NAND flash it is one NAND page (or a
half page, or a quarter page).
In the case of ECC NOR, it is the ECC block size.

View file

@ -79,3 +79,13 @@ Description:
This file is read-only and shows the number of
kilobytes of data that have been written to this
filesystem since it was mounted.
What: /sys/fs/ext4/<disk>/inode_goal
Date: June 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Tuning parameter which (if non-zero) controls the goal
inode used by the inode allocator in p0reference to
all other allocation hueristics. This is intended for
debugging use only, and should be 0 on production
systems.

View file

@ -0,0 +1,73 @@
What: /sys/class/pps/
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ directory will contain files and
directories that will provide a unified interface to
the PPS sources.
What: /sys/class/pps/ppsX/
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/ directory is related to X-th
PPS source into the system. Each directory will
contain files to manage and control its PPS source.
What: /sys/class/pps/ppsX/assert
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/assert file reports the assert events
and the assert sequence number of the X-th source in the form:
<secs>.<nsec>#<sequence>
If the source has no assert events the content of this file
is empty.
What: /sys/class/pps/ppsX/clear
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/clear file reports the clear events
and the clear sequence number of the X-th source in the form:
<secs>.<nsec>#<sequence>
If the source has no clear events the content of this file
is empty.
What: /sys/class/pps/ppsX/mode
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/mode file reports the functioning
mode of the X-th source in hexadecimal encoding.
Please, refer to linux/include/linux/pps.h for further
info.
What: /sys/class/pps/ppsX/echo
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/echo file reports if the X-th does
or does not support an "echo" function.
What: /sys/class/pps/ppsX/name
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/name file reports the name of the
X-th source.
What: /sys/class/pps/ppsX/path
Date: February 2008
Contact: Rodolfo Giometti <giometti@linux.it>
Description:
The /sys/class/pps/ppsX/path file reports the path name of
the device connected with the X-th source.
If the source is not connected with any device the content
of this file is empty.

View file

@ -72,6 +72,13 @@ assembling the 16-bit boot code, removing the need for as86 to compile
your kernel. This change does, however, mean that you need a recent
release of binutils.
Perl
----
You will need perl 5 and the following modules: Getopt::Long, Getopt::Std,
File::Basename, and File::Find to build the kernel.
System utilities
================

View file

@ -449,8 +449,8 @@ printk(KERN_INFO "i = %u\n", i);
</para>
<programlisting>
__u32 ipaddress;
printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
__be32 ipaddress;
printk(KERN_INFO "my ip: %pI4\n", &amp;ipaddress);
</programlisting>
<para>

View file

@ -184,8 +184,6 @@ usage should require reading the full document.
!Finclude/net/mac80211.h ieee80211_ctstoself_get
!Finclude/net/mac80211.h ieee80211_ctstoself_duration
!Finclude/net/mac80211.h ieee80211_generic_frame_duration
!Finclude/net/mac80211.h ieee80211_get_hdrlen_from_skb
!Finclude/net/mac80211.h ieee80211_hdrlen
!Finclude/net/mac80211.h ieee80211_wake_queue
!Finclude/net/mac80211.h ieee80211_stop_queue
!Finclude/net/mac80211.h ieee80211_wake_queues

View file

@ -61,6 +61,10 @@ be initiated although firmwares have no _OSC support. To enable the
walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
when booting kernel. Note that forceload=n by default.
nosourceid, another parameter of type bool, can be used when broken
hardware (mostly chipsets) has root ports that cannot obtain the reporting
source ID. nosourceid=n by default.
2.3 AER error output
When a PCI-E AER error is captured, an error message will be outputed to
console. If it's a correctable error, it is outputed as a warning.
@ -246,3 +250,24 @@ with the PCI Express AER Root driver?
A: It could call the helper functions to enable AER in devices and
cleanup uncorrectable status register. Pls. refer to section 3.3.
4. Software error injection
Debugging PCIE AER error recovery code is quite difficult because it
is hard to trigger real hardware errors. Software based error
injection can be used to fake various kinds of PCIE errors.
First you should enable PCIE AER software error injection in kernel
configuration, that is, following item should be in your .config.
CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
After reboot with new kernel or insert the module, a device file named
/dev/aer_inject should be created.
Then, you need a user space tool named aer-inject, which can be gotten
from:
http://www.kernel.org/pub/linux/utils/pci/aer-inject/
More information about aer-inject can be found in the document comes
with its source code.

View file

@ -83,11 +83,12 @@ not detect it missed following items in original chain.
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
atomic_inc(&obj->refcnt);
/*
* we need to make sure obj->key is updated before obj->next
* or obj->refcnt
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
@ -159,6 +160,10 @@ out:
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* changes to obj->key must be visible before refcnt one
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
/*
* insert obj in RCU way (readers might be traversing chain)

View file

@ -54,7 +54,7 @@ kernel patches.
CONFIG_PREEMPT.
14: If the patch affects IO/Disk, etc: has been tested with and without
CONFIG_LBD.
CONFIG_LBDAF.
15: All codepaths have been exercised with all lockdep features enabled.

View file

@ -21,6 +21,8 @@ ffff8000 ffffffff copy_user_page / clear_user_page use.
For SA11xx and Xscale, this is used to
setup a minicache mapping.
ffff4000 ffffffff cache aliasing on ARMv6 and later CPUs.
ffff1000 ffff7fff Reserved.
Platforms must not use this address range.

View file

@ -50,7 +50,7 @@ encouraged them to allow separation of the data and integrity metadata
scatter-gather lists.
The controller will interleave the buffers on write and split them on
read. This means that the Linux can DMA the data buffers to and from
read. This means that Linux can DMA the data buffers to and from
host memory without changes to the page cache.
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
@ -66,7 +66,7 @@ software RAID5).
The IP checksum is weaker than the CRC in terms of detecting bit
errors. However, the strength is really in the separation of the data
buffers and the integrity metadata. These two distinct buffers much
buffers and the integrity metadata. These two distinct buffers must
match up for an I/O to complete.
The separation of the data and integrity metadata buffers as well as

View file

@ -777,6 +777,18 @@ in cpuset directories:
# /bin/echo 1-4 > cpus -> set cpus list to cpus 1,2,3,4
# /bin/echo 1,2,3,4 > cpus -> set cpus list to cpus 1,2,3,4
To add a CPU to a cpuset, write the new list of CPUs including the
CPU to be added. To add 6 to the above cpuset:
# /bin/echo 1-4,6 > cpus -> set cpus list to cpus 1,2,3,4,6
Similarly to remove a CPU from a cpuset, write the new list of CPUs
without the CPU to be removed.
To remove all the CPUs:
# /bin/echo "" > cpus -> clear cpus list
2.3 Setting flags
-----------------

View file

@ -152,14 +152,19 @@ When swap is accounted, following files are added.
usage of mem+swap is limited by memsw.limit_in_bytes.
Note: why 'mem+swap' rather than swap.
* why 'mem+swap' rather than swap.
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
mem+swap.
mem+swap. In other words, when we want to limit the usage of swap without
affecting global LRU, mem+swap limit is better than just limiting swap from
OS point of view.
In other words, when we want to limit the usage of swap without affecting
global LRU, mem+swap limit is better than just limiting swap from OS point
of view.
* What happens when a cgroup hits memory.memsw.limit_in_bytes
When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
in this cgroup. Then, swap-out will not be done by cgroup routine and file
caches are dropped. But as mentioned above, global LRU can do swapout memory
from it for sanity of the system's memory management state. You can't forbid
it by cgroup.
2.5 Reclaim
@ -204,6 +209,7 @@ We can alter the memory limit:
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes.
NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
# cat /cgroups/0/memory.limit_in_bytes
4194304

View file

@ -1,7 +1,7 @@
/*
* cn_test.c
*
* 2004-2005 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru>
* 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
* All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
@ -41,6 +41,12 @@ void cn_test_callback(void *data)
msg->seq, msg->ack, msg->len, (char *)msg->data);
}
/*
* Do not remove this function even if no one is using it as
* this is an example of how to get notifications about new
* connector user registration
*/
#if 0
static int cn_test_want_notify(void)
{
struct cn_ctl_msg *ctl;
@ -117,6 +123,7 @@ nlmsg_failure:
kfree_skb(skb);
return -EINVAL;
}
#endif
static u32 cn_test_timer_counter;
static void cn_test_timer_func(unsigned long __data)
@ -187,5 +194,5 @@ module_init(cn_test_init);
module_exit(cn_test_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Evgeniy Polyakov <johnpol@2ka.mipt.ru>");
MODULE_AUTHOR("Evgeniy Polyakov <zbr@ioremap.net>");
MODULE_DESCRIPTION("Connector's test module");

View file

@ -1,7 +1,7 @@
/*
* ucon.c
*
* Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru>
* Copyright (c) 2004+ Evgeniy Polyakov <zbr@ioremap.net>
*
*
* This program is free software; you can redistribute it and/or modify

View file

@ -155,7 +155,7 @@ actual frequency must be determined using the following rules:
- if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal
target_freq. ("H for highest, but no higher than")
Here again the frequency table helper might assist you - see section 3
Here again the frequency table helper might assist you - see section 2
for details.

View file

@ -119,10 +119,6 @@ want the kernel to look at the CPU usage and to make decisions on
what to do about the frequency. Typically this is set to values of
around '10000' or more. It's default value is (cmp. with users-guide.txt):
transition_latency * 1000
The lowest value you can set is:
transition_latency * 100 or it may get restricted to a value where it
makes not sense for the kernel anymore to poll that often which depends
on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000).
Be aware that transition latency is in ns and sampling_rate is in us, so you
get the same sysfs value by default.
Sampling rate should always get adjusted considering the transition latency
@ -131,14 +127,20 @@ in the bash (as said, 1000 is default), do:
echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \
>ondemand/sampling_rate
show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT.
You can use wider ranges now and the general
cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be
used to obtain exactly the same info:
show_sampling_rate_min = transtition_latency * 500 / 1000
show_sampling_rate_max = transtition_latency * 500000 / 1000
(divided by 1000 is to illustrate that sampling rate is in us and
transition latency is exported ns).
show_sampling_rate_min:
The sampling rate is limited by the HW transition latency:
transition_latency * 100
Or by kernel restrictions:
If CONFIG_NO_HZ is set, the limit is 10ms fixed.
If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us (20ms)
HZ=250: min=80000us (80ms)
HZ=100: min=200000us (200ms)
The highest value of kernel and HW latency restrictions is shown and
used as the minimum sampling rate.
show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT.
up_threshold: defines what the average CPU usage between the samplings
of 'sampling_rate' needs to be for the kernel to make a decision on

View file

@ -31,7 +31,6 @@ Contents:
3. How to change the CPU cpufreq policy and/or speed
3.1 Preferred interface: sysfs
3.2 Deprecated interfaces

View file

@ -0,0 +1,54 @@
Device-Mapper Logging
=====================
The device-mapper logging code is used by some of the device-mapper
RAID targets to track regions of the disk that are not consistent.
A region (or portion of the address space) of the disk may be
inconsistent because a RAID stripe is currently being operated on or
a machine died while the region was being altered. In the case of
mirrors, a region would be considered dirty/inconsistent while you
are writing to it because the writes need to be replicated for all
the legs of the mirror and may not reach the legs at the same time.
Once all writes are complete, the region is considered clean again.
There is a generic logging interface that the device-mapper RAID
implementations use to perform logging operations (see
dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different
logging implementations are available and provide different
capabilities. The list includes:
Type Files
==== =====
disk drivers/md/dm-log.c
core drivers/md/dm-log.c
userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
The "disk" log type
-------------------
This log implementation commits the log state to disk. This way, the
logging state survives reboots/crashes.
The "core" log type
-------------------
This log implementation keeps the log state in memory. The log state
will not survive a reboot or crash, but there may be a small boost in
performance. This method can also be used if no storage device is
available for storing log state.
The "userspace" log type
------------------------
This log type simply provides a way to export the log API to userspace,
so log implementations can be done there. This is done by forwarding most
logging requests to userspace, where a daemon receives and processes the
request.
The structure used for communication between kernel and userspace are
located in include/linux/dm-log-userspace.h. Due to the frequency,
diversity, and 2-way communication nature of the exchanges between
kernel and userspace, 'connector' is used as the interface for
communication.
There are currently two userspace log implementations that leverage this
framework - "clustered_disk" and "clustered_core". These implementations
provide a cluster-coherent log for shared-storage. Device-mapper mirroring
can be used in a shared-storage environment when the cluster log implementations
are employed.

View file

@ -0,0 +1,39 @@
dm-queue-length
===============
dm-queue-length is a path selector module for device-mapper targets,
which selects a path with the least number of in-flight I/Os.
The path selector name is 'queue-length'.
Table parameters for each path: [<repeat_count>]
<repeat_count>: The number of I/Os to dispatch using the selected
path before switching to the next path.
If not given, internal default is used. To check
the default value, see the activated table.
Status for each path: <status> <fail-count> <in-flight>
<status>: 'A' if the path is active, 'F' if the path is failed.
<fail-count>: The number of path failures.
<in-flight>: The number of in-flight I/Os on the path.
Algorithm
=========
dm-queue-length increments/decrements 'in-flight' when an I/O is
dispatched/completed respectively.
dm-queue-length selects a path with the minimum 'in-flight'.
Examples
========
In case that 2 paths (sda and sdb) are used with repeat_count == 128.
# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0

View file

@ -0,0 +1,91 @@
dm-service-time
===============
dm-service-time is a path selector module for device-mapper targets,
which selects a path with the shortest estimated service time for
the incoming I/O.
The service time for each path is estimated by dividing the total size
of in-flight I/Os on a path with the performance value of the path.
The performance value is a relative throughput value among all paths
in a path-group, and it can be specified as a table argument.
The path selector name is 'service-time'.
Table parameters for each path: [<repeat_count> [<relative_throughput>]]
<repeat_count>: The number of I/Os to dispatch using the selected
path before switching to the next path.
If not given, internal default is used. To check
the default value, see the activated table.
<relative_throughput>: The relative throughput value of the path
among all paths in the path-group.
The valid range is 0-100.
If not given, minimum value '1' is used.
If '0' is given, the path isn't selected while
other paths having a positive value are available.
Status for each path: <status> <fail-count> <in-flight-size> \
<relative_throughput>
<status>: 'A' if the path is active, 'F' if the path is failed.
<fail-count>: The number of path failures.
<in-flight-size>: The size of in-flight I/Os on the path.
<relative_throughput>: The relative throughput value of the path
among all paths in the path-group.
Algorithm
=========
dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
dispatched and substracts when completed.
Basically, dm-service-time selects a path having minimum service time
which is calculated by:
('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
However, some optimizations below are used to reduce the calculation
as much as possible.
1. If the paths have the same 'relative_throughput', skip
the division and just compare the 'in-flight-size'.
2. If the paths have the same 'in-flight-size', skip the division
and just compare the 'relative_throughput'.
3. If some paths have non-zero 'relative_throughput' and others
have zero 'relative_throughput', ignore those paths with zero
'relative_throughput'.
If such optimizations can't be applied, calculate service time, and
compare service time.
If calculated service time is equal, the path having maximum
'relative_throughput' may be better. So compare 'relative_throughput'
then.
Examples
========
In case that 2 paths (sda and sdb) are used with repeat_count == 128
and sda has an average throughput 1GB/s and sdb has 4GB/s,
'relative_throughput' value may be '1' for sda and '4' for sdb.
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
Or '2' for sda and '8' for sdb would be also true.
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
dmsetup create test
#
# dmsetup table
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
#
# dmsetup status
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8

View file

@ -207,8 +207,8 @@ Attributes
~~~~~~~~~~
struct driver_attribute {
struct attribute attr;
ssize_t (*show)(struct device_driver *, char * buf, size_t count, loff_t off);
ssize_t (*store)(struct device_driver *, const char * buf, size_t count, loff_t off);
ssize_t (*show)(struct device_driver *driver, char *buf);
ssize_t (*store)(struct device_driver *, const char * buf, size_t count);
};
Device drivers can export attributes via their sysfs directories.

View file

@ -25,7 +25,7 @@ use IO::Handle;
"tda10046lifeview", "av7110", "dec2000t", "dec2540t",
"dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
"or51211", "or51132_qam", "or51132_vsb", "bluebird",
"opera1", "cx231xx", "cx18", "cx23885", "pvrusb2" );
"opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718" );
# Check args
syntax() if (scalar(@ARGV) != 1);
@ -381,6 +381,57 @@ sub cx18 {
$allfiles;
}
sub mpc718 {
my $archive = 'Yuan MPC718 TV Tuner Card 2.13.10.1016.zip';
my $url = "ftp://ftp.work.acer-euro.com/desktop/aspire_idea510/vista/Drivers/$archive";
my $fwfile = "dvb-cx18-mpc718-mt352.fw";
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
checkstandard();
wgetfile($archive, $url);
unzip($archive, $tmpdir);
my $sourcefile = "$tmpdir/Yuan MPC718 TV Tuner Card 2.13.10.1016/mpc718_32bit/yuanrap.sys";
my $found = 0;
open IN, '<', $sourcefile or die "Couldn't open $sourcefile to extract $fwfile data\n";
binmode IN;
open OUT, '>', $fwfile;
binmode OUT;
{
# Block scope because we change the line terminator variable $/
my $prevlen = 0;
my $currlen;
# Buried in the data segment are 3 runs of almost identical
# register-value pairs that end in 0x5d 0x01 which is a "TUNER GO"
# command for the MT352.
# Pull out the middle run (because it's easy) of register-value
# pairs to make the "firmware" file.
local $/ = "\x5d\x01"; # MT352 "TUNER GO"
while (<IN>) {
$currlen = length($_);
if ($prevlen == $currlen && $currlen <= 64) {
chop; chop; # Get rid of "TUNER GO"
s/^\0\0//; # get rid of leading 00 00 if it's there
printf OUT "$_";
$found = 1;
last;
}
$prevlen = $currlen;
}
}
close OUT;
close IN;
if (!$found) {
unlink $fwfile;
die "Couldn't find valid register-value sequence in $sourcefile for $fwfile\n";
}
$fwfile;
}
sub cx23885 {
my $url = "http://linuxtv.org/downloads/firmware/";

View file

@ -6,6 +6,20 @@ be removed from this file.
---------------------------
What: IRQF_SAMPLE_RANDOM
Check: IRQF_SAMPLE_RANDOM
When: July 2009
Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy
sources in the kernel's current entropy model. To resolve this, every
input point to the kernel's entropy pool needs to better document the
type of entropy source it actually is. This will be replaced with
additional add_*_randomness functions in drivers/char/random.c
Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
---------------------------
What: The ieee80211_regdom module parameter
When: March 2010 / desktop catchup
@ -354,16 +368,6 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl>
---------------------------
What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client(),
i2c_adapter->client_register(), i2c_adapter->client_unregister
When: 2.6.30
Check: i2c_attach_client i2c_detach_client
Why: Deprecated by the new (standard) device driver binding model. Use
i2c_driver->probe() and ->remove() instead.
Who: Jean Delvare <khali@linux-fr.org>
---------------------------
What: fscher and fscpos drivers
When: June 2009
Why: Deprecated by the new fschmd driver.
@ -454,3 +458,13 @@ Why: Remove the old legacy 32bit machine check code. This has been
but the old version has been kept around for easier testing. Note this
doesn't impact the old P5 and WinChip machine check handlers.
Who: Andi Kleen <andi@firstfloor.org>
----------------------------
What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
exported interface anymore.
When: 2.6.33
Why: cpu_policy_rwsem has a new cleaner definition making it local to
cpufreq core and contained inside cpufreq.c. Other dependent
drivers should not use it in order to safely avoid lockdep issues.
Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

View file

@ -66,6 +66,10 @@ mandatory-locking.txt
- info on the Linux implementation of Sys V mandatory file locking.
ncpfs.txt
- info on Novell Netware(tm) filesystem using NCP protocol.
nfs41-server.txt
- info on the Linux server implementation of NFSv4 minor version 1.
nfs-rdma.txt
- how to install and setup the Linux NFS/RDMA client and server software.
nfsroot.txt
- short guide on setting up a diskless box with NFS root filesystem.
nilfs2.txt

View file

@ -123,6 +123,9 @@ available from the same CVS repository.
There are user and developer mailing lists available through the v9fs project
on sourceforge (http://sourceforge.net/projects/v9fs).
A stand-alone version of the module (which should build for any 2.6 kernel)
is available via (http://github.com/ericvh/9p-sac/tree/master)
News and other information is maintained on SWiK (http://swik.net/v9fs).
Bug reports may be issued through the kernel.org bugzilla

View file

@ -109,27 +109,28 @@ prototypes:
locking rules:
All may block.
BKL s_lock s_umount
alloc_inode: no no no
destroy_inode: no
dirty_inode: no (must not sleep)
write_inode: no
drop_inode: no !!!inode_lock!!!
delete_inode: no
put_super: yes yes no
write_super: no yes read
sync_fs: no no read
freeze_fs: ?
unfreeze_fs: ?
statfs: no no no
remount_fs: yes yes maybe (see below)
clear_inode: no
umount_begin: yes no no
show_options: no (vfsmount->sem)
quota_read: no no no (see below)
quota_write: no no no (see below)
None have BKL
s_umount
alloc_inode:
destroy_inode:
dirty_inode: (must not sleep)
write_inode:
drop_inode: !!!inode_lock!!!
delete_inode:
put_super: write
write_super: read
sync_fs: read
freeze_fs: read
unfreeze_fs: read
statfs: no
remount_fs: maybe (see below)
clear_inode:
umount_begin: no
show_options: no (namespace_sem)
quota_read: no (see below)
quota_write: no (see below)
->remount_fs() will have the s_umount lock if it's already mounted.
->remount_fs() will have the s_umount exclusive lock if it's already mounted.
When called from get_sb_single, it does NOT have the s_umount lock.
->quota_read() and ->quota_write() functions are both guaranteed to
be the only ones operating on the quota file by the quota code (via
@ -187,7 +188,7 @@ readpages: no
write_begin: no locks the page yes
write_end: no yes, unlocks yes
perform_write: no n/a yes
bmap: yes
bmap: no
invalidatepage: no yes
releasepage: no yes
direct_IO: no

View file

@ -23,16 +23,14 @@ it does support include:
(*) Security (currently only AFS kaserver and KerberosIV tickets).
(*) File reading.
(*) File reading and writing.
(*) Automounting.
(*) Local caching (via fscache).
It does not yet support the following AFS features:
(*) Write support.
(*) Local caching.
(*) pioctl() system call.
@ -56,7 +54,7 @@ They permit the debugging messages to be turned on dynamically by manipulating
the masks in the following files:
/sys/module/af_rxrpc/parameters/debug
/sys/module/afs/parameters/debug
/sys/module/kafs/parameters/debug
=====
@ -66,9 +64,9 @@ USAGE
When inserting the driver modules the root cell must be specified along with a
list of volume location server IP addresses:
insmod af_rxrpc.o
insmod rxkad.o
insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
modprobe af_rxrpc
modprobe rxkad
modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
The first module is the AF_RXRPC network protocol driver. This provides the
RxRPC remote operation protocol and may also be accessed from userspace. See:
@ -81,7 +79,7 @@ is the actual filesystem driver for the AFS filesystem.
Once the module has been loaded, more modules can be added by the following
procedure:
echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
Where the parameters to the "add" command are the name of a cell and a list of
volume location servers within that cell, with the latter separated by colons.
@ -101,7 +99,7 @@ The name of the volume can be suffixes with ".backup" or ".readonly" to
specify connection to only volumes of those types.
The name of the cell is optional, and if not given during a mount, then the
named volume will be looked up in the cell specified during insmod.
named volume will be looked up in the cell specified during modprobe.
Additional cells can be added through /proc (see later section).
@ -163,14 +161,14 @@ THE CELL DATABASE
The filesystem maintains an internal database of all the cells it knows and the
IP addresses of the volume location servers for those cells. The cell to which
the system belongs is added to the database when insmod is performed by the
the system belongs is added to the database when modprobe is performed by the
"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
the kernel command line.
Further cells can be added by commands similar to the following:
echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
No other cell database operations are available at this time.
@ -233,7 +231,7 @@ insmod /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.91
mount -t afs \%root.afs. /afs
mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/
echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 > /proc/fs/afs/cells
mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/
mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive
mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib

View file

@ -322,7 +322,7 @@ an upper limit on the block size imposed by the page size of the kernel,
so 8kB blocks are only allowed on Alpha systems (and other architectures
which support larger pages).
There is an upper limit of 32768 subdirectories in a single directory.
There is an upper limit of 32000 subdirectories in a single directory.
There is a "soft" upper limit of about 10-15k files in a single directory
with the current linear linked-list directory implementation. This limit

View file

@ -235,6 +235,10 @@ minixdf Make 'df' act like Minix.
debug Extra debugging information is sent to syslog.
abort Simulate the effects of calling ext4_abort() for
debugging purposes. This is normally used while
remounting a filesystem which is already mounted.
errors=remount-ro Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.

View file

@ -23,8 +23,13 @@ Mount options unique to the isofs filesystem.
map=off Do not map non-Rock Ridge filenames to lower case
map=normal Map non-Rock Ridge filenames to lower case
map=acorn As map=normal but also apply Acorn extensions if present
mode=xxx Sets the permissions on files to xxx
dmode=xxx Sets the permissions on directories to xxx
mode=xxx Sets the permissions on files to xxx unless Rock Ridge
extensions set the permissions otherwise
dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
extensions set the permissions otherwise
overriderockperm Set permissions on files and directories according to
'mode' and 'dmode' even though Rock Ridge extensions are
present.
nojoliet Ignore Joliet extensions if they are present.
norock Ignore Rock Ridge extensions if they are present.
hide Completely strip hidden files from the file system.

View file

@ -5,11 +5,12 @@
Bodo Bauer <bb@ricochet.net>
2.4.x update Jorge Nerin <comandante@zaralinux.com> November 14 2000
move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009
move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009
------------------------------------------------------------------------------
Version 1.3 Kernel version 2.2.12
Kernel version 2.4.0-test11-pre4
------------------------------------------------------------------------------
fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009
Table of Contents
-----------------
@ -116,7 +117,7 @@ The link self points to the process reading the file system. Each process
subdirectory has the entries listed in Table 1-1.
Table 1-1: Process specific entries in /proc
Table 1-1: Process specific entries in /proc
..............................................................................
File Content
clear_refs Clears page referenced bits shown in smaps output
@ -134,46 +135,103 @@ Table 1-1: Process specific entries in /proc
status Process status in human readable form
wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
stack Report full stack trace, enable via CONFIG_STACKTRACE
smaps Extension based on maps, the rss size for each mapped file
smaps a extension based on maps, showing the memory consumption of
each mapping
..............................................................................
For example, to get the status information of a process, all you have to do is
read the file /proc/PID/status:
>cat /proc/self/status
Name: cat
State: R (running)
Pid: 5452
PPid: 743
>cat /proc/self/status
Name: cat
State: R (running)
Tgid: 5452
Pid: 5452
PPid: 743
TracerPid: 0 (2.4)
Uid: 501 501 501 501
Gid: 100 100 100 100
Groups: 100 14 16
VmSize: 1112 kB
VmLck: 0 kB
VmRSS: 348 kB
VmData: 24 kB
VmStk: 12 kB
VmExe: 8 kB
VmLib: 1044 kB
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 00000000fffffeff
CapPrm: 0000000000000000
CapEff: 0000000000000000
Uid: 501 501 501 501
Gid: 100 100 100 100
FDSize: 256
Groups: 100 14 16
VmPeak: 5004 kB
VmSize: 5004 kB
VmLck: 0 kB
VmHWM: 476 kB
VmRSS: 476 kB
VmData: 156 kB
VmStk: 88 kB
VmExe: 68 kB
VmLib: 1412 kB
VmPTE: 20 kb
Threads: 1
SigQ: 0/28578
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 00000000fffffeff
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
information. The statm file contains more detailed information about the
process memory usage. Its seven fields are explained in Table 1-2. The stat
file contains details information about the process itself. Its fields are
explained in Table 1-3.
information. But you get a more detailed view of the process by reading the
file /proc/PID/status. It fields are described in table 1-2.
The statm file contains more detailed information about the process
memory usage. Its seven fields are explained in Table 1-3. The stat file
contains details information about the process itself. Its fields are
explained in Table 1-4.
Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
..............................................................................
Field Content
Name filename of the executable
State state (R is running, S is sleeping, D is sleeping
in an uninterruptible wait, Z is zombie,
T is traced or stopped)
Tgid thread group ID
Pid process id
PPid process id of the parent process
TracerPid PID of process tracing this process (0 if not)
Uid Real, effective, saved set, and file system UIDs
Gid Real, effective, saved set, and file system GIDs
FDSize number of file descriptor slots currently allocated
Groups supplementary group list
VmPeak peak virtual memory size
VmSize total program size
VmLck locked memory size
VmHWM peak resident set size ("high water mark")
VmRSS size of memory portions
VmData size of data, stack, and text segments
VmStk size of data, stack, and text segments
VmExe size of text segment
VmLib size of shared library code
VmPTE size of page table entries
Threads number of threads
SigQ number of signals queued/max. number for queue
SigPnd bitmap of pending signals for the thread
ShdPnd bitmap of shared pending signals for the process
SigBlk bitmap of blocked signals
SigIgn bitmap of ignored signals
SigCgt bitmap of catched signals
CapInh bitmap of inheritable capabilities
CapPrm bitmap of permitted capabilities
CapEff bitmap of effective capabilities
CapBnd bitmap of capabilities bounding set
Cpus_allowed mask of CPUs on which this process may run
Cpus_allowed_list Same as previous, but in "list format"
Mems_allowed mask of memory nodes allowed to this process
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
..............................................................................
Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
..............................................................................
Field Content
size total program size (pages) (same as VmSize in status)
@ -188,7 +246,7 @@ Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
..............................................................................
Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
..............................................................................
Field Content
pid process id
@ -222,10 +280,10 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
start_stack address of the start of the stack
esp current value of ESP
eip current value of EIP
pending bitmap of pending signals (obsolete)
blocked bitmap of blocked signals (obsolete)
sigign bitmap of ignored signals (obsolete)
sigcatch bitmap of catched signals (obsolete)
pending bitmap of pending signals
blocked bitmap of blocked signals
sigign bitmap of ignored signals
sigcatch bitmap of catched signals
wchan address where process went to sleep
0 (place holder)
0 (place holder)
@ -234,19 +292,99 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
rt_priority realtime priority
policy scheduling policy (man sched_setscheduler)
blkio_ticks time spent waiting for block IO
gtime guest time of the task in jiffies
cgtime guest time of the task children in jiffies
..............................................................................
The /proc/PID/map file containing the currently mapped memory regions and
their access permissions.
The format is:
address perms offset dev inode pathname
08048000-08049000 r-xp 00000000 03:00 8312 /opt/test
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
a7cb2000-a7eb2000 rw-p 00000000 00:00 0
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6
a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6
a800b000-a800e000 rw-p 00000000 00:00 0
a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
a8024000-a8027000 rw-p 00000000 00:00 0
a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
aff35000-aff4a000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
where "address" is the address space in the process that it occupies, "perms"
is a set of permissions:
r = read
w = write
x = execute
s = shared
p = private (copy on write)
"offset" is the offset into the mapping, "dev" is the device (major:minor), and
"inode" is the inode on that device. 0 indicates that no inode is associated
with the memory region, as the case would be with BSS (uninitialized data).
The "pathname" shows the name associated file for this mapping. If the mapping
is not associated with a file:
[heap] = the heap of the program
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
or if empty, the mapping is anonymous.
The /proc/PID/smaps is an extension based on maps, showing the memory
consumption for each of the process's mappings. For each of mappings there
is a series of lines such as the following:
08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash
Size: 1084 kB
Rss: 892 kB
Pss: 374 kB
Shared_Clean: 892 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 892 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
The first of these lines shows the same information as is displayed for the
mapping in /proc/PID/maps. The remaining lines show the size of the mapping,
the amount of the mapping that is currently resident in RAM, the "proportional
set size” (divide each shared page by the number of processes sharing it), the
number of clean and dirty shared pages in the mapping, and the number of clean
and dirty private pages in the mapping. The "Referenced" indicates the amount
of memory currently marked as referenced or accessed.
This file is only present if the CONFIG_MMU kernel configuration option is
enabled.
1.2 Kernel data
---------------
Similar to the process entries, the kernel data files give information about
the running kernel. The files used to obtain this information are contained in
/proc and are listed in Table 1-4. Not all of these will be present in your
/proc and are listed in Table 1-5. Not all of these will be present in your
system. It depends on the kernel configuration and the loaded modules, which
files are there, and which are missing.
Table 1-4: Kernel info in /proc
Table 1-5: Kernel info in /proc
..............................................................................
File Content
apm Advanced power management info
@ -283,6 +421,7 @@ Table 1-4: Kernel info in /proc
rtc Real time clock
scsi SCSI info (see text)
slabinfo Slab pool info
softirqs softirq usage
stat Overall statistics
swaps Swap space utilization
sys See chapter 2
@ -597,6 +736,25 @@ on the kind of area :
0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ...
pages=10 vmalloc N0=10
..............................................................................
softirqs:
Provides counts of softirq handlers serviced since boot time, for each cpu.
> cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI: 0 0 0 0
TIMER: 27166 27120 27097 27034
NET_TX: 0 0 0 17
NET_RX: 42 0 0 39
BLOCK: 0 0 107 1121
TASKLET: 0 0 0 290
SCHED: 27035 26983 26971 26746
HRTIMER: 0 0 0 0
RCU: 1678 1769 2178 2250
1.3 IDE devices in /proc/ide
----------------------------
@ -614,10 +772,10 @@ IDE devices:
More detailed information can be found in the controller specific
subdirectories. These are named ide0, ide1 and so on. Each of these
directories contains the files shown in table 1-5.
directories contains the files shown in table 1-6.
Table 1-5: IDE controller info in /proc/ide/ide?
Table 1-6: IDE controller info in /proc/ide/ide?
..............................................................................
File Content
channel IDE channel (0 or 1)
@ -627,11 +785,11 @@ Table 1-5: IDE controller info in /proc/ide/ide?
..............................................................................
Each device connected to a controller has a separate subdirectory in the
controllers directory. The files listed in table 1-6 are contained in these
controllers directory. The files listed in table 1-7 are contained in these
directories.
Table 1-6: IDE device information
Table 1-7: IDE device information
..............................................................................
File Content
cache The cache
@ -673,12 +831,12 @@ the drive parameters:
1.4 Networking info in /proc/net
--------------------------------
The subdirectory /proc/net follows the usual pattern. Table 1-6 shows the
The subdirectory /proc/net follows the usual pattern. Table 1-8 shows the
additional values you get for IP version 6 if you configure the kernel to
support this. Table 1-7 lists the files and their meaning.
support this. Table 1-9 lists the files and their meaning.
Table 1-6: IPv6 info in /proc/net
Table 1-8: IPv6 info in /proc/net
..............................................................................
File Content
udp6 UDP sockets (IPv6)
@ -693,7 +851,7 @@ Table 1-6: IPv6 info in /proc/net
..............................................................................
Table 1-7: Network info in /proc/net
Table 1-9: Network info in /proc/net
..............................................................................
File Content
arp Kernel ARP table
@ -817,10 +975,10 @@ The directory /proc/parport contains information about the parallel ports of
your system. It has one subdirectory for each port, named after the port
number (0,1,2,...).
These directories contain the four files shown in Table 1-8.
These directories contain the four files shown in Table 1-10.
Table 1-8: Files in /proc/parport
Table 1-10: Files in /proc/parport
..............................................................................
File Content
autoprobe Any IEEE-1284 device ID information that has been acquired.
@ -838,10 +996,10 @@ Table 1-8: Files in /proc/parport
Information about the available and actually used tty's can be found in the
directory /proc/tty.You'll find entries for drivers and line disciplines in
this directory, as shown in Table 1-9.
this directory, as shown in Table 1-11.
Table 1-9: Files in /proc/tty
Table 1-11: Files in /proc/tty
..............................................................................
File Content
drivers list of drivers and their usage
@ -883,6 +1041,7 @@ since the system first booted. For a quick look, simply cat the file:
processes 2915
procs_running 1
procs_blocked 0
softirq 183433 0 21755 12 39 1137 231 21459 2263
The very first "cpu" line aggregates the numbers in all of the other "cpuN"
lines. These numbers identify the amount of time the CPU has spent performing
@ -918,6 +1077,11 @@ CPUs.
The "procs_blocked" line gives the number of processes currently blocked,
waiting for I/O to complete.
The "softirq" line gives counts of softirqs serviced since boot time, for each
of the possible system softirqs. The first column is the total of all
softirqs serviced; each subsequent column is the total for that particular
softirq.
1.9 Ext4 file system parameters
------------------------------
@ -926,9 +1090,9 @@ Information about mounted ext4 file systems can be found in
/proc/fs/ext4. Each mounted filesystem will have a directory in
/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
/proc/fs/ext4/dm-0). The files in each per-device directory are shown
in Table 1-10, below.
in Table 1-12, below.
Table 1-10: Files in /proc/fs/ext4/<devname>
Table 1-12: Files in /proc/fs/ext4/<devname>
..............................................................................
File Content
mb_groups details of multiblock allocator buddy cache of free blocks
@ -1003,13 +1167,11 @@ CHAPTER 3: PER-PROCESS PARAMETERS
3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
------------------------------------------------------
This file can be used to adjust the score used to select which processes should
be killed in an out-of-memory situation. The oom_adj value is a characteristic
of the task's mm, so all threads that share an mm with pid will have the same
oom_adj value. A high value will increase the likelihood of this process being
killed by the oom-killer. Valid values are in the range -16 to +15 as
explained below and a special value of -17, which disables oom-killing
altogether for threads sharing pid's mm.
This file can be used to adjust the score used to select which processes
should be killed in an out-of-memory situation. Giving it a high score will
increase the likelihood of this process being killed by the oom-killer. Valid
values are in the range -16 to +15, plus the special value -17, which disables
oom-killing altogether for this process.
The process to be killed in an out-of-memory situation is selected among all others
based on its badness score. This value equals the original memory size of the process
@ -1023,9 +1185,6 @@ the parent's score if they do not share the same memory. Thus forking servers
are the prime candidates to be killed. Having only one 'hungry' child will make
parent less preferable than the child.
/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
oom-killing already.
/proc/<pid>/oom_score shows process' current badness score.
The following heuristics are then applied:

View file

@ -23,7 +23,8 @@ interface.
Using sysfs
~~~~~~~~~~~
sysfs is always compiled in. You can access it by doing:
sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
it by doing:
mount -t sysfs sysfs /sys

253
Documentation/gcov.txt Normal file
View file

@ -0,0 +1,253 @@
Using gcov with the Linux kernel
================================
1. Introduction
2. Preparation
3. Customization
4. Files
5. Modules
6. Separated build and test machines
7. Troubleshooting
Appendix A: sample script: gather_on_build.sh
Appendix B: sample script: gather_on_test.sh
1. Introduction
===============
gcov profiling kernel support enables the use of GCC's coverage testing
tool gcov [1] with the Linux kernel. Coverage data of a running kernel
is exported in gcov-compatible format via the "gcov" debugfs directory.
To get coverage data for a specific file, change to the kernel build
directory and use gcov with the -o option as follows (requires root):
# cd /tmp/linux-out
# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
This will create source code files annotated with execution counts
in the current directory. In addition, graphical gcov front-ends such
as lcov [2] can be used to automate the process of collecting data
for the entire kernel and provide coverage overviews in HTML format.
Possible uses:
* debugging (has this line been reached at all?)
* test improvement (how do I change my test to cover these lines?)
* minimizing kernel configurations (do I need this option if the
associated code is never run?)
--
[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
[2] http://ltp.sourceforge.net/coverage/lcov.php
2. Preparation
==============
Configure the kernel with:
CONFIG_DEBUGFS=y
CONFIG_GCOV_KERNEL=y
and to get coverage data for the entire kernel:
CONFIG_GCOV_PROFILE_ALL=y
Note that kernels compiled with profiling flags will be significantly
larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported
on all architectures.
Profiling data will only become accessible once debugfs has been
mounted:
mount -t debugfs none /sys/kernel/debug
3. Customization
================
To enable profiling for specific files or directories, add a line
similar to the following to the respective kernel Makefile:
For a single file (e.g. main.o):
GCOV_PROFILE_main.o := y
For all files in one directory:
GCOV_PROFILE := y
To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL
is specified, use:
GCOV_PROFILE_main.o := n
and:
GCOV_PROFILE := n
Only files which are linked to the main kernel image or are compiled as
kernel modules are supported by this mechanism.
4. Files
========
The gcov kernel support creates the following files in debugfs:
/sys/kernel/debug/gcov
Parent directory for all gcov-related files.
/sys/kernel/debug/gcov/reset
Global reset file: resets all coverage data to zero when
written to.
/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda
The actual gcov data file as understood by the gcov
tool. Resets file coverage data to zero when written to.
/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno
Symbolic link to a static data file required by the gcov
tool. This file is generated by gcc when compiling with
option -ftest-coverage.
5. Modules
==========
Kernel modules may contain cleanup code which is only run during
module unload time. The gcov mechanism provides a means to collect
coverage data for such code by keeping a copy of the data associated
with the unloaded module. This data remains available through debugfs.
Once the module is loaded again, the associated coverage counters are
initialized with the data from its previous instantiation.
This behavior can be deactivated by specifying the gcov_persist kernel
parameter:
gcov_persist=0
At run-time, a user can also choose to discard data for an unloaded
module by writing to its data file or the global reset file.
6. Separated build and test machines
====================================
The gcov kernel profiling infrastructure is designed to work out-of-the
box for setups where kernels are built and run on the same machine. In
cases where the kernel runs on a separate machine, special preparations
must be made, depending on where the gcov tool is used:
a) gcov is run on the TEST machine
The gcov tool version on the test machine must be compatible with the
gcc version used for kernel build. Also the following files need to be
copied from build to test machine:
from the source tree:
- all C source files + headers
from the build tree:
- all C source files + headers
- all .gcda and .gcno files
- all links to directories
It is important to note that these files need to be placed into the
exact same file system location on the test machine as on the build
machine. If any of the path components is symbolic link, the actual
directory needs to be used instead (due to make's CURDIR handling).
b) gcov is run on the BUILD machine
The following files need to be copied after each test case from test
to build machine:
from the gcov directory in sysfs:
- all .gcda files
- all links to .gcno files
These files can be copied to any location on the build machine. gcov
must then be called with the -o option pointing to that directory.
Example directory setup on the build machine:
/tmp/linux: kernel source tree
/tmp/out: kernel build directory as specified by make O=
/tmp/coverage: location of the files copied from the test machine
[user@build] cd /tmp/out
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
7. Troubleshooting
==================
Problem: Compilation aborts during linker step.
Cause: Profiling flags are specified for source files which are not
linked to the main kernel or which are linked by a custom
linker procedure.
Solution: Exclude affected source files from profiling by specifying
GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the
corresponding Makefile.
Problem: Files copied from sysfs appear empty or incomplete.
Cause: Due to the way seq_file works, some tools such as cp or tar
may not correctly copy files from sysfs.
Solution: Use 'cat' to read .gcda files and 'cp -d' to copy links.
Alternatively use the mechanism shown in Appendix B.
Appendix A: gather_on_build.sh
==============================
Sample script to gather coverage meta files on the build machine
(see 6a):
#!/bin/bash
KSRC=$1
KOBJ=$2
DEST=$3
if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then
echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2
exit 1
fi
KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \
-perm /u+r,g+r | tar cfz $DEST -P -T -
if [ $? -eq 0 ] ; then
echo "$DEST successfully created, copy to test system and unpack with:"
echo " tar xfz $DEST -P"
else
echo "Could not create file $DEST"
fi
Appendix B: gather_on_test.sh
=============================
Sample script to gather coverage data files on the test machine
(see 6b):
#!/bin/bash -e
DEST=$1
GCDA=/sys/kernel/debug/gcov
if [ -z "$DEST" ] ; then
echo "Usage: $0 <output.tar.gz>" >&2
exit 1
fi
TEMPDIR=$(mktemp -d)
echo Collecting data..
find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \;
find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \;
find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \;
tar czf $DEST -C $TEMPDIR sys
rm -rf $TEMPDIR
echo "$DEST successfully created, copy to build system and unpack with:"
echo " tar xfz $DEST"

View file

@ -165,3 +165,47 @@ was done there. Two significant differences are:
Once again, method 3 should be avoided wherever possible. Explicit device
instantiation (methods 1 and 2) is much preferred for it is safer and
faster.
Method 4: Instantiate from user-space
-------------------------------------
In general, the kernel should know which I2C devices are connected and
what addresses they live at. However, in certain cases, it does not, so a
sysfs interface was added to let the user provide the information. This
interface is made of 2 attribute files which are created in every I2C bus
directory: new_device and delete_device. Both files are write only and you
must write the right parameters to them in order to properly instantiate,
respectively delete, an I2C device.
File new_device takes 2 parameters: the name of the I2C device (a string)
and the address of the I2C device (a number, typically expressed in
hexadecimal starting with 0x, but can also be expressed in decimal.)
File delete_device takes a single parameter: the address of the I2C
device. As no two devices can live at the same address on a given I2C
segment, the address is sufficient to uniquely identify the device to be
deleted.
Example:
# echo eeprom 0x50 > /sys/class/i2c-adapter/i2c-3/new_device
While this interface should only be used when in-kernel device declaration
can't be done, there is a variety of cases where it can be helpful:
* The I2C driver usually detects devices (method 3 above) but the bus
segment your device lives on doesn't have the proper class bit set and
thus detection doesn't trigger.
* The I2C driver usually detects devices, but your device lives at an
unexpected address.
* The I2C driver usually detects devices, but your device is not detected,
either because the detection routine is too strict, or because your
device is not officially supported yet but you know it is compatible.
* You are developing a driver on a test board, where you soldered the I2C
device yourself.
This interface is a replacement for the force_* module parameters some I2C
drivers implement. Being implemented in i2c-core rather than in each
device driver individually, it is much more efficient, and also has the
advantage that you do not have to reload the driver to change a setting.
You can also instantiate the device before the driver is loaded or even
available, and you don't need to know what driver the device needs.

View file

@ -126,19 +126,9 @@ different) configuration information, as do drivers handling chip variants
that can't be distinguished by protocol probing, or which need some board
specific information to operate correctly.
Accordingly, the I2C stack now has two models for associating I2C devices
with their drivers: the original "legacy" model, and a newer one that's
fully compatible with the Linux 2.6 driver model. These models do not mix,
since the "legacy" model requires drivers to create "i2c_client" device
objects after SMBus style probing, while the Linux driver model expects
drivers to be given such device objects in their probe() routines.
The legacy model is deprecated now and will soon be removed, so we no
longer document it here.
Standard Driver Model Binding ("New Style")
-------------------------------------------
Device/Driver Binding
---------------------
System infrastructure, typically board-specific initialization code or
boot firmware, reports what I2C devices exist. For example, there may be
@ -201,7 +191,7 @@ a given I2C bus. This is for example the case of hardware monitoring
devices on a PC's SMBus. In that case, you may want to let your driver
detect supported devices automatically. This is how the legacy model
was working, and is now available as an extension to the standard
driver model (so that we can finally get rid of the legacy model.)
driver model.
You simply have to define a detect callback which will attempt to
identify supported devices (returning 0 for supported ones and -ENODEV

View file

@ -278,7 +278,7 @@ struct input_event {
};
'time' is the timestamp, it returns the time at which the event happened.
Type is for example EV_REL for relative moment, REL_KEY for a keypress or
Type is for example EV_REL for relative moment, EV_KEY for a keypress or
release. More types are defined in include/linux/input.h.
'code' is event code, for example REL_X or KEY_BACKSPACE, again a complete

View file

@ -67,7 +67,12 @@ data with it.
struct rotary_encoder_platform_data is declared in
include/linux/rotary-encoder.h and needs to be filled with the number of
steps the encoder has and can carry information about externally inverted
signals (because of used invertig buffer or other reasons).
signals (because of an inverting buffer or other reasons). The encoder
can be set up to deliver input information as either an absolute or relative
axes. For relative axes the input event returns +/-1 for each step. For
absolute axes the position of the encoder can either roll over between zero
and the number of steps or will clamp at the maximum and zero depending on
the configuration.
Because GPIO to IRQ mapping is platform specific, this information must
be given in seperately to the driver. See the example below.
@ -85,6 +90,8 @@ be given in seperately to the driver. See the example below.
static struct rotary_encoder_platform_data my_rotary_encoder_info = {
.steps = 24,
.axis = ABS_X,
.relative_axis = false,
.rollover = false,
.gpio_a = GPIO_ROTARY_A,
.gpio_b = GPIO_ROTARY_B,
.inverted_a = 0,

View file

@ -139,6 +139,7 @@ Code Seq# Include File Comments
'm' all linux/synclink.h conflict!
'm' 00-1F net/irda/irmod.h conflict!
'n' 00-7F linux/ncp_fs.h
'n' 80-8F linux/nilfs2_fs.h NILFS2
'n' E0-FF video/matrox.h matroxfb
'o' 00-1F fs/ocfs2/ocfs2_fs.h OCFS2
'o' 00-03 include/mtd/ubi-user.h conflict! (OCFS2 and UBI overlaps)
@ -149,6 +150,8 @@ Code Seq# Include File Comments
'p' 40-7F linux/nvram.h
'p' 80-9F user-space parport
<mailto:tim@cyberelk.net>
'p' a1-a4 linux/pps.h LinuxPPS
<mailto:giometti@linux.it>
'q' 00-1F linux/serio.h
'q' 80-FF Internet PhoneJACK, Internet LineJACK
<http://www.quicknet.net>

View file

@ -14,25 +14,14 @@ README
- general info on what you need and what to do for Linux ISDN.
README.FAQ
- general info for FAQ.
README.audio
- info for running audio over ISDN.
README.fax
- info for using Fax over ISDN.
README.gigaset
- info on the drivers for Siemens Gigaset ISDN adapters.
README.icn
- info on the ICN-ISDN-card and its driver.
>>>>>>> 93af7aca44f0e82e67bda10a0fb73d383edcc8bd:Documentation/isdn/00-INDEX
README.HiSax
- info on the HiSax driver which replaces the old teles.
README.act2000
- info on driver for IBM ACT-2000 card.
README.audio
- info for running audio over ISDN.
README.avmb1
- info on driver for AVM-B1 ISDN card.
README.act2000
- info on driver for IBM ACT-2000 card.
README.eicon
- info on driver for Eicon active cards.
README.concap
- info on "CONCAP" encapsulation protocol interface used for X.25.
README.diversion
@ -59,7 +48,3 @@ README.x25
- info for running X.25 over ISDN.
syncPPP.FAQ
- frequently asked questions about running PPP over ISDN.
README.hysdn
- info on driver for Hypercope active HYSDN cards
README.mISDN
- info on the Modular ISDN subsystem (mISDN).

View file

@ -75,7 +75,7 @@ Linux カーネルパッチ投稿者向けチェックリスト
ビルドした上、動作確認を行ってください。
14: もしパッチがディスクのI/O性能などに影響を与えるようであれば、
'CONFIG_LBD'オプションを有効にした場合と無効にした場合の両方で
'CONFIG_LBDAF'オプションを有効にした場合と無効にした場合の両方で
テストを実施してみてください。
15: lockdepの機能を全て有効にした上で、全てのコードパスを評価してください。

View file

@ -48,6 +48,7 @@ parameter is applicable:
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
FB The frame buffer device is enabled.
GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
@ -228,14 +229,6 @@ and is between 256 and 4096 characters. It is defined in the file
to assume that this machine's pmtimer latches its value
and always returns good values.
acpi.power_nocheck= [HW,ACPI]
Format: 1/0 enable/disable the check of power state.
On some bogus BIOS the _PSC object/_STA object of
power resource can't return the correct device power
state. In such case it is unneccessary to check its
power state again in power transition.
1 : disable the power state check
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
Format: { level | edge | high | low }
@ -796,6 +789,12 @@ and is between 256 and 4096 characters. It is defined in the file
Format: off | on
default: on
gcov_persist= [GCOV] When non-zero (default), profiling data for
kernel modules is saved and remains accessible via
debugfs, even when the module is unloaded/reloaded.
When zero, profiling data is discarded and associated
debugfs files are removed at module unload time.
gdth= [HW,SCSI]
See header of drivers/scsi/gdth.c.
@ -999,6 +998,7 @@ and is between 256 and 4096 characters. It is defined in the file
nomerge
forcesac
soft
pt [x86, IA64]
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
@ -1115,6 +1115,10 @@ and is between 256 and 4096 characters. It is defined in the file
libata.dma=4 Compact Flash DMA only
Combinations also work, so libata.dma=3 enables DMA
for disks and CDROMs, but not CFs.
libata.ignore_hpa= [LIBATA] Ignore HPA limit
libata.ignore_hpa=0 keep BIOS limits (default)
libata.ignore_hpa=1 ignore limits, using full disk
libata.noacpi [LIBATA] Disables use of ACPI in libata suspend/resume
when set.
@ -1362,6 +1366,27 @@ and is between 256 and 4096 characters. It is defined in the file
min_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory below this
physical address is ignored.
mini2440= [ARM,HW,KNL]
Format:[0..2][b][c][t]
Default: "0tb"
MINI2440 configuration specification:
0 - The attached screen is the 3.5" TFT
1 - The attached screen is the 7" TFT
2 - The VGA Shield is attached (1024x768)
Leaving out the screen size parameter will not load
the TFT driver, and the framebuffer will be left
unconfigured.
b - Enable backlight. The TFT backlight pin will be
linked to the kernel VESA blanking code and a GPIO
LED. This parameter is not necessary when using the
VGA shield.
c - Enable the s3c camera interface.
t - Reserved for enabling touchscreen support. The
touchscreen support is not enabled in the mainstream
kernel as of 2.6.30, a preliminary port can be found
in the "bleeding edge" mini2440 support kernel at
http://repo.or.cz/w/linux-2.6/mini2440.git
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
parameter allows control of the logging verbosity for
@ -1403,6 +1428,16 @@ and is between 256 and 4096 characters. It is defined in the file
mtdparts= [MTD]
See drivers/mtd/cmdlinepart.c.
onenand.bdry= [HW,MTD] Flex-OneNAND Boundary Configuration
Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
boundary - index of last SLC block on Flex-OneNAND.
The remaining blocks are configured as MLC blocks.
lock - Configure if Flex-OneNAND boundary should be locked.
Once locked, the boundary cannot be changed.
1 indicates lock status, 0 indicates unlock status.
mtdset= [ARM]
ARM/S3C2412 JIVE boot control
@ -1689,8 +1724,8 @@ and is between 256 and 4096 characters. It is defined in the file
oprofile.cpu_type= Force an oprofile cpu type
This might be useful if you have an older oprofile
userland or if you want common events.
Format: { archperfmon }
archperfmon: [X86] Force use of architectural
Format: { arch_perfmon }
arch_perfmon: [X86] Force use of architectural
perfmon on Intel CPUs instead of the
CPU specific event set.
@ -1769,6 +1804,9 @@ and is between 256 and 4096 characters. It is defined in the file
root domains (aka PCI segments, in ACPI-speak).
nommconf [X86] Disable use of MMCONFIG for PCI
Configuration
check_enable_amd_mmconf [X86] check for and enable
properly configured MMIO access to PCI
config space on AMD family 10h CPU
nomsi [MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.
@ -1858,6 +1896,12 @@ and is between 256 and 4096 characters. It is defined in the file
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
ecrc= Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
the default.
off: Turn ECRC off
on: Turn ECRC on.
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
@ -1875,6 +1919,12 @@ and is between 256 and 4096 characters. It is defined in the file
Format: { 0 | 1 }
See arch/parisc/kernel/pdc_chassis.c
percpu_alloc= [X86] Select which percpu first chunk allocator to use.
Allowed values are one of "lpage", "embed" and "4k".
See comments in arch/x86/kernel/setup_percpu.c for
details on each allocator. This parameter is primarily
for debugging and performance comparison.
pf. [PARIDE]
See Documentation/blockdev/paride.txt.
@ -2427,7 +2477,13 @@ and is between 256 and 4096 characters. It is defined in the file
tp720= [HW,PS2]
trace_buf_size=nn[KMG] [ftrace] will set tracing buffer size.
trace_buf_size=nn[KMG]
[FTRACE] will set tracing buffer size.
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
to facilitate early boot debugging.
See also Documentation/trace/events.txt
trix= [HW,OSS] MediaTrix AudioTrix Pro
Format:

View file

@ -16,13 +16,17 @@ Usage
-----
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
thread scans the memory every 10 minutes (by default) and prints any new
unreferenced objects found. To trigger an intermediate scan and display
all the possible memory leaks:
thread scans the memory every 10 minutes (by default) and prints the
number of new unreferenced objects found. To display the details of all
the possible memory leaks:
# mount -t debugfs nodev /sys/kernel/debug/
# cat /sys/kernel/debug/kmemleak
To trigger an intermediate memory scan:
# echo scan > /sys/kernel/debug/kmemleak
Note that the orphan objects are listed in the order they were allocated
and one object at the beginning of the list may cause other subsequent
objects to be reported as orphan.
@ -31,16 +35,21 @@ Memory scanning parameters can be modified at run-time by writing to the
/sys/kernel/debug/kmemleak file. The following parameters are supported:
off - disable kmemleak (irreversible)
stack=on - enable the task stacks scanning
stack=on - enable the task stacks scanning (default)
stack=off - disable the tasks stacks scanning
scan=on - start the automatic memory scanning thread
scan=on - start the automatic memory scanning thread (default)
scan=off - stop the automatic memory scanning thread
scan=<secs> - set the automatic memory scanning period in seconds (0
to disable it)
scan=<secs> - set the automatic memory scanning period in seconds
(default 600, 0 to stop the automatic scanning)
scan - trigger a memory scan
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
the kernel command line.
Memory may be allocated or freed before kmemleak is initialised and
these actions are stored in an early log buffer. The size of this buffer
is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option.
Basic Algorithm
---------------

View file

@ -36,8 +36,6 @@ detailed description):
- Bluetooth enable and disable
- video output switching, expansion control
- ThinkLight on and off
- limited docking and undocking
- UltraBay eject
- CMOS/UCMS control
- LED control
- ACPI sounds
@ -729,131 +727,6 @@ cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
It is impossible to know if the status returned through sysfs is valid.
Docking / undocking -- /proc/acpi/ibm/dock
------------------------------------------
Docking and undocking (e.g. with the X4 UltraBase) requires some
actions to be taken by the operating system to safely make or break
the electrical connections with the dock.
The docking feature of this driver generates the following ACPI events:
ibm/dock GDCK 00000003 00000001 -- eject request
ibm/dock GDCK 00000003 00000002 -- undocked
ibm/dock GDCK 00000000 00000003 -- docked
NOTE: These events will only be generated if the laptop was docked
when originally booted. This is due to the current lack of support for
hot plugging of devices in the Linux ACPI framework. If the laptop was
booted while not in the dock, the following message is shown in the
logs:
Mar 17 01:42:34 aero kernel: thinkpad_acpi: dock device not present
In this case, no dock-related events are generated but the dock and
undock commands described below still work. They can be executed
manually or triggered by Fn key combinations (see the example acpid
configuration files included in the driver tarball package available
on the web site).
When the eject request button on the dock is pressed, the first event
above is generated. The handler for this event should issue the
following command:
echo undock > /proc/acpi/ibm/dock
After the LED on the dock goes off, it is safe to eject the laptop.
Note: if you pressed this key by mistake, go ahead and eject the
laptop, then dock it back in. Otherwise, the dock may not function as
expected.
When the laptop is docked, the third event above is generated. The
handler for this event should issue the following command to fully
enable the dock:
echo dock > /proc/acpi/ibm/dock
The contents of the /proc/acpi/ibm/dock file shows the current status
of the dock, as provided by the ACPI framework.
The docking support in this driver does not take care of enabling or
disabling any other devices you may have attached to the dock. For
example, a CD drive plugged into the UltraBase needs to be disabled or
enabled separately. See the provided example acpid configuration files
for how this can be accomplished.
There is no support yet for PCI devices that may be attached to a
docking station, e.g. in the ThinkPad Dock II. The driver currently
does not recognize, enable or disable such devices. This means that
the only docking stations currently supported are the X-series
UltraBase docks and "dumb" port replicators like the Mini Dock (the
latter don't need any ACPI support, actually).
UltraBay eject -- /proc/acpi/ibm/bay
------------------------------------
Inserting or ejecting an UltraBay device requires some actions to be
taken by the operating system to safely make or break the electrical
connections with the device.
This feature generates the following ACPI events:
ibm/bay MSTR 00000003 00000000 -- eject request
ibm/bay MSTR 00000001 00000000 -- eject lever inserted
NOTE: These events will only be generated if the UltraBay was present
when the laptop was originally booted (on the X series, the UltraBay
is in the dock, so it may not be present if the laptop was undocked).
This is due to the current lack of support for hot plugging of devices
in the Linux ACPI framework. If the laptop was booted without the
UltraBay, the following message is shown in the logs:
Mar 17 01:42:34 aero kernel: thinkpad_acpi: bay device not present
In this case, no bay-related events are generated but the eject
command described below still works. It can be executed manually or
triggered by a hot key combination.
Sliding the eject lever generates the first event shown above. The
handler for this event should take whatever actions are necessary to
shut down the device in the UltraBay (e.g. call idectl), then issue
the following command:
echo eject > /proc/acpi/ibm/bay
After the LED on the UltraBay goes off, it is safe to pull out the
device.
When the eject lever is inserted, the second event above is
generated. The handler for this event should take whatever actions are
necessary to enable the UltraBay device (e.g. call idectl).
The contents of the /proc/acpi/ibm/bay file shows the current status
of the UltraBay, as provided by the ACPI framework.
EXPERIMENTAL warm eject support on the 600e/x, A22p and A3x (To use
this feature, you need to supply the experimental=1 parameter when
loading the module):
These models do not have a button near the UltraBay device to request
a hot eject but rather require the laptop to be put to sleep
(suspend-to-ram) before the bay device is ejected or inserted).
The sequence of steps to eject the device is as follows:
echo eject > /proc/acpi/ibm/bay
put the ThinkPad to sleep
remove the drive
resume from sleep
cat /proc/acpi/ibm/bay should show that the drive was removed
On the A3x, both the UltraBay 2000 and UltraBay Plus devices are
supported. Use "eject2" instead of "eject" for the second bay.
Note: the UltraBay eject support on the 600e/x, A22p and A3x is
EXPERIMENTAL and may not work as expected. USE WITH CAUTION!
CMOS/UCMS control
-----------------
@ -920,7 +793,7 @@ The available commands are:
echo '<LED number> off' >/proc/acpi/ibm/led
echo '<LED number> blink' >/proc/acpi/ibm/led
The <LED number> range is 0 to 7. The set of LEDs that can be
The <LED number> range is 0 to 15. The set of LEDs that can be
controlled varies from model to model. Here is the common ThinkPad
mapping:
@ -932,6 +805,11 @@ mapping:
5 - UltraBase battery slot
6 - (unknown)
7 - standby
8 - dock status 1
9 - dock status 2
10, 11 - (unknown)
12 - thinkvantage
13, 14, 15 - (unknown)
All of the above can be turned on and off and can be made to blink.
@ -940,10 +818,12 @@ sysfs notes:
The ThinkPad LED sysfs interface is described in detail by the LED class
documentation, in Documentation/leds-class.txt.
The leds are named (in LED ID order, from 0 to 7):
The LEDs are named (in LED ID order, from 0 to 12):
"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
"tpacpi::unknown_led", "tpacpi::standby".
"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
"tpacpi::thinkvantage".
Due to limitations in the sysfs LED class, if the status of the LED
indicators cannot be read due to an error, thinkpad-acpi will report it as
@ -958,6 +838,12 @@ ThinkPad indicator LED should blink in hardware accelerated mode, use the
"timer" trigger, and leave the delay_on and delay_off parameters set to
zero (to request hardware acceleration autodetection).
LEDs that are known not to exist in a given ThinkPad model are not
made available through the sysfs interface. If you have a dock and you
notice there are LEDs listed for your ThinkPad that do not exist (and
are not in the dock), or if you notice that there are missing LEDs,
a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
ACPI sounds -- /proc/acpi/ibm/beep
----------------------------------
@ -1156,17 +1042,19 @@ may not be distinct. Later Lenovo models that implement the ACPI
display backlight brightness control methods have 16 levels, ranging
from 0 to 15.
There are two interfaces to the firmware for direct brightness control,
EC and UCMS (or CMOS). To select which one should be used, use the
brightness_mode module parameter: brightness_mode=1 selects EC mode,
brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
mode with NVRAM backing (so that brightness changes are remembered
across shutdown/reboot).
For IBM ThinkPads, there are two interfaces to the firmware for direct
brightness control, EC and UCMS (or CMOS). To select which one should be
used, use the brightness_mode module parameter: brightness_mode=1 selects
EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
mode with NVRAM backing (so that brightness changes are remembered across
shutdown/reboot).
The driver tries to select which interface to use from a table of
defaults for each ThinkPad model. If it makes a wrong choice, please
report this as a bug, so that we can fix it.
Lenovo ThinkPads only support brightness_mode=2 (UCMS).
When display backlight brightness controls are available through the
standard ACPI interface, it is best to use it instead of this direct
ThinkPad-specific interface. The driver will disable its native
@ -1254,7 +1142,7 @@ Fan control and monitoring: fan speed, fan enable/disable
procfs: /proc/acpi/ibm/fan
sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1,
pwm1_enable
pwm1_enable, fan2_input
sysfs hwmon driver attributes: fan_watchdog
NOTE NOTE NOTE: fan control operations are disabled by default for
@ -1267,6 +1155,9 @@ from the hardware registers of the embedded controller. This is known
to work on later R, T, X and Z series ThinkPads but may show a bogus
value on other models.
Some Lenovo ThinkPads support a secondary fan. This fan cannot be
controlled separately, it shares the main fan control.
Fan levels:
Most ThinkPad fans work in "levels" at the firmware interface. Level 0
@ -1397,6 +1288,11 @@ hwmon device attribute fan1_input:
which can take up to two minutes. May return rubbish on older
ThinkPads.
hwmon device attribute fan2_input:
Fan tachometer reading, in RPM, for the secondary fan.
Available only on some ThinkPads. If the secondary fan is
not installed, will always read 0.
hwmon driver attribute fan_watchdog:
Fan safety watchdog timer interval, in seconds. Minimum is
1 second, maximum is 120 seconds. 0 disables the watchdog.
@ -1555,3 +1451,7 @@ Sysfs interface changelog:
0x020300: hotkey enable/disable support removed, attributes
hotkey_bios_enabled and hotkey_enable deprecated and
marked for removal.
0x020400: Marker for 16 LEDs support. Also, LEDs that are known
to not exist in a given model are not registered with
the LED sysfs class anymore.

View file

@ -0,0 +1,50 @@
Kernel driver lp3944
====================
* National Semiconductor LP3944 Fun-light Chip
Prefix: 'lp3944'
Addresses scanned: None (see the Notes section below)
Datasheet: Publicly available at the National Semiconductor website
http://www.national.com/pf/LP/LP3944.html
Authors:
Antonio Ospite <ospite@studenti.unina.it>
Description
-----------
The LP3944 is a helper chip that can drive up to 8 leds, with two programmable
DIM modes; it could even be used as a gpio expander but this driver assumes it
is used as a led controller.
The DIM modes are used to set _blink_ patterns for leds, the pattern is
specified supplying two parameters:
- period: from 0s to 1.6s
- duty cycle: percentage of the period the led is on, from 0 to 100
Setting a led in DIM0 or DIM1 mode makes it blink according to the pattern.
See the datasheet for details.
LP3944 can be found on Motorola A910 smartphone, where it drives the rgb
leds, the camera flash light and the lcds power.
Notes
-----
The chip is used mainly in embedded contexts, so this driver expects it is
registered using the i2c_board_info mechanism.
To register the chip at address 0x60 on adapter 0, set the platform data
according to include/linux/leds-lp3944.h, set the i2c board info:
static struct i2c_board_info __initdata a910_i2c_board_info[] = {
{
I2C_BOARD_INFO("lp3944", 0x60),
.platform_data = &a910_lp3944_leds,
},
};
and register it in the platform init function
i2c_register_board_info(0, a910_i2c_board_info,
ARRAY_SIZE(a910_i2c_board_info));

File diff suppressed because it is too large Load diff

View file

@ -30,9 +30,9 @@ State
The validator tracks lock-class usage history into 4n + 1 separate state bits:
- 'ever held in STATE context'
- 'ever head as readlock in STATE context'
- 'ever head with STATE enabled'
- 'ever head as readlock with STATE enabled'
- 'ever held as readlock in STATE context'
- 'ever held with STATE enabled'
- 'ever held as readlock with STATE enabled'
Where STATE can be either one of (kernel/lockdep_states.h)
- hardirq

View file

@ -1,7 +1,7 @@
This is the 6pack-mini-HOWTO, written by
Andreas Könsgen DG3KQ
Internet: ajk@iehk.rwth-aachen.de
Internet: ajk@comnets.uni-bremen.de
AMPR-net: dg3kq@db0pra.ampr.org
AX.25: dg3kq@db0ach.#nrw.deu.eu

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,148 @@
4xx/Axon EMAC ethernet nodes
The EMAC ethernet controller in IBM and AMCC 4xx chips, and also
the Axon bridge. To operate this needs to interact with a ths
special McMAL DMA controller, and sometimes an RGMII or ZMII
interface. In addition to the nodes and properties described
below, the node for the OPB bus on which the EMAC sits must have a
correct clock-frequency property.
i) The EMAC node itself
Required properties:
- device_type : "network"
- compatible : compatible list, contains 2 entries, first is
"ibm,emac-CHIP" where CHIP is the host ASIC (440gx,
405gp, Axon) and second is either "ibm,emac" or
"ibm,emac4". For Axon, thus, we have: "ibm,emac-axon",
"ibm,emac4"
- interrupts : <interrupt mapping for EMAC IRQ and WOL IRQ>
- interrupt-parent : optional, if needed for interrupt mapping
- reg : <registers mapping>
- local-mac-address : 6 bytes, MAC address
- mal-device : phandle of the associated McMAL node
- mal-tx-channel : 1 cell, index of the tx channel on McMAL associated
with this EMAC
- mal-rx-channel : 1 cell, index of the rx channel on McMAL associated
with this EMAC
- cell-index : 1 cell, hardware index of the EMAC cell on a given
ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on
each Axon chip)
- max-frame-size : 1 cell, maximum frame size supported in bytes
- rx-fifo-size : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec
operations.
For Axon, 2048
- tx-fifo-size : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec
operations.
For Axon, 2048.
- fifo-entry-size : 1 cell, size of a fifo entry (used to calculate
thresholds).
For Axon, 0x00000010
- mal-burst-size : 1 cell, MAL burst size (used to calculate thresholds)
in bytes.
For Axon, 0x00000100 (I think ...)
- phy-mode : string, mode of operations of the PHY interface.
Supported values are: "mii", "rmii", "smii", "rgmii",
"tbi", "gmii", rtbi", "sgmii".
For Axon on CAB, it is "rgmii"
- mdio-device : 1 cell, required iff using shared MDIO registers
(440EP). phandle of the EMAC to use to drive the
MDIO lines for the PHY used by this EMAC.
- zmii-device : 1 cell, required iff connected to a ZMII. phandle of
the ZMII device node
- zmii-channel : 1 cell, required iff connected to a ZMII. Which ZMII
channel or 0xffffffff if ZMII is only used for MDIO.
- rgmii-device : 1 cell, required iff connected to an RGMII. phandle
of the RGMII device node.
For Axon: phandle of plb5/plb4/opb/rgmii
- rgmii-channel : 1 cell, required iff connected to an RGMII. Which
RGMII channel is used by this EMAC.
Fox Axon: present, whatever value is appropriate for each
EMAC, that is the content of the current (bogus) "phy-port"
property.
Optional properties:
- phy-address : 1 cell, optional, MDIO address of the PHY. If absent,
a search is performed.
- phy-map : 1 cell, optional, bitmap of addresses to probe the PHY
for, used if phy-address is absent. bit 0x00000001 is
MDIO address 0.
For Axon it can be absent, though my current driver
doesn't handle phy-address yet so for now, keep
0x00ffffff in it.
- rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec
operations (if absent the value is the same as
rx-fifo-size). For Axon, either absent or 2048.
- tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec
operations (if absent the value is the same as
tx-fifo-size). For Axon, either absent or 2048.
- tah-device : 1 cell, optional. If connected to a TAH engine for
offload, phandle of the TAH device node.
- tah-channel : 1 cell, optional. If appropriate, channel used on the
TAH engine.
Example:
EMAC0: ethernet@40000800 {
device_type = "network";
compatible = "ibm,emac-440gp", "ibm,emac";
interrupt-parent = <&UIC1>;
interrupts = <1c 4 1d 4>;
reg = <40000800 70>;
local-mac-address = [00 04 AC E3 1B 1E];
mal-device = <&MAL0>;
mal-tx-channel = <0 1>;
mal-rx-channel = <0>;
cell-index = <0>;
max-frame-size = <5dc>;
rx-fifo-size = <1000>;
tx-fifo-size = <800>;
phy-mode = "rmii";
phy-map = <00000001>;
zmii-device = <&ZMII0>;
zmii-channel = <0>;
};
ii) McMAL node
Required properties:
- device_type : "dma-controller"
- compatible : compatible list, containing 2 entries, first is
"ibm,mcmal-CHIP" where CHIP is the host ASIC (like
emac) and the second is either "ibm,mcmal" or
"ibm,mcmal2".
For Axon, "ibm,mcmal-axon","ibm,mcmal2"
- interrupts : <interrupt mapping for the MAL interrupts sources:
5 sources: tx_eob, rx_eob, serr, txde, rxde>.
For Axon: This is _different_ from the current
firmware. We use the "delayed" interrupts for txeob
and rxeob. Thus we end up with mapping those 5 MPIC
interrupts, all level positive sensitive: 10, 11, 32,
33, 34 (in decimal)
- dcr-reg : < DCR registers range >
- dcr-parent : if needed for dcr-reg
- num-tx-chans : 1 cell, number of Tx channels
- num-rx-chans : 1 cell, number of Rx channels
iii) ZMII node
Required properties:
- compatible : compatible list, containing 2 entries, first is
"ibm,zmii-CHIP" where CHIP is the host ASIC (like
EMAC) and the second is "ibm,zmii".
For Axon, there is no ZMII node.
- reg : <registers mapping>
iv) RGMII node
Required properties:
- compatible : compatible list, containing 2 entries, first is
"ibm,rgmii-CHIP" where CHIP is the host ASIC (like
EMAC) and the second is "ibm,rgmii".
For Axon, "ibm,rgmii-axon","ibm,rgmii"
- reg : <registers mapping>
- revision : as provided by the RGMII new version register if
available.
For Axon: 0x0000012a

View file

@ -10,6 +10,8 @@ Required properties:
- interrupts : should contain eSDHC interrupt.
- interrupt-parent : interrupt source phandle.
- clock-frequency : specifies eSDHC base clock frequency.
- sdhci,1-bit-only : (optional) specifies that a controller can
only handle 1-bit data transfers.
Example:

View file

@ -0,0 +1,50 @@
Specifying GPIO information for devices
============================================
1) gpios property
-----------------
Nodes that makes use of GPIOs should define them using `gpios' property,
format of which is: <&gpio-controller1-phandle gpio1-specifier
&gpio-controller2-phandle gpio2-specifier
0 /* holes are permitted, means no GPIO 3 */
&gpio-controller4-phandle gpio4-specifier
...>;
Note that gpio-specifier length is controller dependent.
gpio-specifier may encode: bank, pin position inside the bank,
whether pin is open-drain and whether pin is logically inverted.
Example of the node using GPIOs:
node {
gpios = <&qe_pio_e 18 0>;
};
In this example gpio-specifier is "18 0" and encodes GPIO pin number,
and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller.
2) gpio-controller nodes
------------------------
Every GPIO controller node must have #gpio-cells property defined,
this information will be used to translate gpio-specifiers.
Example of two SOC GPIO banks defined as gpio-controller nodes:
qe_pio_a: gpio-controller@1400 {
#gpio-cells = <2>;
compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank";
reg = <0x1400 0x18>;
gpio-controller;
};
qe_pio_e: gpio-controller@1460 {
#gpio-cells = <2>;
compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank";
reg = <0x1460 0x18>;
gpio-controller;
};

View file

@ -16,10 +16,17 @@ LED sub-node properties:
string defining the trigger assigned to the LED. Current triggers are:
"backlight" - LED will act as a back-light, controlled by the framebuffer
system
"default-on" - LED will turn on
"default-on" - LED will turn on, but see "default-state" below
"heartbeat" - LED "double" flashes at a load average based rate
"ide-disk" - LED indicates disk activity
"timer" - LED flashes at a fixed, configurable rate
- default-state: (optional) The initial state of the LED. Valid
values are "on", "off", and "keep". If the LED is already on or off
and the default-state property is set the to same value, then no
glitch should be produced where the LED momentarily turns off (or
on). The "keep" setting will keep the LED at whatever its current
state is, without producing a glitch. The default is off if this
property is not present.
Examples:
@ -30,14 +37,22 @@ leds {
gpios = <&mcu_pio 0 1>; /* Active low */
linux,default-trigger = "ide-disk";
};
fault {
gpios = <&mcu_pio 1 0>;
/* Keep LED on if BIOS detected hardware fault */
default-state = "keep";
};
};
run-control {
compatible = "gpio-leds";
red {
gpios = <&mpc8572 6 0>;
default-state = "off";
};
green {
gpios = <&mpc8572 7 0>;
default-state = "on";
};
}

View file

@ -0,0 +1,19 @@
MDIO on GPIOs
Currently defined compatibles:
- virtual,gpio-mdio
MDC and MDIO lines connected to GPIO controllers are listed in the
gpios property as described in section VIII.1 in the following order:
MDC, MDIO.
Example:
mdio {
compatible = "virtual,mdio-gpio";
#address-cells = <1>;
#size-cells = <0>;
gpios = <&qe_pio_a 11
&qe_pio_c 6>;
};

View file

@ -0,0 +1,521 @@
Marvell Discovery mv64[345]6x System Controller chips
===========================================================
The Marvell mv64[345]60 series of system controller chips contain
many of the peripherals needed to implement a complete computer
system. In this section, we define device tree nodes to describe
the system controller chip itself and each of the peripherals
which it contains. Compatible string values for each node are
prefixed with the string "marvell,", for Marvell Technology Group Ltd.
1) The /system-controller node
This node is used to represent the system-controller and must be
present when the system uses a system controller chip. The top-level
system-controller node contains information that is global to all
devices within the system controller chip. The node name begins
with "system-controller" followed by the unit address, which is
the base address of the memory-mapped register set for the system
controller chip.
Required properties:
- ranges : Describes the translation of system controller addresses
for memory mapped registers.
- clock-frequency: Contains the main clock frequency for the system
controller chip.
- reg : This property defines the address and size of the
memory-mapped registers contained within the system controller
chip. The address specified in the "reg" property should match
the unit address of the system-controller node.
- #address-cells : Address representation for system controller
devices. This field represents the number of cells needed to
represent the address of the memory-mapped registers of devices
within the system controller chip.
- #size-cells : Size representation for for the memory-mapped
registers within the system controller chip.
- #interrupt-cells : Defines the width of cells used to represent
interrupts.
Optional properties:
- model : The specific model of the system controller chip. Such
as, "mv64360", "mv64460", or "mv64560".
- compatible : A string identifying the compatibility identifiers
of the system controller chip.
The system-controller node contains child nodes for each system
controller device that the platform uses. Nodes should not be created
for devices which exist on the system controller chip but are not used
Example Marvell Discovery mv64360 system-controller node:
system-controller@f1000000 { /* Marvell Discovery mv64360 */
#address-cells = <1>;
#size-cells = <1>;
model = "mv64360"; /* Default */
compatible = "marvell,mv64360";
clock-frequency = <133333333>;
reg = <0xf1000000 0x10000>;
virtual-reg = <0xf1000000>;
ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */
0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */
0xa0000000 0xa0000000 0x4000000 /* User FLASH */
0x00000000 0xf1000000 0x0010000 /* Bridge's regs */
0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */
[ child node definitions... ]
}
2) Child nodes of /system-controller
a) Marvell Discovery MDIO bus
The MDIO is a bus to which the PHY devices are connected. For each
device that exists on this bus, a child node should be created. See
the definition of the PHY node below for an example of how to define
a PHY.
Required properties:
- #address-cells : Should be <1>
- #size-cells : Should be <0>
- device_type : Should be "mdio"
- compatible : Should be "marvell,mv64360-mdio"
Example:
mdio {
#address-cells = <1>;
#size-cells = <0>;
device_type = "mdio";
compatible = "marvell,mv64360-mdio";
ethernet-phy@0 {
......
};
};
b) Marvell Discovery ethernet controller
The Discover ethernet controller is described with two levels
of nodes. The first level describes an ethernet silicon block
and the second level describes up to 3 ethernet nodes within
that block. The reason for the multiple levels is that the
registers for the node are interleaved within a single set
of registers. The "ethernet-block" level describes the
shared register set, and the "ethernet" nodes describe ethernet
port-specific properties.
Ethernet block node
Required properties:
- #address-cells : <1>
- #size-cells : <0>
- compatible : "marvell,mv64360-eth-block"
- reg : Offset and length of the register set for this block
Example Discovery Ethernet block node:
ethernet-block@2000 {
#address-cells = <1>;
#size-cells = <0>;
compatible = "marvell,mv64360-eth-block";
reg = <0x2000 0x2000>;
ethernet@0 {
.......
};
};
Ethernet port node
Required properties:
- device_type : Should be "network".
- compatible : Should be "marvell,mv64360-eth".
- reg : Should be <0>, <1>, or <2>, according to which registers
within the silicon block the device uses.
- interrupts : <a> where a is the interrupt number for the port.
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
- phy : the phandle for the PHY connected to this ethernet
controller.
- local-mac-address : 6 bytes, MAC address
Example Discovery Ethernet port node:
ethernet@0 {
device_type = "network";
compatible = "marvell,mv64360-eth";
reg = <0>;
interrupts = <32>;
interrupt-parent = <&PIC>;
phy = <&PHY0>;
local-mac-address = [ 00 00 00 00 00 00 ];
};
c) Marvell Discovery PHY nodes
Required properties:
- device_type : Should be "ethernet-phy"
- interrupts : <a> where a is the interrupt number for this phy.
- interrupt-parent : the phandle for the interrupt controller that
services interrupts for this device.
- reg : The ID number for the phy, usually a small integer
Example Discovery PHY node:
ethernet-phy@1 {
device_type = "ethernet-phy";
compatible = "broadcom,bcm5421";
interrupts = <76>; /* GPP 12 */
interrupt-parent = <&PIC>;
reg = <1>;
};
d) Marvell Discovery SDMA nodes
Represent DMA hardware associated with the MPSC (multiprotocol
serial controllers).
Required properties:
- compatible : "marvell,mv64360-sdma"
- reg : Offset and length of the register set for this device
- interrupts : <a> where a is the interrupt number for the DMA
device.
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery SDMA node:
sdma@4000 {
compatible = "marvell,mv64360-sdma";
reg = <0x4000 0xc18>;
virtual-reg = <0xf1004000>;
interrupts = <36>;
interrupt-parent = <&PIC>;
};
e) Marvell Discovery BRG nodes
Represent baud rate generator hardware associated with the MPSC
(multiprotocol serial controllers).
Required properties:
- compatible : "marvell,mv64360-brg"
- reg : Offset and length of the register set for this device
- clock-src : A value from 0 to 15 which selects the clock
source for the baud rate generator. This value corresponds
to the CLKS value in the BRGx configuration register. See
the mv64x60 User's Manual.
- clock-frequence : The frequency (in Hz) of the baud rate
generator's input clock.
- current-speed : The current speed setting (presumably by
firmware) of the baud rate generator.
Example Discovery BRG node:
brg@b200 {
compatible = "marvell,mv64360-brg";
reg = <0xb200 0x8>;
clock-src = <8>;
clock-frequency = <133333333>;
current-speed = <9600>;
};
f) Marvell Discovery CUNIT nodes
Represent the Serial Communications Unit device hardware.
Required properties:
- reg : Offset and length of the register set for this device
Example Discovery CUNIT node:
cunit@f200 {
reg = <0xf200 0x200>;
};
g) Marvell Discovery MPSCROUTING nodes
Represent the Discovery's MPSC routing hardware
Required properties:
- reg : Offset and length of the register set for this device
Example Discovery CUNIT node:
mpscrouting@b500 {
reg = <0xb400 0xc>;
};
h) Marvell Discovery MPSCINTR nodes
Represent the Discovery's MPSC DMA interrupt hardware registers
(SDMA cause and mask registers).
Required properties:
- reg : Offset and length of the register set for this device
Example Discovery MPSCINTR node:
mpsintr@b800 {
reg = <0xb800 0x100>;
};
i) Marvell Discovery MPSC nodes
Represent the Discovery's MPSC (Multiprotocol Serial Controller)
serial port.
Required properties:
- device_type : "serial"
- compatible : "marvell,mv64360-mpsc"
- reg : Offset and length of the register set for this device
- sdma : the phandle for the SDMA node used by this port
- brg : the phandle for the BRG node used by this port
- cunit : the phandle for the CUNIT node used by this port
- mpscrouting : the phandle for the MPSCROUTING node used by this port
- mpscintr : the phandle for the MPSCINTR node used by this port
- cell-index : the hardware index of this cell in the MPSC core
- max_idle : value needed for MPSC CHR3 (Maximum Frame Length)
register
- interrupts : <a> where a is the interrupt number for the MPSC.
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery MPSCINTR node:
mpsc@8000 {
device_type = "serial";
compatible = "marvell,mv64360-mpsc";
reg = <0x8000 0x38>;
virtual-reg = <0xf1008000>;
sdma = <&SDMA0>;
brg = <&BRG0>;
cunit = <&CUNIT>;
mpscrouting = <&MPSCROUTING>;
mpscintr = <&MPSCINTR>;
cell-index = <0>;
max_idle = <40>;
interrupts = <40>;
interrupt-parent = <&PIC>;
};
j) Marvell Discovery Watch Dog Timer nodes
Represent the Discovery's watchdog timer hardware
Required properties:
- compatible : "marvell,mv64360-wdt"
- reg : Offset and length of the register set for this device
Example Discovery Watch Dog Timer node:
wdt@b410 {
compatible = "marvell,mv64360-wdt";
reg = <0xb410 0x8>;
};
k) Marvell Discovery I2C nodes
Represent the Discovery's I2C hardware
Required properties:
- device_type : "i2c"
- compatible : "marvell,mv64360-i2c"
- reg : Offset and length of the register set for this device
- interrupts : <a> where a is the interrupt number for the I2C.
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery I2C node:
compatible = "marvell,mv64360-i2c";
reg = <0xc000 0x20>;
virtual-reg = <0xf100c000>;
interrupts = <37>;
interrupt-parent = <&PIC>;
};
l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes
Represent the Discovery's PIC hardware
Required properties:
- #interrupt-cells : <1>
- #address-cells : <0>
- compatible : "marvell,mv64360-pic"
- reg : Offset and length of the register set for this device
- interrupt-controller
Example Discovery PIC node:
pic {
#interrupt-cells = <1>;
#address-cells = <0>;
compatible = "marvell,mv64360-pic";
reg = <0x0 0x88>;
interrupt-controller;
};
m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes
Represent the Discovery's MPP hardware
Required properties:
- compatible : "marvell,mv64360-mpp"
- reg : Offset and length of the register set for this device
Example Discovery MPP node:
mpp@f000 {
compatible = "marvell,mv64360-mpp";
reg = <0xf000 0x10>;
};
n) Marvell Discovery GPP (General Purpose Pins) nodes
Represent the Discovery's GPP hardware
Required properties:
- compatible : "marvell,mv64360-gpp"
- reg : Offset and length of the register set for this device
Example Discovery GPP node:
gpp@f000 {
compatible = "marvell,mv64360-gpp";
reg = <0xf100 0x20>;
};
o) Marvell Discovery PCI host bridge node
Represents the Discovery's PCI host bridge device. The properties
for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE
1275-1994. A typical value for the compatible property is
"marvell,mv64360-pci".
Example Discovery PCI host bridge node
pci@80000000 {
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
device_type = "pci";
compatible = "marvell,mv64360-pci";
reg = <0xcf8 0x8>;
ranges = <0x01000000 0x0 0x0
0x88000000 0x0 0x01000000
0x02000000 0x0 0x80000000
0x80000000 0x0 0x08000000>;
bus-range = <0 255>;
clock-frequency = <66000000>;
interrupt-parent = <&PIC>;
interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
interrupt-map = <
/* IDSEL 0x0a */
0x5000 0 0 1 &PIC 80
0x5000 0 0 2 &PIC 81
0x5000 0 0 3 &PIC 91
0x5000 0 0 4 &PIC 93
/* IDSEL 0x0b */
0x5800 0 0 1 &PIC 91
0x5800 0 0 2 &PIC 93
0x5800 0 0 3 &PIC 80
0x5800 0 0 4 &PIC 81
/* IDSEL 0x0c */
0x6000 0 0 1 &PIC 91
0x6000 0 0 2 &PIC 93
0x6000 0 0 3 &PIC 80
0x6000 0 0 4 &PIC 81
/* IDSEL 0x0d */
0x6800 0 0 1 &PIC 93
0x6800 0 0 2 &PIC 80
0x6800 0 0 3 &PIC 81
0x6800 0 0 4 &PIC 91
>;
};
p) Marvell Discovery CPU Error nodes
Represent the Discovery's CPU error handler device.
Required properties:
- compatible : "marvell,mv64360-cpu-error"
- reg : Offset and length of the register set for this device
- interrupts : the interrupt number for this device
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery CPU Error node:
cpu-error@0070 {
compatible = "marvell,mv64360-cpu-error";
reg = <0x70 0x10 0x128 0x28>;
interrupts = <3>;
interrupt-parent = <&PIC>;
};
q) Marvell Discovery SRAM Controller nodes
Represent the Discovery's SRAM controller device.
Required properties:
- compatible : "marvell,mv64360-sram-ctrl"
- reg : Offset and length of the register set for this device
- interrupts : the interrupt number for this device
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery SRAM Controller node:
sram-ctrl@0380 {
compatible = "marvell,mv64360-sram-ctrl";
reg = <0x380 0x80>;
interrupts = <13>;
interrupt-parent = <&PIC>;
};
r) Marvell Discovery PCI Error Handler nodes
Represent the Discovery's PCI error handler device.
Required properties:
- compatible : "marvell,mv64360-pci-error"
- reg : Offset and length of the register set for this device
- interrupts : the interrupt number for this device
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery PCI Error Handler node:
pci-error@1d40 {
compatible = "marvell,mv64360-pci-error";
reg = <0x1d40 0x40 0xc28 0x4>;
interrupts = <12>;
interrupt-parent = <&PIC>;
};
s) Marvell Discovery Memory Controller nodes
Represent the Discovery's memory controller device.
Required properties:
- compatible : "marvell,mv64360-mem-ctrl"
- reg : Offset and length of the register set for this device
- interrupts : the interrupt number for this device
- interrupt-parent : the phandle for the interrupt controller
that services interrupts for this device.
Example Discovery Memory Controller node:
mem-ctrl@1400 {
compatible = "marvell,mv64360-mem-ctrl";
reg = <0x1400 0x60>;
interrupts = <17>;
interrupt-parent = <&PIC>;
};

View file

@ -0,0 +1,25 @@
PHY nodes
Required properties:
- device_type : Should be "ethernet-phy"
- interrupts : <a b> where a is the interrupt number and b is a
field that represents an encoding of the sense and level
information for the interrupt. This should be encoded based on
the information in section 2) depending on the type of interrupt
controller you have.
- interrupt-parent : the phandle for the interrupt controller that
services interrupts for this device.
- reg : The ID number for the phy, usually a small integer
- linux,phandle : phandle for this node; likely referenced by an
ethernet controller node.
Example:
ethernet-phy@0 {
linux,phandle = <2452000>
interrupt-parent = <40000>;
interrupts = <35 1>;
reg = <0>;
device_type = "ethernet-phy";
};

View file

@ -0,0 +1,57 @@
SPI (Serial Peripheral Interface) busses
SPI busses can be described with a node for the SPI master device
and a set of child nodes for each SPI slave on the bus. For this
discussion, it is assumed that the system's SPI controller is in
SPI master mode. This binding does not describe SPI controllers
in slave mode.
The SPI master node requires the following properties:
- #address-cells - number of cells required to define a chip select
address on the SPI bus.
- #size-cells - should be zero.
- compatible - name of SPI bus controller following generic names
recommended practice.
No other properties are required in the SPI bus node. It is assumed
that a driver for an SPI bus device will understand that it is an SPI bus.
However, the binding does not attempt to define the specific method for
assigning chip select numbers. Since SPI chip select configuration is
flexible and non-standardized, it is left out of this binding with the
assumption that board specific platform code will be used to manage
chip selects. Individual drivers can define additional properties to
support describing the chip select layout.
SPI slave nodes must be children of the SPI master node and can
contain the following properties.
- reg - (required) chip select address of device.
- compatible - (required) name of SPI device following generic names
recommended practice
- spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz
- spi-cpol - (optional) Empty property indicating device requires
inverse clock polarity (CPOL) mode
- spi-cpha - (optional) Empty property indicating device requires
shifted clock phase (CPHA) mode
- spi-cs-high - (optional) Empty property indicating device requires
chip select active high
SPI example for an MPC5200 SPI bus:
spi@f00 {
#address-cells = <1>;
#size-cells = <0>;
compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi";
reg = <0xf00 0x20>;
interrupts = <2 13 0 2 14 0>;
interrupt-parent = <&mpc5200_pic>;
ethernet-switch@0 {
compatible = "micrel,ks8995m";
spi-max-frequency = <1000000>;
reg = <0>;
};
codec@1 {
compatible = "ti,tlv320aic26";
spi-max-frequency = <100000>;
reg = <1>;
};
};

View file

@ -0,0 +1,25 @@
USB EHCI controllers
Required properties:
- compatible : should be "usb-ehci".
- reg : should contain at least address and length of the standard EHCI
register set for the device. Optional platform-dependent registers
(debug-port or other) can be also specified here, but only after
definition of standard EHCI registers.
- interrupts : one EHCI interrupt should be described here.
If device registers are implemented in big endian mode, the device
node should have "big-endian-regs" property.
If controller implementation operates with big endian descriptors,
"big-endian-desc" property should be specified.
If both big endian registers and descriptors are used by the controller
implementation, "big-endian" property can be specified instead of having
both "big-endian-regs" and "big-endian-desc".
Example (Sequoia 440EPx):
ehci@e0000300 {
compatible = "ibm,usb-ehci-440epx", "usb-ehci";
interrupt-parent = <&UIC0>;
interrupts = <1a 4>;
reg = <0 e0000300 90 0 e0000390 70>;
big-endian;
};

View file

@ -0,0 +1,295 @@
d) Xilinx IP cores
The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range
of standard device types (network, serial, etc.) and miscellaneous
devices (gpio, LCD, spi, etc). Also, since these devices are
implemented within the fpga fabric every instance of the device can be
synthesised with different options that change the behaviour.
Each IP-core has a set of parameters which the FPGA designer can use to
control how the core is synthesized. Historically, the EDK tool would
extract the device parameters relevant to device drivers and copy them
into an 'xparameters.h' in the form of #define symbols. This tells the
device drivers how the IP cores are configured, but it requres the kernel
to be recompiled every time the FPGA bitstream is resynthesized.
The new approach is to export the parameters into the device tree and
generate a new device tree each time the FPGA bitstream changes. The
parameters which used to be exported as #defines will now become
properties of the device node. In general, device nodes for IP-cores
will take the following form:
(name): (generic-name)@(base-address) {
compatible = "xlnx,(ip-core-name)-(HW_VER)"
[, (list of compatible devices), ...];
reg = <(baseaddr) (size)>;
interrupt-parent = <&interrupt-controller-phandle>;
interrupts = < ... >;
xlnx,(parameter1) = "(string-value)";
xlnx,(parameter2) = <(int-value)>;
};
(generic-name): an open firmware-style name that describes the
generic class of device. Preferably, this is one word, such
as 'serial' or 'ethernet'.
(ip-core-name): the name of the ip block (given after the BEGIN
directive in system.mhs). Should be in lowercase
and all underscores '_' converted to dashes '-'.
(name): is derived from the "PARAMETER INSTANCE" value.
(parameter#): C_* parameters from system.mhs. The C_ prefix is
dropped from the parameter name, the name is converted
to lowercase and all underscore '_' characters are
converted to dashes '-'.
(baseaddr): the baseaddr parameter value (often named C_BASEADDR).
(HW_VER): from the HW_VER parameter.
(size): the address range size (often C_HIGHADDR - C_BASEADDR + 1).
Typically, the compatible list will include the exact IP core version
followed by an older IP core version which implements the same
interface or any other device with the same interface.
'reg', 'interrupt-parent' and 'interrupts' are all optional properties.
For example, the following block from system.mhs:
BEGIN opb_uartlite
PARAMETER INSTANCE = opb_uartlite_0
PARAMETER HW_VER = 1.00.b
PARAMETER C_BAUDRATE = 115200
PARAMETER C_DATA_BITS = 8
PARAMETER C_ODD_PARITY = 0
PARAMETER C_USE_PARITY = 0
PARAMETER C_CLK_FREQ = 50000000
PARAMETER C_BASEADDR = 0xEC100000
PARAMETER C_HIGHADDR = 0xEC10FFFF
BUS_INTERFACE SOPB = opb_7
PORT OPB_Clk = CLK_50MHz
PORT Interrupt = opb_uartlite_0_Interrupt
PORT RX = opb_uartlite_0_RX
PORT TX = opb_uartlite_0_TX
PORT OPB_Rst = sys_bus_reset_0
END
becomes the following device tree node:
opb_uartlite_0: serial@ec100000 {
device_type = "serial";
compatible = "xlnx,opb-uartlite-1.00.b";
reg = <ec100000 10000>;
interrupt-parent = <&opb_intc_0>;
interrupts = <1 0>; // got this from the opb_intc parameters
current-speed = <d#115200>; // standard serial device prop
clock-frequency = <d#50000000>; // standard serial device prop
xlnx,data-bits = <8>;
xlnx,odd-parity = <0>;
xlnx,use-parity = <0>;
};
Some IP cores actually implement 2 or more logical devices. In
this case, the device should still describe the whole IP core with
a single node and add a child node for each logical device. The
ranges property can be used to translate from parent IP-core to the
registers of each device. In addition, the parent node should be
compatible with the bus type 'xlnx,compound', and should contain
#address-cells and #size-cells, as with any other bus. (Note: this
makes the assumption that both logical devices have the same bus
binding. If this is not true, then separate nodes should be used
for each logical device). The 'cell-index' property can be used to
enumerate logical devices within an IP core. For example, the
following is the system.mhs entry for the dual ps2 controller found
on the ml403 reference design.
BEGIN opb_ps2_dual_ref
PARAMETER INSTANCE = opb_ps2_dual_ref_0
PARAMETER HW_VER = 1.00.a
PARAMETER C_BASEADDR = 0xA9000000
PARAMETER C_HIGHADDR = 0xA9001FFF
BUS_INTERFACE SOPB = opb_v20_0
PORT Sys_Intr1 = ps2_1_intr
PORT Sys_Intr2 = ps2_2_intr
PORT Clkin1 = ps2_clk_rx_1
PORT Clkin2 = ps2_clk_rx_2
PORT Clkpd1 = ps2_clk_tx_1
PORT Clkpd2 = ps2_clk_tx_2
PORT Rx1 = ps2_d_rx_1
PORT Rx2 = ps2_d_rx_2
PORT Txpd1 = ps2_d_tx_1
PORT Txpd2 = ps2_d_tx_2
END
It would result in the following device tree nodes:
opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,compound";
ranges = <0 a9000000 2000>;
// If this device had extra parameters, then they would
// go here.
ps2@0 {
compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
reg = <0 40>;
interrupt-parent = <&opb_intc_0>;
interrupts = <3 0>;
cell-index = <0>;
};
ps2@1000 {
compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
reg = <1000 40>;
interrupt-parent = <&opb_intc_0>;
interrupts = <3 0>;
cell-index = <0>;
};
};
Also, the system.mhs file defines bus attachments from the processor
to the devices. The device tree structure should reflect the bus
attachments. Again an example; this system.mhs fragment:
BEGIN ppc405_virtex4
PARAMETER INSTANCE = ppc405_0
PARAMETER HW_VER = 1.01.a
BUS_INTERFACE DPLB = plb_v34_0
BUS_INTERFACE IPLB = plb_v34_0
END
BEGIN opb_intc
PARAMETER INSTANCE = opb_intc_0
PARAMETER HW_VER = 1.00.c
PARAMETER C_BASEADDR = 0xD1000FC0
PARAMETER C_HIGHADDR = 0xD1000FDF
BUS_INTERFACE SOPB = opb_v20_0
END
BEGIN opb_uart16550
PARAMETER INSTANCE = opb_uart16550_0
PARAMETER HW_VER = 1.00.d
PARAMETER C_BASEADDR = 0xa0000000
PARAMETER C_HIGHADDR = 0xa0001FFF
BUS_INTERFACE SOPB = opb_v20_0
END
BEGIN plb_v34
PARAMETER INSTANCE = plb_v34_0
PARAMETER HW_VER = 1.02.a
END
BEGIN plb_bram_if_cntlr
PARAMETER INSTANCE = plb_bram_if_cntlr_0
PARAMETER HW_VER = 1.00.b
PARAMETER C_BASEADDR = 0xFFFF0000
PARAMETER C_HIGHADDR = 0xFFFFFFFF
BUS_INTERFACE SPLB = plb_v34_0
END
BEGIN plb2opb_bridge
PARAMETER INSTANCE = plb2opb_bridge_0
PARAMETER HW_VER = 1.01.a
PARAMETER C_RNG0_BASEADDR = 0x20000000
PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF
PARAMETER C_RNG1_BASEADDR = 0x60000000
PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF
PARAMETER C_RNG2_BASEADDR = 0x80000000
PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF
PARAMETER C_RNG3_BASEADDR = 0xC0000000
PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF
BUS_INTERFACE SPLB = plb_v34_0
BUS_INTERFACE MOPB = opb_v20_0
END
Gives this device tree (some properties removed for clarity):
plb@0 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,plb-v34-1.02.a";
device_type = "ibm,plb";
ranges; // 1:1 translation
plb_bram_if_cntrl_0: bram@ffff0000 {
reg = <ffff0000 10000>;
}
opb@20000000 {
#address-cells = <1>;
#size-cells = <1>;
ranges = <20000000 20000000 20000000
60000000 60000000 20000000
80000000 80000000 40000000
c0000000 c0000000 20000000>;
opb_uart16550_0: serial@a0000000 {
reg = <a00000000 2000>;
};
opb_intc_0: interrupt-controller@d1000fc0 {
reg = <d1000fc0 20>;
};
};
};
That covers the general approach to binding xilinx IP cores into the
device tree. The following are bindings for specific devices:
i) Xilinx ML300 Framebuffer
Simple framebuffer device from the ML300 reference design (also on the
ML403 reference design as well as others).
Optional properties:
- resolution = <xres yres> : pixel resolution of framebuffer. Some
implementations use a different resolution.
Default is <d#640 d#480>
- virt-resolution = <xvirt yvirt> : Size of framebuffer in memory.
Default is <d#1024 d#480>.
- rotate-display (empty) : rotate display 180 degrees.
ii) Xilinx SystemACE
The Xilinx SystemACE device is used to program FPGAs from an FPGA
bitstream stored on a CF card. It can also be used as a generic CF
interface device.
Optional properties:
- 8-bit (empty) : Set this property for SystemACE in 8 bit mode
iii) Xilinx EMAC and Xilinx TEMAC
Xilinx Ethernet devices. In addition to general xilinx properties
listed above, nodes for these devices should include a phy-handle
property, and may include other common network device properties
like local-mac-address.
iv) Xilinx Uartlite
Xilinx uartlite devices are simple fixed speed serial ports.
Required properties:
- current-speed : Baud rate of uartlite
v) Xilinx hwicap
Xilinx hwicap devices provide access to the configuration logic
of the FPGA through the Internal Configuration Access Port
(ICAP). The ICAP enables partial reconfiguration of the FPGA,
readback of the configuration information, and some control over
'warm boots' of the FPGA fabric.
Required properties:
- xlnx,family : The family of the FPGA, necessary since the
capabilities of the underlying ICAP hardware
differ between different families. May be
'virtex2p', 'virtex4', or 'virtex5'.
vi) Xilinx Uart 16550
Xilinx UART 16550 devices are very similar to the NS16550 but with
different register spacing and an offset from the base address.
Required properties:
- clock-frequency : Frequency of the clock input
- reg-offset : A value of 3 is required
- reg-shift : A value of 2 is required

172
Documentation/pps/pps.txt Normal file
View file

@ -0,0 +1,172 @@
PPS - Pulse Per Second
----------------------
(C) Copyright 2007 Rodolfo Giometti <giometti@enneenne.com>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Overview
--------
LinuxPPS provides a programming interface (API) to define in the
system several PPS sources.
PPS means "pulse per second" and a PPS source is just a device which
provides a high precision signal each second so that an application
can use it to adjust system clock time.
A PPS source can be connected to a serial port (usually to the Data
Carrier Detect pin) or to a parallel port (ACK-pin) or to a special
CPU's GPIOs (this is the common case in embedded systems) but in each
case when a new pulse arrives the system must apply to it a timestamp
and record it for userland.
Common use is the combination of the NTPD as userland program, with a
GPS receiver as PPS source, to obtain a wallclock-time with
sub-millisecond synchronisation to UTC.
RFC considerations
------------------
While implementing a PPS API as RFC 2783 defines and using an embedded
CPU GPIO-Pin as physical link to the signal, I encountered a deeper
problem:
At startup it needs a file descriptor as argument for the function
time_pps_create().
This implies that the source has a /dev/... entry. This assumption is
ok for the serial and parallel port, where you can do something
useful besides(!) the gathering of timestamps as it is the central
task for a PPS-API. But this assumption does not work for a single
purpose GPIO line. In this case even basic file-related functionality
(like read() and write()) makes no sense at all and should not be a
precondition for the use of a PPS-API.
The problem can be simply solved if you consider that a PPS source is
not always connected with a GPS data source.
So your programs should check if the GPS data source (the serial port
for instance) is a PPS source too, and if not they should provide the
possibility to open another device as PPS source.
In LinuxPPS the PPS sources are simply char devices usually mapped
into files /dev/pps0, /dev/pps1, etc..
Coding example
--------------
To register a PPS source into the kernel you should define a struct
pps_source_info_s as follows:
static struct pps_source_info pps_ktimer_info = {
.name = "ktimer",
.path = "",
.mode = PPS_CAPTUREASSERT | PPS_OFFSETASSERT | \
PPS_ECHOASSERT | \
PPS_CANWAIT | PPS_TSFMT_TSPEC,
.echo = pps_ktimer_echo,
.owner = THIS_MODULE,
};
and then calling the function pps_register_source() in your
intialization routine as follows:
source = pps_register_source(&pps_ktimer_info,
PPS_CAPTUREASSERT | PPS_OFFSETASSERT);
The pps_register_source() prototype is:
int pps_register_source(struct pps_source_info_s *info, int default_params)
where "info" is a pointer to a structure that describes a particular
PPS source, "default_params" tells the system what the initial default
parameters for the device should be (it is obvious that these parameters
must be a subset of ones defined in the struct
pps_source_info_s which describe the capabilities of the driver).
Once you have registered a new PPS source into the system you can
signal an assert event (for example in the interrupt handler routine)
just using:
pps_event(source, &ts, PPS_CAPTUREASSERT, ptr)
where "ts" is the event's timestamp.
The same function may also run the defined echo function
(pps_ktimer_echo(), passing to it the "ptr" pointer) if the user
asked for that... etc..
Please see the file drivers/pps/clients/ktimer.c for example code.
SYSFS support
-------------
If the SYSFS filesystem is enabled in the kernel it provides a new class:
$ ls /sys/class/pps/
pps0/ pps1/ pps2/
Every directory is the ID of a PPS sources defined in the system and
inside you find several files:
$ ls /sys/class/pps/pps0/
assert clear echo mode name path subsystem@ uevent
Inside each "assert" and "clear" file you can find the timestamp and a
sequence number:
$ cat /sys/class/pps/pps0/assert
1170026870.983207967#8
Where before the "#" is the timestamp in seconds; after it is the
sequence number. Other files are:
* echo: reports if the PPS source has an echo function or not;
* mode: reports available PPS functioning modes;
* name: reports the PPS source's name;
* path: reports the PPS source's device path, that is the device the
PPS source is connected to (if it exists).
Testing the PPS support
-----------------------
In order to test the PPS support even without specific hardware you can use
the ktimer driver (see the client subsection in the PPS configuration menu)
and the userland tools provided into Documentaion/pps/ directory.
Once you have enabled the compilation of ktimer just modprobe it (if
not statically compiled):
# modprobe ktimer
and the run ppstest as follow:
$ ./ppstest /dev/pps0
trying PPS source "/dev/pps1"
found PPS source "/dev/pps1"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1186592699.388832443, sequence: 364 - clear 0.000000000, sequence: 0
source 0 - assert 1186592700.388931295, sequence: 365 - clear 0.000000000, sequence: 0
source 0 - assert 1186592701.389032765, sequence: 366 - clear 0.000000000, sequence: 0
Please, note that to compile userland programs you need the file timepps.h
(see Documentation/pps/).

View file

@ -3,9 +3,8 @@ rfkill - RF kill switch support
1. Introduction
2. Implementation details
3. Kernel driver guidelines
4. Kernel API
5. Userspace support
3. Kernel API
4. Userspace support
1. Introduction
@ -19,82 +18,62 @@ disable all transmitters of a certain type (or all). This is intended for
situations where transmitters need to be turned off, for example on
aircraft.
The rfkill subsystem has a concept of "hard" and "soft" block, which
differ little in their meaning (block == transmitters off) but rather in
whether they can be changed or not:
- hard block: read-only radio block that cannot be overriden by software
- soft block: writable radio block (need not be readable) that is set by
the system software.
2. Implementation details
The rfkill subsystem is composed of various components: the rfkill class, the
rfkill-input module (an input layer handler), and some specific input layer
events.
The rfkill subsystem is composed of three main components:
* the rfkill core,
* the deprecated rfkill-input module (an input layer handler, being
replaced by userspace policy code) and
* the rfkill drivers.
The rfkill class is provided for kernel drivers to register their radio
transmitter with the kernel, provide methods for turning it on and off and,
optionally, letting the system know about hardware-disabled states that may
be implemented on the device. This code is enabled with the CONFIG_RFKILL
Kconfig option, which drivers can "select".
The rfkill core provides API for kernel drivers to register their radio
transmitter with the kernel, methods for turning it on and off and, letting
the system know about hardware-disabled states that may be implemented on
the device.
The rfkill class code also notifies userspace of state changes, this is
achieved via uevents. It also provides some sysfs files for userspace to
check the status of radio transmitters. See the "Userspace support" section
below.
The rfkill-input code implements a basic response to rfkill buttons -- it
implements turning on/off all devices of a certain class (or all).
The rfkill core code also notifies userspace of state changes, and provides
ways for userspace to query the current states. See the "Userspace support"
section below.
When the device is hard-blocked (either by a call to rfkill_set_hw_state()
or from query_hw_block) set_block() will be invoked but drivers can well
ignore the method call since they can use the return value of the function
rfkill_set_hw_state() to sync the software state instead of keeping track
of calls to set_block().
or from query_hw_block) set_block() will be invoked for additional software
block, but drivers can ignore the method call since they can use the return
value of the function rfkill_set_hw_state() to sync the software state
instead of keeping track of calls to set_block(). In fact, drivers should
use the return value of rfkill_set_hw_state() unless the hardware actually
keeps track of soft and hard block separately.
The entire functionality is spread over more than one subsystem:
* The kernel input layer generates KEY_WWAN, KEY_WLAN etc. and
SW_RFKILL_ALL -- when the user presses a button. Drivers for radio
transmitters generally do not register to the input layer, unless the
device really provides an input device (i.e. a button that has no
effect other than generating a button press event)
* The rfkill-input code hooks up to these events and switches the soft-block
of the various radio transmitters, depending on the button type.
* The rfkill drivers turn off/on their transmitters as requested.
* The rfkill class will generate userspace notifications (uevents) to tell
userspace what the current state is.
3. Kernel API
3. Kernel driver guidelines
Drivers for radio transmitters normally implement only the rfkill class.
These drivers may not unblock the transmitter based on own decisions, they
should act on information provided by the rfkill class only.
Drivers for radio transmitters normally implement an rfkill driver.
Platform drivers might implement input devices if the rfkill button is just
that, a button. If that button influences the hardware then you need to
implement an rfkill class instead. This also applies if the platform provides
implement an rfkill driver instead. This also applies if the platform provides
a way to turn on/off the transmitter(s).
During suspend/hibernation, transmitters should only be left enabled when
wake-on wlan or similar functionality requires it and the device wasn't
blocked before suspend/hibernate. Note that it may be necessary to update
the rfkill subsystem's idea of what the current state is at resume time if
the state may have changed over suspend.
For some platforms, it is possible that the hardware state changes during
suspend/hibernation, in which case it will be necessary to update the rfkill
core with the current state is at resume time.
To create an rfkill driver, driver's Kconfig needs to have
depends on RFKILL || !RFKILL
4. Kernel API
To build a driver with rfkill subsystem support, the driver should depend on
(or select) the Kconfig symbol RFKILL.
The hardware the driver talks to may be write-only (where the current state
of the hardware is unknown), or read-write (where the hardware can be queried
about its current state).
to ensure the driver cannot be built-in when rfkill is modular. The !RFKILL
case allows the driver to be built when rfkill is not configured, which which
case all rfkill API can still be used but will be provided by static inlines
which compile to almost nothing.
Calling rfkill_set_hw_state() when a state change happens is required from
rfkill drivers that control devices that can be hard-blocked unless they also
@ -105,10 +84,35 @@ device). Don't do this unless you cannot get the event in any other way.
5. Userspace support
The following sysfs entries exist for every rfkill device:
The recommended userspace interface to use is /dev/rfkill, which is a misc
character device that allows userspace to obtain and set the state of rfkill
devices and sets of devices. It also notifies userspace about device addition
and removal. The API is a simple read/write API that is defined in
linux/rfkill.h, with one ioctl that allows turning off the deprecated input
handler in the kernel for the transition period.
Except for the one ioctl, communication with the kernel is done via read()
and write() of instances of 'struct rfkill_event'. In this structure, the
soft and hard block are properly separated (unlike sysfs, see below) and
userspace is able to get a consistent snapshot of all rfkill devices in the
system. Also, it is possible to switch all rfkill drivers (or all drivers of
a specified type) into a state which also updates the default state for
hotplugged devices.
After an application opens /dev/rfkill, it can read the current state of
all devices, and afterwards can poll the descriptor for hotplug or state
change events.
Applications must ignore operations (the "op" field) they do not handle,
this allows the API to be extended in the future.
Additionally, each rfkill device is registered in sysfs and there has the
following attributes:
name: Name assigned by driver to this key (interface or driver name).
type: Name of the key type ("wlan", "bluetooth", etc).
type: Driver type string ("wlan", "bluetooth", etc).
persistent: Whether the soft blocked state is initialised from
non-volatile storage at startup.
state: Current state of the transmitter
0: RFKILL_STATE_SOFT_BLOCKED
transmitter is turned off by software
@ -117,7 +121,12 @@ The following sysfs entries exist for every rfkill device:
2: RFKILL_STATE_HARD_BLOCKED
transmitter is forced off by something outside of
the driver's control.
claim: 0: Kernel handles events (currently always reads that value)
This file is deprecated because it can only properly show
three of the four possible states, soft-and-hard-blocked is
missing.
claim: 0: Kernel handles events
This file is deprecated because there no longer is a way to
claim just control over a single rfkill instance.
rfkill devices also issue uevents (with an action of "change"), with the
following environment variables set:
@ -128,9 +137,3 @@ RFKILL_TYPE
The contents of these variables corresponds to the "name", "state" and
"type" sysfs files explained above.
An alternative userspace interface exists as a misc device /dev/rfkill,
which allows userspace to obtain and set the state of rfkill devices and
sets of devices. It also notifies userspace about device addition and
removal. The API is a simple read/write API that is defined in
linux/rfkill.h.

View file

@ -135,7 +135,7 @@ manipulating this list), the user code must observe the following
protocol on 'lock entry' insertion and removal:
On insertion:
1) set the 'list_op_pending' word to the address of the 'lock word'
1) set the 'list_op_pending' word to the address of the 'lock entry'
to be inserted,
2) acquire the futex lock,
3) add the lock entry, with its thread id (TID) in the bottom 29 bits
@ -143,7 +143,7 @@ On insertion:
4) clear the 'list_op_pending' word.
On removal:
1) set the 'list_op_pending' word to the address of the 'lock word'
1) set the 'list_op_pending' word to the address of the 'lock entry'
to be removed,
2) remove the lock entry for this lock from the 'head' list,
2) release the futex lock, and

View file

@ -73,7 +73,7 @@ The remaining CPU time will be used for user input and other tasks. Because
realtime tasks have explicitly allocated the CPU time they need to perform
their tasks, buffer underruns in the graphics or audio can be eliminated.
NOTE: the above example is not fully implemented as of yet (2.6.25). We still
NOTE: the above example is not fully implemented yet. We still
lack an EDF scheduler to make non-uniform periods usable.
@ -140,14 +140,15 @@ The other option is:
.o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
This uses the /cgroup virtual file system and "/cgroup/<cgroup>/cpu.rt_runtime_us"
to control the CPU time reserved for each control group instead.
This uses the /cgroup virtual file system and
"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each
control group instead.
For more information on working with control groups, you should read
Documentation/cgroups/cgroups.txt as well.
Group settings are checked against the following limits in order to keep the configuration
schedulable:
Group settings are checked against the following limits in order to keep the
configuration schedulable:
\Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
@ -189,7 +190,7 @@ Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now).
deadline delta (deadline - now)).
This means the whole PI machinery will have to be reworked - and that is one of
the most complex pieces of code we have.

View file

@ -1,10 +1,11 @@
SCSI FC Tansport
=============================================
Date: 4/12/2007
Date: 11/18/2008
Kernel Revisions for features:
rports : <<TBS>>
vports : 2.6.22 (? TBD)
vports : 2.6.22
bsg support : 2.6.30 (?TBD?)
Introduction
@ -15,6 +16,7 @@ The FC transport can be found at:
drivers/scsi/scsi_transport_fc.c
include/scsi/scsi_transport_fc.h
include/scsi/scsi_netlink_fc.h
include/scsi/scsi_bsg_fc.h
This file is found at Documentation/scsi/scsi_fc_transport.txt
@ -472,6 +474,14 @@ int
fc_vport_terminate(struct fc_vport *vport)
FC BSG support (CT & ELS passthru, and more)
========================================================================
<< To Be Supplied >>
Credits
=======
The following people have contributed to this document:

View file

@ -1271,6 +1271,11 @@ of interest:
hostdata[0] - area reserved for LLD at end of struct Scsi_Host. Size
is set by the second argument (named 'xtr_bytes') to
scsi_host_alloc() or scsi_register().
vendor_id - a unique value that identifies the vendor supplying
the LLD for the Scsi_Host. Used most often in validating
vendor-specific message requests. Value consists of an
identifier type and a vendor-specific value.
See scsi_netlink.h for a description of valid formats.
The scsi_host structure is defined in include/scsi/scsi_host.h

View file

@ -139,6 +139,7 @@ ALC883/888
acer Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc)
acer-aspire Acer Aspire 9810
acer-aspire-4930g Acer Aspire 4930G
acer-aspire-6530g Acer Aspire 6530G
acer-aspire-8930g Acer Aspire 8930G
medion Medion Laptops
medion-md2 Medion MD2
@ -239,6 +240,7 @@ AD1986A
laptop-automute 2-channel with EAPD and HP-automute (Lenovo N100)
ultra 2-channel with EAPD (Samsung Ultra tablet PC)
samsung 2-channel with EAPD (Samsung R65)
samsung-p50 2-channel with HP-automute (Samsung P50)
AD1988/AD1988B/AD1989A/AD1989B
==============================

View file

@ -101,6 +101,8 @@ card*/pcm*/xrun_debug
bit 0 = Enable XRUN/jiffies debug messages
bit 1 = Show stack trace at XRUN / jiffies check
bit 2 = Enable additional jiffies check
bit 3 = Log hwptr update at each period interrupt
bit 4 = Log hwptr update at each snd_pcm_update_hw_ptr()
When the bit 0 is set, the driver will show the messages to
kernel log when an xrun is detected. The debug message is
@ -117,6 +119,9 @@ card*/pcm*/xrun_debug
buggy) hardware that doesn't give smooth pointer updates.
This feature is enabled via the bit 2.
Bits 3 and 4 are for logging the hwptr records. Note that
these will give flood of kernel messages.
card*/pcm*/sub*/info
The general information of this PCM sub-stream.

View file

@ -99,11 +99,13 @@ void parse_opts(int argc, char *argv[])
{ "lsb", 0, 0, 'L' },
{ "cs-high", 0, 0, 'C' },
{ "3wire", 0, 0, '3' },
{ "no-cs", 0, 0, 'N' },
{ "ready", 0, 0, 'R' },
{ NULL, 0, 0, 0 },
};
int c;
c = getopt_long(argc, argv, "D:s:d:b:lHOLC3", lopts, NULL);
c = getopt_long(argc, argv, "D:s:d:b:lHOLC3NR", lopts, NULL);
if (c == -1)
break;
@ -139,6 +141,12 @@ void parse_opts(int argc, char *argv[])
case '3':
mode |= SPI_3WIRE;
break;
case 'N':
mode |= SPI_NO_CS;
break;
case 'R':
mode |= SPI_READY;
break;
default:
print_usage(argv[0]);
break;

View file

@ -66,7 +66,8 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
'b' - Will immediately reboot the system without syncing or unmounting
your disks.
'c' - Will perform a kexec reboot in order to take a crashdump.
'c' - Will perform a system crash by a NULL pointer dereference.
A crashdump will be taken if configured.
'd' - Shows all locks that are held.
@ -141,8 +142,8 @@ useful when you want to exit a program that will not let you switch consoles.
re'B'oot is good when you're unable to shut down. But you should also 'S'ync
and 'U'mount first.
'C'rashdump can be used to manually trigger a crashdump when the system is hung.
The kernel needs to have been built with CONFIG_KEXEC enabled.
'C'rash can be used to manually trigger a crashdump when the system is hung.
Note that this just triggers a crash if there is no dump mechanism available.
'S'ync is great when your system is locked up, it allows you to sync your
disks and will certainly lessen the chance of data loss and fscking. Note

View file

@ -83,6 +83,15 @@ When reading one of these enable files, there are four results:
X - there is a mixture of events enabled and disabled
? - this file does not affect any event
2.3 Boot option
---------------
In order to facilitate early boot debugging, use boot option:
trace_event=[event-list]
The format of this boot option is the same as described in section 2.1.
3. Defining an event-enabled tracepoint
=======================================

View file

@ -85,26 +85,19 @@ of ftrace. Here is a list of some of the key files:
This file holds the output of the trace in a human
readable format (described below).
latency_trace:
This file shows the same trace but the information
is organized more to display possible latencies
in the system (described below).
trace_pipe:
The output is the same as the "trace" file but this
file is meant to be streamed with live tracing.
Reads from this file will block until new data
is retrieved. Unlike the "trace" and "latency_trace"
files, this file is a consumer. This means reading
from this file causes sequential reads to display
more current data. Once data is read from this
file, it is consumed, and will not be read
again with a sequential read. The "trace" and
"latency_trace" files are static, and if the
tracer is not adding more data, they will display
the same information every time they are read.
Reads from this file will block until new data is
retrieved. Unlike the "trace" file, this file is a
consumer. This means reading from this file causes
sequential reads to display more current data. Once
data is read from this file, it is consumed, and
will not be read again with a sequential read. The
"trace" file is static, and if the tracer is not
adding more data,they will display the same
information every time they are read.
trace_options:
@ -117,10 +110,10 @@ of ftrace. Here is a list of some of the key files:
Some of the tracers record the max latency.
For example, the time interrupts are disabled.
This time is saved in this file. The max trace
will also be stored, and displayed by either
"trace" or "latency_trace". A new max trace will
only be recorded if the latency is greater than
the value in this file. (in microseconds)
will also be stored, and displayed by "trace".
A new max trace will only be recorded if the
latency is greater than the value in this
file. (in microseconds)
buffer_size_kb:
@ -210,7 +203,7 @@ Here is the list of current tracers that may be configured.
the trace with the longest max latency.
See tracing_max_latency. When a new max is recorded,
it replaces the old trace. It is best to view this
trace via the latency_trace file.
trace with the latency-format option enabled.
"preemptoff"
@ -307,8 +300,8 @@ the lowest priority thread (pid 0).
Latency trace format
--------------------
For traces that display latency times, the latency_trace file
gives somewhat more information to see why a latency happened.
When the latency-format option is enabled, the trace file gives
somewhat more information to see why a latency happened.
Here is a typical trace.
# tracer: irqsoff
@ -380,9 +373,10 @@ explains which is which.
The above is mostly meaningful for kernel developers.
time: This differs from the trace file output. The trace file output
includes an absolute timestamp. The timestamp used by the
latency_trace file is relative to the start of the trace.
time: When the latency-format option is enabled, the trace file
output includes a timestamp relative to the start of the
trace. This differs from the output when latency-format
is disabled, which includes an absolute timestamp.
delay: This is just to help catch your eye a bit better. And
needs to be fixed to be only relative to the same CPU.
@ -440,7 +434,8 @@ Here are the available options:
sym-addr:
bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
verbose - This deals with the latency_trace file.
verbose - This deals with the trace file when the
latency-format option is enabled.
bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
(+0.000ms): simple_strtoul (strict_strtoul)
@ -472,7 +467,7 @@ Here are the available options:
the app is no longer running
The lookup is performed when you read
trace,trace_pipe,latency_trace. Example:
trace,trace_pipe. Example:
a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
@ -481,6 +476,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
every scheduling event. Will add overhead if
there's a lot of tasks running at once.
latency-format - This option changes the trace. When
it is enabled, the trace displays
additional information about the
latencies, as described in "Latency
trace format".
sched_switch
------------
@ -596,12 +596,13 @@ To reset the maximum, echo 0 into tracing_max_latency. Here is
an example:
# echo irqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: irqsoff
#
irqsoff latency trace v1.1.5 on 2.6.26
@ -703,12 +704,13 @@ which preemption was disabled. The control of preemptoff tracer
is much like the irqsoff tracer.
# echo preemptoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: preemptoff
#
preemptoff latency trace v1.1.5 on 2.6.26-rc8
@ -850,12 +852,13 @@ Again, using this trace is much like the irqsoff and preemptoff
tracers.
# echo preemptirqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: preemptirqsoff
#
preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
@ -1012,11 +1015,12 @@ Instead of performing an 'ls', we will run 'sleep 1' under
'chrt' which changes the priority of the task.
# echo wakeup > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# chrt -f 5 sleep 1
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.26-rc8

View file

@ -0,0 +1,42 @@
" Enable folding for ftrace function_graph traces.
"
" To use, :source this file while viewing a function_graph trace, or use vim's
" -S option to load from the command-line together with a trace. You can then
" use the usual vim fold commands, such as "za", to open and close nested
" functions. While closed, a fold will show the total time taken for a call,
" as would normally appear on the line with the closing brace. Folded
" functions will not include finish_task_switch(), so folding should remain
" relatively sane even through a context switch.
"
" Note that this will almost certainly only work well with a
" single-CPU trace (e.g. trace-cmd report --cpu 1).
function! FunctionGraphFoldExpr(lnum)
let line = getline(a:lnum)
if line[-1:] == '{'
if line =~ 'finish_task_switch() {$'
return '>1'
endif
return 'a1'
elseif line[-1:] == '}'
return 's1'
else
return '='
endif
endfunction
function! FunctionGraphFoldText()
let s = split(getline(v:foldstart), '|', 1)
if getline(v:foldend+1) =~ 'finish_task_switch() {$'
let s[2] = ' task switch '
else
let e = split(getline(v:foldend), '|', 1)
let s[2] = e[2]
endif
return join(s, '|')
endfunction
setlocal foldexpr=FunctionGraphFoldExpr(v:lnum)
setlocal foldtext=FunctionGraphFoldText()
setlocal foldcolumn=12
setlocal foldmethod=expr

View file

@ -0,0 +1,955 @@
Lockless Ring Buffer Design
===========================
Copyright 2009 Red Hat Inc.
Author: Steven Rostedt <srostedt@redhat.com>
License: The GNU Free Documentation License, Version 1.2
(dual licensed under the GPL v2)
Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
and Frederic Weisbecker.
Written for: 2.6.31
Terminology used in this Document
---------------------------------
tail - where new writes happen in the ring buffer.
head - where new reads happen in the ring buffer.
producer - the task that writes into the ring buffer (same as writer)
writer - same as producer
consumer - the task that reads from the buffer (same as reader)
reader - same as consumer.
reader_page - A page outside the ring buffer used solely (for the most part)
by the reader.
head_page - a pointer to the page that the reader will use next
tail_page - a pointer to the page that will be written to next
commit_page - a pointer to the page with the last finished non nested write.
cmpxchg - hardware assisted atomic transaction that performs the following:
A = B iff previous A == C
R = cmpxchg(A, C, B) is saying that we replace A with B if and only if
current A is equal to C, and we put the old (current) A into R
R gets the previous A regardless if A is updated with B or not.
To see if the update was successful a compare of R == C may be used.
The Generic Ring Buffer
-----------------------
The ring buffer can be used in either an overwrite mode or in
producer/consumer mode.
Producer/consumer mode is where the producer were to fill up the
buffer before the consumer could free up anything, the producer
will stop writing to the buffer. This will lose most recent events.
Overwrite mode is where the produce were to fill up the buffer
before the consumer could free up anything, the producer will
overwrite the older data. This will lose the oldest events.
No two writers can write at the same time (on the same per cpu buffer),
but a writer may interrupt another writer, but it must finish writing
before the previous writer may continue. This is very important to the
algorithm. The writers act like a "stack". The way interrupts works
enforces this behavior.
writer1 start
<preempted> writer2 start
<preempted> writer3 start
writer3 finishes
writer2 finishes
writer1 finishes
This is very much like a writer being preempted by an interrupt and
the interrupt doing a write as well.
Readers can happen at any time. But no two readers may run at the
same time, nor can a reader preempt/interrupt another reader. A reader
can not preempt/interrupt a writer, but it may read/consume from the
buffer at the same time as a writer is writing, but the reader must be
on another processor to do so. A reader may read on its own processor
and can be preempted by a writer.
A writer can preempt a reader, but a reader can not preempt a writer.
But a reader can read the buffer at the same time (on another processor)
as a writer.
The ring buffer is made up of a list of pages held together by a link list.
At initialization a reader page is allocated for the reader that is not
part of the ring buffer.
The head_page, tail_page and commit_page are all initialized to point
to the same page.
The reader page is initialized to have its next pointer pointing to
the head page, and its previous pointer pointing to a page before
the head page.
The reader has its own page to use. At start up time, this page is
allocated but is not attached to the list. When the reader wants
to read from the buffer, if its page is empty (like it is on start up)
it will swap its page with the head_page. The old reader page will
become part of the ring buffer and the head_page will be removed.
The page after the inserted page (old reader_page) will become the
new head page.
Once the new page is given to the reader, the reader could do what
it wants with it, as long as a writer has left that page.
A sample of how the reader page is swapped: Note this does not
show the head page in the buffer, it is for demonstrating a swap
only.
+------+
|reader| RING BUFFER
|page |
+------+
+---+ +---+ +---+
| |-->| |-->| |
| |<--| |<--| |
+---+ +---+ +---+
^ | ^ |
| +-------------+ |
+-----------------+
+------+
|reader| RING BUFFER
|page |-------------------+
+------+ v
| +---+ +---+ +---+
| | |-->| |-->| |
| | |<--| |<--| |<-+
| +---+ +---+ +---+ |
| ^ | ^ | |
| | +-------------+ | |
| +-----------------+ |
+------------------------------------+
+------+
|reader| RING BUFFER
|page |-------------------+
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | |-->| |-->| |
| | | | | |<--| |<-+
| | +---+ +---+ +---+ |
| | | ^ | |
| | +-------------+ | |
| +-----------------------------+ |
+------------------------------------+
+------+
|buffer| RING BUFFER
|page |-------------------+
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | | | |-->| |
| | New | | | |<--| |<-+
| | Reader +---+ +---+ +---+ |
| | page ----^ | |
| | | |
| +-----------------------------+ |
+------------------------------------+
It is possible that the page swapped is the commit page and the tail page,
if what is in the ring buffer is less than what is held in a buffer page.
reader page commit page tail page
| | |
v | |
+---+ | |
| |<----------+ |
| |<------------------------+
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
This case is still valid for this algorithm.
When the writer leaves the page, it simply goes into the ring buffer
since the reader page still points to the next location in the ring
buffer.
The main pointers:
reader page - The page used solely by the reader and is not part
of the ring buffer (may be swapped in)
head page - the next page in the ring buffer that will be swapped
with the reader page.
tail page - the page where the next write will take place.
commit page - the page that last finished a write.
The commit page only is updated by the outer most writer in the
writer stack. A writer that preempts another writer will not move the
commit page.
When data is written into the ring buffer, a position is reserved
in the ring buffer and passed back to the writer. When the writer
is finished writing data into that position, it commits the write.
Another write (or a read) may take place at anytime during this
transaction. If another write happens it must finish before continuing
with the previous write.
Write reserve:
Buffer page
+---------+
|written |
+---------+ <--- given back to writer (current commit)
|reserved |
+---------+ <--- tail pointer
| empty |
+---------+
Write commit:
Buffer page
+---------+
|written |
+---------+
|written |
+---------+ <--- next positon for write (current commit)
| empty |
+---------+
If a write happens after the first reserve:
Buffer page
+---------+
|written |
+---------+ <-- current commit
|reserved |
+---------+ <--- given back to second writer
|reserved |
+---------+ <--- tail pointer
After second writer commits:
Buffer page
+---------+
|written |
+---------+ <--(last full commit)
|reserved |
+---------+
|pending |
|commit |
+---------+ <--- tail pointer
When the first writer commits:
Buffer page
+---------+
|written |
+---------+
|written |
+---------+
|written |
+---------+ <--(last full commit and tail pointer)
The commit pointer points to the last write location that was
committed without preempting another write. When a write that
preempted another write is committed, it only becomes a pending commit
and will not be a full commit till all writes have been committed.
The commit page points to the page that has the last full commit.
The tail page points to the page with the last write (before
committing).
The tail page is always equal to or after the commit page. It may
be several pages ahead. If the tail page catches up to the commit
page then no more writes may take place (regardless of the mode
of the ring buffer: overwrite and produce/consumer).
The order of pages are:
head page
commit page
tail page
Possible scenario:
tail page
head page commit page |
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
There is a special case that the head page is after either the commit page
and possibly the tail page. That is when the commit (and tail) page has been
swapped with the reader page. This is because the head page is always
part of the ring buffer, but the reader page is not. When ever there
has been less than a full page that has been committed inside the ring buffer,
and a reader swaps out a page, it will be swapping out the commit page.
reader page commit page tail page
| | |
v | |
+---+ | |
| |<----------+ |
| |<------------------------+
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
In this case, the head page will not move when the tail and commit
move back into the ring buffer.
The reader can not swap a page into the ring buffer if the commit page
is still on that page. If the read meets the last commit (real commit
not pending or reserved), then there is nothing more to read.
The buffer is considered empty until another full commit finishes.
When the tail meets the head page, if the buffer is in overwrite mode,
the head page will be pushed ahead one. If the buffer is in producer/consumer
mode, the write will fail.
Overwrite mode:
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
Note, the reader page will still point to the previous head page.
But when a swap takes place, it will use the most recent head page.
Making the Ring Buffer Lockless:
--------------------------------
The main idea behind the lockless algorithm is to combine the moving
of the head_page pointer with the swapping of pages with the reader.
State flags are placed inside the pointer to the page. To do this,
each page must be aligned in memory by 4 bytes. This will allow the 2
least significant bits of the address to be used as flags. Since
they will always be zero for the address. To get the address,
simply mask out the flags.
MASK = ~3
address & MASK
Two flags will be kept by these two bits:
HEADER - the page being pointed to is a head page
UPDATE - the page being pointed to is being updated by a writer
and was or is about to be a head page.
reader page
|
v
+---+
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above pointer "-H->" would have the HEADER flag set. That is
the next page is the next page to be swapped out by the reader.
This pointer means the next page is the head page.
When the tail page meets the head pointer, it will use cmpxchg to
change the pointer to the UPDATE state:
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
"-U->" represents a pointer in the UPDATE state.
Any access to the reader will need to take some sort of lock to serialize
the readers. But the writers will never take a lock to write to the
ring buffer. This means we only need to worry about a single reader,
and writes only preempt in "stack" formation.
When the reader tries to swap the page with the ring buffer, it
will also use cmpxchg. If the flag bit in the pointer to the
head page does not have the HEADER flag set, the compare will fail
and the reader will need to look for the new head page and try again.
Note, the flag UPDATE and HEADER are never set at the same time.
The reader swaps the reader page as follows:
+------+
|reader| RING BUFFER
|page |
+------+
+---+ +---+ +---+
| |--->| |--->| |
| |<---| |<---| |
+---+ +---+ +---+
^ | ^ |
| +---------------+ |
+-----H-------------+
The reader sets the reader page next pointer as HEADER to the page after
the head page.
+------+
|reader| RING BUFFER
|page |-------H-----------+
+------+ v
| +---+ +---+ +---+
| | |--->| |--->| |
| | |<---| |<---| |<-+
| +---+ +---+ +---+ |
| ^ | ^ | |
| | +---------------+ | |
| +-----H-------------+ |
+--------------------------------------+
It does a cmpxchg with the pointer to the previous head page to make it
point to the reader page. Note that the new pointer does not have the HEADER
flag set. This action atomically moves the head page forward.
+------+
|reader| RING BUFFER
|page |-------H-----------+
+------+ v
| ^ +---+ +---+ +---+
| | | |-->| |-->| |
| | | |<--| |<--| |<-+
| | +---+ +---+ +---+ |
| | | ^ | |
| | +-------------+ | |
| +-----------------------------+ |
+------------------------------------+
After the new head page is set, the previous pointer of the head page is
updated to the reader page.
+------+
|reader| RING BUFFER
|page |-------H-----------+
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | |-->| |-->| |
| | | | | |<--| |<-+
| | +---+ +---+ +---+ |
| | | ^ | |
| | +-------------+ | |
| +-----------------------------+ |
+------------------------------------+
+------+
|buffer| RING BUFFER
|page |-------H-----------+ <--- New head page
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | | | |-->| |
| | New | | | |<--| |<-+
| | Reader +---+ +---+ +---+ |
| | page ----^ | |
| | | |
| +-----------------------------+ |
+------------------------------------+
Another important point. The page that the reader page points back to
by its previous pointer (the one that now points to the new head page)
never points back to the reader page. That is because the reader page is
not part of the ring buffer. Traversing the ring buffer via the next pointers
will always stay in the ring buffer. Traversing the ring buffer via the
prev pointers may not.
Note, the way to determine a reader page is simply by examining the previous
pointer of the page. If the next pointer of the previous page does not
point back to the original page, then the original page is a reader page:
+--------+
| reader | next +----+
| page |-------->| |<====== (buffer page)
+--------+ +----+
| | ^
| v | next
prev | +----+
+------------->| |
+----+
The way the head page moves forward:
When the tail page meets the head page and the buffer is in overwrite mode
and more writes take place, the head page must be moved forward before the
writer may move the tail page. The way this is done is that the writer
performs a cmpxchg to convert the pointer to the head page from the HEADER
flag to have the UPDATE flag set. Once this is done, the reader will
not be able to swap the head page from the buffer, nor will it be able to
move the head page, until the writer is finished with the move.
This eliminates any races that the reader can have on the writer. The reader
must spin, and this is why the reader can not preempt the writer.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The following page will be made into the new head page.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the new head page has been set, we can set the old head page
pointer back to NORMAL.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the head page has been moved, the tail page may now move forward.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above are the trivial updates. Now for the more complex scenarios.
As stated before, if enough writes preempt the first write, the
tail page may make it all the way around the buffer and meet the commit
page. At this time, we must start dropping writes (usually with some kind
of warning to the user). But what happens if the commit was still on the
reader page? The commit page is not part of the ring buffer. The tail page
must account for this.
reader page commit page
| |
v |
+---+ |
| |<----------+
| |
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
tail page
If the tail page were to simply push the head page forward, the commit when
leaving the reader page would not be pointing to the correct page.
The solution to this is to test if the commit page is on the reader page
before pushing the head page. If it is, then it can be assumed that the
tail page wrapped the buffer, and we must drop new writes.
This is not a race condition, because the commit page can only be moved
by the outter most writer (the writer that was preempted).
This means that the commit will not move while a writer is moving the
tail page. The reader can not swap the reader page if it is also being
used as the commit page. The reader can simply check that the commit
is off the reader page. Once the commit page leaves the reader page
it will never go back on it unless a reader does another swap with the
buffer page that is also the commit page.
Nested writes
-------------
In the pushing forward of the tail page we must first push forward
the head page if the head page is the next page. If the head page
is not the next page, the tail page is simply updated with a cmpxchg.
Only writers move the tail page. This must be done atomically to protect
against nested writers.
temp_page = tail_page
next_page = temp_page->next
cmpxchg(tail_page, temp_page, next_page)
The above will update the tail page if it is still pointing to the expected
page. If this fails, a nested write pushed it forward, the the current write
does not need to push it.
temp page
|
v
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Nested write comes in and moves the tail page forward:
tail page (moved by nested writer)
temp page |
| |
v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above would fail the cmpxchg, but since the tail page has already
been moved forward, the writer will just try again to reserve storage
on the new tail page.
But the moving of the head page is a bit more complex.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But if a nested writer preempts here. It will see that the next
page is a head page, but it is also nested. It will detect that
it is nested and will save that information. The detection is the
fact that it sees the UPDATE flag instead of a HEADER or NORMAL
pointer.
The nested writer will set the new head page pointer.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But it will not reset the update back to normal. Only the writer
that converted a pointer from HEAD to UPDATE will convert it back
to NORMAL.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the nested writer finishes, the outer most writer will convert
the UPDATE pointer to NORMAL.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
It can be even more complex if several nested writes came in and moved
the tail page ahead several pages:
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Next writer comes in, and sees the update and sets up the new
head page.
(second writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The nested writer moves the tail page forward. But does not set the old
update page to NORMAL because it is not the outer most writer.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Another writer preempts and sees the page after the tail page is a head page.
It changes it from HEAD to UPDATE.
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The writer will move the head page forward:
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But now that the third writer did change the HEAD flag to UPDATE it
will convert it to normal:
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Then it will move the tail page, and return back to the second writer.
(second writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The second writer will fail to move the tail page because it was already
moved, so it will try again and add its data to the new tail page.
It will return to the first writer.
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The first writer can not know atomically test if the tail page moved
while it updates the HEAD page. It will then update the head page to
what it thinks is the new head page.
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Since the cmpxchg returns the old value of the pointer the first writer
will see it succeeded in updating the pointer from NORMAL to HEAD.
But as we can see, this is not good enough. It must also check to see
if the tail page is either where it use to be or on the next page:
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
If tail page != A and tail page does not equal B, then it must reset the
pointer back to NORMAL. The fact that it only needs to worry about
nested writers, it only needs to check this after setting the HEAD page.
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Now the writer can update the head page. This is also why the head page must
remain in UPDATE and only reset by the outer most writer. This prevents
the reader from seeing the incorrect head page.
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+

View file

@ -6,8 +6,8 @@
5 -> Leadtek Winfast 2000XP Expert [107d:6611,107d:6613]
6 -> AverTV Studio 303 (M126) [1461:000b]
7 -> MSI TV-@nywhere Master [1462:8606]
8 -> Leadtek Winfast DV2000 [107d:6620]
9 -> Leadtek PVR 2000 [107d:663b,107d:663c,107d:6632]
8 -> Leadtek Winfast DV2000 [107d:6620,107d:6621]
9 -> Leadtek PVR 2000 [107d:663b,107d:663c,107d:6632,107d:6630,107d:6638,107d:6631,107d:6637,107d:663d]
10 -> IODATA GV-VCP3/PCI [10fc:d003]
11 -> Prolink PlayTV PVR
12 -> ASUS PVR-416 [1043:4823,1461:c111]
@ -59,7 +59,7 @@
58 -> Pinnacle PCTV HD 800i [11bd:0051]
59 -> DViCO FusionHDTV 5 PCI nano [18ac:d530]
60 -> Pinnacle Hybrid PCTV [12ab:1788]
61 -> Winfast TV2000 XP Global [107d:6f18]
61 -> Leadtek TV2000 XP Global [107d:6f18,107d:6618]
62 -> PowerColor RA330 [14f1:ea3d]
63 -> Geniatech X8000-MT DVBT [14f1:8852]
64 -> DViCO FusionHDTV DVB-T PRO [18ac:db30]

View file

@ -1,5 +1,5 @@
0 -> Unknown EM2800 video grabber (em2800) [eb1a:2800]
1 -> Unknown EM2750/28xx video grabber (em2820/em2840) [eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883]
1 -> Unknown EM2750/28xx video grabber (em2820/em2840) [eb1a:2710,eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883]
2 -> Terratec Cinergy 250 USB (em2820/em2840) [0ccd:0036]
3 -> Pinnacle PCTV USB 2 (em2820/em2840) [2304:0208]
4 -> Hauppauge WinTV USB 2 (em2820/em2840) [2040:4200,2040:4201]
@ -20,7 +20,7 @@
19 -> EM2860/SAA711X Reference Design (em2860)
20 -> AMD ATI TV Wonder HD 600 (em2880) [0438:b002]
21 -> eMPIA Technology, Inc. GrabBeeX+ Video Encoder (em2800) [eb1a:2801]
22 -> Unknown EM2750/EM2751 webcam grabber (em2750) [eb1a:2750,eb1a:2751]
22 -> EM2710/EM2750/EM2751 webcam grabber (em2750) [eb1a:2750,eb1a:2751]
23 -> Huaqi DLCW-130 (em2750)
24 -> D-Link DUB-T210 TV Tuner (em2820/em2840) [2001:f112]
25 -> Gadmei UTV310 (em2820/em2840)
@ -65,3 +65,5 @@
67 -> Terratec Grabby (em2860) [0ccd:0096]
68 -> Terratec AV350 (em2860) [0ccd:0084]
69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313]
70 -> Evga inDtube (em2882)
71 -> Silvercrest Webcam 1.3mpix (em2820/em2840)

View file

@ -153,8 +153,8 @@
152 -> Asus Tiger Rev:1.00 [1043:4857]
153 -> Kworld Plus TV Analog Lite PCI [17de:7128]
154 -> Avermedia AVerTV GO 007 FM Plus [1461:f31d]
155 -> Hauppauge WinTV-HVR1120 ATSC/QAM-Hybrid [0070:6706,0070:6708]
156 -> Hauppauge WinTV-HVR1110r3 DVB-T/Hybrid [0070:6707,0070:6709,0070:670a]
155 -> Hauppauge WinTV-HVR1150 ATSC/QAM-Hybrid [0070:6706,0070:6708]
156 -> Hauppauge WinTV-HVR1120 DVB-T/Hybrid [0070:6707,0070:6709,0070:670a]
157 -> Avermedia AVerTV Studio 507UA [1461:a11b]
158 -> AVerMedia Cardbus TV/Radio (E501R) [1461:b7e9]
159 -> Beholder BeholdTV 505 RDS [0000:505B]

View file

@ -44,7 +44,9 @@ zc3xx 0458:7007 Genius VideoCam V2
zc3xx 0458:700c Genius VideoCam V3
zc3xx 0458:700f Genius VideoCam Web V2
sonixj 0458:7025 Genius Eye 311Q
sn9c20x 0458:7029 Genius Look 320s
sonixj 0458:702e Genius Slim 310 NB
sn9c20x 045e:00f4 LifeCam VX-6000 (SN9C20x + OV9650)
sonixj 045e:00f5 MicroSoft VX3000
sonixj 045e:00f7 MicroSoft VX1000
ov519 045e:028c Micro$oft xbox cam
@ -282,6 +284,28 @@ sonixj 0c45:613a Microdia Sonix PC Camera
sonixj 0c45:613b Surfer SN-206
sonixj 0c45:613c Sonix Pccam168
sonixj 0c45:6143 Sonix Pccam168
sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001)
sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111)
sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655)
sn9c20x 0c45:624e PC Camera (SN9C201 + SOI968)
sn9c20x 0c45:624f PC Camera (SN9C201 + OV9650)
sn9c20x 0c45:6251 PC Camera (SN9C201 + OV9650)
sn9c20x 0c45:6253 PC Camera (SN9C201 + OV9650)
sn9c20x 0c45:6260 PC Camera (SN9C201 + OV7670)
sn9c20x 0c45:6270 PC Camera (SN9C201 + MT9V011/MT9V111/MT9V112)
sn9c20x 0c45:627b PC Camera (SN9C201 + OV7660)
sn9c20x 0c45:627c PC Camera (SN9C201 + HV7131R)
sn9c20x 0c45:627f PC Camera (SN9C201 + OV9650)
sn9c20x 0c45:6280 PC Camera (SN9C202 + MT9M001)
sn9c20x 0c45:6282 PC Camera (SN9C202 + MT9M111)
sn9c20x 0c45:6288 PC Camera (SN9C202 + OV9655)
sn9c20x 0c45:628e PC Camera (SN9C202 + SOI968)
sn9c20x 0c45:628f PC Camera (SN9C202 + OV9650)
sn9c20x 0c45:62a0 PC Camera (SN9C202 + OV7670)
sn9c20x 0c45:62b0 PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112)
sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655)
sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660)
sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R)
sunplus 0d64:0303 Sunplus FashionCam DXG
etoms 102c:6151 Qcam Sangha CIF
etoms 102c:6251 Qcam xxxxxx VGA
@ -290,6 +314,7 @@ spca561 10fd:7e50 FlyCam Usb 100
zc3xx 10fd:8050 Typhoon Webshot II USB 300k
ov534 1415:2000 Sony HD Eye for PS3 (SLEH 00201)
pac207 145f:013a Trust WB-1300N
sn9c20x 145f:013d Trust WB-3600R
vc032x 15b8:6001 HP 2.0 Megapixel
vc032x 15b8:6002 HP 2.0 Megapixel rz406aa
spca501 1776:501c Arowana 300K CMOS Camera
@ -300,4 +325,11 @@ spca500 2899:012c Toptro Industrial
spca508 8086:0110 Intel Easy PC Camera
spca500 8086:0630 Intel Pocket PC Camera
spca506 99fa:8988 Grandtec V.cap
sn9c20x a168:0610 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
sn9c20x a168:0611 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
sn9c20x a168:0613 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
sn9c20x a168:0618 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
sn9c20x a168:0614 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
sn9c20x a168:0615 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
sn9c20x a168:0617 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
spca561 abcd:cdee Petcam

View file

@ -390,6 +390,30 @@ later date. It differs between i2c drivers and as such can be confusing.
To see which chip variants are supported you can look in the i2c driver code
for the i2c_device_id table. This lists all the possibilities.
There are two more helper functions:
v4l2_i2c_new_subdev_cfg: this function adds new irq and platform_data
arguments and has both 'addr' and 'probed_addrs' arguments: if addr is not
0 then that will be used (non-probing variant), otherwise the probed_addrs
are probed.
For example: this will probe for address 0x10:
struct v4l2_subdev *sd = v4l2_i2c_new_subdev_cfg(v4l2_dev, adapter,
"module_foo", "chipid", 0, NULL, 0, I2C_ADDRS(0x10));
v4l2_i2c_new_subdev_board uses an i2c_board_info struct which is passed
to the i2c driver and replaces the irq, platform_data and addr arguments.
If the subdev supports the s_config core ops, then that op is called with
the irq and platform_data arguments after the subdev was setup. The older
v4l2_i2c_new_(probed_)subdev functions will call s_config as well, but with
irq set to 0 and platform_data set to NULL.
Note that in the next kernel release the functions v4l2_i2c_new_subdev,
v4l2_i2c_new_probed_subdev and v4l2_i2c_new_probed_subdev_addr will all be
replaced by a single v4l2_i2c_new_subdev that is identical to
v4l2_i2c_new_subdev_cfg but without the irq and platform_data arguments.
struct video_device
-------------------

View file

@ -2,7 +2,7 @@
obj- := dummy.o
# List of programs to build
hostprogs-y := slabinfo slqbinfo page-types
hostprogs-y := slabinfo page-types
# Tell kbuild to always build the programs
always := $(hostprogs-y)

View file

@ -0,0 +1,95 @@
Last reviewed: 06/02/2009
HP iLO2 NMI Watchdog Driver
NMI sourcing for iLO2 based ProLiant Servers
Documentation and Driver by
Thomas Mingarelli <thomas.mingarelli@hp.com>
The HP iLO2 NMI Watchdog driver is a kernel module that provides basic
watchdog functionality and the added benefit of NMI sourcing. Both the
watchdog functionality and the NMI sourcing capability need to be enabled
by the user. Remember that the two modes are not dependant on one another.
A user can have the NMI sourcing without the watchdog timer and vice-versa.
Watchdog functionality is enabled like any other common watchdog driver. That
is, an application needs to be started that kicks off the watchdog timer. A
basic application exists in the Documentation/watchdog/src directory called
watchdog-test.c. Simply compile the C file and kick it off. If the system
gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will
not be updated in a timely fashion and a hardware system reset (also known as
an Automatic Server Recovery (ASR)) event will occur.
The hpwdt driver also has four (4) module parameters. They are the following:
soft_margin - allows the user to set the watchdog timer value
allow_kdump - allows the user to save off a kernel dump image after an NMI
nowayout - basic watchdog parameter that does not allow the timer to
be restarted or an impending ASR to be escaped.
priority - determines whether or not the hpwdt driver is first on the
die_notify list to handle NMIs or last. The default value
for this module parameter is 0 or LAST. If the user wants to
enable NMI sourcing then reload the hpwdt driver with
priority=1 (and boot with nmi_watchdog=0).
NOTE: More information about watchdog drivers in general, including the ioctl
interface to /dev/watchdog can be found in
Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
The priority parameter was introduced due to other kernel software that relied
on handling NMIs (like oprofile). Keeping hpwdt's priority at 0 (or LAST)
enables the users of NMIs for non critical events to be work as expected.
The NMI sourcing capability is disabled by default due to the inability to
distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
Linux kernel. What this means is that the hpwdt nmi handler code is called
each time the NMI signal fires off. This could amount to several thousands of
NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
confused" message in the logs or if the system gets into a hung state, then
the hpwdt driver can be reloaded with the "priority" module parameter set
(priority=1).
1. If the kernel has not been booted with nmi_watchdog turned off then
edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the
currently booting kernel line.
2. reboot the sever
3. Once the system comes up perform a rmmod hpwdt
4. insmod /lib/modules/`uname -r`/kernel/drivers/char/watchdog/hpwdt.ko priority=1
Now, the hpwdt can successfully receive and source the NMI and provide a log
message that details the reason for the NMI (as determined by the HP BIOS).
Below is a list of NMIs the HP BIOS understands along with the associated
code (reason):
No source found 00h
Uncorrectable Memory Error 01h
ASR NMI 1Bh
PCI Parity Error 20h
NMI Button Press 27h
SB_BUS_NMI 28h
ILO Doorbell NMI 29h
ILO IOP NMI 2Ah
ILO Watchdog NMI 2Bh
Proc Throt NMI 2Ch
Front Side Bus NMI 2Dh
PCI Express Error 2Fh
DMA controller NMI 30h
Hypertransport/CSI Error 31h
-- Tom Mingarelli
(thomas.mingarelli@hp.com)

View file

@ -2,3 +2,5 @@
- this file
mtrr.txt
- how to use x86 Memory Type Range Registers to increase performance
exception-tables.txt
- why and how Linux kernel uses exception tables on x86

View file

@ -1,123 +1,123 @@
Kernel level exception handling in Linux 2.1.8
Kernel level exception handling in Linux
Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com>
When a process runs in kernel mode, it often has to access user
mode memory whose address has been passed by an untrusted program.
When a process runs in kernel mode, it often has to access user
mode memory whose address has been passed by an untrusted program.
To protect itself the kernel has to verify this address.
In older versions of Linux this was done with the
int verify_area(int type, const void * addr, unsigned long size)
In older versions of Linux this was done with the
int verify_area(int type, const void * addr, unsigned long size)
function (which has since been replaced by access_ok()).
This function verified that the memory area starting at address
This function verified that the memory area starting at address
'addr' and of size 'size' was accessible for the operation specified
in type (read or write). To do this, verify_read had to look up the
virtual memory area (vma) that contained the address addr. In the
normal case (correctly working program), this test was successful.
in type (read or write). To do this, verify_read had to look up the
virtual memory area (vma) that contained the address addr. In the
normal case (correctly working program), this test was successful.
It only failed for a few buggy programs. In some kernel profiling
tests, this normally unneeded verification used up a considerable
amount of time.
To overcome this situation, Linus decided to let the virtual memory
To overcome this situation, Linus decided to let the virtual memory
hardware present in every Linux-capable CPU handle this test.
How does this work?
Whenever the kernel tries to access an address that is currently not
accessible, the CPU generates a page fault exception and calls the
page fault handler
Whenever the kernel tries to access an address that is currently not
accessible, the CPU generates a page fault exception and calls the
page fault handler
void do_page_fault(struct pt_regs *regs, unsigned long error_code)
in arch/i386/mm/fault.c. The parameters on the stack are set up by
the low level assembly glue in arch/i386/kernel/entry.S. The parameter
regs is a pointer to the saved registers on the stack, error_code
in arch/x86/mm/fault.c. The parameters on the stack are set up by
the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter
regs is a pointer to the saved registers on the stack, error_code
contains a reason code for the exception.
do_page_fault first obtains the unaccessible address from the CPU
control register CR2. If the address is within the virtual address
space of the process, the fault probably occurred, because the page
was not swapped in, write protected or something similar. However,
we are interested in the other case: the address is not valid, there
is no vma that contains this address. In this case, the kernel jumps
to the bad_area label.
do_page_fault first obtains the unaccessible address from the CPU
control register CR2. If the address is within the virtual address
space of the process, the fault probably occurred, because the page
was not swapped in, write protected or something similar. However,
we are interested in the other case: the address is not valid, there
is no vma that contains this address. In this case, the kernel jumps
to the bad_area label.
There it uses the address of the instruction that caused the exception
(i.e. regs->eip) to find an address where the execution can continue
(fixup). If this search is successful, the fault handler modifies the
return address (again regs->eip) and returns. The execution will
There it uses the address of the instruction that caused the exception
(i.e. regs->eip) to find an address where the execution can continue
(fixup). If this search is successful, the fault handler modifies the
return address (again regs->eip) and returns. The execution will
continue at the address in fixup.
Where does fixup point to?
Since we jump to the contents of fixup, fixup obviously points
to executable code. This code is hidden inside the user access macros.
I have picked the get_user macro defined in include/asm/uaccess.h as an
example. The definition is somewhat hard to follow, so let's peek at
Since we jump to the contents of fixup, fixup obviously points
to executable code. This code is hidden inside the user access macros.
I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h
as an example. The definition is somewhat hard to follow, so let's peek at
the code generated by the preprocessor and the compiler. I selected
the get_user call in drivers/char/console.c for a detailed examination.
the get_user call in drivers/char/sysrq.c for a detailed examination.
The original code in console.c line 1405:
The original code in sysrq.c line 587:
get_user(c, buf);
The preprocessor output (edited to become somewhat readable):
(
{
long __gu_err = - 14 , __gu_val = 0;
const __typeof__(*( ( buf ) )) *__gu_addr = ((buf));
if (((((0 + current_set[0])->tss.segment) == 0x18 ) ||
(((sizeof(*(buf))) <= 0xC0000000UL) &&
((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
{
long __gu_err = - 14 , __gu_val = 0;
const __typeof__(*( ( buf ) )) *__gu_addr = ((buf));
if (((((0 + current_set[0])->tss.segment) == 0x18 ) ||
(((sizeof(*(buf))) <= 0xC0000000UL) &&
((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
do {
__gu_err = 0;
switch ((sizeof(*(buf)))) {
case 1:
__asm__ __volatile__(
"1: mov" "b" " %2,%" "b" "1\n"
"2:\n"
".section .fixup,\"ax\"\n"
"3: movl %3,%0\n"
" xor" "b" " %" "b" "1,%" "b" "1\n"
" jmp 2b\n"
".section __ex_table,\"a\"\n"
" .align 4\n"
" .long 1b,3b\n"
".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ;
break;
case 2:
__gu_err = 0;
switch ((sizeof(*(buf)))) {
case 1:
__asm__ __volatile__(
"1: mov" "w" " %2,%" "w" "1\n"
"2:\n"
".section .fixup,\"ax\"\n"
"3: movl %3,%0\n"
" xor" "w" " %" "w" "1,%" "w" "1\n"
" jmp 2b\n"
".section __ex_table,\"a\"\n"
" .align 4\n"
" .long 1b,3b\n"
"1: mov" "b" " %2,%" "b" "1\n"
"2:\n"
".section .fixup,\"ax\"\n"
"3: movl %3,%0\n"
" xor" "b" " %" "b" "1,%" "b" "1\n"
" jmp 2b\n"
".section __ex_table,\"a\"\n"
" .align 4\n"
" .long 1b,3b\n"
".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ;
break;
case 2:
__asm__ __volatile__(
"1: mov" "w" " %2,%" "w" "1\n"
"2:\n"
".section .fixup,\"ax\"\n"
"3: movl %3,%0\n"
" xor" "w" " %" "w" "1,%" "w" "1\n"
" jmp 2b\n"
".section __ex_table,\"a\"\n"
" .align 4\n"
" .long 1b,3b\n"
".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err ));
break;
case 4:
__asm__ __volatile__(
"1: mov" "l" " %2,%" "" "1\n"
"2:\n"
".section .fixup,\"ax\"\n"
"3: movl %3,%0\n"
" xor" "l" " %" "" "1,%" "" "1\n"
" jmp 2b\n"
".section __ex_table,\"a\"\n"
" .align 4\n" " .long 1b,3b\n"
( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err ));
break;
case 4:
__asm__ __volatile__(
"1: mov" "l" " %2,%" "" "1\n"
"2:\n"
".section .fixup,\"ax\"\n"
"3: movl %3,%0\n"
" xor" "l" " %" "" "1,%" "" "1\n"
" jmp 2b\n"
".section __ex_table,\"a\"\n"
" .align 4\n" " .long 1b,3b\n"
".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err));
break;
default:
(__gu_val) = __get_user_bad();
}
} while (0) ;
((c)) = (__typeof__(*((buf))))__gu_val;
( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err));
break;
default:
(__gu_val) = __get_user_bad();
}
} while (0) ;
((c)) = (__typeof__(*((buf))))__gu_val;
__gu_err;
}
);
@ -127,12 +127,12 @@ see what code gcc generates:
> xorl %edx,%edx
> movl current_set,%eax
> cmpl $24,788(%eax)
> je .L1424
> cmpl $24,788(%eax)
> je .L1424
> cmpl $-1073741825,64(%esp)
> ja .L1423
> ja .L1423
> .L1424:
> movl %edx,%eax
> movl %edx,%eax
> movl 64(%esp),%ebx
> #APP
> 1: movb (%ebx),%dl /* this is the actual user access */
@ -149,17 +149,17 @@ see what code gcc generates:
> .L1423:
> movzbl %dl,%esi
The optimizer does a good job and gives us something we can actually
understand. Can we? The actual user access is quite obvious. Thanks
to the unified address space we can just access the address in user
The optimizer does a good job and gives us something we can actually
understand. Can we? The actual user access is quite obvious. Thanks
to the unified address space we can just access the address in user
memory. But what does the .section stuff do?????
To understand this we have to look at the final kernel:
> objdump --section-headers vmlinux
>
>
> vmlinux: file format elf32-i386
>
>
> Sections:
> Idx Name Size VMA LMA File off Algn
> 0 .text 00098f40 c0100000 c0100000 00001000 2**4
@ -198,18 +198,18 @@ final kernel executable:
The whole user memory access is reduced to 10 x86 machine instructions.
The instructions bracketed in the .section directives are no longer
in the normal execution path. They are located in a different section
in the normal execution path. They are located in a different section
of the executable file:
> objdump --disassemble --section=.fixup vmlinux
>
>
> c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax
> c0199ffa <.fixup+10ba> xorb %dl,%dl
> c0199ffc <.fixup+10bc> jmp c017e7a7 <do_con_write+e3>
And finally:
> objdump --full-contents --section=__ex_table vmlinux
>
>
> c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................
> c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................
> c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................
@ -235,8 +235,8 @@ sections in the ELF object file. So the instructions
ended up in the .fixup section of the object file and the addresses
.long 1b,3b
ended up in the __ex_table section of the object file. 1b and 3b
are local labels. The local label 1b (1b stands for next label 1
backward) is the address of the instruction that might fault, i.e.
are local labels. The local label 1b (1b stands for next label 1
backward) is the address of the instruction that might fault, i.e.
in our case the address of the label 1 is c017e7a5:
the original assembly code: > 1: movb (%ebx),%dl
and linked in vmlinux : > c017e7a5 <do_con_write+e1> movb (%ebx),%dl
@ -254,7 +254,7 @@ The assembly code
becomes the value pair
> c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................
^this is ^this is
1b 3b
1b 3b
c017e7a5,c0199ff5 in the exception table of the kernel.
So, what actually happens if a fault from kernel mode with no suitable
@ -266,9 +266,9 @@ vma occurs?
3.) CPU calls do_page_fault
4.) do page fault calls search_exception_table (regs->eip == c017e7a5);
5.) search_exception_table looks up the address c017e7a5 in the
exception table (i.e. the contents of the ELF section __ex_table)
exception table (i.e. the contents of the ELF section __ex_table)
and returns the address of the associated fault handle code c0199ff5.
6.) do_page_fault modifies its own return address to point to the fault
6.) do_page_fault modifies its own return address to point to the fault
handle code and returns.
7.) execution continues in the fault handling code.
8.) 8a) EAX becomes -EFAULT (== -14)

File diff suppressed because it is too large Load diff

View file

@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 30
EXTRAVERSION =
SUBLEVEL = 31
EXTRAVERSION = -rc9
NAME = Man-Eating Seals of Antiquity
# *DOCUMENTATION*
@ -140,15 +140,13 @@ _all: modules
endif
srctree := $(if $(KBUILD_SRC),$(KBUILD_SRC),$(CURDIR))
TOPDIR := $(srctree)
# FIXME - TOPDIR is obsolete, use srctree/objtree
objtree := $(CURDIR)
src := $(srctree)
obj := $(objtree)
VPATH := $(srctree)$(if $(KBUILD_EXTMOD),:$(KBUILD_EXTMOD))
export srctree objtree VPATH TOPDIR
export srctree objtree VPATH
# SUBARCH tells the usermode build what the underlying arch is. That is set
@ -330,6 +328,7 @@ AFLAGS_MODULE = $(MODFLAGS)
LDFLAGS_MODULE =
CFLAGS_KERNEL =
AFLAGS_KERNEL =
CFLAGS_GCOV = -fprofile-arcs -ftest-coverage
# Use LINUXINCLUDE when you must reference the include/ directory.
@ -343,7 +342,9 @@ KBUILD_CPPFLAGS := -D__KERNEL__
KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
-fno-strict-aliasing -fno-common \
-Werror-implicit-function-declaration
-Werror-implicit-function-declaration \
-Wno-format-security \
-fno-delete-null-pointer-checks
KBUILD_AFLAGS := -D__ASSEMBLY__
# Read KERNELRELEASE from include/config/kernel.release (if it exists)
@ -356,7 +357,7 @@ export CPP AR NM STRIP OBJCOPY OBJDUMP MAKE AWK GENKSYMS PERL UTS_MACHINE
export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS
export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_GCOV
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
# When compiling out-of-tree modules, put MODVERDIR in the module
@ -565,7 +566,7 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,)
KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,)
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fwrapv)
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
# revert to pre-gcc-4.4 behaviour of .eh_frame
KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm)
@ -1216,8 +1217,8 @@ clean: archclean $(clean-dirs)
\( -name '*.[oas]' -o -name '*.ko' -o -name '.*.cmd' \
-o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
-o -name '*.symtypes' -o -name 'modules.order' \
-o -name 'Module.markers' -o -name '.tmp_*.o.*' \) \
-type f -print | xargs rm -f
-o -name 'Module.markers' -o -name '.tmp_*.o.*' \
-o -name '*.gcno' \) -type f -print | xargs rm -f
# mrproper - Delete all generated files, including .config
#
@ -1421,8 +1422,8 @@ clean: $(clean-dirs)
$(call cmd,rmfiles)
@find $(KBUILD_EXTMOD) $(RCS_FIND_IGNORE) \
\( -name '*.[oas]' -o -name '*.ko' -o -name '.*.cmd' \
-o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \) \
-type f -print | xargs rm -f
-o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
-o -name '*.gcno' \) -type f -print | xargs rm -f
help:
@echo ' Building external modules.'

View file

@ -15,7 +15,10 @@ worry too much about getting the wrong person. If you are unsure send it
to the person responsible for the code relevant to what you were doing.
If it occurs repeatably try and describe how to recreate it. That is
worth even more than the oops itself. The list of maintainers and
mailing lists is in the MAINTAINERS file in this directory.
mailing lists is in the MAINTAINERS file in this directory. If you
know the file name that causes the problem you can use the following
command in this directory to find some of the maintainers of that file:
perl scripts/get_maintainer.pl -f <filename>
If it is a security bug, please copy the Security Contact listed
in the MAINTAINERS file. They can help coordinate bugfix and disclosure.

View file

@ -116,3 +116,5 @@ config HAVE_DEFAULT_NO_SPIN_MUTEXES
config HAVE_HW_BREAKPOINT
bool
source "kernel/gcov/Kconfig"

View file

@ -237,19 +237,6 @@ extern void pcibios_resource_to_bus(struct pci_dev *, struct pci_bus_region *,
extern void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
struct pci_bus_region *region);
static inline struct resource *
pcibios_select_root(struct pci_dev *pdev, struct resource *res)
{
struct resource *root = NULL;
if (res->flags & IORESOURCE_IO)
root = &ioport_resource;
if (res->flags & IORESOURCE_MEM)
root = &iomem_resource;
return root;
}
#define pci_domain_nr(bus) ((struct pci_controller *)(bus)->sysdata)->index
static inline int pci_proc_domain(struct pci_bus *bus)

View file

@ -30,7 +30,7 @@ extern unsigned long __per_cpu_offset[NR_CPUS];
#ifndef MODULE
#define SHIFT_PERCPU_PTR(var, offset) RELOC_HIDE(&per_cpu_var(var), (offset))
#define PER_CPU_ATTRIBUTES
#define PER_CPU_DEF_ATTRIBUTES
#else
/*
* To calculate addresses of locally defined variables, GCC uses 32-bit
@ -49,7 +49,7 @@ extern unsigned long __per_cpu_offset[NR_CPUS];
: "=&r"(__ptr), "=&r"(tmp_gp)); \
(typeof(&per_cpu_var(var)))(__ptr + (offset)); })
#define PER_CPU_ATTRIBUTES __used
#define PER_CPU_DEF_ATTRIBUTES __used
#endif /* MODULE */
@ -71,7 +71,7 @@ extern unsigned long __per_cpu_offset[NR_CPUS];
#define __get_cpu_var(var) per_cpu_var(var)
#define __raw_get_cpu_var(var) per_cpu_var(var)
#define PER_CPU_ATTRIBUTES
#define PER_CPU_DEF_ATTRIBUTES
#endif /* SMP */

View file

@ -37,6 +37,7 @@ struct thread_info {
.task = &tsk, \
.exec_domain = &default_exec_domain, \
.addr_limit = KERNEL_DS, \
.preempt_count = INIT_PREEMPT_COUNT, \
.restart_block = { \
.fn = do_no_restart_syscall, \
}, \

View file

@ -9,7 +9,7 @@
#include <asm-generic/tlb.h>
#define __pte_free_tlb(tlb, pte) pte_free((tlb)->mm, pte)
#define __pmd_free_tlb(tlb, pmd) pmd_free((tlb)->mm, pmd)
#define __pte_free_tlb(tlb, pte, address) pte_free((tlb)->mm, pte)
#define __pmd_free_tlb(tlb, pmd, address) pmd_free((tlb)->mm, pmd)
#endif

View file

@ -8,7 +8,6 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/smp_lock.h>
#include <linux/errno.h>
#include <linux/ptrace.h>
#include <linux/user.h>

View file

@ -146,7 +146,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* If for any reason at all we couldn't handle the fault,
make sure we exit gracefully rather than endlessly redo
the fault. */
fault = handle_mm_fault(mm, vma, address, cause > 0);
fault = handle_mm_fault(mm, vma, address, cause > 0 ? FAULT_FLAG_WRITE : 0);
up_read(&mm->mmap_sem);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)

View file

@ -1241,7 +1241,7 @@ endmenu
menu "CPU Power Management"
if (ARCH_SA1100 || ARCH_INTEGRATOR || ARCH_OMAP || ARCH_PXA)
if (ARCH_SA1100 || ARCH_INTEGRATOR || ARCH_OMAP || ARCH_PXA || ARCH_S3C64XX)
source "drivers/cpufreq/Kconfig"
@ -1272,6 +1272,10 @@ config CPU_FREQ_PXA
default y
select CPU_FREQ_DEFAULT_GOV_USERSPACE
config CPU_FREQ_S3C64XX
bool "CPUfreq support for Samsung S3C64XX CPUs"
depends on CPU_FREQ && CPU_S3C6410
endif
source "drivers/cpuidle/Kconfig"

View file

@ -99,14 +99,6 @@ config DEBUG_CLPS711X_UART2
output to the second serial port on these devices. Saying N will
cause the debug messages to appear on the first serial port.
config DEBUG_S3C_PORT
depends on DEBUG_LL && PLAT_S3C
bool "Kernel low-level debugging messages via S3C UART"
help
Say Y here if you want debug print routines to go to one of the
S3C internal UARTs. The chosen UART must have been configured
before it is used.
config DEBUG_S3C_UART
depends on PLAT_S3C
int "S3C UART to use for low-level debug"

View file

@ -674,6 +674,15 @@ proc_types:
b __armv4_mmu_cache_off
b __armv5tej_mmu_cache_flush
#ifdef CONFIG_CPU_FEROCEON_OLD_ID
/* this conflicts with the standard ARMv5TE entry */
.long 0x41009260 @ Old Feroceon
.long 0xff00fff0
b __armv4_mmu_cache_on
b __armv4_mmu_cache_off
b __armv5tej_mmu_cache_flush
#endif
.word 0x66015261 @ FA526
.word 0xff01fff1
b __fa526_cache_on

View file

@ -29,7 +29,6 @@ unsigned int __machine_arch_type;
static void putstr(const char *ptr);
#include <linux/compiler.h>
#include <mach/uncompress.h>
#ifdef CONFIG_DEBUG_ICEDCC

Some files were not shown because too many files have changed in this diff Show more