Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
This commit is contained in:
Linus Torvalds 2005-04-16 15:20:36 -07:00
commit 1da177e4c3
17291 changed files with 6718755 additions and 0 deletions

356
COPYING Normal file
View file

@ -0,0 +1,356 @@
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
Also note that the GPL below is copyrighted by the Free Software
Foundation, but the instance of code that it refers to (the Linux
kernel) is copyrighted by me and others who actually wrote it.
Also note that the only valid version of the GPL as far as the kernel
is concerned is _this_ particular version of the license (ie v2, not
v2.2 or v3.x or whatever), unless explicitly otherwise stated.
Linus Torvalds
----------------------------------------
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Library General
Public License instead of this License.

3743
CREDITS Normal file

File diff suppressed because it is too large Load diff

294
Documentation/00-INDEX Normal file
View file

@ -0,0 +1,294 @@
This is a brief list of all the files in ./linux/Documentation and what
they contain. If you add a documentation file, please list it here in
alphabetical order as well, or risk being hunted down like a rabid dog.
Please try and keep the descriptions small enough to fit on one line.
Thanks -- Paul G.
Following translations are available on the WWW:
- Japanese, maintained by the JF Project (JF@linux.or.jp), at
http://www.linux.or.jp/JF/
00-INDEX
- this file.
BK-usage/
- directory with info on BitKeeper.
BUG-HUNTING
- brute force method of doing binary search of patches to find bug.
Changes
- list of changes that break older software packages.
CodingStyle
- how the boss likes the C code in the kernel to look.
DMA-API.txt
- DMA API, pci_ API & extensions for non-consistent memory machines.
DMA-mapping.txt
- info for PCI drivers using DMA portably across all platforms.
DocBook/
- directory with DocBook templates etc. for kernel documentation.
IO-mapping.txt
- how to access I/O mapped memory from within device drivers.
IPMI.txt
- info on Linux Intelligent Platform Management Interface (IPMI) Driver.
IRQ-affinity.txt
- how to select which CPU(s) handle which interrupt events on SMP.
ManagementStyle
- how to (attempt to) manage kernel hackers.
MSI-HOWTO.txt
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
RCU/
- directory with info on RCU (read-copy update).
README.DAC960
- info on Mylex DAC960/DAC1100 PCI RAID Controller Driver for Linux.
SAK.txt
- info on Secure Attention Keys.
SubmittingDrivers
- procedure to get a new driver source included into the kernel tree.
SubmittingPatches
- procedure to get a source patch included into the kernel tree.
VGA-softcursor.txt
- how to change your VGA cursor from a blinking underscore.
arm/
- directory with info about Linux on the ARM architecture.
basic_profiling.txt
- basic instructions for those who wants to profile Linux kernel.
binfmt_misc.txt
- info on the kernel support for extra binary formats.
block/
- info on the Block I/O (BIO) layer.
cachetlb.txt
- describes the cache/TLB flushing interfaces Linux uses.
cciss.txt
- info, major/minor #'s for Compaq's SMART Array Controllers.
cdrom/
- directory with information on the CD-ROM drivers that Linux has.
cli-sti-removal.txt
- cli()/sti() removal guide.
computone.txt
- info on Computone Intelliport II/Plus Multiport Serial Driver.
cpqarray.txt
- info on using Compaq's SMART2 Intelligent Disk Array Controllers.
cpu-freq/
- info on CPU frequency and voltage scaling.
cris/
- directory with info about Linux on CRIS architecture.
crypto/
- directory with info on the Crypto API.
debugging-modules.txt
- some notes on debugging modules after Linux 2.6.3.
device-mapper/
- directory with info on Device Mapper.
devices.txt
- plain ASCII listing of all the nodes in /dev/ with major minor #'s.
digiepca.txt
- info on Digi Intl. {PC,PCI,EISA}Xx and Xem series cards.
dnotify.txt
- info about directory notification in Linux.
driver-model/
- directory with info about Linux driver model.
dvb/
- info on Linux Digital Video Broadcast (DVB) subsystem.
early-userspace/
- info about initramfs, klibc, and userspace early during boot.
eisa.txt
- info on EISA bus support.
exception.txt
- how Linux v2.2 handles exceptions without verify_area etc.
fb/
- directory with info on the frame buffer graphics abstraction layer.
filesystems/
- directory with info on the various filesystems that Linux supports.
firmware_class/
- request_firmware() hotplug interface info.
floppy.txt
- notes and driver options for the floppy disk driver.
ftape.txt
- notes about the floppy tape device driver.
hayes-esp.txt
- info on using the Hayes ESP serial driver.
highuid.txt
- notes on the change from 16 bit to 32 bit user/group IDs.
hpet.txt
- High Precision Event Timer Driver for Linux.
hw_random.txt
- info on Linux support for random number generator in i8xx chipsets.
i2c/
- directory with info about the I2C bus/protocol (2 wire, kHz speed).
i2o/
- directory with info about the Linux I2O subsystem.
i386/
- directory with info about Linux on Intel 32 bit architecture.
ia64/
- directory with info about Linux on Intel 64 bit architecture.
ide.txt
- important info for users of ATA devices (IDE/EIDE disks and CD-ROMS).
initrd.txt
- how to use the RAM disk as an initial/temporary root filesystem.
input/
- info on Linux input device support.
io_ordering.txt
- info on ordering I/O writes to memory-mapped addresses.
ioctl-number.txt
- how to implement and register device/driver ioctl calls.
iostats.txt
- info on I/O statistics Linux kernel provides.
isapnp.txt
- info on Linux ISA Plug & Play support.
isdn/
- directory with info on the Linux ISDN support, and supported cards.
java.txt
- info on the in-kernel binary support for Java(tm).
kbuild/
- directory with info about the kernel build process.
kernel-doc-nano-HOWTO.txt
- mini HowTo on generation and location of kernel documentation files.
kernel-docs.txt
- listing of various WWW + books that document kernel internals.
kernel-parameters.txt
- summary listing of command line / boot prompt args for the kernel.
kobject.txt
- info of the kobject infrastructure of the Linux kernel.
laptop-mode.txt
- How to conserve battery power using laptop-mode.
ldm.txt
- a brief description of LDM (Windows Dynamic Disks).
locks.txt
- info on file locking implementations, flock() vs. fcntl(), etc.
logo.gif
- Full colour GIF image of Linux logo (penguin).
logo.txt
- Info on creator of above logo & site to get additional images from.
m68k/
- directory with info about Linux on Motorola 68k architecture.
magic-number.txt
- list of magic numbers used to mark/protect kernel data structures.
mandatory.txt
- info on the Linux implementation of Sys V mandatory file locking.
mca.txt
- info on supporting Micro Channel Architecture (e.g. PS/2) systems.
md.txt
- info on boot arguments for the multiple devices driver.
memory.txt
- info on typical Linux memory problems.
mips/
- directory with info about Linux on MIPS architecture.
mono.txt
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
moxa-smartio
- info on installing/using Moxa multiport serial driver.
mtrr.txt
- how to use PPro Memory Type Range Registers to increase performance.
nbd.txt
- info on a TCP implementation of a network block device.
networking/
- directory with info on various aspects of networking with Linux.
nfsroot.txt
- short guide on setting up a diskless box with NFS root filesystem.
nmi_watchdog.txt
- info on NMI watchdog for SMP systems.
numastat.txt
- info on how to read Numa policy hit/miss statistics in sysfs.
oops-tracing.txt
- how to decode those nasty internal kernel error dump messages.
paride.txt
- information about the parallel port IDE subsystem.
parisc/
- directory with info on using Linux on PA-RISC architecture.
parport.txt
- how to use the parallel-port driver.
parport-lowlevel.txt
- description and usage of the low level parallel port functions.
pci.txt
- info on the PCI subsystem for device driver authors.
pm.txt
- info on Linux power management support.
pnp.txt
- Linux Plug and Play documentation.
power/
- directory with info on Linux PCI power management.
powerpc/
- directory with info on using Linux with the PowerPC.
preempt-locking.txt
- info on locking under a preemptive kernel.
ramdisk.txt
- short guide on how to set up and use the RAM disk.
riscom8.txt
- notes on using the RISCom/8 multi-port serial driver.
rocket.txt
- info on the Comtrol RocketPort multiport serial driver.
rpc-cache.txt
- introduction to the caching mechanisms in the sunrpc layer.
rtc.txt
- notes on how to use the Real Time Clock (aka CMOS clock) driver.
s390/
- directory with info on using Linux on the IBM S390.
sched-coding.txt
- reference for various scheduler-related methods in the O(1) scheduler.
sched-design.txt
- goals, design and implementation of the Linux O(1) scheduler.
sched-domains.txt
- information on scheduling domains.
sched-stats.txt
- information on schedstats (Linux Scheduler Statistics).
scsi/
- directory with info on Linux scsi support.
serial/
- directory with info on the low level serial API.
serial-console.txt
- how to set up Linux with a serial line console as the default.
sgi-visws.txt
- short blurb on the SGI Visual Workstations.
sh/
- directory with info on porting Linux to a new architecture.
smart-config.txt
- description of the Smart Config makefile feature.
smp.txt
- a few notes on symmetric multi-processing.
sonypi.txt
- info on Linux Sony Programmable I/O Device support.
sound/
- directory with info on sound card support.
sparc/
- directory with info on using Linux on Sparc architecture.
specialix.txt
- info on hardware/driver for specialix IO8+ multiport serial card.
spinlocks.txt
- info on using spinlocks to provide exclusive access in kernel.
stallion.txt
- info on using the Stallion multiport serial driver.
svga.txt
- short guide on selecting video modes at boot via VGA BIOS.
sx.txt
- info on the Specialix SX/SI multiport serial driver.
sysctl/
- directory with info on the /proc/sys/* files.
sysrq.txt
- info on the magic SysRq key.
telephony/
- directory with info on telephony (e.g. voice over IP) support.
time_interpolators.txt
- info on time interpolators.
tipar.txt
- information about Parallel link cable for Texas Instruments handhelds.
tty.txt
- guide to the locking policies of the tty layer.
unicode.txt
- info on the Unicode character/font mapping used in Linux.
uml/
- directory with infomation about User Mode Linux.
usb/
- directory with info regarding the Universal Serial Bus.
video4linux/
- directory with info regarding video/TV/radio cards and linux.
vm/
- directory with info on the Linux vm code.
voyager.txt
- guide to running Linux on the Voyager architecture.
watchdog/
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
x86_64/
- directory with info on Linux support for AMD x86-64 (Hammer) machines.
xterm-linux.xpm
- XPM image of penguin logo (see logo.txt) sitting on an xterm.
zorro.txt
- info on writing drivers for Zorro bus devices found on Amigas.

View file

@ -0,0 +1,51 @@
bk-kernel-howto.txt: Description of kernel workflow under BitKeeper
bk-make-sum: Create summary of changesets in one repository and not
another, typically in preparation to be sent to an upstream maintainer.
Typical usage:
cd my-updated-repo
bk-make-sum ~/repo/original-repo
mv /tmp/linus.txt ../original-repo.txt
bksend: Create readable text output containing summary of changes, GNU
patch of the changes, and BK metadata of changes (as needed for proper
importing into BitKeeper by an upstream maintainer). This output is
suitable for emailing BitKeeper changes. The recipient of this output
may pipe it directly to 'bk receive'.
bz64wrap: helper script. Uncompressed input is piped to this script,
which compresses its input, and then outputs the uu-/base64-encoded
version of the compressed input.
cpcset: Copy changeset between unrelated repositories.
Attempts to preserve changeset user, user address, description, in
addition to the changeset (the patch) itself.
Typical usage:
cd my-updated-repo
bk changes # looking for a changeset...
cpcset 1.1511 . ../another-repo
csets-to-patches: Produces a delta of two BK repositories, in the form
of individual files, each containing a single cset as a GNU patch.
Output is several files, each with the filename "/tmp/rev-$REV.patch"
Typical usage:
cd my-updated-repo
bk changes -L ~/repo/original-repo 2>&1 | \
perl csets-to-patches
cset-to-linus: Produces a delta of two BK repositories, in the form of
changeset descriptions, with 'diffstat' output created for each
individual changset.
Typical usage:
cd my-updated-repo
bk changes -L ~/repo/original-repo 2>&1 | \
perl cset-to-linus > summary.txt
gcapatch: Generates patch containing changes in local repository.
Typical usage:
cd my-updated-repo
gcapatch > foo.patch
unbz64wrap: Reverse an encoded, compressed data stream created by
bz64wrap into an uncompressed, typically text/plain output.

View file

@ -0,0 +1,283 @@
Doing the BK Thing, Penguin-Style
This set of notes is intended mainly for kernel developers, occasional
or full-time, but sysadmins and power users may find parts of it useful
as well. It assumes at least a basic familiarity with CVS, both at a
user level (use on the cmd line) and at a higher level (client-server model).
Due to the author's background, an operation may be described in terms
of CVS, or in terms of how that operation differs from CVS.
This is -not- intended to be BitKeeper documentation. Always run
"bk help <command>" or in X "bk helptool <command>" for reference
documentation.
BitKeeper Concepts
------------------
In the true nature of the Internet itself, BitKeeper is a distributed
system. When applied to revision control, this means doing away with
client-server, and changing to a parent-child model... essentially
peer-to-peer. On the developer's end, this also represents a
fundamental disruption in the standard workflow of changes, commits,
and merges. You will need to take a few minutes to think about
how to best work under BitKeeper, and re-optimize things a bit.
In some sense it is a bit radical, because it might described as
tossing changes out into a maelstrom and having them magically
land at the right destination... but I'm getting ahead of myself.
Let's start with this progression:
Each BitKeeper source tree on disk is a repository unto itself.
Each repository has a parent (except the root/original, of course).
Each repository contains a set of a changesets ("csets").
Each cset is one or more changed files, bundled together.
Each tree is a repository, so all changes are checked into the local
tree. When a change is checked in, all modified files are grouped
into a logical unit, the changeset. Internally, BK links these
changesets in a tree, representing various converging and diverging
lines of development. These changesets are the bread and butter of
the BK system.
After the concept of changesets, the next thing you need to get used
to is having multiple copies of source trees lying around. This -really-
takes some getting used to, for some people. Separate source trees
are the means in BitKeeper by which you delineate parallel lines
of development, both minor and major. What would be branches in
CVS become separate source trees, or "clones" in BitKeeper [heh,
or Star Wars] terminology.
Clones and changesets are the tools from which most of the power of
BitKeeper is derived. As mentioned earlier, each clone has a parent,
the tree used as the source when the new clone was created. In a
CVS-like setup, the parent would be a remote server on the Internet,
and the child is your local clone of that tree.
Once you have established a common baseline between two source trees --
a common parent -- then you can merge changesets between those two
trees with ease. Merging changes into a tree is called a "pull", and
is analagous to 'cvs update'. A pull downloads all the changesets in
the remote tree you do not have, and merges them. Sending changes in
one tree to another tree is called a "push". Push sends all changes
in the local tree the remote does not yet have, and merges them.
From these concepts come some initial command examples:
1) bk clone -q http://linux.bkbits.net/linux-2.5 linus-2.5
Download a 2.5 stock kernel tree, naming it "linus-2.5" in the local dir.
The "-q" disables listing every single file as it is downloaded.
2) bk clone -ql linus-2.5 alpha-2.5
Create a separate source tree for the Alpha AXP architecture.
The "-l" uses hard links instead of copying data, since both trees are
on the local disk. You can also replace the above with "bk lclone -q ..."
You only clone a tree -once-. After cloning the tree lives a long time
on disk, being updating by pushes and pulls.
3) cd alpha-2.5 ; bk pull http://gkernel.bkbits.net/alpha-2.5
Download changes in "alpha-2.5" repository which are not present
in the local repository, and merge them into the source tree.
4) bk -r co -q
Because every tree is a repository, files must be checked out before
they will be in their standard places in the source tree.
5) bk vi fs/inode.c # example change...
bk citool # checkin, using X tool
bk push bk://gkernel@bkbits.net/alpha-2.5 # upload change
Typical example of a BK sequence that would replace the analagous CVS
situation,
vi fs/inode.c
cvs commit
As this is just supposed to be a quick BK intro, for more in-depth
tutorials, live working demos, and docs, see http://www.bitkeeper.com/
BK and Kernel Development Workflow
----------------------------------
Currently the latest 2.5 tree is available via "bk clone $URL"
and "bk pull $URL" at http://linux.bkbits.net/linux-2.5
This should change in a few weeks to a kernel.org URL.
A big part of using BitKeeper is organizing the various trees you have
on your local disk, and organizing the flow of changes among those
trees, and remote trees. If one were to graph the relationships between
a desired BK setup, you are likely to see a few-many-few graph, like
this:
linux-2.5
|
merge-to-linus-2.5
/ | |
/ | |
vm-hacks bugfixes filesys personal-hacks
\ | | /
\ | | /
\ | | /
testing-and-validation
Since a "bk push" sends all changes not in the target tree, and
since a "bk pull" receives all changes not in the source tree, you want
to make sure you are only pushing specific changes to the desired tree,
not all changes from "peer parent" trees. For example, pushing a change
from the testing-and-validation tree would probably be a bad idea,
because it will push all changes from vm-hacks, bugfixes, filesys, and
personal-hacks trees into the target tree.
One would typically work on only one "theme" at a time, either
vm-hacks or bugfixes or filesys, keeping those changes isolated in
their own tree during development, and only merge the isolated with
other changes when going upstream (to Linus or other maintainers) or
downstream (to your "union" trees, like testing-and-validation above).
It should be noted that some of this separation is not just recommended
practice, it's actually [for now] -enforced- by BitKeeper. BitKeeper
requires that changesets maintain a certain order, which is the reason
that "bk push" sends all local changesets the remote doesn't have. This
separation may look like a lot of wasted disk space at first, but it
helps when two unrelated changes may "pollute" the same area of code, or
don't follow the same pace of development, or any other of the standard
reasons why one creates a development branch.
Small development branches (clones) will appear and disappear:
-------- A --------- B --------- C --------- D -------
\ /
-----short-term devel branch-----
While long-term branches will parallel a tree (or trees), with period
merge points. In this first example, we pull from a tree (pulls,
"\") periodically, such as what occurs when tracking changes in a
vendor tree, never pushing changes back up the line:
-------- A --------- B --------- C --------- D -------
\ \ \
----long-term devel branch-----------------
And then a more common case in Linux kernel development, a long term
branch with periodic merges back into the tree (pushes, "/"):
-------- A --------- B --------- C --------- D -------
\ \ / \
----long-term devel branch-----------------
Submitting Changes to Linus
---------------------------
There's a bit of an art, or style, of submitting changes to Linus.
Since Linus's tree is now (you might say) fully integrated into the
distributed BitKeeper system, there are several prerequisites to
properly submitting a BitKeeper change. All these prereq's are just
general cleanliness of BK usage, so as people become experts at BK, feel
free to optimize this process further (assuming Linus agrees, of
course).
0) Make sure your tree was originally cloned from the linux-2.5 tree
created by Linus. If your tree does not have this as its ancestor, it
is impossible to reliably exchange changesets.
1) Pay attention to your commit text. The commit message that
accompanies each changeset you submit will live on forever in history,
and is used by Linus to accurately summarize the changes in each
pre-patch. Remember that there is no context, so
"fix for new scheduler changes"
would be too vague, but
"fix mips64 arch for new scheduler switch_to(), TIF_xxx semantics"
would be much better.
You can and should use the command "bk comment -C<rev>" to update the
commit text, and improve it after the fact. This is very useful for
development: poor, quick descriptions during development, which get
cleaned up using "bk comment" before issuing the "bk push" to submit the
changes.
2) Include an Internet-available URL for Linus to pull from, such as
Pull from: http://gkernel.bkbits.net/net-drivers-2.5
3) Include a summary and "diffstat -p1" of each changeset that will be
downloaded, when Linus issues a "bk pull". The author auto-generates
these summaries using "bk changes -L <parent>", to obtain a listing
of all the pending-to-send changesets, and their commit messages.
It is important to show Linus what he will be downloading when he issues
a "bk pull", to reduce the time required to sift the changes once they
are downloaded to Linus's local machine.
IMPORTANT NOTE: One of the features of BK is that your repository does
not have to be up to date, in order for Linus to receive your changes.
It is considered a courtesy to keep your repository fairly recent, to
lessen any potential merge work Linus may need to do.
4) Split up your changes. Each maintainer<->Linus situation is likely
to be slightly different here, so take this just as general advice. The
author splits up changes according to "themes" when merging with Linus.
Simultaneous pushes from local development go to special trees which
exist solely to house changes "queued" for Linus. Example of the trees:
net-drivers-2.5 -- on-going net driver maintenance
vm-2.5 -- VM-related changes
fs-2.5 -- filesystem-related changes
Linus then has much more freedom for pulling changes. He could (for
example) issue a "bk pull" on vm-2.5 and fs-2.5 trees, to merge their
changes, but hold off net-drivers-2.5 because of a change that needs
more discussion.
Other maintainers may find that a single linus-pull-from tree is
adequate for passing BK changesets to him.
Frequently Answered Questions
-----------------------------
1) How do I change the e-mail address shown in the changelog?
A. When you run "bk citool" or "bk commit", set environment
variables BK_USER and BK_HOST to the desired username
and host/domain name.
2) How do I use tags / get a diff between two kernel versions?
A. Pass the tags Linus uses to 'bk export'.
ChangeSets are in a forward-progressing order, so it's pretty easy
to get a snapshot starting and ending at any two points in time.
Linus puts tags on each release and pre-release, so you could use
these two examples:
bk export -tpatch -hdu -rv2.5.4,v2.5.5 | less
# creates patch-2.5.5 essentially
bk export -tpatch -du -rv2.5.5-pre1,v2.5.5 | less
# changes from pre1 to final
A tag is just an alias for a specific changeset... and since changesets
are ordered, a tag is thus a marker for a specific point in time (or
specific state of the tree).
3) Is there an easy way to generate One Big Patch versus mainline,
for my long-lived kernel branch?
A. Yes. This requires BK 3.x, though.
bk export -tpatch -r`bk repogca bk://linux.bkbits.net/linux-2.5`,+

View file

@ -0,0 +1,34 @@
#!/bin/sh -e
# DIR=$HOME/BK/axp-2.5
# cd $DIR
LINUS_REPO=$1
DIRBASE=`basename $PWD`
{
cat <<EOT
Please do a
bk pull bk://gkernel.bkbits.net/$DIRBASE
This will update the following files:
EOT
bk export -tpatch -hdu -r`bk repogca $LINUS_REPO`,+ | diffstat -p1 2>/dev/null
cat <<EOT
through these ChangeSets:
EOT
bk changes -L -d'$unless(:MERGE:){ChangeSet|:CSETREV:\n}' $LINUS_REPO |
bk -R prs -h -d'$unless(:MERGE:){<:P:@:HOST:> (:D: :I:)\n$each(:C:){ (:C:)\n}\n}' -
} > /tmp/linus.txt
cat <<EOT
Mail text in /tmp/linus.txt; please check and send using your favourite
mailer.
EOT

36
Documentation/BK-usage/bksend Executable file
View file

@ -0,0 +1,36 @@
#!/bin/sh
# A script to format BK changeset output in a manner that is easy to read.
# Andreas Dilger <adilger@turbolabs.com> 13/02/2002
#
# Add diffstat output after Changelog <adilger@turbolabs.com> 21/02/2002
PROG=bksend
usage() {
echo "usage: $PROG -r<rev>"
echo -e "\twhere <rev> is of the form '1.23', '1.23..', '1.23..1.27',"
echo -e "\tor '+' to indicate the most recent revision"
exit 1
}
case $1 in
-r) REV=$2; shift ;;
-r*) REV=`echo $1 | sed 's/^-r//'` ;;
*) echo "$PROG: no revision given, you probably don't want that";;
esac
[ -z "$REV" ] && usage
echo "You can import this changeset into BK by piping this whole message to:"
echo "'| bk receive [path to repository]' or apply the patch as usual."
SEP="\n===================================================================\n\n"
echo -e $SEP
env PAGER=/bin/cat bk changes -r$REV
echo
bk export -tpatch -du -h -r$REV | diffstat
echo; echo
bk export -tpatch -du -h -r$REV
echo -e $SEP
bk send -wgzip_uu -r$REV -

41
Documentation/BK-usage/bz64wrap Executable file
View file

@ -0,0 +1,41 @@
#!/bin/sh
# bz64wrap - the sending side of a bzip2 | base64 stream
# Andreas Dilger <adilger@clusterfs.com> Jan 2002
PATH=$PATH:/usr/bin:/usr/local/bin:/usr/freeware/bin
# A program to generate base64 encoding on stdout
BASE64_ENCODE="uuencode -m /dev/stdout"
BASE64_BEGIN=
BASE64_END=
BZIP=NO
BASE64=NO
# Test if we have the bzip program installed
bzip2 -c /dev/null > /dev/null 2>&1 && BZIP=YES
# Test if uuencode can handle the -m (MIME) encoding option
$BASE64_ENCODE < /dev/null > /dev/null 2>&1 && BASE64=YES
if [ $BASE64 = NO ]; then
BASE64_ENCODE=mimencode
BASE64_BEGIN="begin-base64 644 -"
BASE64_END="===="
$BASE64_ENCODE < /dev/null > /dev/null 2>&1 && BASE64=YES
fi
if [ $BZIP = NO -o $BASE64 = NO ]; then
echo "$0: can't use bz64 encoding: bzip2=$BZIP, $BASE64_ENCODE=$BASE64"
exit 1
fi
# Sadly, mimencode does not appear to have good "begin" and "end" markers
# like uuencode does, and it is picky about getting the right start/end of
# the base64 stream, so we handle this internally.
echo "$BASE64_BEGIN"
bzip2 -9 | $BASE64_ENCODE
echo "$BASE64_END"

36
Documentation/BK-usage/cpcset Executable file
View file

@ -0,0 +1,36 @@
#!/bin/sh
#
# Purpose: Copy changeset patch and description from one
# repository to another, unrelated one.
#
# usage: cpcset [revision] [from-repository] [to-repository]
#
REV=$1
FROM=$2
TO=$3
TMPF=/tmp/cpcset.$$
rm -f $TMPF*
CWD_SAVE=`pwd`
cd $FROM
bk changes -r$REV | \
grep -v '^ChangeSet' | \
sed -e 's/^ //g' > $TMPF.log
USERHOST=`bk changes -r$REV | grep '^ChangeSet' | awk '{print $4}'`
export BK_USER=`echo $USERHOST | awk '-F@' '{print $1}'`
export BK_HOST=`echo $USERHOST | awk '-F@' '{print $2}'`
bk export -tpatch -hdu -r$REV > $TMPF.patch && \
cd $CWD_SAVE && \
cd $TO && \
bk import -tpatch -CFR -y"`cat $TMPF.log`" $TMPF.patch . && \
bk commit -y"`cat $TMPF.log`"
rm -f $TMPF*
echo changeset $REV copied.
echo ""

View file

@ -0,0 +1,49 @@
#!/usr/bin/perl -w
use strict;
my ($lhs, $rev, $tmp, $rhs, $s);
my @cset_text = ();
my @pipe_text = ();
my $have_cset = 0;
while (<>) {
next if /^---/;
if (($lhs, $tmp, $rhs) = (/^(ChangeSet\@)([^,]+)(, .*)$/)) {
&cset_rev if ($have_cset);
$rev = $tmp;
$have_cset = 1;
push(@cset_text, $_);
}
elsif ($have_cset) {
push(@cset_text, $_);
}
}
&cset_rev if ($have_cset);
exit(0);
sub cset_rev {
my $empty_cset = 0;
open PIPE, "bk export -tpatch -hdu -r $rev | diffstat -p1 2>/dev/null |" or die;
while ($s = <PIPE>) {
$empty_cset = 1 if ($s =~ /0 files changed/);
push(@pipe_text, $s);
}
close(PIPE);
if (! $empty_cset) {
print @cset_text;
print @pipe_text;
print "\n\n";
}
@pipe_text = ();
@cset_text = ();
}

View file

@ -0,0 +1,44 @@
#!/usr/bin/perl -w
use strict;
my ($lhs, $rev, $tmp, $rhs, $s);
my @cset_text = ();
my @pipe_text = ();
my $have_cset = 0;
while (<>) {
next if /^---/;
if (($lhs, $tmp, $rhs) = (/^(ChangeSet\@)([^,]+)(, .*)$/)) {
&cset_rev if ($have_cset);
$rev = $tmp;
$have_cset = 1;
push(@cset_text, $_);
}
elsif ($have_cset) {
push(@cset_text, $_);
}
}
&cset_rev if ($have_cset);
exit(0);
sub cset_rev {
my $empty_cset = 0;
system("bk export -tpatch -du -r $rev > /tmp/rev-$rev.patch");
if (! $empty_cset) {
print @cset_text;
print @pipe_text;
print "\n\n";
}
@pipe_text = ();
@cset_text = ();
}

View file

@ -0,0 +1,8 @@
#!/bin/sh
#
# Purpose: Generate GNU diff of local changes versus canonical top-of-tree
#
# Usage: gcapatch > foo.patch
#
bk export -tpatch -hdu -r`bk repogca bk://linux.bkbits.net/linux-2.5`,+

View file

@ -0,0 +1,25 @@
#!/bin/sh
# unbz64wrap - the receiving side of a bzip2 | base64 stream
# Andreas Dilger <adilger@clusterfs.com> Jan 2002
# Sadly, mimencode does not appear to have good "begin" and "end" markers
# like uuencode does, and it is picky about getting the right start/end of
# the base64 stream, so we handle this explicitly here.
PATH=$PATH:/usr/bin:/usr/local/bin:/usr/freeware/bin
if mimencode -u < /dev/null > /dev/null 2>&1 ; then
SHOW=
while read LINE; do
case $LINE in
begin-base64*) SHOW=YES ;;
====) SHOW= ;;
*) [ "$SHOW" ] && echo "$LINE" ;;
esac
done | mimencode -u | bunzip2
exit $?
else
cat - | uudecode -o /dev/stdout | bunzip2
exit $?
fi

92
Documentation/BUG-HUNTING Normal file
View file

@ -0,0 +1,92 @@
[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
This is how to track down a bug if you know nothing about kernel hacking.
It's a brute force approach but it works pretty well.
You need:
. A reproducible bug - it has to happen predictably (sorry)
. All the kernel tar files from a revision that worked to the
revision that doesn't
You will then do:
. Rebuild a revision that you believe works, install, and verify that.
. Do a binary search over the kernels to figure out which one
introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
you know that 1.3.69 does. Pick a kernel in the middle and build
that, like 1.3.50. Build & test; if it works, pick the mid point
between .50 and .69, else the mid point between .28 and .50.
. You'll narrow it down to the kernel that introduced the bug. You
can probably do better than this but it gets tricky.
. Narrow it down to a subdirectory
- Copy kernel that works into "test". Let's say that 3.62 works,
but 3.63 doesn't. So you diff -r those two kernels and come
up with a list of directories that changed. For each of those
directories:
Copy the non-working directory next to the working directory
as "dir.63".
One directory at time, try moving the working directory to
"dir.62" and mv dir.63 dir"time, try
mv dir dir.62
mv dir.63 dir
find dir -name '*.[oa]' -print | xargs rm -f
And then rebuild and retest. Assuming that all related
changes were contained in the sub directory, this should
isolate the change to a directory.
Problems: changes in header files may have occurred; I've
found in my case that they were self explanatory - you may
or may not want to give up when that happens.
. Narrow it down to a file
- You can apply the same technique to each file in the directory,
hoping that the changes in that file are self contained.
. Narrow it down to a routine
- You can take the old file and the new file and manually create
a merged file that has
#ifdef VER62
routine()
{
...
}
#else
routine()
{
...
}
#endif
And then walk through that file, one routine at a time and
prefix it with
#define VER62
/* both routines here */
#undef VER62
Then recompile, retest, move the ifdefs until you find the one
that makes the difference.
Finally, you take all the info that you have, kernel revisions, bug
description, the extent to which you have narrowed it down, and pass
that off to whomever you believe is the maintainer of that section.
A post to linux.dev.kernel isn't such a bad idea if you've done some
work to narrow it down.
If you get it down to a routine, you'll probably get a fix in 24 hours.
My apologies to Linus and the other kernel hackers for describing this
brute force approach, it's hardly what a kernel hacker would do. However,
it does work and it lets non-hackers help fix bugs. And it is cool
because Linux snapshots will let you do this - something that you can't
do with vendor supplied releases.

410
Documentation/Changes Normal file
View file

@ -0,0 +1,410 @@
Intro
=====
This document is designed to provide a list of the minimum levels of
software necessary to run the 2.6 kernels, as well as provide brief
instructions regarding any other "Gotchas" users may encounter when
trying life on the Bleeding Edge. If upgrading from a pre-2.4.x
kernel, please consult the Changes file included with 2.4.x kernels for
additional information; most of that information will not be repeated
here. Basically, this document assumes that your system is already
functional and running at least 2.4.x kernels.
This document is originally based on my "Changes" file for 2.0.x kernels
and therefore owes credit to the same people as that file (Jared Mauch,
Axel Boldt, Alessandro Sigala, and countless other users all over the
'net).
The latest revision of this document, in various formats, can always
be found at <http://cyberbuzz.gatech.edu/kaboom/linux/Changes-2.4/>.
Feel free to translate this document. If you do so, please send me a
URL to your translation for inclusion in future revisions of this
document.
Smotrite file <http://oblom.rnc.ru/linux/kernel/Changes.ru>, yavlyaushisya
russkim perevodom dannogo documenta.
Visite <http://www2.adi.uam.es/~ender/tecnico/> para obtener la traducción
al español de este documento en varios formatos.
Eine deutsche Version dieser Datei finden Sie unter
<http://www.stefan-winter.de/Changes-2.4.0.txt>.
Last updated: October 29th, 2002
Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu).
Current Minimal Requirements
============================
Upgrade to at *least* these software revisions before thinking you've
encountered a bug! If you're unsure what version you're currently
running, the suggested command should tell you.
Again, keep in mind that this list assumes you are already
functionally running a Linux 2.4 kernel. Also, not all tools are
necessary on all systems; obviously, if you don't have any PCMCIA (PC
Card) hardware, for example, you probably needn't concern yourself
with pcmcia-cs.
o Gnu C 2.95.3 # gcc --version
o Gnu make 3.79.1 # make --version
o binutils 2.12 # ld -v
o util-linux 2.10o # fdformat --version
o module-init-tools 0.9.10 # depmod -V
o e2fsprogs 1.29 # tune2fs
o jfsutils 1.1.3 # fsck.jfs -V
o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs
o xfsprogs 2.6.0 # xfs_db -V
o pcmcia-cs 3.1.21 # cardmgr -V
o quota-tools 3.09 # quota -V
o PPP 2.4.0 # pppd --version
o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version
o nfs-utils 1.0.5 # showmount --version
o procps 3.2.0 # ps --version
o oprofile 0.5.3 # oprofiled --version
Kernel compilation
==================
GCC
---
The gcc version requirements may vary depending on the type of CPU in your
computer. The next paragraph applies to users of x86 CPUs, but not
necessarily to users of other CPUs. Users of other CPUs should obtain
information about their gcc version requirements from another source.
The recommended compiler for the kernel is gcc 2.95.x (x >= 3), and it
should be used when you need absolute stability. You may use gcc 3.0.x
instead if you wish, although it may cause problems. Later versions of gcc
have not received much testing for Linux kernel compilation, and there are
almost certainly bugs (mainly, but not exclusively, in the kernel) that
will need to be fixed in order to use these compilers. In any case, using
pgcc instead of plain gcc is just asking for trouble.
The Red Hat gcc 2.96 compiler subtree can also be used to build this tree.
You should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build
the kernel correctly.
In addition, please pay attention to compiler optimization. Anything
greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x
or derivatives, be sure not to use -fstrict-aliasing (which, depending on
your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing).
Make
----
You will need Gnu make 3.79.1 or later to build the kernel.
Binutils
--------
Linux on IA-32 has recently switched from using as86 to using gas for
assembling the 16-bit boot code, removing the need for as86 to compile
your kernel. This change does, however, mean that you need a recent
release of binutils.
System utilities
================
Architectural changes
---------------------
DevFS has been obsoleted in favour of udev
(http://www.kernel.org/pub/linux/utils/kernel/hotplug/)
32-bit UID support is now in place. Have fun!
Linux documentation for functions is transitioning to inline
documentation via specially-formatted comments near their
definitions in the source. These comments can be combined with the
SGML templates in the Documentation/DocBook directory to make DocBook
files, which can then be converted by DocBook stylesheets to PostScript,
HTML, PDF files, and several other formats. In order to convert from
DocBook format to a format of your choice, you'll need to install Jade as
well as the desired DocBook stylesheets.
Util-linux
----------
New versions of util-linux provide *fdisk support for larger disks,
support new options to mount, recognize more supported partition
types, have a fdformat which works with 2.4 kernels, and similar goodies.
You'll probably want to upgrade.
Ksymoops
--------
If the unthinkable happens and your kernel oopses, you'll need a 2.4
version of ksymoops to decode the report; see REPORTING-BUGS in the
root of the Linux source for more information.
Module-Init-Tools
-----------------
A new module loader is now in the kernel that requires module-init-tools
to use. It is backward compatible with the 2.4.x series kernels.
Mkinitrd
--------
These changes to the /lib/modules file tree layout also require that
mkinitrd be upgraded.
E2fsprogs
---------
The latest version of e2fsprogs fixes several bugs in fsck and
debugfs. Obviously, it's a good idea to upgrade.
JFSutils
--------
The jfsutils package contains the utilities for the file system.
The following utilities are available:
o fsck.jfs - initiate replay of the transaction log, and check
and repair a JFS formatted partition.
o mkfs.jfs - create a JFS formatted partition.
o other file system utilities are also available in this package.
Reiserfsprogs
-------------
The reiserfsprogs package should be used for reiserfs-3.6.x
(Linux kernels 2.4.x). It is a combined package and contains working
versions of mkreiserfs, resize_reiserfs, debugreiserfs and
reiserfsck. These utils work on both i386 and alpha platforms.
Xfsprogs
--------
The latest version of xfsprogs contains mkfs.xfs, xfs_db, and the
xfs_repair utilities, among others, for the XFS filesystem. It is
architecture independent and any version from 2.0.0 onward should
work correctly with this version of the XFS kernel code (2.6.0 or
later is recommended, due to some significant improvements).
Pcmcia-cs
---------
PCMCIA (PC Card) support is now partially implemented in the main
kernel source. Pay attention when you recompile your kernel ;-).
Also, be sure to upgrade to the latest pcmcia-cs release.
Quota-tools
-----------
Support for 32 bit uid's and gid's is required if you want to use
the newer version 2 quota format. Quota-tools version 3.07 and
newer has this support. Use the recommended version or newer
from the table above.
Intel IA32 microcode
--------------------
A driver has been added to allow updating of Intel IA32 microcode,
accessible as both a devfs regular file and as a normal (misc)
character device. If you are not using devfs you may need to:
mkdir /dev/cpu
mknod /dev/cpu/microcode c 10 184
chmod 0644 /dev/cpu/microcode
as root before you can use this. You'll probably also want to
get the user-space microcode_ctl utility to use with this.
Powertweak
----------
If you are running v0.1.17 or earlier, you should upgrade to
version v0.99.0 or higher. Running old versions may cause problems
with programs using shared memory.
udev
----
udev is a userspace application for populating /dev dynamically with
only entries for devices actually present. udev replaces devfs.
Networking
==========
General changes
---------------
If you have advanced network configuration needs, you should probably
consider using the network tools from ip-route2.
Packet Filter / NAT
-------------------
The packet filtering and NAT code uses the same tools like the previous 2.4.x
kernel series (iptables). It still includes backwards-compatibility modules
for 2.2.x-style ipchains and 2.0.x-style ipfwadm.
PPP
---
The PPP driver has been restructured to support multilink and to
enable it to operate over diverse media layers. If you use PPP,
upgrade pppd to at least 2.4.0.
If you are not using devfs, you must have the device file /dev/ppp
which can be made by:
mknod /dev/ppp c 108 0
as root.
If you use devfsd and build ppp support as modules, you will need
the following in your /etc/devfsd.conf file:
LOOKUP PPP MODLOAD
Isdn4k-utils
------------
Due to changes in the length of the phone number field, isdn4k-utils
needs to be recompiled or (preferably) upgraded.
NFS-utils
---------
In 2.4 and earlier kernels, the nfs server needed to know about any
client that expected to be able to access files via NFS. This
information would be given to the kernel by "mountd" when the client
mounted the filesystem, or by "exportfs" at system startup. exportfs
would take information about active clients from /var/lib/nfs/rmtab.
This approach is quite fragile as it depends on rmtab being correct
which is not always easy, particularly when trying to implement
fail-over. Even when the system is working well, rmtab suffers from
getting lots of old entries that never get removed.
With 2.6 we have the option of having the kernel tell mountd when it
gets a request from an unknown host, and mountd can give appropriate
export information to the kernel. This removes the dependency on
rmtab and means that the kernel only needs to know about currently
active clients.
To enable this new functionality, you need to:
mount -t nfsd nfsd /proc/fs/nfs
before running exportfs or mountd. It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.
Getting updated software
========================
Kernel compilation
******************
gcc 2.95.3
----------
o <ftp://ftp.gnu.org/gnu/gcc/gcc-2.95.3.tar.gz>
Make
----
o <ftp://ftp.gnu.org/gnu/make/>
Binutils
--------
o <ftp://ftp.kernel.org/pub/linux/devel/binutils/>
System utilities
****************
Util-linux
----------
o <ftp://ftp.kernel.org/pub/linux/utils/util-linux/>
Ksymoops
--------
o <ftp://ftp.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/>
Module-Init-Tools
-----------------
o <ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/>
Mkinitrd
--------
o <ftp://rawhide.redhat.com/pub/rawhide/SRPMS/SRPMS/>
E2fsprogs
---------
o <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz>
JFSutils
--------
o <http://jfs.sourceforge.net/>
Reiserfsprogs
-------------
o <http://www.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.6.3.tar.gz>
Xfsprogs
--------
o <ftp://oss.sgi.com/projects/xfs/download/>
Pcmcia-cs
---------
o <ftp://pcmcia-cs.sourceforge.net/pub/pcmcia-cs/pcmcia-cs-3.1.21.tar.gz>
Quota-tools
----------
o <http://sourceforge.net/projects/linuxquota/>
Jade
----
o <ftp://ftp.jclark.com/pub/jade/jade-1.2.1.tar.gz>
DocBook Stylesheets
-------------------
o <http://nwalsh.com/docbook/dsssl/>
Intel P6 microcode
------------------
o <http://www.urbanmyth.org/microcode/>
Powertweak
----------
o <http://powertweak.sourceforge.net/>
udev
----
o <http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html>
Networking
**********
PPP
---
o <ftp://ftp.samba.org/pub/ppp/ppp-2.4.0.tar.gz>
Isdn4k-utils
------------
o <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/isdn4k-utils.v3.1pre1.tar.gz>
NFS-utils
---------
o <http://sourceforge.net/project/showfiles.php?group_id=14>
Iptables
--------
o <http://www.iptables.org/downloads.html>
Ip-route2
---------
o <ftp://ftp.tux.org/pub/net/ip-routing/iproute2-2.2.4-now-ss991023.tar.gz>
OProfile
--------
o <http://oprofile.sf.net/download/>
NFS-Utils
---------
o <http://nfs.sourceforge.net/>

431
Documentation/CodingStyle Normal file
View file

@ -0,0 +1,431 @@
Linux kernel coding style
This is a short document describing the preferred coding style for the
linux kernel. Coding style is very personal, and I won't _force_ my
views on anybody, but this is what goes for anything that I have to be
able to maintain, and I'd prefer it for most other things too. Please
at least consider the points made here.
First off, I'd suggest printing out a copy of the GNU coding standards,
and NOT read it. Burn them, it's a great symbolic gesture.
Anyway, here goes:
Chapter 1: Indentation
Tabs are 8 characters, and thus indentations are also 8 characters.
There are heretic movements that try to make indentations 4 (or even 2!)
characters deep, and that is akin to trying to define the value of PI to
be 3.
Rationale: The whole idea behind indentation is to clearly define where
a block of control starts and ends. Especially when you've been looking
at your screen for 20 straight hours, you'll find it a lot easier to see
how the indentation works if you have large indentations.
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
In short, 8-char indents make things easier to read, and have the added
benefit of warning you when you're nesting your functions too deep.
Heed that warning.
Don't put multiple statements on a single line unless you have
something to hide:
if (condition) do_this;
do_something_everytime;
Outside of comments, documentation and except in Kconfig, spaces are never
used for indentation, and the above example is deliberately broken.
Get a decent editor and don't leave whitespace at the end of lines.
Chapter 2: Breaking long lines and strings
Coding style is all about readability and maintainability using commonly
available tools.
The limit on the length of lines is 80 columns and this is a hard limit.
Statements longer than 80 columns will be broken into sensible chunks.
Descendants are always substantially shorter than the parent and are placed
substantially to the right. The same applies to function headers with a long
argument list. Long strings are as well broken into shorter strings.
void fun(int a, int b, int c)
{
if (condition)
printk(KERN_WARNING "Warning this is a long printk with "
"3 parameters a: %u b: %u "
"c: %u \n", a, b, c);
else
next_statement;
}
Chapter 3: Placing Braces
The other issue that always comes up in C styling is the placement of
braces. Unlike the indent size, there are few technical reasons to
choose one placement strategy over the other, but the preferred way, as
shown to us by the prophets Kernighan and Ritchie, is to put the opening
brace last on the line, and put the closing brace first, thusly:
if (x is true) {
we do y
}
However, there is one special case, namely functions: they have the
opening brace at the beginning of the next line, thus:
int function(int x)
{
body of function
}
Heretic people all over the world have claimed that this inconsistency
is ... well ... inconsistent, but all right-thinking people know that
(a) K&R are _right_ and (b) K&R are right. Besides, functions are
special anyway (you can't nest them in C).
Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a "while" in a do-statement or an "else" in an if-statement, like
this:
do {
body of do-loop
} while (condition);
and
if (x == y) {
..
} else if (x > y) {
...
} else {
....
}
Rationale: K&R.
Also, note that this brace-placement also minimizes the number of empty
(or almost empty) lines, without any loss of readability. Thus, as the
supply of new-lines on your screen is not a renewable resource (think
25-line terminal screens here), you have more empty lines to put
comments on.
Chapter 4: Naming
C is a Spartan language, and so should your naming be. Unlike Modula-2
and Pascal programmers, C programmers do not use cute names like
ThisVariableIsATemporaryCounter. A C programmer would call that
variable "tmp", which is much easier to write, and not the least more
difficult to understand.
HOWEVER, while mixed-case names are frowned upon, descriptive names for
global variables are a must. To call a global function "foo" is a
shooting offense.
GLOBAL variables (to be used only if you _really_ need them) need to
have descriptive names, as do global functions. If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should _not_ call it "cntusr()".
Encoding the type of a function into the name (so-called Hungarian
notation) is brain damaged - the compiler knows the types anyway and can
check those, and it only confuses the programmer. No wonder MicroSoft
makes buggy programs.
LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood. Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.
If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See next chapter.
Chapter 5: Functions
Functions should be short and sweet, and do just one thing. They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.
The maximum length of a function is inversely proportional to the
complexity and indentation level of that function. So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.
However, if you have a complex function, and you suspect that a
less-than-gifted first-year high-school student might not even
understand what the function is all about, you should adhere to the
maximum limits all the more closely. Use helper functions with
descriptive names (you can ask the compiler to in-line them if you think
it's performance-critical, and it will probably do a better job of it
than you would have done).
Another measure of the function is the number of local variables. They
shouldn't exceed 5-10, or you're doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused. You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.
Chapter 6: Centralized exiting of functions
Albeit deprecated by some people, the equivalent of the goto statement is
used frequently by compilers in form of the unconditional jump instruction.
The goto statement comes in handy when a function exits from multiple
locations and some common work such as cleanup has to be done.
The rationale is:
- unconditional statements are easier to understand and follow
- nesting is reduced
- errors by not updating individual exit points when making
modifications are prevented
- saves the compiler work to optimize redundant code away ;)
int fun(int )
{
int result = 0;
char *buffer = kmalloc(SIZE);
if (buffer == NULL)
return -ENOMEM;
if (condition1) {
while (loop1) {
...
}
result = 1;
goto out;
}
...
out:
kfree(buffer);
return result;
}
Chapter 7: Commenting
Comments are good, but there is also a danger of over-commenting. NEVER
try to explain HOW your code works in a comment: it's much better to
write the code so that the _working_ is obvious, and it's a waste of
time to explain badly written code.
Generally, you want your comments to tell WHAT your code does, not HOW.
Also, try to avoid putting comments inside a function body: if the
function is so complex that you need to separately comment parts of it,
you should probably go back to chapter 5 for a while. You can make
small comments to note or warn about something particularly clever (or
ugly), but try to avoid excess. Instead, put the comments at the head
of the function, telling people what it does, and possibly WHY it does
it.
Chapter 8: You've made a mess of it
That's OK, we all do. You've probably been told by your long-time Unix
user helper that "GNU emacs" automatically formats the C sources for
you, and you've noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing - an infinite number of monkeys typing into GNU emacs would never
make a good program).
So, you can either get rid of GNU emacs, or change it to use saner
values. To do the latter, you can stick the following in your .emacs file:
(defun linux-c-mode ()
"C mode with adjusted defaults for use with the Linux kernel."
(interactive)
(c-mode)
(c-set-style "K&R")
(setq tab-width 8)
(setq indent-tabs-mode t)
(setq c-basic-offset 8))
This will define the M-x linux-c-mode command. When hacking on a
module, if you put the string -*- linux-c -*- somewhere on the first
two lines, this mode will be automatically invoked. Also, you may want
to add
(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode)
auto-mode-alist))
to your .emacs file if you want to have linux-c-mode switched on
automagically when you edit source files under /usr/src/linux.
But even if you fail in getting emacs to do sane formatting, not
everything is lost: use "indent".
Now, again, GNU indent has the same brain-dead settings that GNU emacs
has, which is why you need to give it a few command line options.
However, that's not too bad, because even the makers of GNU indent
recognize the authority of K&R (the GNU people aren't evil, they are
just severely misguided in this matter), so you just give indent the
options "-kr -i8" (stands for "K&R, 8 character indents"), or use
"scripts/Lindent", which indents in the latest style.
"indent" has a lot of options, and especially when it comes to comment
re-formatting you may want to take a look at the man page. But
remember: "indent" is not a fix for bad programming.
Chapter 9: Configuration-files
For configuration options (arch/xxx/Kconfig, and all the Kconfig files),
somewhat different indentation is used.
Help text is indented with 2 spaces.
if CONFIG_EXPERIMENTAL
tristate CONFIG_BOOM
default n
help
Apply nitroglycerine inside the keyboard (DANGEROUS)
bool CONFIG_CHEER
depends on CONFIG_BOOM
default y
help
Output nice messages when you explode
endif
Generally, CONFIG_EXPERIMENTAL should surround all options not considered
stable. All options that are known to trash data (experimental write-
support for file-systems, for instance) should be denoted (DANGEROUS), other
experimental options should be denoted (EXPERIMENTAL).
Chapter 10: Data structures
Data structures that have visibility outside the single-threaded
environment they are created and destroyed in should always have
reference counts. In the kernel, garbage collection doesn't exist (and
outside the kernel garbage collection is slow and inefficient), which
means that you absolutely _have_ to reference count all your uses.
Reference counting means that you can avoid locking, and allows multiple
users to have access to the data structure in parallel - and not having
to worry about the structure suddenly going away from under them just
because they slept or did something else for a while.
Note that locking is _not_ a replacement for reference counting.
Locking is used to keep data structures coherent, while reference
counting is a memory management technique. Usually both are needed, and
they are not to be confused with each other.
Many data structures can indeed have two levels of reference counting,
when there are users of different "classes". The subclass count counts
the number of subclass users, and decrements the global count just once
when the subclass count goes to zero.
Examples of this kind of "multi-level-reference-counting" can be found in
memory management ("struct mm_struct": mm_users and mm_count), and in
filesystem code ("struct super_block": s_count and s_active).
Remember: if another thread can find your data structure, and you don't
have a reference count on it, you almost certainly have a bug.
Chapter 11: Macros, Enums, Inline functions and RTL
Names of macros defining constants and labels in enums are capitalized.
#define CONSTANT 0x12345
Enums are preferred when defining several related constants.
CAPITALIZED macro names are appreciated but macros resembling functions
may be named in lower case.
Generally, inline functions are preferable to macros resembling functions.
Macros with multiple statements should be enclosed in a do - while block:
#define macrofun(a, b, c) \
do { \
if (a == 5) \
do_this(b, c); \
} while (0)
Things to avoid when using macros:
1) macros that affect control flow:
#define FOO(x) \
do { \
if (blah(x) < 0) \
return -EBUGGERED; \
} while(0)
is a _very_ bad idea. It looks like a function call but exits the "calling"
function; don't break the internal parsers of those who will read the code.
2) macros that depend on having a local variable with a magic name:
#define FOO(val) bar(index, val)
might look like a good thing, but it's confusing as hell when one reads the
code and it's prone to breakage from seemingly innocent changes.
3) macros with arguments that are used as l-values: FOO(x) = y; will
bite you if somebody e.g. turns FOO into an inline function.
4) forgetting about precedence: macros defining constants using expressions
must enclose the expression in parentheses. Beware of similar issues with
macros using parameters.
#define CONSTANT 0x4000
#define CONSTEXP (CONSTANT | 3)
The cpp manual deals with macros exhaustively. The gcc internals manual also
covers RTL which is used frequently with assembly language in the kernel.
Chapter 12: Printing kernel messages
Kernel developers like to be seen as literate. Do mind the spelling
of kernel messages to make a good impression. Do not use crippled
words like "dont" and use "do not" or "don't" instead.
Kernel messages do not have to be terminated with a period.
Printing numbers in parentheses (%d) adds no value and should be avoided.
Chapter 13: References
The C Programming Language, Second Edition
by Brian W. Kernighan and Dennis M. Ritchie.
Prentice Hall, Inc., 1988.
ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback).
URL: http://cm.bell-labs.com/cm/cs/cbook/
The Practice of Programming
by Brian W. Kernighan and Rob Pike.
Addison-Wesley, Inc., 1999.
ISBN 0-201-61586-X.
URL: http://cm.bell-labs.com/cm/cs/tpop/
GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
gcc internals and indent, all available from http://www.gnu.org
WG14 is the international standardization working group for the programming
language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/
--
Last updated on 16 February 2004 by a community effort on LKML.

526
Documentation/DMA-API.txt Normal file
View file

@ -0,0 +1,526 @@
Dynamic DMA mapping using the generic device
============================================
James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
This document describes the DMA API. For a more gentle introduction
phrased in terms of the pci_ equivalents (and actual examples) see
DMA-mapping.txt
This API is split into two pieces. Part I describes the API and the
corresponding pci_ API. Part II describes the extensions to the API
for supporting non-consistent memory machines. Unless you know that
your driver absolutely has to support non-consistent platforms (this
is usually only legacy platforms) you should only use the API
described in part I.
Part I - pci_ and dma_ Equivalent API
-------------------------------------
To get the pci_ API, you must #include <linux/pci.h>
To get the dma_ API, you must #include <linux/dma-mapping.h>
Part Ia - Using large dma-coherent buffers
------------------------------------------
void *
dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, int flag)
void *
pci_alloc_consistent(struct pci_dev *dev, size_t size,
dma_addr_t *dma_handle)
Consistent memory is memory for which a write by either the device or
the processor can immediately be read by the processor or device
without having to worry about caching effects.
This routine allocates a region of <size> bytes of consistent memory.
it also returns a <dma_handle> which may be cast to an unsigned
integer the same width as the bus and used as the physical address
base of the region.
Returns: a pointer to the allocated region (in the processor's virtual
address space) or NULL if the allocation failed.
Note: consistent memory can be expensive on some platforms, and the
minimum allocation length may be as big as a page, so you should
consolidate your requests for consistent memory as much as possible.
The simplest way to do that is to use the dma_pool calls (see below).
The flag parameter (dma_alloc_coherent only) allows the caller to
specify the GFP_ flags (see kmalloc) for the allocation (the
implementation may chose to ignore flags that affect the location of
the returned memory, like GFP_DMA). For pci_alloc_consistent, you
must assume GFP_ATOMIC behaviour.
void
dma_free_coherent(struct device *dev, size_t size, void *cpu_addr
dma_addr_t dma_handle)
void
pci_free_consistent(struct pci_dev *dev, size_t size, void *cpu_addr
dma_addr_t dma_handle)
Free the region of consistent memory you previously allocated. dev,
size and dma_handle must all be the same as those passed into the
consistent allocate. cpu_addr must be the virtual address returned by
the consistent allocate
Part Ib - Using small dma-coherent buffers
------------------------------------------
To get this part of the dma_ API, you must #include <linux/dmapool.h>
Many drivers need lots of small dma-coherent memory regions for DMA
descriptors or I/O buffers. Rather than allocating in units of a page
or more using dma_alloc_coherent(), you can use DMA pools. These work
much like a kmem_cache_t, except that they use the dma-coherent allocator
not __get_free_pages(). Also, they understand common hardware constraints
for alignment, like queue heads needing to be aligned on N byte boundaries.
struct dma_pool *
dma_pool_create(const char *name, struct device *dev,
size_t size, size_t align, size_t alloc);
struct pci_pool *
pci_pool_create(const char *name, struct pci_device *dev,
size_t size, size_t align, size_t alloc);
The pool create() routines initialize a pool of dma-coherent buffers
for use with a given device. It must be called in a context which
can sleep.
The "name" is for diagnostics (like a kmem_cache_t name); dev and size
are like what you'd pass to dma_alloc_coherent(). The device's hardware
alignment requirement for this type of data is "align" (which is expressed
in bytes, and must be a power of two). If your device has no boundary
crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
from this pool must not cross 4KByte boundaries.
void *dma_pool_alloc(struct dma_pool *pool, int gfp_flags,
dma_addr_t *dma_handle);
void *pci_pool_alloc(struct pci_pool *pool, int gfp_flags,
dma_addr_t *dma_handle);
This allocates memory from the pool; the returned memory will meet the size
and alignment requirements specified at creation time. Pass GFP_ATOMIC to
prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks)
pass GFP_KERNEL to allow blocking. Like dma_alloc_coherent(), this returns
two values: an address usable by the cpu, and the dma address usable by the
pool's device.
void dma_pool_free(struct dma_pool *pool, void *vaddr,
dma_addr_t addr);
void pci_pool_free(struct pci_pool *pool, void *vaddr,
dma_addr_t addr);
This puts memory back into the pool. The pool is what was passed to
the the pool allocation routine; the cpu and dma addresses are what
were returned when that routine allocated the memory being freed.
void dma_pool_destroy(struct dma_pool *pool);
void pci_pool_destroy(struct pci_pool *pool);
The pool destroy() routines free the resources of the pool. They must be
called in a context which can sleep. Make sure you've freed all allocated
memory back to the pool before you destroy it.
Part Ic - DMA addressing limitations
------------------------------------
int
dma_supported(struct device *dev, u64 mask)
int
pci_dma_supported(struct device *dev, u64 mask)
Checks to see if the device can support DMA to the memory described by
mask.
Returns: 1 if it can and 0 if it can't.
Notes: This routine merely tests to see if the mask is possible. It
won't change the current mask settings. It is more intended as an
internal API for use by the platform than an external API for use by
driver writers.
int
dma_set_mask(struct device *dev, u64 mask)
int
pci_set_dma_mask(struct pci_device *dev, u64 mask)
Checks to see if the mask is possible and updates the device
parameters if it is.
Returns: 0 if successful and a negative error if not.
u64
dma_get_required_mask(struct device *dev)
After setting the mask with dma_set_mask(), this API returns the
actual mask (within that already set) that the platform actually
requires to operate efficiently. Usually this means the returned mask
is the minimum required to cover all of memory. Examining the
required mask gives drivers with variable descriptor sizes the
opportunity to use smaller descriptors as necessary.
Requesting the required mask does not alter the current mask. If you
wish to take advantage of it, you should issue another dma_set_mask()
call to lower the mask again.
Part Id - Streaming DMA mappings
--------------------------------
dma_addr_t
dma_map_single(struct device *dev, void *cpu_addr, size_t size,
enum dma_data_direction direction)
dma_addr_t
pci_map_single(struct device *dev, void *cpu_addr, size_t size,
int direction)
Maps a piece of processor virtual memory so it can be accessed by the
device and returns the physical handle of the memory.
The direction for both api's may be converted freely by casting.
However the dma_ API uses a strongly typed enumerator for its
direction:
DMA_NONE = PCI_DMA_NONE no direction (used for
debugging)
DMA_TO_DEVICE = PCI_DMA_TODEVICE data is going from the
memory to the device
DMA_FROM_DEVICE = PCI_DMA_FROMDEVICE data is coming from
the device to the
memory
DMA_BIDIRECTIONAL = PCI_DMA_BIDIRECTIONAL direction isn't known
Notes: Not all memory regions in a machine can be mapped by this
API. Further, regions that appear to be physically contiguous in
kernel virtual space may not be contiguous as physical memory. Since
this API does not provide any scatter/gather capability, it will fail
if the user tries to map a non physically contiguous piece of memory.
For this reason, it is recommended that memory mapped by this API be
obtained only from sources which guarantee to be physically contiguous
(like kmalloc).
Further, the physical address of the memory must be within the
dma_mask of the device (the dma_mask represents a bit mask of the
addressable region for the device. i.e. if the physical address of
the memory anded with the dma_mask is still equal to the physical
address, then the device can perform DMA to the memory). In order to
ensure that the memory allocated by kmalloc is within the dma_mask,
the driver may specify various platform dependent flags to restrict
the physical memory range of the allocation (e.g. on x86, GFP_DMA
guarantees to be within the first 16Mb of available physical memory,
as required by ISA devices).
Note also that the above constraints on physical contiguity and
dma_mask may not apply if the platform has an IOMMU (a device which
supplies a physical to virtual mapping between the I/O memory bus and
the device). However, to be portable, device driver writers may *not*
assume that such an IOMMU exists.
Warnings: Memory coherency operates at a granularity called the cache
line width. In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache line
boundary and end exactly on one (to prevent two separately mapped
regions from sharing a single cache line). Since the cache line size
may not be known at compile time, the API will not enforce this
requirement. Therefore, it is recommended that driver writers who
don't take special care to determine the cache line size at run time
only map virtual regions that begin and end on page boundaries (which
are guaranteed also to be cache line boundaries).
DMA_TO_DEVICE synchronisation must be done after the last modification
of the memory region by the software and before it is handed off to
the driver. Once this primitive is used. Memory covered by this
primitive should be treated as read only by the device. If the device
may write to it at any point, it should be DMA_BIDIRECTIONAL (see
below).
DMA_FROM_DEVICE synchronisation must be done before the driver
accesses data that may be changed by the device. This memory should
be treated as read only by the driver. If the driver needs to write
to it at any point, it should be DMA_BIDIRECTIONAL (see below).
DMA_BIDIRECTIONAL requires special handling: it means that the driver
isn't sure if the memory was modified before being handed off to the
device and also isn't sure if the device will also modify it. Thus,
you must always sync bidirectional memory twice: once before the
memory is handed off to the device (to make sure all memory changes
are flushed from the processor) and once before the data may be
accessed after being used by the device (to make sure any processor
cache lines are updated with data that the device may have changed.
void
dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
enum dma_data_direction direction)
void
pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
size_t size, int direction)
Unmaps the region previously mapped. All the parameters passed in
must be identical to those passed in (and returned) by the mapping
API.
dma_addr_t
dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction direction)
dma_addr_t
pci_map_page(struct pci_dev *hwdev, struct page *page,
unsigned long offset, size_t size, int direction)
void
dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
enum dma_data_direction direction)
void
pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
size_t size, int direction)
API for mapping and unmapping for pages. All the notes and warnings
for the other mapping APIs apply here. Also, although the <offset>
and <size> parameters are provided to do partial page mapping, it is
recommended that you never use these unless you really know what the
cache width is.
int
dma_mapping_error(dma_addr_t dma_addr)
int
pci_dma_mapping_error(dma_addr_t dma_addr)
In some circumstances dma_map_single and dma_map_page will fail to create
a mapping. A driver can check for these errors by testing the returned
dma address with dma_mapping_error(). A non zero return value means the mapping
could not be created and the driver should take appropriate action (eg
reduce current DMA mapping usage or delay and try again later).
int
dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction direction)
int
pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
int nents, int direction)
Maps a scatter gather list from the block layer.
Returns: the number of physical segments mapped (this may be shorted
than <nents> passed in if the block layer determines that some
elements of the scatter/gather list are physically adjacent and thus
may be mapped with a single entry).
Please note that the sg cannot be mapped again if it has been mapped once.
The mapping process is allowed to destroy information in the sg.
As with the other mapping interfaces, dma_map_sg can fail. When it
does, 0 is returned and a driver must take appropriate action. It is
critical that the driver do something, in the case of a block driver
aborting the request or even oopsing is better than doing nothing and
corrupting the filesystem.
void
dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
enum dma_data_direction direction)
void
pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
int nents, int direction)
unmap the previously mapped scatter/gather list. All the parameters
must be the same as those and passed in to the scatter/gather mapping
API.
Note: <nents> must be the number you passed in, *not* the number of
physical entries returned.
void
dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
enum dma_data_direction direction)
void
pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle,
size_t size, int direction)
void
dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems,
enum dma_data_direction direction)
void
pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
int nelems, int direction)
synchronise a single contiguous or scatter/gather mapping. All the
parameters must be the same as those passed into the single mapping
API.
Notes: You must do this:
- Before reading values that have been written by DMA from the device
(use the DMA_FROM_DEVICE direction)
- After writing values that will be written to the device using DMA
(use the DMA_TO_DEVICE) direction
- before *and* after handing memory to the device if the memory is
DMA_BIDIRECTIONAL
See also dma_map_single().
Part II - Advanced dma_ usage
-----------------------------
Warning: These pieces of the DMA API have no PCI equivalent. They
should also not be used in the majority of cases, since they cater for
unlikely corner cases that don't belong in usual drivers.
If you don't understand how cache line coherency works between a
processor and an I/O device, you should not be using this part of the
API at all.
void *
dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, int flag)
Identical to dma_alloc_coherent() except that the platform will
choose to return either consistent or non-consistent memory as it sees
fit. By using this API, you are guaranteeing to the platform that you
have all the correct and necessary sync points for this memory in the
driver should it choose to return non-consistent memory.
Note: where the platform can return consistent memory, it will
guarantee that the sync points become nops.
Warning: Handling non-consistent memory is a real pain. You should
only ever use this API if you positively know your driver will be
required to work on one of the rare (usually non-PCI) architectures
that simply cannot make consistent memory.
void
dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_handle)
free memory allocated by the nonconsistent API. All parameters must
be identical to those passed in (and returned by
dma_alloc_noncoherent()).
int
dma_is_consistent(dma_addr_t dma_handle)
returns true if the memory pointed to by the dma_handle is actually
consistent.
int
dma_get_cache_alignment(void)
returns the processor cache alignment. This is the absolute minimum
alignment *and* width that you must observe when either mapping
memory or doing partial flushes.
Notes: This API may return a number *larger* than the actual cache
line, but it will guarantee that one or more cache lines fit exactly
into the width returned by this call. It will also always be a power
of two for easy alignment
void
dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
unsigned long offset, size_t size,
enum dma_data_direction direction)
does a partial sync. starting at offset and continuing for size. You
must be careful to observe the cache alignment and width when doing
anything like this. You must also be extra careful about accessing
memory you intend to sync partially.
void
dma_cache_sync(void *vaddr, size_t size,
enum dma_data_direction direction)
Do a partial sync of memory that was allocated by
dma_alloc_noncoherent(), starting at virtual address vaddr and
continuing on for size. Again, you *must* observe the cache line
boundaries when doing this.
int
dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
dma_addr_t device_addr, size_t size, int
flags)
Declare region of memory to be handed out by dma_alloc_coherent when
it's asked for coherent memory for this device.
bus_addr is the physical address to which the memory is currently
assigned in the bus responding region (this will be used by the
platform to perform the mapping)
device_addr is the physical address the device needs to be programmed
with actually to address this memory (this will be handed out as the
dma_addr_t in dma_alloc_coherent())
size is the size of the area (must be multiples of PAGE_SIZE).
flags can be or'd together and are
DMA_MEMORY_MAP - request that the memory returned from
dma_alloc_coherent() be directly writeable.
DMA_MEMORY_IO - request that the memory returned from
dma_alloc_coherent() be addressable using read/write/memcpy_toio etc.
One or both of these flags must be present
DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by
dma_alloc_coherent of any child devices of this one (for memory residing
on a bridge).
DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions.
Do not allow dma_alloc_coherent() to fall back to system memory when
it's out of memory in the declared region.
The return value will be either DMA_MEMORY_MAP or DMA_MEMORY_IO and
must correspond to a passed in flag (i.e. no returning DMA_MEMORY_IO
if only DMA_MEMORY_MAP were passed in) for success or zero for
failure.
Note, for DMA_MEMORY_IO returns, all subsequent memory returned by
dma_alloc_coherent() may no longer be accessed directly, but instead
must be accessed using the correct bus functions. If your driver
isn't prepared to handle this contingency, it should not specify
DMA_MEMORY_IO in the input flags.
As a simplification for the platforms, only *one* such region of
memory may be declared per device.
For reasons of efficiency, most platforms choose to track the declared
region only at the granularity of a page. For smaller allocations,
you should use the dma_pool() API.
void
dma_release_declared_memory(struct device *dev)
Remove the memory region previously declared from the system. This
API performs *no* in-use checking for this region and will return
unconditionally having removed all the required structures. It is the
drivers job to ensure that no parts of this memory region are
currently in use.
void *
dma_mark_declared_memory_occupied(struct device *dev,
dma_addr_t device_addr, size_t size)
This is used to occupy specific regions of the declared space
(dma_alloc_coherent() will hand out the first free region it finds).
device_addr is the *device* address of the region requested
size is the size (and should be a page sized multiple).
The return value will be either a pointer to the processor virtual
address of the memory, or an error (via PTR_ERR()) if any part of the
region is occupied.

View file

@ -0,0 +1,881 @@
Dynamic DMA mapping
===================
David S. Miller <davem@redhat.com>
Richard Henderson <rth@cygnus.com>
Jakub Jelinek <jakub@redhat.com>
This document describes the DMA mapping system in terms of the pci_
API. For a similar API that works for generic devices, see
DMA-API.txt.
Most of the 64bit platforms have special hardware that translates bus
addresses (DMA addresses) into physical addresses. This is similar to
how page tables and/or a TLB translates virtual addresses to physical
addresses on a CPU. This is needed so that e.g. PCI devices can
access with a Single Address Cycle (32bit DMA address) any page in the
64bit physical address space. Previously in Linux those 64bit
platforms had to set artificial limits on the maximum RAM size in the
system, so that the virt_to_bus() static scheme works (the DMA address
translation tables were simply filled on bootup to map each bus
address to the physical page __pa(bus_to_virt())).
So that Linux can use the dynamic DMA mapping, it needs some help from the
drivers, namely it has to take into account that DMA addresses should be
mapped only for the time they are actually used and unmapped after the DMA
transfer.
The following API will work of course even on platforms where no such
hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on
top of the virt_to_bus interface.
First of all, you should make sure
#include <linux/pci.h>
is in your driver. This file will obtain for you the definition of the
dma_addr_t (which can hold any valid DMA address for the platform)
type which should be used everywhere you hold a DMA (bus) address
returned from the DMA mapping functions.
What memory is DMA'able?
The first piece of information you must know is what kernel memory can
be used with the DMA mapping facilities. There has been an unwritten
set of rules regarding this, and this text is an attempt to finally
write them down.
If you acquired your memory via the page allocator
(i.e. __get_free_page*()) or the generic memory allocators
(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from
that memory using the addresses returned from those routines.
This means specifically that you may _not_ use the memory/addresses
returned from vmalloc() for DMA. It is possible to DMA to the
_underlying_ memory mapped into a vmalloc() area, but this requires
walking page tables to get the physical addresses, and then
translating each of those pages back to a kernel address using
something like __va(). [ EDIT: Update this when we integrate
Gerd Knorr's generic code which does this. ]
This rule also means that you may not use kernel image addresses
(ie. items in the kernel's data/text/bss segment, or your driver's)
nor may you use kernel stack addresses for DMA. Both of these items
might be mapped somewhere entirely different than the rest of physical
memory.
Also, this means that you cannot take the return of a kmap()
call and DMA to/from that. This is similar to vmalloc().
What about block I/O and networking buffers? The block I/O and
networking subsystems make sure that the buffers they use are valid
for you to DMA from/to.
DMA addressing limitations
Does your device have any DMA addressing limitations? For example, is
your device only capable of driving the low order 24-bits of address
on the PCI bus for SAC DMA transfers? If so, you need to inform the
PCI layer of this fact.
By default, the kernel assumes that your device can address the full
32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs
to be increased. And for a device with limitations, as discussed in
the previous paragraph, it needs to be decreased.
pci_alloc_consistent() by default will return 32-bit DMA addresses.
PCI-X specification requires PCI-X devices to support 64-bit
addressing (DAC) for all transactions. And at least one platform (SGI
SN2) requires 64-bit consistent allocations to operate correctly when
the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(),
it's good practice to call pci_set_consistent_dma_mask() to set the
appropriate mask even if your device only supports 32-bit DMA
(default) and especially if it's a PCI-X device.
For correct operation, you must interrogate the PCI layer in your
device probe routine to see if the PCI controller on the machine can
properly support the DMA addressing limitation your device has. It is
good style to do this even if your device holds the default setting,
because this shows that you did think about these issues wrt. your
device.
The query is performed via a call to pci_set_dma_mask():
int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask);
The query for consistent allocations is performed via a a call to
pci_set_consistent_dma_mask():
int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask);
Here, pdev is a pointer to the PCI device struct of your device, and
device_mask is a bit mask describing which bits of a PCI address your
device supports. It returns zero if your card can perform DMA
properly on the machine given the address mask you provided.
If it returns non-zero, your device can not perform DMA properly on
this platform, and attempting to do so will result in undefined
behavior. You must either use a different mask, or not use DMA.
This means that in the failure case, you have three options:
1) Use another DMA mask, if possible (see below).
2) Use some non-DMA mode for data transfer, if possible.
3) Ignore this device and do not initialize it.
It is recommended that your driver print a kernel KERN_WARNING message
when you end up performing either #2 or #3. In this manner, if a user
of your driver reports that performance is bad or that the device is not
even detected, you can ask them for the kernel messages to find out
exactly why.
The standard 32-bit addressing PCI device would do something like
this:
if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
}
Another common scenario is a 64-bit capable device. The approach
here is to try for 64-bit DAC addressing, but back down to a
32-bit mask should that fail. The PCI platform code may fail the
64-bit mask not because the platform is not capable of 64-bit
addressing. Rather, it may fail in this case simply because
32-bit SAC addressing is done more efficiently than DAC addressing.
Sparc64 is one platform which behaves in this way.
Here is how you would handle a 64-bit capable device which can drive
all 64-bits when accessing streaming DMA:
int using_dac;
if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) {
using_dac = 1;
} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
using_dac = 0;
} else {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
}
If a card is capable of using 64-bit consistent allocations as well,
the case would look like this:
int using_dac, consistent_using_dac;
if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) {
using_dac = 1;
consistent_using_dac = 1;
pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
using_dac = 0;
consistent_using_dac = 0;
pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK);
} else {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
}
pci_set_consistent_dma_mask() will always be able to set the same or a
smaller mask as pci_set_dma_mask(). However for the rare case that a
device driver only uses consistent allocations, one would have to
check the return value from pci_set_consistent_dma_mask().
If your 64-bit device is going to be an enormous consumer of DMA
mappings, this can be problematic since the DMA mappings are a
finite resource on many platforms. Please see the "DAC Addressing
for Address Space Hungry Devices" section near the end of this
document for how to handle this case.
Finally, if your device can only drive the low 24-bits of
address during PCI bus mastering you might do something like:
if (pci_set_dma_mask(pdev, 0x00ffffff)) {
printk(KERN_WARNING
"mydev: 24-bit DMA addressing not available.\n");
goto ignore_this_device;
}
When pci_set_dma_mask() is successful, and returns zero, the PCI layer
saves away this mask you have provided. The PCI layer will use this
information later when you make DMA mappings.
There is a case which we are aware of at this time, which is worth
mentioning in this documentation. If your device supports multiple
functions (for example a sound card provides playback and record
functions) and the various different functions have _different_
DMA addressing limitations, you may wish to probe each mask and
only provide the functionality which the machine can handle. It
is important that the last call to pci_set_dma_mask() be for the
most specific mask.
Here is pseudo-code showing how this might be done:
#define PLAYBACK_ADDRESS_BITS DMA_32BIT_MASK
#define RECORD_ADDRESS_BITS 0x00ffffff
struct my_sound_card *card;
struct pci_dev *pdev;
...
if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) {
card->playback_enabled = 1;
} else {
card->playback_enabled = 0;
printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n",
card->name);
}
if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) {
card->record_enabled = 1;
} else {
card->record_enabled = 0;
printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n",
card->name);
}
A sound card was used as an example here because this genre of PCI
devices seems to be littered with ISA chips given a PCI front end,
and thus retaining the 16MB DMA addressing limitations of ISA.
Types of DMA mappings
There are two types of DMA mappings:
- Consistent DMA mappings which are usually mapped at driver
initialization, unmapped at the end and for which the hardware should
guarantee that the device and the CPU can access the data
in parallel and will see updates made by each other without any
explicit software flushing.
Think of "consistent" as "synchronous" or "coherent".
The current default is to return consistent memory in the low 32
bits of the PCI bus space. However, for future compatibility you
should set the consistent mask even if this default is fine for your
driver.
Good examples of what to use consistent mappings for are:
- Network card DMA ring descriptors.
- SCSI adapter mailbox command data structures.
- Device firmware microcode executed out of
main memory.
The invariant these examples all require is that any CPU store
to memory is immediately visible to the device, and vice
versa. Consistent mappings guarantee this.
IMPORTANT: Consistent DMA memory does not preclude the usage of
proper memory barriers. The CPU may reorder stores to
consistent memory just as it may normal memory. Example:
if it is important for the device to see the first word
of a descriptor updated before the second, you must do
something like:
desc->word0 = address;
wmb();
desc->word1 = DESC_VALID;
in order to get correct behavior on all platforms.
- Streaming DMA mappings which are usually mapped for one DMA transfer,
unmapped right after it (unless you use pci_dma_sync_* below) and for which
hardware can optimize for sequential accesses.
This of "streaming" as "asynchronous" or "outside the coherency
domain".
Good examples of what to use streaming mappings for are:
- Networking buffers transmitted/received by a device.
- Filesystem buffers written/read by a SCSI device.
The interfaces for using this type of mapping were designed in
such a way that an implementation can make whatever performance
optimizations the hardware allows. To this end, when using
such mappings you must be explicit about what you want to happen.
Neither type of DMA mapping has alignment restrictions that come
from PCI, although some devices may have such restrictions.
Using Consistent DMA mappings.
To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
you should do:
dma_addr_t dma_handle;
cpu_addr = pci_alloc_consistent(dev, size, &dma_handle);
where dev is a struct pci_dev *. You should pass NULL for PCI like buses
where devices don't have struct pci_dev (like ISA, EISA). This may be
called in interrupt context.
This argument is needed because the DMA translations may be bus
specific (and often is private to the bus which the device is attached
to).
Size is the length of the region you want to allocate, in bytes.
This routine will allocate RAM for that region, so it acts similarly to
__get_free_pages (but takes size instead of a page order). If your
driver needs regions sized smaller than a page, you may prefer using
the pci_pool interface, described below.
The consistent DMA mapping interfaces, for non-NULL dev, will by
default return a DMA address which is SAC (Single Address Cycle)
addressable. Even if the device indicates (via PCI dma mask) that it
may address the upper 32-bits and thus perform DAC cycles, consistent
allocation will only return > 32-bit PCI addresses for DMA if the
consistent dma mask has been explicitly changed via
pci_set_consistent_dma_mask(). This is true of the pci_pool interface
as well.
pci_alloc_consistent returns two values: the virtual address which you
can use to access it from the CPU and dma_handle which you pass to the
card.
The cpu return address and the DMA bus master address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size. This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary.
To unmap and free such a DMA region, you call:
pci_free_consistent(dev, size, cpu_addr, dma_handle);
where dev, size are the same as in the above call and cpu_addr and
dma_handle are the values pci_alloc_consistent returned to you.
This function may not be called in interrupt context.
If your driver needs lots of smaller memory regions, you can write
custom code to subdivide pages returned by pci_alloc_consistent,
or you can use the pci_pool API to do that. A pci_pool is like
a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages.
Also, it understands common hardware constraints for alignment,
like queue heads needing to be aligned on N byte boundaries.
Create a pci_pool like this:
struct pci_pool *pool;
pool = pci_pool_create(name, dev, size, align, alloc);
The "name" is for diagnostics (like a kmem_cache name); dev and size
are as above. The device's hardware alignment requirement for this
type of data is "align" (which is expressed in bytes, and must be a
power of two). If your device has no boundary crossing restrictions,
pass 0 for alloc; passing 4096 says memory allocated from this pool
must not cross 4KByte boundaries (but at that time it may be better to
go for pci_alloc_consistent directly instead).
Allocate memory from a pci pool like this:
cpu_addr = pci_pool_alloc(pool, flags, &dma_handle);
flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor
holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent,
this returns two values, cpu_addr and dma_handle.
Free memory that was allocated from a pci_pool like this:
pci_pool_free(pool, cpu_addr, dma_handle);
where pool is what you passed to pci_pool_alloc, and cpu_addr and
dma_handle are the values pci_pool_alloc returned. This function
may be called in interrupt context.
Destroy a pci_pool by calling:
pci_pool_destroy(pool);
Make sure you've called pci_pool_free for all memory allocated
from a pool before you destroy the pool. This function may not
be called in interrupt context.
DMA Direction
The interfaces described in subsequent portions of this document
take a DMA direction argument, which is an integer and takes on
one of the following values:
PCI_DMA_BIDIRECTIONAL
PCI_DMA_TODEVICE
PCI_DMA_FROMDEVICE
PCI_DMA_NONE
One should provide the exact DMA direction if you know it.
PCI_DMA_TODEVICE means "from main memory to the PCI device"
PCI_DMA_FROMDEVICE means "from the PCI device to main memory"
It is the direction in which the data moves during the DMA
transfer.
You are _strongly_ encouraged to specify this as precisely
as you possibly can.
If you absolutely cannot know the direction of the DMA transfer,
specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in
either direction. The platform guarantees that you may legally
specify this, and that it will work, but this may be at the
cost of performance for example.
The value PCI_DMA_NONE is to be used for debugging. One can
hold this in a data structure before you come to know the
precise direction, and this will help catch cases where your
direction tracking logic has failed to set things up properly.
Another advantage of specifying this value precisely (outside of
potential platform-specific optimizations of such) is for debugging.
Some platforms actually have a write permission boolean which DMA
mappings can be marked with, much like page protections in the user
program address space. Such platforms can and do report errors in the
kernel logs when the PCI controller hardware detects violation of the
permission setting.
Only streaming mappings specify a direction, consistent mappings
implicitly have a direction attribute setting of
PCI_DMA_BIDIRECTIONAL.
The SCSI subsystem provides mechanisms for you to easily obtain
the direction to use, in the SCSI command:
scsi_to_pci_dma_dir(SCSI_DIRECTION)
Where SCSI_DIRECTION is obtained from the 'sc_data_direction'
member of the SCSI command your driver is working on. The
mentioned interface above returns a value suitable for passing
into the streaming DMA mapping interfaces below.
For Networking drivers, it's a rather simple affair. For transmit
packets, map/unmap them with the PCI_DMA_TODEVICE direction
specifier. For receive packets, just the opposite, map/unmap them
with the PCI_DMA_FROMDEVICE direction specifier.
Using Streaming DMA mappings
The streaming DMA mapping routines can be called from interrupt
context. There are two versions of each map/unmap, one which will
map/unmap a single memory region, and one which will map/unmap a
scatterlist.
To map a single region, you do:
struct pci_dev *pdev = mydev->pdev;
dma_addr_t dma_handle;
void *addr = buffer->ptr;
size_t size = buffer->len;
dma_handle = pci_map_single(dev, addr, size, direction);
and to unmap it:
pci_unmap_single(dev, dma_handle, size, direction);
You should call pci_unmap_single when the DMA activity is finished, e.g.
from the interrupt which told you that the DMA transfer is done.
Using cpu pointers like this for single mappings has a disadvantage,
you cannot reference HIGHMEM memory in this way. Thus, there is a
map/unmap interface pair akin to pci_{map,unmap}_single. These
interfaces deal with page/offset pairs instead of cpu pointers.
Specifically:
struct pci_dev *pdev = mydev->pdev;
dma_addr_t dma_handle;
struct page *page = buffer->page;
unsigned long offset = buffer->offset;
size_t size = buffer->len;
dma_handle = pci_map_page(dev, page, offset, size, direction);
...
pci_unmap_page(dev, dma_handle, size, direction);
Here, "offset" means byte offset within the given page.
With scatterlists, you map a region gathered from several regions by:
int i, count = pci_map_sg(dev, sglist, nents, direction);
struct scatterlist *sg;
for (i = 0, sg = sglist; i < count; i++, sg++) {
hw_address[i] = sg_dma_address(sg);
hw_len[i] = sg_dma_len(sg);
}
where nents is the number of entries in the sglist.
The implementation is free to merge several consecutive sglist entries
into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
consecutive sglist entries can be merged into one provided the first one
ends and the second one starts on a page boundary - in fact this is a huge
advantage for cards which either cannot do scatter-gather or have very
limited number of scatter-gather entries) and returns the actual number
of sg entries it mapped them to. On failure 0 is returned.
Then you should loop count times (note: this can be less than nents times)
and use sg_dma_address() and sg_dma_len() macros where you previously
accessed sg->address and sg->length as shown above.
To unmap a scatterlist, just call:
pci_unmap_sg(dev, sglist, nents, direction);
Again, make sure DMA activity has already finished.
PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be
the _same_ one you passed into the pci_map_sg call,
it should _NOT_ be the 'count' value _returned_ from the
pci_map_sg call.
Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}
counterpart, because the bus address space is a shared resource (although
in some ports the mapping is per each BUS so less devices contend for the
same bus address space) and you could render the machine unusable by eating
all bus addresses.
If you need to use the same streaming DMA region multiple times and touch
the data in between the DMA transfers, the buffer needs to be synced
properly in order for the cpu and device to see the most uptodate and
correct copy of the DMA buffer.
So, firstly, just map it with pci_map_{single,sg}, and after each DMA
transfer call either:
pci_dma_sync_single_for_cpu(dev, dma_handle, size, direction);
or:
pci_dma_sync_sg_for_cpu(dev, sglist, nents, direction);
as appropriate.
Then, if you wish to let the device get at the DMA area again,
finish accessing the data with the cpu, and then before actually
giving the buffer to the hardware call either:
pci_dma_sync_single_for_device(dev, dma_handle, size, direction);
or:
pci_dma_sync_sg_for_device(dev, sglist, nents, direction);
as appropriate.
After the last DMA transfer call one of the DMA unmap routines
pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*
call till pci_unmap_*, then you don't have to call the pci_dma_sync_*
routines at all.
Here is pseudo code which shows a situation in which you would need
to use the pci_dma_sync_*() interfaces.
my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
{
dma_addr_t mapping;
mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE);
cp->rx_buf = buffer;
cp->rx_len = len;
cp->rx_dma = mapping;
give_rx_buf_to_card(cp);
}
...
my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs)
{
struct my_card *cp = devid;
...
if (read_card_status(cp) == RX_BUF_TRANSFERRED) {
struct my_card_header *hp;
/* Examine the header to see if we wish
* to accept the data. But synchronize
* the DMA transfer with the CPU first
* so that we see updated contents.
*/
pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma,
cp->rx_len,
PCI_DMA_FROMDEVICE);
/* Now it is safe to examine the buffer. */
hp = (struct my_card_header *) cp->rx_buf;
if (header_is_ok(hp)) {
pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len,
PCI_DMA_FROMDEVICE);
pass_to_upper_layers(cp->rx_buf);
make_and_setup_new_rx_buf(cp);
} else {
/* Just sync the buffer and give it back
* to the card.
*/
pci_dma_sync_single_for_device(cp->pdev,
cp->rx_dma,
cp->rx_len,
PCI_DMA_FROMDEVICE);
give_rx_buf_to_card(cp);
}
}
}
Drivers converted fully to this interface should not use virt_to_bus any
longer, nor should they use bus_to_virt. Some drivers have to be changed a
little bit, because there is no longer an equivalent to bus_to_virt in the
dynamic DMA mapping scheme - you have to always store the DMA addresses
returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single
calls (pci_map_sg stores them in the scatterlist itself if the platform
supports dynamic DMA mapping in hardware) in your driver structures and/or
in the card registers.
All PCI drivers should be using these interfaces with no exceptions.
It is planned to completely remove virt_to_bus() and bus_to_virt() as
they are entirely deprecated. Some ports already do not provide these
as it is impossible to correctly support them.
64-bit DMA and DAC cycle support
Do you understand all of the text above? Great, then you already
know how to use 64-bit DMA addressing under Linux. Simply make
the appropriate pci_set_dma_mask() calls based upon your cards
capabilities, then use the mapping APIs above.
It is that simple.
Well, not for some odd devices. See the next section for information
about that.
DAC Addressing for Address Space Hungry Devices
There exists a class of devices which do not mesh well with the PCI
DMA mapping API. By definition these "mappings" are a finite
resource. The number of total available mappings per bus is platform
specific, but there will always be a reasonable amount.
What is "reasonable"? Reasonable means that networking and block I/O
devices need not worry about using too many mappings.
As an example of a problematic device, consider compute cluster cards.
They can potentially need to access gigabytes of memory at once via
DMA. Dynamic mappings are unsuitable for this kind of access pattern.
To this end we've provided a small API by which a device driver
may use DAC cycles to directly address all of physical memory.
Not all platforms support this, but most do. It is easy to determine
whether the platform will work properly at probe time.
First, understand that there may be a SEVERE performance penalty for
using these interfaces on some platforms. Therefore, you MUST only
use these interfaces if it is absolutely required. %99 of devices can
use the normal APIs without any problems.
Note that for streaming type mappings you must either use these
interfaces, or the dynamic mapping interfaces above. You may not mix
usage of both for the same device. Such an act is illegal and is
guaranteed to put a banana in your tailpipe.
However, consistent mappings may in fact be used in conjunction with
these interfaces. Remember that, as defined, consistent mappings are
always going to be SAC addressable.
The first thing your driver needs to do is query the PCI platform
layer with your devices DAC addressing capabilities:
int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);
This routine behaves identically to pci_set_dma_mask. You may not
use the following interfaces if this routine fails.
Next, DMA addresses using this API are kept track of using the
dma64_addr_t type. It is guaranteed to be big enough to hold any
DAC address the platform layer will give to you from the following
routines. If you have consistent mappings as well, you still
use plain dma_addr_t to keep track of those.
All mappings obtained here will be direct. The mappings are not
translated, and this is the purpose of this dialect of the DMA API.
All routines work with page/offset pairs. This is the _ONLY_ way to
portably refer to any piece of memory. If you have a cpu pointer
(which may be validly DMA'd too) you may easily obtain the page
and offset using something like this:
struct page *page = virt_to_page(ptr);
unsigned long offset = offset_in_page(ptr);
Here are the interfaces:
dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev,
struct page *page,
unsigned long offset,
int direction);
The DAC address for the tuple PAGE/OFFSET are returned. The direction
argument is the same as for pci_{map,unmap}_single(). The same rules
for cpu/device access apply here as for the streaming mapping
interfaces. To reiterate:
The cpu may touch the buffer before pci_dac_page_to_dma.
The device may touch the buffer after pci_dac_page_to_dma
is made, but the cpu may NOT.
When the DMA transfer is complete, invoke:
void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev,
dma64_addr_t dma_addr,
size_t len, int direction);
This must be done before the CPU looks at the buffer again.
This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu().
And likewise, if you wish to let the device get back at the buffer after
the cpu has read/written it, invoke:
void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev,
dma64_addr_t dma_addr,
size_t len, int direction);
before letting the device access the DMA area again.
If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t
the following interfaces are provided:
struct page *pci_dac_dma_to_page(struct pci_dev *pdev,
dma64_addr_t dma_addr);
unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev,
dma64_addr_t dma_addr);
This is possible with the DAC interfaces purely because they are
not translated in any way.
Optimizing Unmap State Space Consumption
On many platforms, pci_unmap_{single,page}() is simply a nop.
Therefore, keeping track of the mapping address and length is a waste
of space. Instead of filling your drivers up with ifdefs and the like
to "work around" this (which would defeat the whole purpose of a
portable API) the following facilities are provided.
Actually, instead of describing the macros one by one, we'll
transform some example code.
1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures.
Example, before:
struct ring_state {
struct sk_buff *skb;
dma_addr_t mapping;
__u32 len;
};
after:
struct ring_state {
struct sk_buff *skb;
DECLARE_PCI_UNMAP_ADDR(mapping)
DECLARE_PCI_UNMAP_LEN(len)
};
NOTE: DO NOT put a semicolon at the end of the DECLARE_*()
macro.
2) Use pci_unmap_{addr,len}_set to set these values.
Example, before:
ringp->mapping = FOO;
ringp->len = BAR;
after:
pci_unmap_addr_set(ringp, mapping, FOO);
pci_unmap_len_set(ringp, len, BAR);
3) Use pci_unmap_{addr,len} to access these values.
Example, before:
pci_unmap_single(pdev, ringp->mapping, ringp->len,
PCI_DMA_FROMDEVICE);
after:
pci_unmap_single(pdev,
pci_unmap_addr(ringp, mapping),
pci_unmap_len(ringp, len),
PCI_DMA_FROMDEVICE);
It really should be self-explanatory. We treat the ADDR and LEN
separately, because it is possible for an implementation to only
need the address in order to perform the unmap operation.
Platform Issues
If you are just writing drivers for Linux and do not maintain
an architecture port for the kernel, you can safely skip down
to "Closing".
1) Struct scatterlist requirements.
Struct scatterlist must contain, at a minimum, the following
members:
struct page *page;
unsigned int offset;
unsigned int length;
The base address is specified by a "page+offset" pair.
Previous versions of struct scatterlist contained a "void *address"
field that was sometimes used instead of page+offset. As of Linux
2.5., page+offset is always used, and the "address" field has been
deleted.
2) More to come...
Handling Errors
DMA address space is limited on some architectures and an allocation
failure can be determined by:
- checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0
- checking the returned dma_addr_t of pci_map_single and pci_map_page
by using pci_dma_mapping_error():
dma_addr_t dma_handle;
dma_handle = pci_map_single(dev, addr, size, direction);
if (pci_dma_mapping_error(dma_handle)) {
/*
* reduce current DMA mapping usage,
* delay and try again later or
* reset driver.
*/
}
Closing
This document, and the API itself, would not be in it's current
form without the feedback and suggestions from numerous individuals.
We would like to specifically mention, in no particular order, the
following people:
Russell King <rmk@arm.linux.org.uk>
Leo Dagum <dagum@barrel.engr.sgi.com>
Ralf Baechle <ralf@oss.sgi.com>
Grant Grundler <grundler@cup.hp.com>
Jay Estabrook <Jay.Estabrook@compaq.com>
Thomas Sailer <sailer@ife.ee.ethz.ch>
Andrea Arcangeli <andrea@suse.de>
Jens Axboe <axboe@suse.de>
David Mosberger-Tang <davidm@hpl.hp.com>

View file

@ -0,0 +1,195 @@
###
# This makefile is used to generate the kernel documentation,
# primarily based on in-line comments in various source files.
# See Documentation/kernel-doc-nano-HOWTO.txt for instruction in how
# to ducument the SRC - and how to read it.
# To add a new book the only step required is to add the book to the
# list of DOCBOOKS.
DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
kernel-hacking.xml kernel-locking.xml via-audio.xml \
deviceiobook.xml procfs-guide.xml tulip-user.xml \
writing_usb_driver.xml scsidrivers.xml sis900.xml \
kernel-api.xml journal-api.xml lsm.xml usb.xml \
gadget.xml libata.xml mtdnand.xml librs.xml
###
# The build process is as follows (targets):
# (xmldocs)
# file.tmpl --> file.xml +--> file.ps (psdocs)
# +--> file.pdf (pdfdocs)
# +--> DIR=file (htmldocs)
# +--> man/ (mandocs)
###
# The targets that may be used.
.PHONY: xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs
BOOKS := $(addprefix $(obj)/,$(DOCBOOKS))
xmldocs: $(BOOKS)
sgmldocs: xmldocs
PS := $(patsubst %.xml, %.ps, $(BOOKS))
psdocs: $(PS)
PDF := $(patsubst %.xml, %.pdf, $(BOOKS))
pdfdocs: $(PDF)
HTML := $(patsubst %.xml, %.html, $(BOOKS))
htmldocs: $(HTML)
MAN := $(patsubst %.xml, %.9, $(BOOKS))
mandocs: $(MAN)
installmandocs: mandocs
$(MAKEMAN) install Documentation/DocBook/man
###
#External programs used
KERNELDOC = scripts/kernel-doc
DOCPROC = scripts/basic/docproc
SPLITMAN = $(PERL) $(srctree)/scripts/split-man
MAKEMAN = $(PERL) $(srctree)/scripts/makeman
###
# DOCPROC is used for two purposes:
# 1) To generate a dependency list for a .tmpl file
# 2) To preprocess a .tmpl file and call kernel-doc with
# appropriate parameters.
# The following rules are used to generate the .xml documentation
# required to generate the final targets. (ps, pdf, html).
quiet_cmd_docproc = DOCPROC $@
cmd_docproc = SRCTREE=$(srctree)/ $(DOCPROC) doc $< >$@
define rule_docproc
set -e; \
$(if $($(quiet)cmd_$(1)),echo ' $($(quiet)cmd_$(1))';) \
$(cmd_$(1)); \
( \
echo 'cmd_$@ := $(cmd_$(1))'; \
echo $@: `SRCTREE=$(srctree) $(DOCPROC) depend $<`; \
) > $(dir $@).$(notdir $@).cmd
endef
%.xml: %.tmpl FORCE
$(call if_changed_rule,docproc)
###
#Read in all saved dependency files
cmd_files := $(wildcard $(foreach f,$(BOOKS),$(dir $(f)).$(notdir $(f)).cmd))
ifneq ($(cmd_files),)
include $(cmd_files)
endif
###
# Changes in kernel-doc force a rebuild of all documentation
$(BOOKS): $(KERNELDOC)
###
# procfs guide uses a .c file as example code.
# This requires an explicit dependency
C-procfs-example = procfs_example.xml
C-procfs-example2 = $(addprefix $(obj)/,$(C-procfs-example))
$(obj)/procfs-guide.xml: $(C-procfs-example2)
###
# Rules to generate postscript, PDF and HTML
# db2html creates a directory. Generate a html file used for timestamp
quiet_cmd_db2ps = DB2PS $@
cmd_db2ps = db2ps -o $(dir $@) $<
%.ps : %.xml
@(which db2ps > /dev/null 2>&1) || \
(echo "*** You need to install DocBook stylesheets ***"; \
exit 1)
$(call cmd,db2ps)
quiet_cmd_db2pdf = DB2PDF $@
cmd_db2pdf = db2pdf -o $(dir $@) $<
%.pdf : %.xml
@(which db2pdf > /dev/null 2>&1) || \
(echo "*** You need to install DocBook stylesheets ***"; \
exit 1)
$(call cmd,db2pdf)
quiet_cmd_db2html = DB2HTML $@
cmd_db2html = db2html -o $(patsubst %.html,%,$@) $< && \
echo '<a HREF="$(patsubst %.html,%,$(notdir $@))/book1.html"> \
Goto $(patsubst %.html,%,$(notdir $@))</a><p>' > $@
%.html: %.xml
@(which db2html > /dev/null 2>&1) || \
(echo "*** You need to install DocBook stylesheets ***"; \
exit 1)
@rm -rf $@ $(patsubst %.html,%,$@)
$(call cmd,db2html)
@if [ ! -z "$(PNG-$(basename $(notdir $@)))" ]; then \
cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi
###
# Rule to generate man files - output is placed in the man subdirectory
%.9: %.xml
ifneq ($(KBUILD_SRC),)
$(Q)mkdir -p $(objtree)/Documentation/DocBook/man
endif
$(SPLITMAN) $< $(objtree)/Documentation/DocBook/man "$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)"
$(MAKEMAN) convert $(objtree)/Documentation/DocBook/man $<
###
# Rules to generate postscripts and PNG imgages from .fig format files
quiet_cmd_fig2eps = FIG2EPS $@
cmd_fig2eps = fig2dev -Leps $< $@
%.eps: %.fig
@(which fig2dev > /dev/null 2>&1) || \
(echo "*** You need to install transfig ***"; \
exit 1)
$(call cmd,fig2eps)
quiet_cmd_fig2png = FIG2PNG $@
cmd_fig2png = fig2dev -Lpng $< $@
%.png: %.fig
@(which fig2dev > /dev/null 2>&1) || \
(echo "*** You need to install transfig ***"; \
exit 1)
$(call cmd,fig2png)
###
# Rule to convert a .c file to inline XML documentation
%.xml: %.c
@echo ' GEN $@'
@( \
echo "<programlisting>"; \
expand --tabs=8 < $< | \
sed -e "s/&/\\&amp;/g" \
-e "s/</\\&lt;/g" \
-e "s/>/\\&gt;/g"; \
echo "</programlisting>") > $@
###
# Help targets as used by the top-level makefile
dochelp:
@echo ' Linux kernel internal documentation in different formats:'
@echo ' xmldocs (XML DocBook), psdocs (Postscript), pdfdocs (PDF)'
@echo ' htmldocs (HTML), mandocs (man pages, use installmandocs to install)'
###
# Temporary files left by various tools
clean-files := $(DOCBOOKS) \
$(patsubst %.xml, %.dvi, $(DOCBOOKS)) \
$(patsubst %.xml, %.aux, $(DOCBOOKS)) \
$(patsubst %.xml, %.tex, $(DOCBOOKS)) \
$(patsubst %.xml, %.log, $(DOCBOOKS)) \
$(patsubst %.xml, %.out, $(DOCBOOKS)) \
$(patsubst %.xml, %.ps, $(DOCBOOKS)) \
$(patsubst %.xml, %.pdf, $(DOCBOOKS)) \
$(patsubst %.xml, %.html, $(DOCBOOKS)) \
$(patsubst %.xml, %.9, $(DOCBOOKS)) \
$(C-procfs-example)
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS))
#man put files in man subdir - traverse down
subdir- := man/

View file

@ -0,0 +1,341 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="DoingIO">
<bookinfo>
<title>Bus-Independent Device Accesses</title>
<authorgroup>
<author>
<firstname>Matthew</firstname>
<surname>Wilcox</surname>
<affiliation>
<address>
<email>matthew@wil.cx</email>
</address>
</affiliation>
</author>
</authorgroup>
<authorgroup>
<author>
<firstname>Alan</firstname>
<surname>Cox</surname>
<affiliation>
<address>
<email>alan@redhat.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2001</year>
<holder>Matthew Wilcox</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
Linux provides an API which abstracts performing IO across all busses
and devices, allowing device drivers to be written independently of
bus type.
</para>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
None.
</para>
</chapter>
<chapter id="mmio">
<title>Memory Mapped IO</title>
<sect1>
<title>Getting Access to the Device</title>
<para>
The most widely supported form of IO is memory mapped IO.
That is, a part of the CPU's address space is interpreted
not as accesses to memory, but as accesses to a device. Some
architectures define devices to be at a fixed address, but most
have some method of discovering devices. The PCI bus walk is a
good example of such a scheme. This document does not cover how
to receive such an address, but assumes you are starting with one.
Physical addresses are of type unsigned long.
</para>
<para>
This address should not be used directly. Instead, to get an
address suitable for passing to the accessor functions described
below, you should call <function>ioremap</function>.
An address suitable for accessing the device will be returned to you.
</para>
<para>
After you've finished using the device (say, in your module's
exit routine), call <function>iounmap</function> in order to return
the address space to the kernel. Most architectures allocate new
address space each time you call <function>ioremap</function>, and
they can run out unless you call <function>iounmap</function>.
</para>
</sect1>
<sect1>
<title>Accessing the device</title>
<para>
The part of the interface most used by drivers is reading and
writing memory-mapped registers on the device. Linux provides
interfaces to read and write 8-bit, 16-bit, 32-bit and 64-bit
quantities. Due to a historical accident, these are named byte,
word, long and quad accesses. Both read and write accesses are
supported; there is no prefetch support at this time.
</para>
<para>
The functions are named <function>readb</function>,
<function>readw</function>, <function>readl</function>,
<function>readq</function>, <function>readb_relaxed</function>,
<function>readw_relaxed</function>, <function>readl_relaxed</function>,
<function>readq_relaxed</function>, <function>writeb</function>,
<function>writew</function>, <function>writel</function> and
<function>writeq</function>.
</para>
<para>
Some devices (such as framebuffers) would like to use larger
transfers than 8 bytes at a time. For these devices, the
<function>memcpy_toio</function>, <function>memcpy_fromio</function>
and <function>memset_io</function> functions are provided.
Do not use memset or memcpy on IO addresses; they
are not guaranteed to copy data in order.
</para>
<para>
The read and write functions are defined to be ordered. That is the
compiler is not permitted to reorder the I/O sequence. When the
ordering can be compiler optimised, you can use <function>
__readb</function> and friends to indicate the relaxed ordering. Use
this with care.
</para>
<para>
While the basic functions are defined to be synchronous with respect
to each other and ordered with respect to each other the busses the
devices sit on may themselves have asynchronicity. In particular many
authors are burned by the fact that PCI bus writes are posted
asynchronously. A driver author must issue a read from the same
device to ensure that writes have occurred in the specific cases the
author cares. This kind of property cannot be hidden from driver
writers in the API. In some cases, the read used to flush the device
may be expected to fail (if the card is resetting, for example). In
that case, the read should be done from config space, which is
guaranteed to soft-fail if the card doesn't respond.
</para>
<para>
The following is an example of flushing a write to a device when
the driver would like to ensure the write's effects are visible prior
to continuing execution.
</para>
<programlisting>
static inline void
qla1280_disable_intrs(struct scsi_qla_host *ha)
{
struct device_reg *reg;
reg = ha->iobase;
/* disable risc and host interrupts */
WRT_REG_WORD(&amp;reg->ictrl, 0);
/*
* The following read will ensure that the above write
* has been received by the device before we return from this
* function.
*/
RD_REG_WORD(&amp;reg->ictrl);
ha->flags.ints_enabled = 0;
}
</programlisting>
<para>
In addition to write posting, on some large multiprocessing systems
(e.g. SGI Challenge, Origin and Altix machines) posted writes won't
be strongly ordered coming from different CPUs. Thus it's important
to properly protect parts of your driver that do memory-mapped writes
with locks and use the <function>mmiowb</function> to make sure they
arrive in the order intended. Issuing a regular <function>readX
</function> will also ensure write ordering, but should only be used
when the driver has to be sure that the write has actually arrived
at the device (not that it's simply ordered with respect to other
writes), since a full <function>readX</function> is a relatively
expensive operation.
</para>
<para>
Generally, one should use <function>mmiowb</function> prior to
releasing a spinlock that protects regions using <function>writeb
</function> or similar functions that aren't surrounded by <function>
readb</function> calls, which will ensure ordering and flushing. The
following pseudocode illustrates what might occur if write ordering
isn't guaranteed via <function>mmiowb</function> or one of the
<function>readX</function> functions.
</para>
<programlisting>
CPU A: spin_lock_irqsave(&amp;dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&amp;dev_lock, flags)
...
CPU B: spin_lock_irqsave(&amp;dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: spin_unlock_irqrestore(&amp;dev_lock, flags)
</programlisting>
<para>
In the case above, newval2 could be written to ring_ptr before
newval. Fixing it is easy though:
</para>
<programlisting>
CPU A: spin_lock_irqsave(&amp;dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: mmiowb(); /* ensure no other writes beat us to the device */
CPU A: spin_unlock_irqrestore(&amp;dev_lock, flags)
...
CPU B: spin_lock_irqsave(&amp;dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: mmiowb();
CPU B: spin_unlock_irqrestore(&amp;dev_lock, flags)
</programlisting>
<para>
See tg3.c for a real world example of how to use <function>mmiowb
</function>
</para>
<para>
PCI ordering rules also guarantee that PIO read responses arrive
after any outstanding DMA writes from that bus, since for some devices
the result of a <function>readb</function> call may signal to the
driver that a DMA transaction is complete. In many cases, however,
the driver may want to indicate that the next
<function>readb</function> call has no relation to any previous DMA
writes performed by the device. The driver can use
<function>readb_relaxed</function> for these cases, although only
some platforms will honor the relaxed semantics. Using the relaxed
read functions will provide significant performance benefits on
platforms that support it. The qla2xxx driver provides examples
of how to use <function>readX_relaxed</function>. In many cases,
a majority of the driver's <function>readX</function> calls can
safely be converted to <function>readX_relaxed</function> calls, since
only a few will indicate or depend on DMA completion.
</para>
</sect1>
<sect1>
<title>ISA legacy functions</title>
<para>
On older kernels (2.2 and earlier) the ISA bus could be read or
written with these functions and without ioremap being used. This is
no longer true in Linux 2.4. A set of equivalent functions exist for
easy legacy driver porting. The functions available are prefixed
with 'isa_' and are <function>isa_readb</function>,
<function>isa_writeb</function>, <function>isa_readw</function>,
<function>isa_writew</function>, <function>isa_readl</function>,
<function>isa_writel</function>, <function>isa_memcpy_fromio</function>
and <function>isa_memcpy_toio</function>
</para>
<para>
These functions should not be used in new drivers, and will
eventually be going away.
</para>
</sect1>
</chapter>
<chapter>
<title>Port Space Accesses</title>
<sect1>
<title>Port Space Explained</title>
<para>
Another form of IO commonly supported is Port Space. This is a
range of addresses separate to the normal memory address space.
Access to these addresses is generally not as fast as accesses
to the memory mapped addresses, and it also has a potentially
smaller address space.
</para>
<para>
Unlike memory mapped IO, no preparation is required
to access port space.
</para>
</sect1>
<sect1>
<title>Accessing Port Space</title>
<para>
Accesses to this space are provided through a set of functions
which allow 8-bit, 16-bit and 32-bit accesses; also
known as byte, word and long. These functions are
<function>inb</function>, <function>inw</function>,
<function>inl</function>, <function>outb</function>,
<function>outw</function> and <function>outl</function>.
</para>
<para>
Some variants are provided for these functions. Some devices
require that accesses to their ports are slowed down. This
functionality is provided by appending a <function>_p</function>
to the end of the function. There are also equivalents to memcpy.
The <function>ins</function> and <function>outs</function>
functions copy bytes, words or longs to the given port.
</para>
</sect1>
</chapter>
<chapter id="pubfunctions">
<title>Public Functions Provided</title>
!Einclude/asm-i386/io.h
</chapter>
</book>

View file

@ -0,0 +1,752 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="USB-Gadget-API">
<bookinfo>
<title>USB Gadget API for Linux</title>
<date>20 August 2004</date>
<edition>20 August 2004</edition>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
<copyright>
<year>2003-2004</year>
<holder>David Brownell</holder>
</copyright>
<author>
<firstname>David</firstname>
<surname>Brownell</surname>
<affiliation>
<address><email>dbrownell@users.sourceforge.net</email></address>
</affiliation>
</author>
</bookinfo>
<toc></toc>
<chapter><title>Introduction</title>
<para>This document presents a Linux-USB "Gadget"
kernel mode
API, for use within peripherals and other USB devices
that embed Linux.
It provides an overview of the API structure,
and shows how that fits into a system development project.
This is the first such API released on Linux to address
a number of important problems, including: </para>
<itemizedlist>
<listitem><para>Supports USB 2.0, for high speed devices which
can stream data at several dozen megabytes per second.
</para></listitem>
<listitem><para>Handles devices with dozens of endpoints just as
well as ones with just two fixed-function ones. Gadget drivers
can be written so they're easy to port to new hardware.
</para></listitem>
<listitem><para>Flexible enough to expose more complex USB device
capabilities such as multiple configurations, multiple interfaces,
composite devices,
and alternate interface settings.
</para></listitem>
<listitem><para>USB "On-The-Go" (OTG) support, in conjunction
with updates to the Linux-USB host side.
</para></listitem>
<listitem><para>Sharing data structures and API models with the
Linux-USB host side API. This helps the OTG support, and
looks forward to more-symmetric frameworks (where the same
I/O model is used by both host and device side drivers).
</para></listitem>
<listitem><para>Minimalist, so it's easier to support new device
controller hardware. I/O processing doesn't imply large
demands for memory or CPU resources.
</para></listitem>
</itemizedlist>
<para>Most Linux developers will not be able to use this API, since they
have USB "host" hardware in a PC, workstation, or server.
Linux users with embedded systems are more likely to
have USB peripheral hardware.
To distinguish drivers running inside such hardware from the
more familiar Linux "USB device drivers",
which are host side proxies for the real USB devices,
a different term is used:
the drivers inside the peripherals are "USB gadget drivers".
In USB protocol interactions, the device driver is the master
(or "client driver")
and the gadget driver is the slave (or "function driver").
</para>
<para>The gadget API resembles the host side Linux-USB API in that both
use queues of request objects to package I/O buffers, and those requests
may be submitted or canceled.
They share common definitions for the standard USB
<emphasis>Chapter 9</emphasis> messages, structures, and constants.
Also, both APIs bind and unbind drivers to devices.
The APIs differ in detail, since the host side's current
URB framework exposes a number of implementation details
and assumptions that are inappropriate for a gadget API.
While the model for control transfers and configuration
management is necessarily different (one side is a hardware-neutral master,
the other is a hardware-aware slave), the endpoint I/0 API used here
should also be usable for an overhead-reduced host side API.
</para>
</chapter>
<chapter id="structure"><title>Structure of Gadget Drivers</title>
<para>A system running inside a USB peripheral
normally has at least three layers inside the kernel to handle
USB protocol processing, and may have additional layers in
user space code.
The "gadget" API is used by the middle layer to interact
with the lowest level (which directly handles hardware).
</para>
<para>In Linux, from the bottom up, these layers are:
</para>
<variablelist>
<varlistentry>
<term><emphasis>USB Controller Driver</emphasis></term>
<listitem>
<para>This is the lowest software level.
It is the only layer that talks to hardware,
through registers, fifos, dma, irqs, and the like.
The <filename>&lt;linux/usb_gadget.h&gt;</filename> API abstracts
the peripheral controller endpoint hardware.
That hardware is exposed through endpoint objects, which accept
streams of IN/OUT buffers, and through callbacks that interact
with gadget drivers.
Since normal USB devices only have one upstream
port, they only have one of these drivers.
The controller driver can support any number of different
gadget drivers, but only one of them can be used at a time.
</para>
<para>Examples of such controller hardware include
the PCI-based NetChip 2280 USB 2.0 high speed controller,
the SA-11x0 or PXA-25x UDC (found within many PDAs),
and a variety of other products.
</para>
</listitem></varlistentry>
<varlistentry>
<term><emphasis>Gadget Driver</emphasis></term>
<listitem>
<para>The lower boundary of this driver implements hardware-neutral
USB functions, using calls to the controller driver.
Because such hardware varies widely in capabilities and restrictions,
and is used in embedded environments where space is at a premium,
the gadget driver is often configured at compile time
to work with endpoints supported by one particular controller.
Gadget drivers may be portable to several different controllers,
using conditional compilation.
(Recent kernels substantially simplify the work involved in
supporting new hardware, by <emphasis>autoconfiguring</emphasis>
endpoints automatically for many bulk-oriented drivers.)
Gadget driver responsibilities include:
</para>
<itemizedlist>
<listitem><para>handling setup requests (ep0 protocol responses)
possibly including class-specific functionality
</para></listitem>
<listitem><para>returning configuration and string descriptors
</para></listitem>
<listitem><para>(re)setting configurations and interface
altsettings, including enabling and configuring endpoints
</para></listitem>
<listitem><para>handling life cycle events, such as managing
bindings to hardware,
USB suspend/resume, remote wakeup,
and disconnection from the USB host.
</para></listitem>
<listitem><para>managing IN and OUT transfers on all currently
enabled endpoints
</para></listitem>
</itemizedlist>
<para>
Such drivers may be modules of proprietary code, although
that approach is discouraged in the Linux community.
</para>
</listitem></varlistentry>
<varlistentry>
<term><emphasis>Upper Level</emphasis></term>
<listitem>
<para>Most gadget drivers have an upper boundary that connects
to some Linux driver or framework in Linux.
Through that boundary flows the data which the gadget driver
produces and/or consumes through protocol transfers over USB.
Examples include:
</para>
<itemizedlist>
<listitem><para>user mode code, using generic (gadgetfs)
or application specific files in
<filename>/dev</filename>
</para></listitem>
<listitem><para>networking subsystem (for network gadgets,
like the CDC Ethernet Model gadget driver)
</para></listitem>
<listitem><para>data capture drivers, perhaps video4Linux or
a scanner driver; or test and measurement hardware.
</para></listitem>
<listitem><para>input subsystem (for HID gadgets)
</para></listitem>
<listitem><para>sound subsystem (for audio gadgets)
</para></listitem>
<listitem><para>file system (for PTP gadgets)
</para></listitem>
<listitem><para>block i/o subsystem (for usb-storage gadgets)
</para></listitem>
<listitem><para>... and more </para></listitem>
</itemizedlist>
</listitem></varlistentry>
<varlistentry>
<term><emphasis>Additional Layers</emphasis></term>
<listitem>
<para>Other layers may exist.
These could include kernel layers, such as network protocol stacks,
as well as user mode applications building on standard POSIX
system call APIs such as
<emphasis>open()</emphasis>, <emphasis>close()</emphasis>,
<emphasis>read()</emphasis> and <emphasis>write()</emphasis>.
On newer systems, POSIX Async I/O calls may be an option.
Such user mode code will not necessarily be subject to
the GNU General Public License (GPL).
</para>
</listitem></varlistentry>
</variablelist>
<para>OTG-capable systems will also need to include a standard Linux-USB
host side stack,
with <emphasis>usbcore</emphasis>,
one or more <emphasis>Host Controller Drivers</emphasis> (HCDs),
<emphasis>USB Device Drivers</emphasis> to support
the OTG "Targeted Peripheral List",
and so forth.
There will also be an <emphasis>OTG Controller Driver</emphasis>,
which is visible to gadget and device driver developers only indirectly.
That helps the host and device side USB controllers implement the
two new OTG protocols (HNP and SRP).
Roles switch (host to peripheral, or vice versa) using HNP
during USB suspend processing, and SRP can be viewed as a
more battery-friendly kind of device wakeup protocol.
</para>
<para>Over time, reusable utilities are evolving to help make some
gadget driver tasks simpler.
For example, building configuration descriptors from vectors of
descriptors for the configurations interfaces and endpoints is
now automated, and many drivers now use autoconfiguration to
choose hardware endpoints and initialize their descriptors.
A potential example of particular interest
is code implementing standard USB-IF protocols for
HID, networking, storage, or audio classes.
Some developers are interested in KDB or KGDB hooks, to let
target hardware be remotely debugged.
Most such USB protocol code doesn't need to be hardware-specific,
any more than network protocols like X11, HTTP, or NFS are.
Such gadget-side interface drivers should eventually be combined,
to implement composite devices.
</para>
</chapter>
<chapter id="api"><title>Kernel Mode Gadget API</title>
<para>Gadget drivers declare themselves through a
<emphasis>struct usb_gadget_driver</emphasis>, which is responsible for
most parts of enumeration for a <emphasis>struct usb_gadget</emphasis>.
The response to a set_configuration usually involves
enabling one or more of the <emphasis>struct usb_ep</emphasis> objects
exposed by the gadget, and submitting one or more
<emphasis>struct usb_request</emphasis> buffers to transfer data.
Understand those four data types, and their operations, and
you will understand how this API works.
</para>
<note><title>Incomplete Data Type Descriptions</title>
<para>This documentation was prepared using the standard Linux
kernel <filename>docproc</filename> tool, which turns text
and in-code comments into SGML DocBook and then into usable
formats such as HTML or PDF.
Other than the "Chapter 9" data types, most of the significant
data types and functions are described here.
</para>
<para>However, docproc does not understand all the C constructs
that are used, so some relevant information is likely omitted from
what you are reading.
One example of such information is endpoint autoconfiguration.
You'll have to read the header file, and use example source
code (such as that for "Gadget Zero"), to fully understand the API.
</para>
<para>The part of the API implementing some basic
driver capabilities is specific to the version of the
Linux kernel that's in use.
The 2.6 kernel includes a <emphasis>driver model</emphasis>
framework that has no analogue on earlier kernels;
so those parts of the gadget API are not fully portable.
(They are implemented on 2.4 kernels, but in a different way.)
The driver model state is another part of this API that is
ignored by the kerneldoc tools.
</para>
</note>
<para>The core API does not expose
every possible hardware feature, only the most widely available ones.
There are significant hardware features, such as device-to-device DMA
(without temporary storage in a memory buffer)
that would be added using hardware-specific APIs.
</para>
<para>This API allows drivers to use conditional compilation to handle
endpoint capabilities of different hardware, but doesn't require that.
Hardware tends to have arbitrary restrictions, relating to
transfer types, addressing, packet sizes, buffering, and availability.
As a rule, such differences only matter for "endpoint zero" logic
that handles device configuration and management.
The API supports limited run-time
detection of capabilities, through naming conventions for endpoints.
Many drivers will be able to at least partially autoconfigure
themselves.
In particular, driver init sections will often have endpoint
autoconfiguration logic that scans the hardware's list of endpoints
to find ones matching the driver requirements
(relying on those conventions), to eliminate some of the most
common reasons for conditional compilation.
</para>
<para>Like the Linux-USB host side API, this API exposes
the "chunky" nature of USB messages: I/O requests are in terms
of one or more "packets", and packet boundaries are visible to drivers.
Compared to RS-232 serial protocols, USB resembles
synchronous protocols like HDLC
(N bytes per frame, multipoint addressing, host as the primary
station and devices as secondary stations)
more than asynchronous ones
(tty style: 8 data bits per frame, no parity, one stop bit).
So for example the controller drivers won't buffer
two single byte writes into a single two-byte USB IN packet,
although gadget drivers may do so when they implement
protocols where packet boundaries (and "short packets")
are not significant.
</para>
<sect1 id="lifecycle"><title>Driver Life Cycle</title>
<para>Gadget drivers make endpoint I/O requests to hardware without
needing to know many details of the hardware, but driver
setup/configuration code needs to handle some differences.
Use the API like this:
</para>
<orderedlist numeration='arabic'>
<listitem><para>Register a driver for the particular device side
usb controller hardware,
such as the net2280 on PCI (USB 2.0),
sa11x0 or pxa25x as found in Linux PDAs,
and so on.
At this point the device is logically in the USB ch9 initial state
("attached"), drawing no power and not usable
(since it does not yet support enumeration).
Any host should not see the device, since it's not
activated the data line pullup used by the host to
detect a device, even if VBUS power is available.
</para></listitem>
<listitem><para>Register a gadget driver that implements some higher level
device function. That will then bind() to a usb_gadget, which
activates the data line pullup sometime after detecting VBUS.
</para></listitem>
<listitem><para>The hardware driver can now start enumerating.
The steps it handles are to accept USB power and set_address requests.
Other steps are handled by the gadget driver.
If the gadget driver module is unloaded before the host starts to
enumerate, steps before step 7 are skipped.
</para></listitem>
<listitem><para>The gadget driver's setup() call returns usb descriptors,
based both on what the bus interface hardware provides and on the
functionality being implemented.
That can involve alternate settings or configurations,
unless the hardware prevents such operation.
For OTG devices, each configuration descriptor includes
an OTG descriptor.
</para></listitem>
<listitem><para>The gadget driver handles the last step of enumeration,
when the USB host issues a set_configuration call.
It enables all endpoints used in that configuration,
with all interfaces in their default settings.
That involves using a list of the hardware's endpoints, enabling each
endpoint according to its descriptor.
It may also involve using <function>usb_gadget_vbus_draw</function>
to let more power be drawn from VBUS, as allowed by that configuration.
For OTG devices, setting a configuration may also involve reporting
HNP capabilities through a user interface.
</para></listitem>
<listitem><para>Do real work and perform data transfers, possibly involving
changes to interface settings or switching to new configurations, until the
device is disconnect()ed from the host.
Queue any number of transfer requests to each endpoint.
It may be suspended and resumed several times before being disconnected.
On disconnect, the drivers go back to step 3 (above).
</para></listitem>
<listitem><para>When the gadget driver module is being unloaded,
the driver unbind() callback is issued. That lets the controller
driver be unloaded.
</para></listitem>
</orderedlist>
<para>Drivers will normally be arranged so that just loading the
gadget driver module (or statically linking it into a Linux kernel)
allows the peripheral device to be enumerated, but some drivers
will defer enumeration until some higher level component (like
a user mode daemon) enables it.
Note that at this lowest level there are no policies about how
ep0 configuration logic is implemented,
except that it should obey USB specifications.
Such issues are in the domain of gadget drivers,
including knowing about implementation constraints
imposed by some USB controllers
or understanding that composite devices might happen to
be built by integrating reusable components.
</para>
<para>Note that the lifecycle above can be slightly different
for OTG devices.
Other than providing an additional OTG descriptor in each
configuration, only the HNP-related differences are particularly
visible to driver code.
They involve reporting requirements during the SET_CONFIGURATION
request, and the option to invoke HNP during some suspend callbacks.
Also, SRP changes the semantics of
<function>usb_gadget_wakeup</function>
slightly.
</para>
</sect1>
<sect1 id="ch9"><title>USB 2.0 Chapter 9 Types and Constants</title>
<para>Gadget drivers
rely on common USB structures and constants
defined in the
<filename>&lt;linux/usb_ch9.h&gt;</filename>
header file, which is standard in Linux 2.6 kernels.
These are the same types and constants used by host
side drivers (and usbcore).
</para>
!Iinclude/linux/usb_ch9.h
</sect1>
<sect1 id="core"><title>Core Objects and Methods</title>
<para>These are declared in
<filename>&lt;linux/usb_gadget.h&gt;</filename>,
and are used by gadget drivers to interact with
USB peripheral controller drivers.
</para>
<!-- yeech, this is ugly in nsgmls PDF output.
the PDF bookmark and refentry output nesting is wrong,
and the member/argument documentation indents ugly.
plus something (docproc?) adds whitespace before the
descriptive paragraph text, so it can't line up right
unless the explanations are trivial.
-->
!Iinclude/linux/usb_gadget.h
</sect1>
<sect1 id="utils"><title>Optional Utilities</title>
<para>The core API is sufficient for writing a USB Gadget Driver,
but some optional utilities are provided to simplify common tasks.
These utilities include endpoint autoconfiguration.
</para>
!Edrivers/usb/gadget/usbstring.c
!Edrivers/usb/gadget/config.c
<!-- !Edrivers/usb/gadget/epautoconf.c -->
</sect1>
</chapter>
<chapter id="controllers"><title>Peripheral Controller Drivers</title>
<para>The first hardware supporting this API was the NetChip 2280
controller, which supports USB 2.0 high speed and is based on PCI.
This is the <filename>net2280</filename> driver module.
The driver supports Linux kernel versions 2.4 and 2.6;
contact NetChip Technologies for development boards and product
information.
</para>
<para>Other hardware working in the "gadget" framework includes:
Intel's PXA 25x and IXP42x series processors
(<filename>pxa2xx_udc</filename>),
Toshiba TC86c001 "Goku-S" (<filename>goku_udc</filename>),
Renesas SH7705/7727 (<filename>sh_udc</filename>),
MediaQ 11xx (<filename>mq11xx_udc</filename>),
Hynix HMS30C7202 (<filename>h7202_udc</filename>),
National 9303/4 (<filename>n9604_udc</filename>),
Texas Instruments OMAP (<filename>omap_udc</filename>),
Sharp LH7A40x (<filename>lh7a40x_udc</filename>),
and more.
Most of those are full speed controllers.
</para>
<para>At this writing, there are people at work on drivers in
this framework for several other USB device controllers,
with plans to make many of them be widely available.
</para>
<!-- !Edrivers/usb/gadget/net2280.c -->
<para>A partial USB simulator,
the <filename>dummy_hcd</filename> driver, is available.
It can act like a net2280, a pxa25x, or an sa11x0 in terms
of available endpoints and device speeds; and it simulates
control, bulk, and to some extent interrupt transfers.
That lets you develop some parts of a gadget driver on a normal PC,
without any special hardware, and perhaps with the assistance
of tools such as GDB running with User Mode Linux.
At least one person has expressed interest in adapting that
approach, hooking it up to a simulator for a microcontroller.
Such simulators can help debug subsystems where the runtime hardware
is unfriendly to software development, or is not yet available.
</para>
<para>Support for other controllers is expected to be developed
and contributed
over time, as this driver framework evolves.
</para>
</chapter>
<chapter id="gadget"><title>Gadget Drivers</title>
<para>In addition to <emphasis>Gadget Zero</emphasis>
(used primarily for testing and development with drivers
for usb controller hardware), other gadget drivers exist.
</para>
<para>There's an <emphasis>ethernet</emphasis> gadget
driver, which implements one of the most useful
<emphasis>Communications Device Class</emphasis> (CDC) models.
One of the standards for cable modem interoperability even
specifies the use of this ethernet model as one of two
mandatory options.
Gadgets using this code look to a USB host as if they're
an Ethernet adapter.
It provides access to a network where the gadget's CPU is one host,
which could easily be bridging, routing, or firewalling
access to other networks.
Since some hardware can't fully implement the CDC Ethernet
requirements, this driver also implements a "good parts only"
subset of CDC Ethernet.
(That subset doesn't advertise itself as CDC Ethernet,
to avoid creating problems.)
</para>
<para>Support for Microsoft's <emphasis>RNDIS</emphasis>
protocol has been contributed by Pengutronix and Auerswald GmbH.
This is like CDC Ethernet, but it runs on more slightly USB hardware
(but less than the CDC subset).
However, its main claim to fame is being able to connect directly to
recent versions of Windows, using drivers that Microsoft bundles
and supports, making it much simpler to network with Windows.
</para>
<para>There is also support for user mode gadget drivers,
using <emphasis>gadgetfs</emphasis>.
This provides a <emphasis>User Mode API</emphasis> that presents
each endpoint as a single file descriptor. I/O is done using
normal <emphasis>read()</emphasis> and <emphasis>read()</emphasis> calls.
Familiar tools like GDB and pthreads can be used to
develop and debug user mode drivers, so that once a robust
controller driver is available many applications for it
won't require new kernel mode software.
Linux 2.6 <emphasis>Async I/O (AIO)</emphasis>
support is available, so that user mode software
can stream data with only slightly more overhead
than a kernel driver.
</para>
<para>There's a USB Mass Storage class driver, which provides
a different solution for interoperability with systems such
as MS-Windows and MacOS.
That <emphasis>File-backed Storage</emphasis> driver uses a
file or block device as backing store for a drive,
like the <filename>loop</filename> driver.
The USB host uses the BBB, CB, or CBI versions of the mass
storage class specification, using transparent SCSI commands
to access the data from the backing store.
</para>
<para>There's a "serial line" driver, useful for TTY style
operation over USB.
The latest version of that driver supports CDC ACM style
operation, like a USB modem, and so on most hardware it can
interoperate easily with MS-Windows.
One interesting use of that driver is in boot firmware (like a BIOS),
which can sometimes use that model with very small systems without
real serial lines.
</para>
<para>Support for other kinds of gadget is expected to
be developed and contributed
over time, as this driver framework evolves.
</para>
</chapter>
<chapter id="otg"><title>USB On-The-GO (OTG)</title>
<para>USB OTG support on Linux 2.6 was initially developed
by Texas Instruments for
<ulink url="http://www.omap.com">OMAP</ulink> 16xx and 17xx
series processors.
Other OTG systems should work in similar ways, but the
hardware level details could be very different.
</para>
<para>Systems need specialized hardware support to implement OTG,
notably including a special <emphasis>Mini-AB</emphasis> jack
and associated transciever to support <emphasis>Dual-Role</emphasis>
operation:
they can act either as a host, using the standard
Linux-USB host side driver stack,
or as a peripheral, using this "gadget" framework.
To do that, the system software relies on small additions
to those programming interfaces,
and on a new internal component (here called an "OTG Controller")
affecting which driver stack connects to the OTG port.
In each role, the system can re-use the existing pool of
hardware-neutral drivers, layered on top of the controller
driver interfaces (<emphasis>usb_bus</emphasis> or
<emphasis>usb_gadget</emphasis>).
Such drivers need at most minor changes, and most of the calls
added to support OTG can also benefit non-OTG products.
</para>
<itemizedlist>
<listitem><para>Gadget drivers test the <emphasis>is_otg</emphasis>
flag, and use it to determine whether or not to include
an OTG descriptor in each of their configurations.
</para></listitem>
<listitem><para>Gadget drivers may need changes to support the
two new OTG protocols, exposed in new gadget attributes
such as <emphasis>b_hnp_enable</emphasis> flag.
HNP support should be reported through a user interface
(two LEDs could suffice), and is triggered in some cases
when the host suspends the peripheral.
SRP support can be user-initiated just like remote wakeup,
probably by pressing the same button.
</para></listitem>
<listitem><para>On the host side, USB device drivers need
to be taught to trigger HNP at appropriate moments, using
<function>usb_suspend_device()</function>.
That also conserves battery power, which is useful even
for non-OTG configurations.
</para></listitem>
<listitem><para>Also on the host side, a driver must support the
OTG "Targeted Peripheral List". That's just a whitelist,
used to reject peripherals not supported with a given
Linux OTG host.
<emphasis>This whitelist is product-specific;
each product must modify <filename>otg_whitelist.h</filename>
to match its interoperability specification.
</emphasis>
</para>
<para>Non-OTG Linux hosts, like PCs and workstations,
normally have some solution for adding drivers, so that
peripherals that aren't recognized can eventually be supported.
That approach is unreasonable for consumer products that may
never have their firmware upgraded, and where it's usually
unrealistic to expect traditional PC/workstation/server kinds
of support model to work.
For example, it's often impractical to change device firmware
once the product has been distributed, so driver bugs can't
normally be fixed if they're found after shipment.
</para></listitem>
</itemizedlist>
<para>
Additional changes are needed below those hardware-neutral
<emphasis>usb_bus</emphasis> and <emphasis>usb_gadget</emphasis>
driver interfaces; those aren't discussed here in any detail.
Those affect the hardware-specific code for each USB Host or Peripheral
controller, and how the HCD initializes (since OTG can be active only
on a single port).
They also involve what may be called an <emphasis>OTG Controller
Driver</emphasis>, managing the OTG transceiver and the OTG state
machine logic as well as much of the root hub behavior for the
OTG port.
The OTG controller driver needs to activate and deactivate USB
controllers depending on the relevant device role.
Some related changes were needed inside usbcore, so that it
can identify OTG-capable devices and respond appropriately
to HNP or SRP protocols.
</para>
</chapter>
</book>
<!--
vim:syntax=sgml:sw=4
-->

View file

@ -0,0 +1,333 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="LinuxJBDAPI">
<bookinfo>
<title>The Linux Journalling API</title>
<authorgroup>
<author>
<firstname>Roger</firstname>
<surname>Gammans</surname>
<affiliation>
<address>
<email>rgammans@computer-surgery.co.uk</email>
</address>
</affiliation>
</author>
</authorgroup>
<authorgroup>
<author>
<firstname>Stephen</firstname>
<surname>Tweedie</surname>
<affiliation>
<address>
<email>sct@redhat.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2002</year>
<holder>Roger Gammans</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="Overview">
<title>Overview</title>
<sect1>
<title>Details</title>
<para>
The journalling layer is easy to use. You need to
first of all create a journal_t data structure. There are
two calls to do this dependent on how you decide to allocate the physical
media on which the journal resides. The journal_init_inode() call
is for journals stored in filesystem inodes, or the journal_init_dev()
call can be use for journal stored on a raw device (in a continuous range
of blocks). A journal_t is a typedef for a struct pointer, so when
you are finally finished make sure you call journal_destroy() on it
to free up any used kernel memory.
</para>
<para>
Once you have got your journal_t object you need to 'mount' or load the journal
file, unless of course you haven't initialised it yet - in which case you
need to call journal_create().
</para>
<para>
Most of the time however your journal file will already have been created, but
before you load it you must call journal_wipe() to empty the journal file.
Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the
job of the client file system to detect this and skip the call to journal_wipe().
</para>
<para>
In either case the next call should be to journal_load() which prepares the
journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery()
for you if it detects any outstanding transactions in the journal and similarly
journal_load() will call journal_recover() if necessary.
I would advise reading fs/ext3/super.c for examples on this stage.
[RGG: Why is the journal_wipe() call necessary - doesn't this needlessly
complicate the API. Or isn't a good idea for the journal layer to hide
dirty mounts from the client fs]
</para>
<para>
Now you can go ahead and start modifying the underlying
filesystem. Almost.
</para>
<para>
You still need to actually journal your filesystem changes, this
is done by wrapping them into transactions. Additionally you
also need to wrap the modification of each of the the buffers
with calls to the journal layer, so it knows what the modifications
you are actually making are. To do this use journal_start() which
returns a transaction handle.
</para>
<para>
journal_start()
and its counterpart journal_stop(), which indicates the end of a transaction
are nestable calls, so you can reenter a transaction if necessary,
but remember you must call journal_stop() the same number of times as
journal_start() before the transaction is completed (or more accurately
leaves the the update phase). Ext3/VFS makes use of this feature to simplify
quota support.
</para>
<para>
Inside each transaction you need to wrap the modifications to the
individual buffers (blocks). Before you start to modify a buffer you
need to call journal_get_{create,write,undo}_access() as appropriate,
this allows the journalling layer to copy the unmodified data if it
needs to. After all the buffer may be part of a previously uncommitted
transaction.
At this point you are at last ready to modify a buffer, and once
you are have done so you need to call journal_dirty_{meta,}data().
Or if you've asked for access to a buffer you now know is now longer
required to be pushed back on the device you can call journal_forget()
in much the same way as you might have used bforget() in the past.
</para>
<para>
A journal_flush() may be called at any time to commit and checkpoint
all your transactions.
</para>
<para>
Then at umount time , in your put_super() (2.4) or write_super() (2.5)
you can then call journal_destroy() to clean up your in-core journal object.
</para>
<para>
Unfortunately there a couple of ways the journal layer can cause a deadlock.
The first thing to note is that each task can only have
a single outstanding transaction at any one time, remember nothing
commits until the outermost journal_stop(). This means
you must complete the transaction at the end of each file/inode/address
etc. operation you perform, so that the journalling system isn't re-entered
on another journal. Since transactions can't be nested/batched
across differing journals, and another filesystem other than
yours (say ext3) may be modified in a later syscall.
</para>
<para>
The second case to bear in mind is that journal_start() can
block if there isn't enough space in the journal for your transaction
(based on the passed nblocks param) - when it blocks it merely(!) needs to
wait for transactions to complete and be committed from other tasks,
so essentially we are waiting for journal_stop(). So to avoid
deadlocks you must treat journal_start/stop() as if they
were semaphores and include them in your semaphore ordering rules to prevent
deadlocks. Note that journal_extend() has similar blocking behaviour to
journal_start() so you can deadlock here just as easily as on journal_start().
</para>
<para>
Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this transaction.
I advise having a look at at least ext3_jbd.h to see the basis on which
ext3 uses to make these decisions.
</para>
<para>
Another wriggle to watch out for is your on-disk block allocation strategy.
why? Because, if you undo a delete, you need to ensure you haven't reused any
of the freed blocks in a later transaction. One simple way of doing this
is make sure any blocks you allocate only have checkpointed transactions
listed against them. Ext3 does this in ext3_test_allocatable().
</para>
<para>
Lock is also providing through journal_{un,}lock_updates(),
ext3 uses this when it wants a window with a clean and stable fs for a moment.
eg.
</para>
<programlisting>
journal_lock_updates() //stop new stuff happening..
journal_flush() // checkpoint everything.
..do stuff on stable fs
journal_unlock_updates() // carry on with filesystem use.
</programlisting>
<para>
The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing these
calls.
</para>
<para>
A new feature of jbd since 2.5.25 is commit callbacks with the new
journal_callback_set() function you can now ask the journalling layer
to call you back when the transaction is finally committed to disk, so that
you can do some of your own management. The key to this is the journal_callback
struct, this maintains the internal callback information but you can
extend it like this:-
</para>
<programlisting>
struct myfs_callback_s {
//Data structure element required by jbd..
struct journal_callback for_jbd;
// Stuff for myfs allocated together.
myfs_inode* i_commited;
}
</programlisting>
<para>
this would be useful if you needed to know when data was committed to a
particular inode.
</para>
</sect1>
<sect1>
<title>Summary</title>
<para>
Using the journal is a matter of wrapping the different context changes,
being each mount, each modification (transaction) and each changed buffer
to tell the journalling layer about them.
</para>
<para>
Here is a some pseudo code to give you an idea of how it works, as
an example.
</para>
<programlisting>
journal_t* my_jnrl = journal_create();
journal_init_{dev,inode}(jnrl,...)
if (clean) journal_wipe();
journal_load();
foreach(transaction) { /*transactions must be
completed before
a syscall returns to
userspace*/
handle_t * xct=journal_start(my_jnrl);
foreach(bh) {
journal_get_{create,write,undo}_access(xact,bh);
if ( myfs_modify(bh) ) { /* returns true
if makes changes */
journal_dirty_{meta,}data(xact,bh);
} else {
journal_forget(bh);
}
}
journal_stop(xct);
}
journal_destroy(my_jrnl);
</programlisting>
</sect1>
</chapter>
<chapter id="adt">
<title>Data Types</title>
<para>
The journalling layer uses typedefs to 'hide' the concrete definitions
of the structures used. As a client of the JBD layer you can
just rely on the using the pointer as a magic cookie of some sort.
Obviously the hiding is not enforced as this is 'C'.
</para>
<sect1><title>Structures</title>
!Iinclude/linux/jbd.h
</sect1>
</chapter>
<chapter id="calls">
<title>Functions</title>
<para>
The functions here are split into two groups those that
affect a journal as a whole, and those which are used to
manage transactions
</para>
<sect1><title>Journal Level</title>
!Efs/jbd/journal.c
!Efs/jbd/recovery.c
</sect1>
<sect1><title>Transasction Level</title>
!Efs/jbd/transaction.c
</sect1>
</chapter>
<chapter>
<title>See also</title>
<para>
<citation>
<ulink url="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/journal-design.ps.gz">
Journaling the Linux ext2fs Filesystem,LinuxExpo 98, Stephen Tweedie
</ulink>
</citation>
</para>
<para>
<citation>
<ulink url="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html">
Ext3 Journalling FileSystem , OLS 2000, Dr. Stephen Tweedie
</ulink>
</citation>
</para>
</chapter>
</book>

View file

@ -0,0 +1,342 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="LinuxKernelAPI">
<bookinfo>
<title>The Linux Kernel API</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="Basics">
<title>Driver Basics</title>
<sect1><title>Driver Entry and Exit points</title>
!Iinclude/linux/init.h
</sect1>
<sect1><title>Atomic and pointer manipulation</title>
!Iinclude/asm-i386/atomic.h
!Iinclude/asm-i386/unaligned.h
</sect1>
<!-- FIXME:
kernel/sched.c has no docs, which stuffs up the sgml. Comment
out until somebody adds docs. KAO
<sect1><title>Delaying, scheduling, and timer routines</title>
X!Ekernel/sched.c
</sect1>
KAO -->
</chapter>
<chapter id="adt">
<title>Data Types</title>
<sect1><title>Doubly Linked Lists</title>
!Iinclude/linux/list.h
</sect1>
</chapter>
<chapter id="libc">
<title>Basic C Library Functions</title>
<para>
When writing drivers, you cannot in general use routines which are
from the C Library. Some of the functions have been found generally
useful and they are listed below. The behaviour of these functions
may vary slightly from those defined by ANSI, and these deviations
are noted in the text.
</para>
<sect1><title>String Conversions</title>
!Ilib/vsprintf.c
!Elib/vsprintf.c
</sect1>
<sect1><title>String Manipulation</title>
!Ilib/string.c
!Elib/string.c
</sect1>
<sect1><title>Bit Operations</title>
!Iinclude/asm-i386/bitops.h
</sect1>
</chapter>
<chapter id="mm">
<title>Memory Management in Linux</title>
<sect1><title>The Slab Cache</title>
!Emm/slab.c
</sect1>
<sect1><title>User Space Memory Access</title>
!Iinclude/asm-i386/uaccess.h
!Iarch/i386/lib/usercopy.c
</sect1>
</chapter>
<chapter id="kfifo">
<title>FIFO Buffer</title>
<sect1><title>kfifo interface</title>
!Iinclude/linux/kfifo.h
!Ekernel/kfifo.c
</sect1>
</chapter>
<chapter id="proc">
<title>The proc filesystem</title>
<sect1><title>sysctl interface</title>
!Ekernel/sysctl.c
</sect1>
</chapter>
<chapter id="debugfs">
<title>The debugfs filesystem</title>
<sect1><title>debugfs interface</title>
!Efs/debugfs/inode.c
!Efs/debugfs/file.c
</sect1>
</chapter>
<chapter id="vfs">
<title>The Linux VFS</title>
<sect1><title>The Directory Cache</title>
!Efs/dcache.c
!Iinclude/linux/dcache.h
</sect1>
<sect1><title>Inode Handling</title>
!Efs/inode.c
!Efs/bad_inode.c
</sect1>
<sect1><title>Registration and Superblocks</title>
!Efs/super.c
</sect1>
<sect1><title>File Locks</title>
!Efs/locks.c
!Ifs/locks.c
</sect1>
</chapter>
<chapter id="netcore">
<title>Linux Networking</title>
<sect1><title>Socket Buffer Functions</title>
!Iinclude/linux/skbuff.h
!Enet/core/skbuff.c
</sect1>
<sect1><title>Socket Filter</title>
!Enet/core/filter.c
</sect1>
<sect1><title>Generic Network Statistics</title>
!Iinclude/linux/gen_stats.h
!Enet/core/gen_stats.c
!Enet/core/gen_estimator.c
</sect1>
</chapter>
<chapter id="netdev">
<title>Network device support</title>
<sect1><title>Driver Support</title>
!Enet/core/dev.c
</sect1>
<sect1><title>8390 Based Network Cards</title>
!Edrivers/net/8390.c
</sect1>
<sect1><title>Synchronous PPP</title>
!Edrivers/net/wan/syncppp.c
</sect1>
</chapter>
<chapter id="modload">
<title>Module Support</title>
<sect1><title>Module Loading</title>
!Ekernel/kmod.c
</sect1>
<sect1><title>Inter Module support</title>
<para>
Refer to the file kernel/module.c for more information.
</para>
<!-- FIXME: Removed for now since no structured comments in source
X!Ekernel/module.c
-->
</sect1>
</chapter>
<chapter id="hardware">
<title>Hardware Interfaces</title>
<sect1><title>Interrupt Handling</title>
!Iarch/i386/kernel/irq.c
</sect1>
<sect1><title>MTRR Handling</title>
!Earch/i386/kernel/cpu/mtrr/main.c
</sect1>
<sect1><title>PCI Support Library</title>
!Edrivers/pci/pci.c
</sect1>
<sect1><title>PCI Hotplug Support Library</title>
!Edrivers/pci/hotplug/pci_hotplug_core.c
</sect1>
<sect1><title>MCA Architecture</title>
<sect2><title>MCA Device Functions</title>
<para>
Refer to the file arch/i386/kernel/mca.c for more information.
</para>
<!-- FIXME: Removed for now since no structured comments in source
X!Earch/i386/kernel/mca.c
-->
</sect2>
<sect2><title>MCA Bus DMA</title>
!Iinclude/asm-i386/mca_dma.h
</sect2>
</sect1>
</chapter>
<chapter id="devfs">
<title>The Device File System</title>
!Efs/devfs/base.c
</chapter>
<chapter id="security">
<title>Security Framework</title>
!Esecurity/security.c
</chapter>
<chapter id="pmfuncs">
<title>Power Management</title>
!Ekernel/power/pm.c
</chapter>
<chapter id="blkdev">
<title>Block Devices</title>
!Edrivers/block/ll_rw_blk.c
</chapter>
<chapter id="miscdev">
<title>Miscellaneous Devices</title>
!Edrivers/char/misc.c
</chapter>
<chapter id="viddev">
<title>Video4Linux</title>
!Edrivers/media/video/videodev.c
</chapter>
<chapter id="snddev">
<title>Sound Devices</title>
!Esound/sound_core.c
<!-- FIXME: Removed for now since no structured comments in source
X!Isound/sound_firmware.c
-->
</chapter>
<chapter id="uart16x50">
<title>16x50 UART Driver</title>
!Edrivers/serial/serial_core.c
!Edrivers/serial/8250.c
</chapter>
<chapter id="z85230">
<title>Z85230 Support Library</title>
!Edrivers/net/wan/z85230.c
</chapter>
<chapter id="fbdev">
<title>Frame Buffer Library</title>
<para>
The frame buffer drivers depend heavily on four data structures.
These structures are declared in include/linux/fb.h. They are
fb_info, fb_var_screeninfo, fb_fix_screeninfo and fb_monospecs.
The last three can be made available to and from userland.
</para>
<para>
fb_info defines the current state of a particular video card.
Inside fb_info, there exists a fb_ops structure which is a
collection of needed functions to make fbdev and fbcon work.
fb_info is only visible to the kernel.
</para>
<para>
fb_var_screeninfo is used to describe the features of a video card
that are user defined. With fb_var_screeninfo, things such as
depth and the resolution may be defined.
</para>
<para>
The next structure is fb_fix_screeninfo. This defines the
properties of a card that are created when a mode is set and can't
be changed otherwise. A good example of this is the start of the
frame buffer memory. This "locks" the address of the frame buffer
memory, so that it cannot be changed or moved.
</para>
<para>
The last structure is fb_monospecs. In the old API, there was
little importance for fb_monospecs. This allowed for forbidden things
such as setting a mode of 800x600 on a fix frequency monitor. With
the new API, fb_monospecs prevents such things, and if used
correctly, can prevent a monitor from being cooked. fb_monospecs
will not be useful until kernels 2.5.x.
</para>
<sect1><title>Frame Buffer Memory</title>
!Edrivers/video/fbmem.c
</sect1>
<sect1><title>Frame Buffer Console</title>
!Edrivers/video/console/fbcon.c
</sect1>
<sect1><title>Frame Buffer Colormap</title>
!Edrivers/video/fbcmap.c
</sect1>
<!-- FIXME:
drivers/video/fbgen.c has no docs, which stuffs up the sgml. Comment
out until somebody adds docs. KAO
<sect1><title>Frame Buffer Generic Functions</title>
X!Idrivers/video/fbgen.c
</sect1>
KAO -->
<sect1><title>Frame Buffer Video Mode Database</title>
!Idrivers/video/modedb.c
!Edrivers/video/modedb.c
</sect1>
<sect1><title>Frame Buffer Macintosh Video Mode Database</title>
!Idrivers/video/macmodes.c
</sect1>
<sect1><title>Frame Buffer Fonts</title>
<para>
Refer to the file drivers/video/console/fonts.c for more information.
</para>
<!-- FIXME: Removed for now since no structured comments in source
X!Idrivers/video/console/fonts.c
-->
</sect1>
</chapter>
</book>

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,282 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="libataDevGuide">
<bookinfo>
<title>libATA Developer's Guide</title>
<authorgroup>
<author>
<firstname>Jeff</firstname>
<surname>Garzik</surname>
</author>
</authorgroup>
<copyright>
<year>2003</year>
<holder>Jeff Garzik</holder>
</copyright>
<legalnotice>
<para>
The contents of this file are subject to the Open
Software License version 1.1 that can be found at
<ulink url="http://www.opensource.org/licenses/osl-1.1.txt">http://www.opensource.org/licenses/osl-1.1.txt</ulink> and is included herein
by reference.
</para>
<para>
Alternatively, the contents of this file may be used under the terms
of the GNU General Public License version 2 (the "GPL") as distributed
in the kernel source COPYING file, in which case the provisions of
the GPL are applicable instead of the above. If you wish to allow
the use of your version of this file only under the terms of the
GPL and not to allow others to use your version of this file under
the OSL, indicate your decision by deleting the provisions above and
replace them with the notice and other provisions required by the GPL.
If you do not delete the provisions above, a recipient may use your
version of this file under either the OSL or the GPL.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="libataThanks">
<title>Thanks</title>
<para>
The bulk of the ATA knowledge comes thanks to long conversations with
Andre Hedrick (www.linux-ide.org).
</para>
<para>
Thanks to Alan Cox for pointing out similarities
between SATA and SCSI, and in general for motivation to hack on
libata.
</para>
<para>
libata's device detection
method, ata_pio_devchk, and in general all the early probing was
based on extensive study of Hale Landis's probe/reset code in his
ATADRVR driver (www.ata-atapi.com).
</para>
</chapter>
<chapter id="libataDriverApi">
<title>libata Driver API</title>
<sect1>
<title>struct ata_port_operations</title>
<programlisting>
void (*port_disable) (struct ata_port *);
</programlisting>
<para>
Called from ata_bus_probe() and ata_bus_reset() error paths,
as well as when unregistering from the SCSI module (rmmod, hot
unplug).
</para>
<programlisting>
void (*dev_config) (struct ata_port *, struct ata_device *);
</programlisting>
<para>
Called after IDENTIFY [PACKET] DEVICE is issued to each device
found. Typically used to apply device-specific fixups prior to
issue of SET FEATURES - XFER MODE, and prior to operation.
</para>
<programlisting>
void (*set_piomode) (struct ata_port *, struct ata_device *);
void (*set_dmamode) (struct ata_port *, struct ata_device *);
void (*post_set_mode) (struct ata_port *ap);
</programlisting>
<para>
Hooks called prior to the issue of SET FEATURES - XFER MODE
command. dev->pio_mode is guaranteed to be valid when
->set_piomode() is called, and dev->dma_mode is guaranteed to be
valid when ->set_dmamode() is called. ->post_set_mode() is
called unconditionally, after the SET FEATURES - XFER MODE
command completes successfully.
</para>
<para>
->set_piomode() is always called (if present), but
->set_dma_mode() is only called if DMA is possible.
</para>
<programlisting>
void (*tf_load) (struct ata_port *ap, struct ata_taskfile *tf);
void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
</programlisting>
<para>
->tf_load() is called to load the given taskfile into hardware
registers / DMA buffers. ->tf_read() is called to read the
hardware registers / DMA buffers, to obtain the current set of
taskfile register values.
</para>
<programlisting>
void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
</programlisting>
<para>
causes an ATA command, previously loaded with
->tf_load(), to be initiated in hardware.
</para>
<programlisting>
u8 (*check_status)(struct ata_port *ap);
void (*dev_select)(struct ata_port *ap, unsigned int device);
</programlisting>
<para>
Reads the Status ATA shadow register from hardware. On some
hardware, this has the side effect of clearing the interrupt
condition.
</para>
<programlisting>
void (*dev_select)(struct ata_port *ap, unsigned int device);
</programlisting>
<para>
Issues the low-level hardware command(s) that causes one of N
hardware devices to be considered 'selected' (active and
available for use) on the ATA bus.
</para>
<programlisting>
void (*phy_reset) (struct ata_port *ap);
</programlisting>
<para>
The very first step in the probe phase. Actions vary depending
on the bus type, typically. After waking up the device and probing
for device presence (PATA and SATA), typically a soft reset
(SRST) will be performed. Drivers typically use the helper
functions ata_bus_reset() or sata_phy_reset() for this hook.
</para>
<programlisting>
void (*bmdma_setup) (struct ata_queued_cmd *qc);
void (*bmdma_start) (struct ata_queued_cmd *qc);
</programlisting>
<para>
When setting up an IDE BMDMA transaction, these hooks arm
(->bmdma_setup) and fire (->bmdma_start) the hardware's DMA
engine.
</para>
<programlisting>
void (*qc_prep) (struct ata_queued_cmd *qc);
int (*qc_issue) (struct ata_queued_cmd *qc);
</programlisting>
<para>
Higher-level hooks, these two hooks can potentially supercede
several of the above taskfile/DMA engine hooks. ->qc_prep is
called after the buffers have been DMA-mapped, and is typically
used to populate the hardware's DMA scatter-gather table.
Most drivers use the standard ata_qc_prep() helper function, but
more advanced drivers roll their own.
</para>
<para>
->qc_issue is used to make a command active, once the hardware
and S/G tables have been prepared. IDE BMDMA drivers use the
helper function ata_qc_issue_prot() for taskfile protocol-based
dispatch. More advanced drivers roll their own ->qc_issue
implementation, using this as the "issue new ATA command to
hardware" hook.
</para>
<programlisting>
void (*eng_timeout) (struct ata_port *ap);
</programlisting>
<para>
This is a high level error handling function, called from the
error handling thread, when a command times out.
</para>
<programlisting>
irqreturn_t (*irq_handler)(int, void *, struct pt_regs *);
void (*irq_clear) (struct ata_port *);
</programlisting>
<para>
->irq_handler is the interrupt handling routine registered with
the system, by libata. ->irq_clear is called during probe just
before the interrupt handler is registered, to be sure hardware
is quiet.
</para>
<programlisting>
u32 (*scr_read) (struct ata_port *ap, unsigned int sc_reg);
void (*scr_write) (struct ata_port *ap, unsigned int sc_reg,
u32 val);
</programlisting>
<para>
Read and write standard SATA phy registers. Currently only used
if ->phy_reset hook called the sata_phy_reset() helper function.
</para>
<programlisting>
int (*port_start) (struct ata_port *ap);
void (*port_stop) (struct ata_port *ap);
void (*host_stop) (struct ata_host_set *host_set);
</programlisting>
<para>
->port_start() is called just after the data structures for each
port are initialized. Typically this is used to alloc per-port
DMA buffers / tables / rings, enable DMA engines, and similar
tasks.
</para>
<para>
->host_stop() is called when the rmmod or hot unplug process
begins. The hook must stop all hardware interrupts, DMA
engines, etc.
</para>
<para>
->port_stop() is called after ->host_stop(). It's sole function
is to release DMA/memory resources, now that they are no longer
actively being used.
</para>
</sect1>
</chapter>
<chapter id="libataExt">
<title>libata Library</title>
!Edrivers/scsi/libata-core.c
</chapter>
<chapter id="libataInt">
<title>libata Core Internals</title>
!Idrivers/scsi/libata-core.c
</chapter>
<chapter id="libataScsiInt">
<title>libata SCSI translation/emulation</title>
!Edrivers/scsi/libata-scsi.c
!Idrivers/scsi/libata-scsi.c
</chapter>
<chapter id="PiixInt">
<title>ata_piix Internals</title>
!Idrivers/scsi/ata_piix.c
</chapter>
<chapter id="SILInt">
<title>sata_sil Internals</title>
!Idrivers/scsi/sata_sil.c
</chapter>
</book>

View file

@ -0,0 +1,289 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Reed-Solomon-Library-Guide">
<bookinfo>
<title>Reed-Solomon Library Programming Interface</title>
<authorgroup>
<author>
<firstname>Thomas</firstname>
<surname>Gleixner</surname>
<affiliation>
<address>
<email>tglx@linutronix.de</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2004</year>
<holder>Thomas Gleixner</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
The generic Reed-Solomon Library provides encoding, decoding
and error correction functions.
</para>
<para>
Reed-Solomon codes are used in communication and storage
applications to ensure data integrity.
</para>
<para>
This documentation is provided for developers who want to utilize
the functions provided by the library.
</para>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
None.
</para>
</chapter>
<chapter id="usage">
<title>Usage</title>
<para>
This chapter provides examples how to use the library.
</para>
<sect1>
<title>Initializing</title>
<para>
The init function init_rs returns a pointer to a
rs decoder structure, which holds the necessary
information for encoding, decoding and error correction
with the given polynomial. It either uses an existing
matching decoder or creates a new one. On creation all
the lookup tables for fast en/decoding are created.
The function may take a while, so make sure not to
call it in critical code paths.
</para>
<programlisting>
/* the Reed Solomon control structure */
static struct rs_control *rs_decoder;
/* Symbolsize is 10 (bits)
* Primitve polynomial is x^10+x^3+1
* first consecutive root is 0
* primitve element to generate roots = 1
* generator polinomial degree (number of roots) = 6
*/
rs_decoder = init_rs (10, 0x409, 0, 1, 6);
</programlisting>
</sect1>
<sect1>
<title>Encoding</title>
<para>
The encoder calculates the Reed-Solomon code over
the given data length and stores the result in
the parity buffer. Note that the parity buffer must
be initialized before calling the encoder.
</para>
<para>
The expanded data can be inverted on the fly by
providing a non zero inversion mask. The expanded data is
XOR'ed with the mask. This is used e.g. for FLASH
ECC, where the all 0xFF is inverted to an all 0x00.
The Reed-Solomon code for all 0x00 is all 0x00. The
code is inverted before storing to FLASH so it is 0xFF
too. This prevent's that reading from an erased FLASH
results in ECC errors.
</para>
<para>
The databytes are expanded to the given symbol size
on the fly. There is no support for encoding continuous
bitstreams with a symbol size != 8 at the moment. If
it is necessary it should be not a big deal to implement
such functionality.
</para>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6];
/* Initialize the parity buffer */
memset(par, 0, sizeof(par));
/* Encode 512 byte in data8. Store parity in buffer par */
encode_rs8 (rs_decoder, data8, 512, par, 0);
</programlisting>
</sect1>
<sect1>
<title>Decoding</title>
<para>
The decoder calculates the syndrome over
the given data length and the received parity symbols
and corrects errors in the data.
</para>
<para>
If a syndrome is available from a hardware decoder
then the syndrome calculation is skipped.
</para>
<para>
The correction of the data buffer can be suppressed
by providing a correction pattern buffer and an error
location buffer to the decoder. The decoder stores the
calculated error location and the correction bitmask
in the given buffers. This is useful for hardware
decoders which use a weird bit ordering scheme.
</para>
<para>
The databytes are expanded to the given symbol size
on the fly. There is no support for decoding continuous
bitstreams with a symbolsize != 8 at the moment. If
it is necessary it should be not a big deal to implement
such functionality.
</para>
<sect2>
<title>
Decoding with syndrome calculation, direct data correction
</title>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6];
uint8_t data[512];
int numerr;
/* Receive data */
.....
/* Receive parity */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL);
</programlisting>
</sect2>
<sect2>
<title>
Decoding with syndrome given by hardware decoder, direct data correction
</title>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6], syn[6];
uint8_t data[512];
int numerr;
/* Receive data */
.....
/* Receive parity */
.....
/* Get syndrome from hardware decoder */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL);
</programlisting>
</sect2>
<sect2>
<title>
Decoding with syndrome given by hardware decoder, no direct data correction.
</title>
<para>
Note: It's not necessary to give data and received parity to the decoder.
</para>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6], syn[6], corr[8];
uint8_t data[512];
int numerr, errpos[8];
/* Receive data */
.....
/* Receive parity */
.....
/* Get syndrome from hardware decoder */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr);
for (i = 0; i &lt; numerr; i++) {
do_error_correction_in_your_buffer(errpos[i], corr[i]);
}
</programlisting>
</sect2>
</sect1>
<sect1>
<title>Cleanup</title>
<para>
The function free_rs frees the allocated resources,
if the caller is the last user of the decoder.
</para>
<programlisting>
/* Release resources */
free_rs(rs_decoder);
</programlisting>
</sect1>
</chapter>
<chapter id="structs">
<title>Structures</title>
<para>
This chapter contains the autogenerated documentation of the structures which are
used in the Reed-Solomon Library and are relevant for a developer.
</para>
!Iinclude/linux/rslib.h
</chapter>
<chapter id="pubfunctions">
<title>Public Functions Provided</title>
<para>
This chapter contains the autogenerated documentation of the Reed-Solomon functions
which are exported.
</para>
!Elib/reed_solomon/reed_solomon.c
</chapter>
<chapter id="credits">
<title>Credits</title>
<para>
The library code for encoding and decoding was written by Phil Karn.
</para>
<programlisting>
Copyright 2002, Phil Karn, KA9Q
May be used under the terms of the GNU General Public License (GPL)
</programlisting>
<para>
The wrapper functions and interfaces are written by Thomas Gleixner
</para>
<para>
Many users have provided bugfixes, improvements and helping hands for testing.
Thanks a lot.
</para>
<para>
The following people have contributed to this document:
</para>
<para>
Thomas Gleixner<email>tglx@linutronix.de</email>
</para>
</chapter>
</book>

View file

@ -0,0 +1,265 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<article class="whitepaper" id="LinuxSecurityModule" lang="en">
<articleinfo>
<title>Linux Security Modules: General Security Hooks for Linux</title>
<authorgroup>
<author>
<firstname>Stephen</firstname>
<surname>Smalley</surname>
<affiliation>
<orgname>NAI Labs</orgname>
<address><email>ssmalley@nai.com</email></address>
</affiliation>
</author>
<author>
<firstname>Timothy</firstname>
<surname>Fraser</surname>
<affiliation>
<orgname>NAI Labs</orgname>
<address><email>tfraser@nai.com</email></address>
</affiliation>
</author>
<author>
<firstname>Chris</firstname>
<surname>Vance</surname>
<affiliation>
<orgname>NAI Labs</orgname>
<address><email>cvance@nai.com</email></address>
</affiliation>
</author>
</authorgroup>
</articleinfo>
<sect1><title>Introduction</title>
<para>
In March 2001, the National Security Agency (NSA) gave a presentation
about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel
Summit. SELinux is an implementation of flexible and fine-grained
nondiscretionary access controls in the Linux kernel, originally
implemented as its own particular kernel patch. Several other
security projects (e.g. RSBAC, Medusa) have also developed flexible
access control architectures for the Linux kernel, and various
projects have developed particular access control models for Linux
(e.g. LIDS, DTE, SubDomain). Each project has developed and
maintained its own kernel patch to support its security needs.
</para>
<para>
In response to the NSA presentation, Linus Torvalds made a set of
remarks that described a security framework he would be willing to
consider for inclusion in the mainstream Linux kernel. He described a
general framework that would provide a set of security hooks to
control operations on kernel objects and a set of opaque security
fields in kernel data structures for maintaining security attributes.
This framework could then be used by loadable kernel modules to
implement any desired model of security. Linus also suggested the
possibility of migrating the Linux capabilities code into such a
module.
</para>
<para>
The Linux Security Modules (LSM) project was started by WireX to
develop such a framework. LSM is a joint development effort by
several security projects, including Immunix, SELinux, SGI and Janus,
and several individuals, including Greg Kroah-Hartman and James
Morris, to develop a Linux kernel patch that implements this
framework. The patch is currently tracking the 2.4 series and is
targeted for integration into the 2.5 development series. This
technical report provides an overview of the framework and the example
capabilities security module provided by the LSM kernel patch.
</para>
</sect1>
<sect1 id="framework"><title>LSM Framework</title>
<para>
The LSM kernel patch provides a general kernel framework to support
security modules. In particular, the LSM framework is primarily
focused on supporting access control modules, although future
development is likely to address other security needs such as
auditing. By itself, the framework does not provide any additional
security; it merely provides the infrastructure to support security
modules. The LSM kernel patch also moves most of the capabilities
logic into an optional security module, with the system defaulting
to the traditional superuser logic. This capabilities module
is discussed further in <xref linkend="cap"/>.
</para>
<para>
The LSM kernel patch adds security fields to kernel data structures
and inserts calls to hook functions at critical points in the kernel
code to manage the security fields and to perform access control. It
also adds functions for registering and unregistering security
modules, and adds a general <function>security</function> system call
to support new system calls for security-aware applications.
</para>
<para>
The LSM security fields are simply <type>void*</type> pointers. For
process and program execution security information, security fields
were added to <structname>struct task_struct</structname> and
<structname>struct linux_binprm</structname>. For filesystem security
information, a security field was added to
<structname>struct super_block</structname>. For pipe, file, and socket
security information, security fields were added to
<structname>struct inode</structname> and
<structname>struct file</structname>. For packet and network device security
information, security fields were added to
<structname>struct sk_buff</structname> and
<structname>struct net_device</structname>. For System V IPC security
information, security fields were added to
<structname>struct kern_ipc_perm</structname> and
<structname>struct msg_msg</structname>; additionally, the definitions
for <structname>struct msg_msg</structname>, <structname>struct
msg_queue</structname>, and <structname>struct
shmid_kernel</structname> were moved to header files
(<filename>include/linux/msg.h</filename> and
<filename>include/linux/shm.h</filename> as appropriate) to allow
the security modules to use these definitions.
</para>
<para>
Each LSM hook is a function pointer in a global table,
security_ops. This table is a
<structname>security_operations</structname> structure as defined by
<filename>include/linux/security.h</filename>. Detailed documentation
for each hook is included in this header file. At present, this
structure consists of a collection of substructures that group related
hooks based on the kernel object (e.g. task, inode, file, sk_buff,
etc) as well as some top-level hook function pointers for system
operations. This structure is likely to be flattened in the future
for performance. The placement of the hook calls in the kernel code
is described by the "called:" lines in the per-hook documentation in
the header file. The hook calls can also be easily found in the
kernel code by looking for the string "security_ops->".
</para>
<para>
Linus mentioned per-process security hooks in his original remarks as a
possible alternative to global security hooks. However, if LSM were
to start from the perspective of per-process hooks, then the base
framework would have to deal with how to handle operations that
involve multiple processes (e.g. kill), since each process might have
its own hook for controlling the operation. This would require a
general mechanism for composing hooks in the base framework.
Additionally, LSM would still need global hooks for operations that
have no process context (e.g. network input operations).
Consequently, LSM provides global security hooks, but a security
module is free to implement per-process hooks (where that makes sense)
by storing a security_ops table in each process' security field and
then invoking these per-process hooks from the global hooks.
The problem of composition is thus deferred to the module.
</para>
<para>
The global security_ops table is initialized to a set of hook
functions provided by a dummy security module that provides
traditional superuser logic. A <function>register_security</function>
function (in <filename>security/security.c</filename>) is provided to
allow a security module to set security_ops to refer to its own hook
functions, and an <function>unregister_security</function> function is
provided to revert security_ops to the dummy module hooks. This
mechanism is used to set the primary security module, which is
responsible for making the final decision for each hook.
</para>
<para>
LSM also provides a simple mechanism for stacking additional security
modules with the primary security module. It defines
<function>register_security</function> and
<function>unregister_security</function> hooks in the
<structname>security_operations</structname> structure and provides
<function>mod_reg_security</function> and
<function>mod_unreg_security</function> functions that invoke these
hooks after performing some sanity checking. A security module can
call these functions in order to stack with other modules. However,
the actual details of how this stacking is handled are deferred to the
module, which can implement these hooks in any way it wishes
(including always returning an error if it does not wish to support
stacking). In this manner, LSM again defers the problem of
composition to the module.
</para>
<para>
Although the LSM hooks are organized into substructures based on
kernel object, all of the hooks can be viewed as falling into two
major categories: hooks that are used to manage the security fields
and hooks that are used to perform access control. Examples of the
first category of hooks include the
<function>alloc_security</function> and
<function>free_security</function> hooks defined for each kernel data
structure that has a security field. These hooks are used to allocate
and free security structures for kernel objects. The first category
of hooks also includes hooks that set information in the security
field after allocation, such as the <function>post_lookup</function>
hook in <structname>struct inode_security_ops</structname>. This hook
is used to set security information for inodes after successful lookup
operations. An example of the second category of hooks is the
<function>permission</function> hook in
<structname>struct inode_security_ops</structname>. This hook checks
permission when accessing an inode.
</para>
</sect1>
<sect1 id="cap"><title>LSM Capabilities Module</title>
<para>
The LSM kernel patch moves most of the existing POSIX.1e capabilities
logic into an optional security module stored in the file
<filename>security/capability.c</filename>. This change allows
users who do not want to use capabilities to omit this code entirely
from their kernel, instead using the dummy module for traditional
superuser logic or any other module that they desire. This change
also allows the developers of the capabilities logic to maintain and
enhance their code more freely, without needing to integrate patches
back into the base kernel.
</para>
<para>
In addition to moving the capabilities logic, the LSM kernel patch
could move the capability-related fields from the kernel data
structures into the new security fields managed by the security
modules. However, at present, the LSM kernel patch leaves the
capability fields in the kernel data structures. In his original
remarks, Linus suggested that this might be preferable so that other
security modules can be easily stacked with the capabilities module
without needing to chain multiple security structures on the security field.
It also avoids imposing extra overhead on the capabilities module
to manage the security fields. However, the LSM framework could
certainly support such a move if it is determined to be desirable,
with only a few additional changes described below.
</para>
<para>
At present, the capabilities logic for computing process capabilities
on <function>execve</function> and <function>set*uid</function>,
checking capabilities for a particular process, saving and checking
capabilities for netlink messages, and handling the
<function>capget</function> and <function>capset</function> system
calls have been moved into the capabilities module. There are still a
few locations in the base kernel where capability-related fields are
directly examined or modified, but the current version of the LSM
patch does allow a security module to completely replace the
assignment and testing of capabilities. These few locations would
need to be changed if the capability-related fields were moved into
the security field. The following is a list of known locations that
still perform such direct examination or modification of
capability-related fields:
<itemizedlist>
<listitem><para><filename>fs/open.c</filename>:<function>sys_access</function></para></listitem>
<listitem><para><filename>fs/lockd/host.c</filename>:<function>nlm_bind_host</function></para></listitem>
<listitem><para><filename>fs/nfsd/auth.c</filename>:<function>nfsd_setuser</function></para></listitem>
<listitem><para><filename>fs/proc/array.c</filename>:<function>task_cap</function></para></listitem>
</itemizedlist>
</para>
</sect1>
</article>

View file

@ -0,0 +1,3 @@
# Rules are put in Documentation/DocBook
clean-files := *.9.gz *.sgml manpage.links manpage.refs

View file

@ -0,0 +1,107 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="MCAGuide">
<bookinfo>
<title>MCA Driver Programming Interface</title>
<authorgroup>
<author>
<firstname>Alan</firstname>
<surname>Cox</surname>
<affiliation>
<address>
<email>alan@redhat.com</email>
</address>
</affiliation>
</author>
<author>
<firstname>David</firstname>
<surname>Weinehall</surname>
</author>
<author>
<firstname>Chris</firstname>
<surname>Beauregard</surname>
</author>
</authorgroup>
<copyright>
<year>2000</year>
<holder>Alan Cox</holder>
<holder>David Weinehall</holder>
<holder>Chris Beauregard</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
The MCA bus functions provide a generalised interface to find MCA
bus cards, to claim them for a driver, and to read and manipulate POS
registers without being aware of the motherboard internals or
certain deep magic specific to onboard devices.
</para>
<para>
The basic interface to the MCA bus devices is the slot. Each slot
is numbered and virtual slot numbers are assigned to the internal
devices. Using a pci_dev as other busses do does not really make
sense in the MCA context as the MCA bus resources require card
specific interpretation.
</para>
<para>
Finally the MCA bus functions provide a parallel set of DMA
functions mimicing the ISA bus DMA functions as closely as possible,
although also supporting the additional DMA functionality on the
MCA bus controllers.
</para>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
None.
</para>
</chapter>
<chapter id="pubfunctions">
<title>Public Functions Provided</title>
!Earch/i386/kernel/mca.c
</chapter>
<chapter id="dmafunctions">
<title>DMA Functions Provided</title>
!Iinclude/asm-i386/mca_dma.h
</chapter>
</book>

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,591 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
<!ENTITY procfsexample SYSTEM "procfs_example.xml">
]>
<book id="LKProcfsGuide">
<bookinfo>
<title>Linux Kernel Procfs Guide</title>
<authorgroup>
<author>
<firstname>Erik</firstname>
<othername>(J.A.K.)</othername>
<surname>Mouw</surname>
<affiliation>
<orgname>Delft University of Technology</orgname>
<orgdiv>Faculty of Information Technology and Systems</orgdiv>
<address>
<email>J.A.K.Mouw@its.tudelft.nl</email>
<pob>PO BOX 5031</pob>
<postcode>2600 GA</postcode>
<city>Delft</city>
<country>The Netherlands</country>
</address>
</affiliation>
</author>
</authorgroup>
<revhistory>
<revision>
<revnumber>1.0&nbsp;</revnumber>
<date>May 30, 2001</date>
<revremark>Initial revision posted to linux-kernel</revremark>
</revision>
<revision>
<revnumber>1.1&nbsp;</revnumber>
<date>June 3, 2001</date>
<revremark>Revised after comments from linux-kernel</revremark>
</revision>
</revhistory>
<copyright>
<year>2001</year>
<holder>Erik Mouw</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This documentation is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc>
</toc>
<preface>
<title>Preface</title>
<para>
This guide describes the use of the procfs file system from
within the Linux kernel. The idea to write this guide came up on
the #kernelnewbies IRC channel (see <ulink
url="http://www.kernelnewbies.org/">http://www.kernelnewbies.org/</ulink>),
when Jeff Garzik explained the use of procfs and forwarded me a
message Alexander Viro wrote to the linux-kernel mailing list. I
agreed to write it up nicely, so here it is.
</para>
<para>
I'd like to thank Jeff Garzik
<email>jgarzik@pobox.com</email> and Alexander Viro
<email>viro@parcelfarce.linux.theplanet.co.uk</email> for their input,
Tim Waugh <email>twaugh@redhat.com</email> for his <ulink
url="http://people.redhat.com/twaugh/docbook/selfdocbook/">Selfdocbook</ulink>,
and Marc Joosen <email>marcj@historia.et.tudelft.nl</email> for
proofreading.
</para>
<para>
This documentation was written while working on the LART
computing board (<ulink
url="http://www.lart.tudelft.nl/">http://www.lart.tudelft.nl/</ulink>),
which is sponsored by the Mobile Multi-media Communications
(<ulink
url="http://www.mmc.tudelft.nl/">http://www.mmc.tudelft.nl/</ulink>)
and Ubiquitous Communications (<ulink
url="http://www.ubicom.tudelft.nl/">http://www.ubicom.tudelft.nl/</ulink>)
projects.
</para>
<para>
Erik
</para>
</preface>
<chapter id="intro">
<title>Introduction</title>
<para>
The <filename class="directory">/proc</filename> file system
(procfs) is a special file system in the linux kernel. It's a
virtual file system: it is not associated with a block device
but exists only in memory. The files in the procfs are there to
allow userland programs access to certain information from the
kernel (like process information in <filename
class="directory">/proc/[0-9]+/</filename>), but also for debug
purposes (like <filename>/proc/ksyms</filename>).
</para>
<para>
This guide describes the use of the procfs file system from
within the Linux kernel. It starts by introducing all relevant
functions to manage the files within the file system. After that
it shows how to communicate with userland, and some tips and
tricks will be pointed out. Finally a complete example will be
shown.
</para>
<para>
Note that the files in <filename
class="directory">/proc/sys</filename> are sysctl files: they
don't belong to procfs and are governed by a completely
different API described in the Kernel API book.
</para>
</chapter>
<chapter id="managing">
<title>Managing procfs entries</title>
<para>
This chapter describes the functions that various kernel
components use to populate the procfs with files, symlinks,
device nodes, and directories.
</para>
<para>
A minor note before we start: if you want to use any of the
procfs functions, be sure to include the correct header file!
This should be one of the first lines in your code:
</para>
<programlisting>
#include &lt;linux/proc_fs.h&gt;
</programlisting>
<sect1 id="regularfile">
<title>Creating a regular file</title>
<funcsynopsis>
<funcprototype>
<funcdef>struct proc_dir_entry* <function>create_proc_entry</function></funcdef>
<paramdef>const char* <parameter>name</parameter></paramdef>
<paramdef>mode_t <parameter>mode</parameter></paramdef>
<paramdef>struct proc_dir_entry* <parameter>parent</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
This function creates a regular file with the name
<parameter>name</parameter>, file mode
<parameter>mode</parameter> in the directory
<parameter>parent</parameter>. To create a file in the root of
the procfs, use <constant>NULL</constant> as
<parameter>parent</parameter> parameter. When successful, the
function will return a pointer to the freshly created
<structname>struct proc_dir_entry</structname>; otherwise it
will return <constant>NULL</constant>. <xref
linkend="userland"/> describes how to do something useful with
regular files.
</para>
<para>
Note that it is specifically supported that you can pass a
path that spans multiple directories. For example
<function>create_proc_entry</function>(<parameter>"drivers/via0/info"</parameter>)
will create the <filename class="directory">via0</filename>
directory if necessary, with standard
<constant>0755</constant> permissions.
</para>
<para>
If you only want to be able to read the file, the function
<function>create_proc_read_entry</function> described in <xref
linkend="convenience"/> may be used to create and initialise
the procfs entry in one single call.
</para>
</sect1>
<sect1>
<title>Creating a symlink</title>
<funcsynopsis>
<funcprototype>
<funcdef>struct proc_dir_entry*
<function>proc_symlink</function></funcdef> <paramdef>const
char* <parameter>name</parameter></paramdef>
<paramdef>struct proc_dir_entry*
<parameter>parent</parameter></paramdef> <paramdef>const
char* <parameter>dest</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
This creates a symlink in the procfs directory
<parameter>parent</parameter> that points from
<parameter>name</parameter> to
<parameter>dest</parameter>. This translates in userland to
<literal>ln -s</literal> <parameter>dest</parameter>
<parameter>name</parameter>.
</para>
</sect1>
<sect1>
<title>Creating a directory</title>
<funcsynopsis>
<funcprototype>
<funcdef>struct proc_dir_entry* <function>proc_mkdir</function></funcdef>
<paramdef>const char* <parameter>name</parameter></paramdef>
<paramdef>struct proc_dir_entry* <parameter>parent</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
Create a directory <parameter>name</parameter> in the procfs
directory <parameter>parent</parameter>.
</para>
</sect1>
<sect1>
<title>Removing an entry</title>
<funcsynopsis>
<funcprototype>
<funcdef>void <function>remove_proc_entry</function></funcdef>
<paramdef>const char* <parameter>name</parameter></paramdef>
<paramdef>struct proc_dir_entry* <parameter>parent</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
Removes the entry <parameter>name</parameter> in the directory
<parameter>parent</parameter> from the procfs. Entries are
removed by their <emphasis>name</emphasis>, not by the
<structname>struct proc_dir_entry</structname> returned by the
various create functions. Note that this function doesn't
recursively remove entries.
</para>
<para>
Be sure to free the <structfield>data</structfield> entry from
the <structname>struct proc_dir_entry</structname> before
<function>remove_proc_entry</function> is called (that is: if
there was some <structfield>data</structfield> allocated, of
course). See <xref linkend="usingdata"/> for more information
on using the <structfield>data</structfield> entry.
</para>
</sect1>
</chapter>
<chapter id="userland">
<title>Communicating with userland</title>
<para>
Instead of reading (or writing) information directly from
kernel memory, procfs works with <emphasis>call back
functions</emphasis> for files: functions that are called when
a specific file is being read or written. Such functions have
to be initialised after the procfs file is created by setting
the <structfield>read_proc</structfield> and/or
<structfield>write_proc</structfield> fields in the
<structname>struct proc_dir_entry*</structname> that the
function <function>create_proc_entry</function> returned:
</para>
<programlisting>
struct proc_dir_entry* entry;
entry->read_proc = read_proc_foo;
entry->write_proc = write_proc_foo;
</programlisting>
<para>
If you only want to use a the
<structfield>read_proc</structfield>, the function
<function>create_proc_read_entry</function> described in <xref
linkend="convenience"/> may be used to create and initialise the
procfs entry in one single call.
</para>
<sect1>
<title>Reading data</title>
<para>
The read function is a call back function that allows userland
processes to read data from the kernel. The read function
should have the following format:
</para>
<funcsynopsis>
<funcprototype>
<funcdef>int <function>read_func</function></funcdef>
<paramdef>char* <parameter>page</parameter></paramdef>
<paramdef>char** <parameter>start</parameter></paramdef>
<paramdef>off_t <parameter>off</parameter></paramdef>
<paramdef>int <parameter>count</parameter></paramdef>
<paramdef>int* <parameter>eof</parameter></paramdef>
<paramdef>void* <parameter>data</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
The read function should write its information into the
<parameter>page</parameter>. For proper use, the function
should start writing at an offset of
<parameter>off</parameter> in <parameter>page</parameter> and
write at most <parameter>count</parameter> bytes, but because
most read functions are quite simple and only return a small
amount of information, these two parameters are usually
ignored (it breaks pagers like <literal>more</literal> and
<literal>less</literal>, but <literal>cat</literal> still
works).
</para>
<para>
If the <parameter>off</parameter> and
<parameter>count</parameter> parameters are properly used,
<parameter>eof</parameter> should be used to signal that the
end of the file has been reached by writing
<literal>1</literal> to the memory location
<parameter>eof</parameter> points to.
</para>
<para>
The parameter <parameter>start</parameter> doesn't seem to be
used anywhere in the kernel. The <parameter>data</parameter>
parameter can be used to create a single call back function for
several files, see <xref linkend="usingdata"/>.
</para>
<para>
The <function>read_func</function> function must return the
number of bytes written into the <parameter>page</parameter>.
</para>
<para>
<xref linkend="example"/> shows how to use a read call back
function.
</para>
</sect1>
<sect1>
<title>Writing data</title>
<para>
The write call back function allows a userland process to write
data to the kernel, so it has some kind of control over the
kernel. The write function should have the following format:
</para>
<funcsynopsis>
<funcprototype>
<funcdef>int <function>write_func</function></funcdef>
<paramdef>struct file* <parameter>file</parameter></paramdef>
<paramdef>const char* <parameter>buffer</parameter></paramdef>
<paramdef>unsigned long <parameter>count</parameter></paramdef>
<paramdef>void* <parameter>data</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
The write function should read <parameter>count</parameter>
bytes at maximum from the <parameter>buffer</parameter>. Note
that the <parameter>buffer</parameter> doesn't live in the
kernel's memory space, so it should first be copied to kernel
space with <function>copy_from_user</function>. The
<parameter>file</parameter> parameter is usually
ignored. <xref linkend="usingdata"/> shows how to use the
<parameter>data</parameter> parameter.
</para>
<para>
Again, <xref linkend="example"/> shows how to use this call back
function.
</para>
</sect1>
<sect1 id="usingdata">
<title>A single call back for many files</title>
<para>
When a large number of almost identical files is used, it's
quite inconvenient to use a separate call back function for
each file. A better approach is to have a single call back
function that distinguishes between the files by using the
<structfield>data</structfield> field in <structname>struct
proc_dir_entry</structname>. First of all, the
<structfield>data</structfield> field has to be initialised:
</para>
<programlisting>
struct proc_dir_entry* entry;
struct my_file_data *file_data;
file_data = kmalloc(sizeof(struct my_file_data), GFP_KERNEL);
entry->data = file_data;
</programlisting>
<para>
The <structfield>data</structfield> field is a <type>void
*</type>, so it can be initialised with anything.
</para>
<para>
Now that the <structfield>data</structfield> field is set, the
<function>read_proc</function> and
<function>write_proc</function> can use it to distinguish
between files because they get it passed into their
<parameter>data</parameter> parameter:
</para>
<programlisting>
int foo_read_func(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
int len;
if(data == file_data) {
/* special case for this file */
} else {
/* normal processing */
}
return len;
}
</programlisting>
<para>
Be sure to free the <structfield>data</structfield> data field
when removing the procfs entry.
</para>
</sect1>
</chapter>
<chapter id="tips">
<title>Tips and tricks</title>
<sect1 id="convenience">
<title>Convenience functions</title>
<funcsynopsis>
<funcprototype>
<funcdef>struct proc_dir_entry* <function>create_proc_read_entry</function></funcdef>
<paramdef>const char* <parameter>name</parameter></paramdef>
<paramdef>mode_t <parameter>mode</parameter></paramdef>
<paramdef>struct proc_dir_entry* <parameter>parent</parameter></paramdef>
<paramdef>read_proc_t* <parameter>read_proc</parameter></paramdef>
<paramdef>void* <parameter>data</parameter></paramdef>
</funcprototype>
</funcsynopsis>
<para>
This function creates a regular file in exactly the same way
as <function>create_proc_entry</function> from <xref
linkend="regularfile"/> does, but also allows to set the read
function <parameter>read_proc</parameter> in one call. This
function can set the <parameter>data</parameter> as well, like
explained in <xref linkend="usingdata"/>.
</para>
</sect1>
<sect1>
<title>Modules</title>
<para>
If procfs is being used from within a module, be sure to set
the <structfield>owner</structfield> field in the
<structname>struct proc_dir_entry</structname> to
<constant>THIS_MODULE</constant>.
</para>
<programlisting>
struct proc_dir_entry* entry;
entry->owner = THIS_MODULE;
</programlisting>
</sect1>
<sect1>
<title>Mode and ownership</title>
<para>
Sometimes it is useful to change the mode and/or ownership of
a procfs entry. Here is an example that shows how to achieve
that:
</para>
<programlisting>
struct proc_dir_entry* entry;
entry->mode = S_IWUSR |S_IRUSR | S_IRGRP | S_IROTH;
entry->uid = 0;
entry->gid = 100;
</programlisting>
</sect1>
</chapter>
<chapter id="example">
<title>Example</title>
<!-- be careful with the example code: it shouldn't be wider than
approx. 60 columns, or otherwise it won't fit properly on a page
-->
&procfsexample;
</chapter>
</book>

View file

@ -0,0 +1,224 @@
/*
* procfs_example.c: an example proc interface
*
* Copyright (C) 2001, Erik Mouw (J.A.K.Mouw@its.tudelft.nl)
*
* This file accompanies the procfs-guide in the Linux kernel
* source. Its main use is to demonstrate the concepts and
* functions described in the guide.
*
* This software has been developed while working on the LART
* computing board (http://www.lart.tudelft.nl/), which is
* sponsored by the Mobile Multi-media Communications
* (http://www.mmc.tudelft.nl/) and Ubiquitous Communications
* (http://www.ubicom.tudelft.nl/) projects.
*
* The author can be reached at:
*
* Erik Mouw
* Information and Communication Theory Group
* Faculty of Information Technology and Systems
* Delft University of Technology
* P.O. Box 5031
* 2600 GA Delft
* The Netherlands
*
*
* This program is free software; you can redistribute
* it and/or modify it under the terms of the GNU General
* Public License as published by the Free Software
* Foundation; either version 2 of the License, or (at your
* option) any later version.
*
* This program is distributed in the hope that it will be
* useful, but WITHOUT ANY WARRANTY; without even the implied
* warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
* PURPOSE. See the GNU General Public License for more
* details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place,
* Suite 330, Boston, MA 02111-1307 USA
*
*/
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <linux/jiffies.h>
#include <asm/uaccess.h>
#define MODULE_VERS "1.0"
#define MODULE_NAME "procfs_example"
#define FOOBAR_LEN 8
struct fb_data_t {
char name[FOOBAR_LEN + 1];
char value[FOOBAR_LEN + 1];
};
static struct proc_dir_entry *example_dir, *foo_file,
*bar_file, *jiffies_file, *symlink;
struct fb_data_t foo_data, bar_data;
static int proc_read_jiffies(char *page, char **start,
off_t off, int count,
int *eof, void *data)
{
int len;
len = sprintf(page, "jiffies = %ld\n",
jiffies);
return len;
}
static int proc_read_foobar(char *page, char **start,
off_t off, int count,
int *eof, void *data)
{
int len;
struct fb_data_t *fb_data = (struct fb_data_t *)data;
/* DON'T DO THAT - buffer overruns are bad */
len = sprintf(page, "%s = '%s'\n",
fb_data->name, fb_data->value);
return len;
}
static int proc_write_foobar(struct file *file,
const char *buffer,
unsigned long count,
void *data)
{
int len;
struct fb_data_t *fb_data = (struct fb_data_t *)data;
if(count > FOOBAR_LEN)
len = FOOBAR_LEN;
else
len = count;
if(copy_from_user(fb_data->value, buffer, len))
return -EFAULT;
fb_data->value[len] = '\0';
return len;
}
static int __init init_procfs_example(void)
{
int rv = 0;
/* create directory */
example_dir = proc_mkdir(MODULE_NAME, NULL);
if(example_dir == NULL) {
rv = -ENOMEM;
goto out;
}
example_dir->owner = THIS_MODULE;
/* create jiffies using convenience function */
jiffies_file = create_proc_read_entry("jiffies",
0444, example_dir,
proc_read_jiffies,
NULL);
if(jiffies_file == NULL) {
rv = -ENOMEM;
goto no_jiffies;
}
jiffies_file->owner = THIS_MODULE;
/* create foo and bar files using same callback
* functions
*/
foo_file = create_proc_entry("foo", 0644, example_dir);
if(foo_file == NULL) {
rv = -ENOMEM;
goto no_foo;
}
strcpy(foo_data.name, "foo");
strcpy(foo_data.value, "foo");
foo_file->data = &foo_data;
foo_file->read_proc = proc_read_foobar;
foo_file->write_proc = proc_write_foobar;
foo_file->owner = THIS_MODULE;
bar_file = create_proc_entry("bar", 0644, example_dir);
if(bar_file == NULL) {
rv = -ENOMEM;
goto no_bar;
}
strcpy(bar_data.name, "bar");
strcpy(bar_data.value, "bar");
bar_file->data = &bar_data;
bar_file->read_proc = proc_read_foobar;
bar_file->write_proc = proc_write_foobar;
bar_file->owner = THIS_MODULE;
/* create symlink */
symlink = proc_symlink("jiffies_too", example_dir,
"jiffies");
if(symlink == NULL) {
rv = -ENOMEM;
goto no_symlink;
}
symlink->owner = THIS_MODULE;
/* everything OK */
printk(KERN_INFO "%s %s initialised\n",
MODULE_NAME, MODULE_VERS);
return 0;
no_symlink:
remove_proc_entry("tty", example_dir);
no_tty:
remove_proc_entry("bar", example_dir);
no_bar:
remove_proc_entry("foo", example_dir);
no_foo:
remove_proc_entry("jiffies", example_dir);
no_jiffies:
remove_proc_entry(MODULE_NAME, NULL);
out:
return rv;
}
static void __exit cleanup_procfs_example(void)
{
remove_proc_entry("jiffies_too", example_dir);
remove_proc_entry("tty", example_dir);
remove_proc_entry("bar", example_dir);
remove_proc_entry("foo", example_dir);
remove_proc_entry("jiffies", example_dir);
remove_proc_entry(MODULE_NAME, NULL);
printk(KERN_INFO "%s %s removed\n",
MODULE_NAME, MODULE_VERS);
}
module_init(init_procfs_example);
module_exit(cleanup_procfs_example);
MODULE_AUTHOR("Erik Mouw");
MODULE_DESCRIPTION("procfs examples");

View file

@ -0,0 +1,193 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="scsidrivers">
<bookinfo>
<title>SCSI Subsystem Interfaces</title>
<authorgroup>
<author>
<firstname>Douglas</firstname>
<surname>Gilbert</surname>
<affiliation>
<address>
<email>dgilbert@interlog.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<pubdate>2003-08-11</pubdate>
<copyright>
<year>2002</year>
<year>2003</year>
<holder>Douglas Gilbert</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
This document outlines the interface between the Linux scsi mid level
and lower level drivers. Lower level drivers are variously called HBA
(host bus adapter) drivers, host drivers (HD) or pseudo adapter drivers.
The latter alludes to the fact that a lower level driver may be a
bridge to another IO subsystem (and the "ide-scsi" driver is an example
of this). There can be many lower level drivers active in a running
system, but only one per hardware type. For example, the aic7xxx driver
controls adaptec controllers based on the 7xxx chip series. Most lower
level drivers can control one or more scsi hosts (a.k.a. scsi initiators).
</para>
<para>
This document can been found in an ASCII text file in the linux kernel
source: <filename>Documentation/scsi/scsi_mid_low_api.txt</filename> .
It currently hold a little more information than this document. The
<filename>drivers/scsi/hosts.h</filename> and <filename>
drivers/scsi/scsi.h</filename> headers contain descriptions of members
of important structures for the scsi subsystem.
</para>
</chapter>
<chapter id="driver-struct">
<title>Driver structure</title>
<para>
Traditionally a lower level driver for the scsi subsystem has been
at least two files in the drivers/scsi directory. For example, a
driver called "xyz" has a header file "xyz.h" and a source file
"xyz.c". [Actually there is no good reason why this couldn't all
be in one file.] Some drivers that have been ported to several operating
systems (e.g. aic7xxx which has separate files for generic and
OS-specific code) have more than two files. Such drivers tend to have
their own directory under the drivers/scsi directory.
</para>
<para>
scsi_module.c is normally included at the end of a lower
level driver. For it to work a declaration like this is needed before
it is included:
<programlisting>
static Scsi_Host_Template driver_template = DRIVER_TEMPLATE;
/* DRIVER_TEMPLATE should contain pointers to supported interface
functions. Scsi_Host_Template is defined hosts.h */
#include "scsi_module.c"
</programlisting>
</para>
<para>
The scsi_module.c assumes the name "driver_template" is appropriately
defined. It contains 2 functions:
<orderedlist>
<listitem><para>
init_this_scsi_driver() called during builtin and module driver
initialization: invokes mid level's scsi_register_host()
</para></listitem>
<listitem><para>
exit_this_scsi_driver() called during closedown: invokes
mid level's scsi_unregister_host()
</para></listitem>
</orderedlist>
</para>
<para>
When a new, lower level driver is being added to Linux, the following
files (all found in the drivers/scsi directory) will need some attention:
Makefile, Config.help and Config.in . It is probably best to look at what
an existing lower level driver does in this regard.
</para>
</chapter>
<chapter id="intfunctions">
<title>Interface Functions</title>
!EDocumentation/scsi/scsi_mid_low_api.txt
</chapter>
<chapter id="locks">
<title>Locks</title>
<para>
Each Scsi_Host instance has a spin_lock called Scsi_Host::default_lock
which is initialized in scsi_register() [found in hosts.c]. Within the
same function the Scsi_Host::host_lock pointer is initialized to point
at default_lock with the scsi_assign_lock() function. Thereafter
lock and unlock operations performed by the mid level use the
Scsi_Host::host_lock pointer.
</para>
<para>
Lower level drivers can override the use of Scsi_Host::default_lock by
using scsi_assign_lock(). The earliest opportunity to do this would
be in the detect() function after it has invoked scsi_register(). It
could be replaced by a coarser grain lock (e.g. per driver) or a
lock of equal granularity (i.e. per host). Using finer grain locks
(e.g. per scsi device) may be possible by juggling locks in
queuecommand().
</para>
</chapter>
<chapter id="changes">
<title>Changes since lk 2.4 series</title>
<para>
io_request_lock has been replaced by several finer grained locks. The lock
relevant to lower level drivers is Scsi_Host::host_lock and there is one
per scsi host.
</para>
<para>
The older error handling mechanism has been removed. This means the
lower level interface functions abort() and reset() have been removed.
</para>
<para>
In the 2.4 series the scsi subsystem configuration descriptions were
aggregated with the configuration descriptions from all other Linux
subsystems in the Documentation/Configure.help file. In the 2.5 series,
the scsi subsystem now has its own (much smaller) drivers/scsi/Config.help
file.
</para>
</chapter>
<chapter id="credits">
<title>Credits</title>
<para>
The following people have contributed to this document:
<orderedlist>
<listitem><para>
Mike Anderson <email>andmike@us.ibm.com</email>
</para></listitem>
<listitem><para>
James Bottomley <email>James.Bottomley@steeleye.com</email>
</para></listitem>
<listitem><para>
Patrick Mansfield <email>patmans@us.ibm.com</email>
</para></listitem>
</orderedlist>
</para>
</chapter>
</book>

View file

@ -0,0 +1,585 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="SiS900Guide">
<bookinfo>
<title>SiS 900/7016 Fast Ethernet Device Driver</title>
<authorgroup>
<author>
<firstname>Ollie</firstname>
<surname>Lho</surname>
</author>
<author>
<firstname>Lei Chun</firstname>
<surname>Chang</surname>
</author>
</authorgroup>
<edition>Document Revision: 0.3 for SiS900 driver v1.06 &amp; v1.07</edition>
<pubdate>November 16, 2000</pubdate>
<copyright>
<year>1999</year>
<holder>Silicon Integrated System Corp.</holder>
</copyright>
<legalnotice>
<para>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
</para>
<para>
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
</para>
</legalnotice>
<abstract>
<para>
This document gives some information on installation and usage of SiS 900/7016
device driver under Linux.
</para>
</abstract>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
This document describes the revision 1.06 and 1.07 of SiS 900/7016 Fast Ethernet
device driver under Linux. The driver is developed by Silicon Integrated
System Corp. and distributed freely under the GNU General Public License (GPL).
The driver can be compiled as a loadable module and used under Linux kernel
version 2.2.x. (rev. 1.06)
With minimal changes, the driver can also be used under 2.3.x and 2.4.x kernel
(rev. 1.07), please see
<xref linkend="install"/>. If you are intended to
use the driver for earlier kernels, you are on your own.
</para>
<para>
The driver is tested with usual TCP/IP applications including
FTP, Telnet, Netscape etc. and is used constantly by the developers.
</para>
<para>
Please send all comments/fixes/questions to
<ulink url="mailto:lcchang@sis.com.tw">Lei-Chun Chang</ulink>.
</para>
</chapter>
<chapter id="changes">
<title>Changes</title>
<para>
Changes made in Revision 1.07
<orderedlist>
<listitem>
<para>
Separation of sis900.c and sis900.h in order to move most
constant definition to sis900.h (many of those constants were
corrected)
</para>
</listitem>
<listitem>
<para>
Clean up PCI detection, the pci-scan from Donald Becker were not used,
just simple pci&lowbar;find&lowbar;*.
</para>
</listitem>
<listitem>
<para>
MII detection is modified to support multiple mii transceiver.
</para>
</listitem>
<listitem>
<para>
Bugs in read&lowbar;eeprom, mdio&lowbar;* were removed.
</para>
</listitem>
<listitem>
<para>
Lot of sis900 irrelevant comments were removed/changed and
more comments were added to reflect the real situation.
</para>
</listitem>
<listitem>
<para>
Clean up of physical/virtual address space mess in buffer
descriptors.
</para>
</listitem>
<listitem>
<para>
Better transmit/receive error handling.
</para>
</listitem>
<listitem>
<para>
The driver now uses zero-copy single buffer management
scheme to improve performance.
</para>
</listitem>
<listitem>
<para>
Names of variables were changed to be more consistent.
</para>
</listitem>
<listitem>
<para>
Clean up of auo-negotiation and timer code.
</para>
</listitem>
<listitem>
<para>
Automatic detection and change of PHY on the fly.
</para>
</listitem>
<listitem>
<para>
Bug in mac probing fixed.
</para>
</listitem>
<listitem>
<para>
Fix 630E equalier problem by modifying the equalizer workaround rule.
</para>
</listitem>
<listitem>
<para>
Support for ICS1893 10/100 Interated PHYceiver.
</para>
</listitem>
<listitem>
<para>
Support for media select by ifconfig.
</para>
</listitem>
<listitem>
<para>
Added kernel-doc extratable documentation.
</para>
</listitem>
</orderedlist>
</para>
</chapter>
<chapter id="tested">
<title>Tested Environment</title>
<para>
This driver is developed on the following hardware
<itemizedlist>
<listitem>
<para>
Intel Celeron 500 with SiS 630 (rev 02) chipset
</para>
</listitem>
<listitem>
<para>
SiS 900 (rev 01) and SiS 7016/7014 Fast Ethernet Card
</para>
</listitem>
</itemizedlist>
and tested with these software environments
<itemizedlist>
<listitem>
<para>
Red Hat Linux version 6.2
</para>
</listitem>
<listitem>
<para>
Linux kernel version 2.4.0
</para>
</listitem>
<listitem>
<para>
Netscape version 4.6
</para>
</listitem>
<listitem>
<para>
NcFTP 3.0.0 beta 18
</para>
</listitem>
<listitem>
<para>
Samba version 2.0.3
</para>
</listitem>
</itemizedlist>
</para>
</chapter>
<chapter id="files">
<title>Files in This Package</title>
<para>
In the package you can find these files:
</para>
<para>
<variablelist>
<varlistentry>
<term>sis900.c</term>
<listitem>
<para>
Driver source file in C
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>sis900.h</term>
<listitem>
<para>
Header file for sis900.c
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>sis900.sgml</term>
<listitem>
<para>
DocBook SGML source of the document
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>sis900.txt</term>
<listitem>
<para>
Driver document in plain text
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</chapter>
<chapter id="install">
<title>Installation</title>
<para>
Silicon Integrated System Corp. is cooperating closely with core Linux Kernel
developers. The revisions of SiS 900 driver are distributed by the usuall channels
for kernel tar files and patches. Those kernel tar files for official kernel and
patches for kernel pre-release can be download at
<ulink url="http://ftp.kernel.org/pub/linux/kernel/">official kernel ftp site</ulink>
and its mirrors.
The 1.06 revision can be found in kernel version later than 2.3.15 and pre-2.2.14,
and 1.07 revision can be found in kernel version 2.4.0.
If you have no prior experience in networking under Linux, please read
<ulink url="http://www.tldp.org/">Ethernet HOWTO</ulink> and
<ulink url="http://www.tldp.org/">Networking HOWTO</ulink> available from
Linux Documentation Project (LDP).
</para>
<para>
The driver is bundled in release later than 2.2.11 and 2.3.15 so this
is the most easy case.
Be sure you have the appropriate packages for compiling kernel source.
Those packages are listed in Document/Changes in kernel source
distribution. If you have to install the driver other than those bundled
in kernel release, you should have your driver file
<filename>sis900.c</filename> and <filename>sis900.h</filename>
copied into <filename class="directory">/usr/src/linux/drivers/net/</filename> first.
There are two alternative ways to install the driver
</para>
<sect1>
<title>Building the driver as loadable module</title>
<para>
To build the driver as a loadable kernel module you have to reconfigure
the kernel to activate network support by
</para>
<para><screen>
make menuconfig
</screen></para>
<para>
Choose <quote>Loadable module support ---></quote>,
then select <quote>Enable loadable module support</quote>.
</para>
<para>
Choose <quote>Network Device Support ---></quote>, select
<quote>Ethernet (10 or 100Mbit)</quote>.
Then select <quote>EISA, VLB, PCI and on board controllers</quote>,
and choose <quote>SiS 900/7016 PCI Fast Ethernet Adapter support</quote>
to <quote>M</quote>.
</para>
<para>
After reconfiguring the kernel, you can make the driver module by
</para>
<para><screen>
make modules
</screen></para>
<para>
The driver should be compiled with no errors. After compiling the driver,
the driver can be installed to proper place by
</para>
<para><screen>
make modules_install
</screen></para>
<para>
Load the driver into kernel by
</para>
<para><screen>
insmod sis900
</screen></para>
<para>
When loading the driver into memory, some information message can be view by
</para>
<para>
<screen>
dmesg
</screen>
or
<screen>
cat /var/log/message
</screen>
</para>
<para>
If the driver is loaded properly you will have messages similar to this:
</para>
<para><screen>
sis900.c: v1.07.06 11/07/2000
eth0: SiS 900 PCI Fast Ethernet at 0xd000, IRQ 10, 00:00:e8:83:7f:a4.
eth0: SiS 900 Internal MII PHY transceiver found at address 1.
eth0: Using SiS 900 Internal MII PHY as default
</screen></para>
<para>
showing the version of the driver and the results of probing routine.
</para>
<para>
Once the driver is loaded, network can be brought up by
</para>
<para><screen>
/sbin/ifconfig eth0 IPADDR broadcast BROADCAST netmask NETMASK media TYPE
</screen></para>
<para>
where IPADDR, BROADCAST, NETMASK are your IP address, broadcast address and
netmask respectively. TYPE is used to set medium type used by the device.
Typical values are "10baseT"(twisted-pair 10Mbps Ethernet) or "100baseT"
(twisted-pair 100Mbps Ethernet). For more information on how to configure
network interface, please refer to
<ulink url="http://www.tldp.org/">Networking HOWTO</ulink>.
</para>
<para>
The link status is also shown by kernel messages. For example, after the
network interface is activated, you may have the message:
</para>
<para><screen>
eth0: Media Link On 100mbps full-duplex
</screen></para>
<para>
If you try to unplug the twist pair (TP) cable you will get
</para>
<para><screen>
eth0: Media Link Off
</screen></para>
<para>
indicating that the link is failed.
</para>
</sect1>
<sect1>
<title>Building the driver into kernel</title>
<para>
If you want to make the driver into kernel, choose <quote>Y</quote>
rather than <quote>M</quote> on
<quote>SiS 900/7016 PCI Fast Ethernet Adapter support</quote>
when configuring the kernel. Build the kernel image in the usual way
</para>
<para><screen>
make clean
make bzlilo
</screen></para>
<para>
Next time the system reboot, you have the driver in memory.
</para>
</sect1>
</chapter>
<chapter id="problems">
<title>Known Problems and Bugs</title>
<para>
There are some known problems and bugs. If you find any other bugs please
mail to <ulink url="mailto:lcchang@sis.com.tw">lcchang@sis.com.tw</ulink>
<orderedlist>
<listitem>
<para>
AM79C901 HomePNA PHY is not thoroughly tested, there may be some
bugs in the <quote>on the fly</quote> change of transceiver.
</para>
</listitem>
<listitem>
<para>
A bug is hidden somewhere in the receive buffer management code,
the bug causes NULL pointer reference in the kernel. This fault is
caught before bad things happen and reported with the message:
<computeroutput>
eth0: NULL pointer encountered in Rx ring, skipping
</computeroutput>
which can be viewed with <literal remap="tt">dmesg</literal> or
<literal remap="tt">cat /var/log/message</literal>.
</para>
</listitem>
<listitem>
<para>
The media type change from 10Mbps to 100Mbps twisted-pair ethernet
by ifconfig causes the media link down.
</para>
</listitem>
</orderedlist>
</para>
</chapter>
<chapter id="RHistory">
<title>Revision History</title>
<para>
<itemizedlist>
<listitem>
<para>
November 13, 2000, Revision 1.07, seventh release, 630E problem fixed
and further clean up.
</para>
</listitem>
<listitem>
<para>
November 4, 1999, Revision 1.06, Second release, lots of clean up
and optimization.
</para>
</listitem>
<listitem>
<para>
August 8, 1999, Revision 1.05, Initial Public Release
</para>
</listitem>
</itemizedlist>
</para>
</chapter>
<chapter id="acknowledgements">
<title>Acknowledgements</title>
<para>
This driver was originally derived form
<ulink url="mailto:becker@cesdis1.gsfc.nasa.gov">Donald Becker</ulink>'s
<ulink url="ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/kern-2.3/pci-skeleton.c"
>pci-skeleton</ulink> and
<ulink url="ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/kern-2.3/rtl8139.c"
>rtl8139</ulink> drivers. Donald also provided various suggestion
regarded with improvements made in revision 1.06.
</para>
<para>
The 1.05 revision was created by
<ulink url="mailto:cmhuang@sis.com.tw">Jim Huang</ulink>, AMD 79c901
support was added by <ulink url="mailto:lcs@sis.com.tw">Chin-Shan Li</ulink>.
</para>
</chapter>
<chapter id="functions">
<title>List of Functions</title>
!Idrivers/net/sis900.c
</chapter>
</book>

View file

@ -0,0 +1,327 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="TulipUserGuide">
<bookinfo>
<title>Tulip Driver User's Guide</title>
<authorgroup>
<author>
<firstname>Jeff</firstname>
<surname>Garzik</surname>
<affiliation>
<address>
<email>jgarzik@pobox.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2001</year>
<holder>Jeff Garzik</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
The Tulip Ethernet Card Driver
is maintained by Jeff Garzik (<email>jgarzik@pobox.com</email>).
</para>
<para>
The Tulip driver was developed by Donald Becker and changed by
Jeff Garzik, Takashi Manabe and a cast of thousands.
</para>
<para>
For 2.4.x and later kernels, the Linux Tulip driver is available at
<ulink url="http://sourceforge.net/projects/tulip/">http://sourceforge.net/projects/tulip/</ulink>
</para>
<para>
This driver is for the Digital "Tulip" Ethernet adapter interface.
It should work with most DEC 21*4*-based chips/ethercards, as well as
with work-alike chips from Lite-On (PNIC) and Macronix (MXIC) and ASIX.
</para>
<para>
The original author may be reached as becker@scyld.com, or C/O
Scyld Computing Corporation,
410 Severn Ave., Suite 210,
Annapolis MD 21403
</para>
<para>
Additional information on Donald Becker's tulip.c
is available at <ulink url="http://www.scyld.com/network/tulip.html">http://www.scyld.com/network/tulip.html</ulink>
</para>
</chapter>
<chapter id="drvr-compat">
<title>Driver Compatibility</title>
<para>
This device driver is designed for the DECchip "Tulip", Digital's
single-chip ethernet controllers for PCI (now owned by Intel).
Supported members of the family
are the 21040, 21041, 21140, 21140A, 21142, and 21143. Similar work-alike
chips from Lite-On, Macronics, ASIX, Compex and other listed below are also
supported.
</para>
<para>
These chips are used on at least 140 unique PCI board designs. The great
number of chips and board designs supported is the reason for the
driver size and complexity. Almost of the increasing complexity is in the
board configuration and media selection code. There is very little
increasing in the operational critical path length.
</para>
</chapter>
<chapter id="board-settings">
<title>Board-specific Settings</title>
<para>
PCI bus devices are configured by the system at boot time, so no jumpers
need to be set on the board. The system BIOS preferably should assign the
PCI INTA signal to an otherwise unused system IRQ line.
</para>
<para>
Some boards have EEPROMs tables with default media entry. The factory default
is usually "autoselect". This should only be overridden when using
transceiver connections without link beat e.g. 10base2 or AUI, or (rarely!)
for forcing full-duplex when used with old link partners that do not do
autonegotiation.
</para>
</chapter>
<chapter id="driver-operation">
<title>Driver Operation</title>
<sect1><title>Ring buffers</title>
<para>
The Tulip can use either ring buffers or lists of Tx and Rx descriptors.
This driver uses statically allocated rings of Rx and Tx descriptors, set at
compile time by RX/TX_RING_SIZE. This version of the driver allocates skbuffs
for the Rx ring buffers at open() time and passes the skb->data field to the
Tulip as receive data buffers. When an incoming frame is less than
RX_COPYBREAK bytes long, a fresh skbuff is allocated and the frame is
copied to the new skbuff. When the incoming frame is larger, the skbuff is
passed directly up the protocol stack and replaced by a newly allocated
skbuff.
</para>
<para>
The RX_COPYBREAK value is chosen to trade-off the memory wasted by
using a full-sized skbuff for small frames vs. the copying costs of larger
frames. For small frames the copying cost is negligible (esp. considering
that we are pre-loading the cache with immediately useful header
information). For large frames the copying cost is non-trivial, and the
larger copy might flush the cache of useful data. A subtle aspect of this
choice is that the Tulip only receives into longword aligned buffers, thus
the IP header at offset 14 isn't longword aligned for further processing.
Copied frames are put into the new skbuff at an offset of "+2", thus copying
has the beneficial effect of aligning the IP header and preloading the
cache.
</para>
</sect1>
<sect1><title>Synchronization</title>
<para>
The driver runs as two independent, single-threaded flows of control. One
is the send-packet routine, which enforces single-threaded use by the
dev->tbusy flag. The other thread is the interrupt handler, which is single
threaded by the hardware and other software.
</para>
<para>
The send packet thread has partial control over the Tx ring and 'dev->tbusy'
flag. It sets the tbusy flag whenever it's queuing a Tx packet. If the next
queue slot is empty, it clears the tbusy flag when finished otherwise it sets
the 'tp->tx_full' flag.
</para>
<para>
The interrupt handler has exclusive control over the Rx ring and records stats
from the Tx ring. (The Tx-done interrupt can't be selectively turned off, so
we can't avoid the interrupt overhead by having the Tx routine reap the Tx
stats.) After reaping the stats, it marks the queue entry as empty by setting
the 'base' to zero. Iff the 'tp->tx_full' flag is set, it clears both the
tx_full and tbusy flags.
</para>
</sect1>
</chapter>
<chapter id="errata">
<title>Errata</title>
<para>
The old DEC databooks were light on details.
The 21040 databook claims that CSR13, CSR14, and CSR15 should each be the last
register of the set CSR12-15 written. Hmmm, now how is that possible?
</para>
<para>
The DEC SROM format is very badly designed not precisely defined, leading to
part of the media selection junkheap below. Some boards do not have EEPROM
media tables and need to be patched up. Worse, other boards use the DEC
design kit media table when it isn't correct for their board.
</para>
<para>
We cannot use MII interrupts because there is no defined GPIO pin to attach
them. The MII transceiver status is polled using an kernel timer.
</para>
</chapter>
<chapter id="changelog">
<title>Driver Change History</title>
<sect1><title>Version 0.9.14 (February 20, 2001)</title>
<itemizedlist>
<listitem><para>Fix PNIC problems (Manfred Spraul)</para></listitem>
<listitem><para>Add new PCI id for Accton comet</para></listitem>
<listitem><para>Support Davicom tulips</para></listitem>
<listitem><para>Fix oops in eeprom parsing</para></listitem>
<listitem><para>Enable workarounds for early PCI chipsets</para></listitem>
<listitem><para>IA64, hppa csr0 support</para></listitem>
<listitem><para>Support media types 5, 6</para></listitem>
<listitem><para>Interpret a bit more of the 21142 SROM extended media type 3</para></listitem>
<listitem><para>Add missing delay in eeprom reading</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.11 (November 3, 2000)</title>
<itemizedlist>
<listitem><para>Eliminate extra bus accesses when sharing interrupts (prumpf)</para></listitem>
<listitem><para>Barrier following ownership descriptor bit flip (prumpf)</para></listitem>
<listitem><para>Endianness fixes for >14 addresses in setup frames (prumpf)</para></listitem>
<listitem><para>Report link beat to kernel/userspace via netif_carrier_*. (kuznet)</para></listitem>
<listitem><para>Better spinlocking in set_rx_mode.</para></listitem>
<listitem><para>Fix I/O resource request failure error messages (DaveM catch)</para></listitem>
<listitem><para>Handle DMA allocation failure.</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.10 (September 6, 2000)</title>
<itemizedlist>
<listitem><para>Simple interrupt mitigation (via jamal)</para></listitem>
<listitem><para>More PCI ids</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.9 (August 11, 2000)</title>
<itemizedlist>
<listitem><para>More PCI ids</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.8 (July 13, 2000)</title>
<itemizedlist>
<listitem><para>Correct signed/unsigned comparison for dummy frame index</para></listitem>
<listitem><para>Remove outdated references to struct enet_statistics</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.7 (June 17, 2000)</title>
<itemizedlist>
<listitem><para>Timer cleanups (Andrew Morton)</para></listitem>
<listitem><para>Alpha compile fix (somebody?)</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.6 (May 31, 2000)</title>
<itemizedlist>
<listitem><para>Revert 21143-related support flag patch</para></listitem>
<listitem><para>Add HPPA/media-table debugging printk</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.5 (May 30, 2000)</title>
<itemizedlist>
<listitem><para>HPPA support (willy@puffingroup)</para></listitem>
<listitem><para>CSR6 bits and tulip.h cleanup (Chris Smith)</para></listitem>
<listitem><para>Improve debugging messages a bit</para></listitem>
<listitem><para>Add delay after CSR13 write in t21142_start_nway</para></listitem>
<listitem><para>Remove unused ETHER_STATS code</para></listitem>
<listitem><para>Convert 'extern inline' to 'static inline' in tulip.h (Chris Smith)</para></listitem>
<listitem><para>Update DS21143 support flags in tulip_chip_info[]</para></listitem>
<listitem><para>Use spin_lock_irq, not _irqsave/restore, in tulip_start_xmit()</para></listitem>
<listitem><para>Add locking to set_rx_mode()</para></listitem>
<listitem><para>Fix race with chip setting DescOwned bit (Hal Murray)</para></listitem>
<listitem><para>Request 100% of PIO and MMIO resource space assigned to card</para></listitem>
<listitem><para>Remove error message from pci_enable_device failure</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.4.3 (April 14, 2000)</title>
<itemizedlist>
<listitem><para>mod_timer fix (Hal Murray)</para></listitem>
<listitem><para>PNIC2 resuscitation (Chris Smith)</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.4.2 (March 21, 2000)</title>
<itemizedlist>
<listitem><para>Fix 21041 CSR7, CSR13/14/15 handling</para></listitem>
<listitem><para>Merge some PCI ids from tulip 0.91x</para></listitem>
<listitem><para>Merge some HAS_xxx flags and flag settings from tulip 0.91x</para></listitem>
<listitem><para>asm/io.h fix (submitted by many) and cleanup</para></listitem>
<listitem><para>s/HAS_NWAY143/HAS_NWAY/</para></listitem>
<listitem><para>Cleanup 21041 mode reporting</para></listitem>
<listitem><para>Small code cleanups</para></listitem>
</itemizedlist>
</sect1>
<sect1><title>Version 0.9.4.1 (March 18, 2000)</title>
<itemizedlist>
<listitem><para>Finish PCI DMA conversion (davem)</para></listitem>
<listitem><para>Do not netif_start_queue() at end of tulip_tx_timeout() (kuznet)</para></listitem>
<listitem><para>PCI DMA fix (kuznet)</para></listitem>
<listitem><para>eeprom.c code cleanup</para></listitem>
<listitem><para>Remove Xircom Tulip crud</para></listitem>
</itemizedlist>
</sect1>
</chapter>
</book>

View file

@ -0,0 +1,979 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Linux-USB-API">
<bookinfo>
<title>The Linux-USB Host Side API</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction to USB on Linux</title>
<para>A Universal Serial Bus (USB) is used to connect a host,
such as a PC or workstation, to a number of peripheral
devices. USB uses a tree structure, with the host at the
root (the system's master), hubs as interior nodes, and
peripheral devices as leaves (and slaves).
Modern PCs support several such trees of USB devices, usually
one USB 2.0 tree (480 Mbit/sec each) with
a few USB 1.1 trees (12 Mbit/sec each) that are used when you
connect a USB 1.1 device directly to the machine's "root hub".
</para>
<para>That master/slave asymmetry was designed in part for
ease of use. It is not physically possible to assemble
(legal) USB cables incorrectly: all upstream "to-the-host"
connectors are the rectangular type, matching the sockets on
root hubs, and the downstream type are the squarish type
(or they are built in to the peripheral).
Software doesn't need to deal with distributed autoconfiguration
since the pre-designated master node manages all that.
At the electrical level, bus protocol overhead is reduced by
eliminating arbitration and moving scheduling into host software.
</para>
<para>USB 1.0 was announced in January 1996, and was revised
as USB 1.1 (with improvements in hub specification and
support for interrupt-out transfers) in September 1998.
USB 2.0 was released in April 2000, including high speed
transfers and transaction translating hubs (used for USB 1.1
and 1.0 backward compatibility).
</para>
<para>USB support was added to Linux early in the 2.2 kernel series
shortly before the 2.3 development forked off. Updates
from 2.3 were regularly folded back into 2.2 releases, bringing
new features such as <filename>/sbin/hotplug</filename> support,
more drivers, and more robustness.
The 2.5 kernel series continued such improvements, and also
worked on USB 2.0 support,
higher performance,
better consistency between host controller drivers,
API simplification (to make bugs less likely),
and providing internal "kerneldoc" documentation.
</para>
<para>Linux can run inside USB devices as well as on
the hosts that control the devices.
Because the Linux 2.x USB support evolved to support mass market
platforms such as Apple Macintosh or PC-compatible systems,
it didn't address design concerns for those types of USB systems.
So it can't be used inside mass-market PDAs, or other peripherals.
USB device drivers running inside those Linux peripherals
don't do the same things as the ones running inside hosts,
and so they've been given a different name:
they're called <emphasis>gadget drivers</emphasis>.
This document does not present gadget drivers.
</para>
</chapter>
<chapter id="host">
<title>USB Host-Side API Model</title>
<para>Within the kernel,
host-side drivers for USB devices talk to the "usbcore" APIs.
There are two types of public "usbcore" APIs, targetted at two different
layers of USB driver. Those are
<emphasis>general purpose</emphasis> drivers, exposed through
driver frameworks such as block, character, or network devices;
and drivers that are <emphasis>part of the core</emphasis>,
which are involved in managing a USB bus.
Such core drivers include the <emphasis>hub</emphasis> driver,
which manages trees of USB devices, and several different kinds
of <emphasis>host controller driver (HCD)</emphasis>,
which control individual busses.
</para>
<para>The device model seen by USB drivers is relatively complex.
</para>
<itemizedlist>
<listitem><para>USB supports four kinds of data transfer
(control, bulk, interrupt, and isochronous). Two transfer
types use bandwidth as it's available (control and bulk),
while the other two types of transfer (interrupt and isochronous)
are scheduled to provide guaranteed bandwidth.
</para></listitem>
<listitem><para>The device description model includes one or more
"configurations" per device, only one of which is active at a time.
Devices that are capable of high speed operation must also support
full speed configurations, along with a way to ask about the
"other speed" configurations that might be used.
</para></listitem>
<listitem><para>Configurations have one or more "interface", each
of which may have "alternate settings". Interfaces may be
standardized by USB "Class" specifications, or may be specific to
a vendor or device.</para>
<para>USB device drivers actually bind to interfaces, not devices.
Think of them as "interface drivers", though you
may not see many devices where the distinction is important.
<emphasis>Most USB devices are simple, with only one configuration,
one interface, and one alternate setting.</emphasis>
</para></listitem>
<listitem><para>Interfaces have one or more "endpoints", each of
which supports one type and direction of data transfer such as
"bulk out" or "interrupt in". The entire configuration may have
up to sixteen endpoints in each direction, allocated as needed
among all the interfaces.
</para></listitem>
<listitem><para>Data transfer on USB is packetized; each endpoint
has a maximum packet size.
Drivers must often be aware of conventions such as flagging the end
of bulk transfers using "short" (including zero length) packets.
</para></listitem>
<listitem><para>The Linux USB API supports synchronous calls for
control and bulk messaging.
It also supports asynchnous calls for all kinds of data transfer,
using request structures called "URBs" (USB Request Blocks).
</para></listitem>
</itemizedlist>
<para>Accordingly, the USB Core API exposed to device drivers
covers quite a lot of territory. You'll probably need to consult
the USB 2.0 specification, available online from www.usb.org at
no cost, as well as class or device specifications.
</para>
<para>The only host-side drivers that actually touch hardware
(reading/writing registers, handling IRQs, and so on) are the HCDs.
In theory, all HCDs provide the same functionality through the same
API. In practice, that's becoming more true on the 2.5 kernels,
but there are still differences that crop up especially with
fault handling. Different controllers don't necessarily report
the same aspects of failures, and recovery from faults (including
software-induced ones like unlinking an URB) isn't yet fully
consistent.
Device driver authors should make a point of doing disconnect
testing (while the device is active) with each different host
controller driver, to make sure drivers don't have bugs of
their own as well as to make sure they aren't relying on some
HCD-specific behavior.
(You will need external USB 1.1 and/or
USB 2.0 hubs to perform all those tests.)
</para>
</chapter>
<chapter><title>USB-Standard Types</title>
<para>In <filename>&lt;linux/usb_ch9.h&gt;</filename> you will find
the USB data types defined in chapter 9 of the USB specification.
These data types are used throughout USB, and in APIs including
this host side API, gadget APIs, and usbfs.
</para>
!Iinclude/linux/usb_ch9.h
</chapter>
<chapter><title>Host-Side Data Types and Macros</title>
<para>The host side API exposes several layers to drivers, some of
which are more necessary than others.
These support lifecycle models for host side drivers
and devices, and support passing buffers through usbcore to
some HCD that performs the I/O for the device driver.
</para>
!Iinclude/linux/usb.h
</chapter>
<chapter><title>USB Core APIs</title>
<para>There are two basic I/O models in the USB API.
The most elemental one is asynchronous: drivers submit requests
in the form of an URB, and the URB's completion callback
handle the next step.
All USB transfer types support that model, although there
are special cases for control URBs (which always have setup
and status stages, but may not have a data stage) and
isochronous URBs (which allow large packets and include
per-packet fault reports).
Built on top of that is synchronous API support, where a
driver calls a routine that allocates one or more URBs,
submits them, and waits until they complete.
There are synchronous wrappers for single-buffer control
and bulk transfers (which are awkward to use in some
driver disconnect scenarios), and for scatterlist based
streaming i/o (bulk or interrupt).
</para>
<para>USB drivers need to provide buffers that can be
used for DMA, although they don't necessarily need to
provide the DMA mapping themselves.
There are APIs to use used when allocating DMA buffers,
which can prevent use of bounce buffers on some systems.
In some cases, drivers may be able to rely on 64bit DMA
to eliminate another kind of bounce buffer.
</para>
!Edrivers/usb/core/urb.c
!Edrivers/usb/core/message.c
!Edrivers/usb/core/file.c
!Edrivers/usb/core/usb.c
!Edrivers/usb/core/hub.c
</chapter>
<chapter><title>Host Controller APIs</title>
<para>These APIs are only for use by host controller drivers,
most of which implement standard register interfaces such as
EHCI, OHCI, or UHCI.
UHCI was one of the first interfaces, designed by Intel and
also used by VIA; it doesn't do much in hardware.
OHCI was designed later, to have the hardware do more work
(bigger transfers, tracking protocol state, and so on).
EHCI was designed with USB 2.0; its design has features that
resemble OHCI (hardware does much more work) as well as
UHCI (some parts of ISO support, TD list processing).
</para>
<para>There are host controllers other than the "big three",
although most PCI based controllers (and a few non-PCI based
ones) use one of those interfaces.
Not all host controllers use DMA; some use PIO, and there
is also a simulator.
</para>
<para>The same basic APIs are available to drivers for all
those controllers.
For historical reasons they are in two layers:
<structname>struct usb_bus</structname> is a rather thin
layer that became available in the 2.2 kernels, while
<structname>struct usb_hcd</structname> is a more featureful
layer (available in later 2.4 kernels and in 2.5) that
lets HCDs share common code, to shrink driver size
and significantly reduce hcd-specific behaviors.
</para>
!Edrivers/usb/core/hcd.c
!Edrivers/usb/core/hcd-pci.c
!Edrivers/usb/core/buffer.c
</chapter>
<chapter>
<title>The USB Filesystem (usbfs)</title>
<para>This chapter presents the Linux <emphasis>usbfs</emphasis>.
You may prefer to avoid writing new kernel code for your
USB driver; that's the problem that usbfs set out to solve.
User mode device drivers are usually packaged as applications
or libraries, and may use usbfs through some programming library
that wraps it. Such libraries include
<ulink url="http://libusb.sourceforge.net">libusb</ulink>
for C/C++, and
<ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java.
</para>
<note><title>Unfinished</title>
<para>This particular documentation is incomplete,
especially with respect to the asynchronous mode.
As of kernel 2.5.66 the code and this (new) documentation
need to be cross-reviewed.
</para>
</note>
<para>Configure usbfs into Linux kernels by enabling the
<emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS),
and you get basic support for user mode USB device drivers.
Until relatively recently it was often (confusingly) called
<emphasis>usbdevfs</emphasis> although it wasn't solving what
<emphasis>devfs</emphasis> was.
Every USB device will appear in usbfs, regardless of whether or
not it has a kernel driver; but only devices with kernel drivers
show up in devfs.
</para>
<sect1>
<title>What files are in "usbfs"?</title>
<para>Conventionally mounted at
<filename>/proc/bus/usb</filename>, usbfs
features include:
<itemizedlist>
<listitem><para><filename>/proc/bus/usb/devices</filename>
... a text file
showing each of the USB devices on known to the kernel,
and their configuration descriptors.
You can also poll() this to learn about new devices.
</para></listitem>
<listitem><para><filename>/proc/bus/usb/BBB/DDD</filename>
... magic files
exposing the each device's configuration descriptors, and
supporting a series of ioctls for making device requests,
including I/O to devices. (Purely for access by programs.)
</para></listitem>
</itemizedlist>
</para>
<para> Each bus is given a number (BBB) based on when it was
enumerated; within each bus, each device is given a similar
number (DDD).
Those BBB/DDD paths are not "stable" identifiers;
expect them to change even if you always leave the devices
plugged in to the same hub port.
<emphasis>Don't even think of saving these in application
configuration files.</emphasis>
Stable identifiers are available, for user mode applications
that want to use them. HID and networking devices expose
these stable IDs, so that for example you can be sure that
you told the right UPS to power down its second server.
"usbfs" doesn't (yet) expose those IDs.
</para>
</sect1>
<sect1>
<title>Mounting and Access Control</title>
<para>There are a number of mount options for usbfs, which will
be of most interest to you if you need to override the default
access control policy.
That policy is that only root may read or write device files
(<filename>/proc/bus/BBB/DDD</filename>) although anyone may read
the <filename>devices</filename>
or <filename>drivers</filename> files.
I/O requests to the device also need the CAP_SYS_RAWIO capability,
</para>
<para>The significance of that is that by default, all user mode
device drivers need super-user privileges.
You can change modes or ownership in a driver setup
when the device hotplugs, or maye just start the
driver right then, as a privileged server (or some activity
within one).
That's the most secure approach for multi-user systems,
but for single user systems ("trusted" by that user)
it's more convenient just to grant everyone all access
(using the <emphasis>devmode=0666</emphasis> option)
so the driver can start whenever it's needed.
</para>
<para>The mount options for usbfs, usable in /etc/fstab or
in command line invocations of <emphasis>mount</emphasis>, are:
<variablelist>
<varlistentry>
<term><emphasis>busgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/BBB
directories. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>busmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/BBB
directories. (Default: 0555)
</para></listitem></varlistentry>
<varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/BBB
directories. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0644)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/devices and drivers files.
(Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/devices and drivers files.
(Default: 0444)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/devices and drivers files.
(Default: 0)</para></listitem></varlistentry>
</variablelist>
</para>
<para>Note that many Linux distributions hard-wire the mount options
for usbfs in their init scripts, such as
<filename>/etc/rc.d/rc.sysinit</filename>,
rather than making it easy to set this per-system
policy in <filename>/etc/fstab</filename>.
</para>
</sect1>
<sect1>
<title>/proc/bus/usb/devices</title>
<para>This file is handy for status viewing tools in user
mode, which can scan the text format and ignore most of it.
More detailed device status (including class and vendor
status) is available from device-specific files.
For information about the current format of this file,
see the
<filename>Documentation/usb/proc_usb_info.txt</filename>
file in your Linux kernel sources.
</para>
<para>Otherwise the main use for this file from programs
is to poll() it to get notifications of usb devices
as they're plugged or unplugged.
To see what changed, you'd need to read the file and
compare "before" and "after" contents, scan the filesystem,
or see its hotplug event.
</para>
</sect1>
<sect1>
<title>/proc/bus/usb/BBB/DDD</title>
<para>Use these files in one of these basic ways:
</para>
<para><emphasis>They can be read,</emphasis>
producing first the device descriptor
(18 bytes) and then the descriptors for the current configuration.
See the USB 2.0 spec for details about those binary data formats.
You'll need to convert most multibyte values from little endian
format to your native host byte order, although a few of the
fields in the device descriptor (both of the BCD-encoded fields,
and the vendor and product IDs) will be byteswapped for you.
Note that configuration descriptors include descriptors for
interfaces, altsettings, endpoints, and maybe additional
class descriptors.
</para>
<para><emphasis>Perform USB operations</emphasis> using
<emphasis>ioctl()</emphasis> requests to make endpoint I/O
requests (synchronously or asynchronously) or manage
the device.
These requests need the CAP_SYS_RAWIO capability,
as well as filesystem access permissions.
Only one ioctl request can be made on one of these
device files at a time.
This means that if you are synchronously reading an endpoint
from one thread, you won't be able to write to a different
endpoint from another thread until the read completes.
This works for <emphasis>half duplex</emphasis> protocols,
but otherwise you'd use asynchronous i/o requests.
</para>
</sect1>
<sect1>
<title>Life Cycle of User Mode Drivers</title>
<para>Such a driver first needs to find a device file
for a device it knows how to handle.
Maybe it was told about it because a
<filename>/sbin/hotplug</filename> event handling agent
chose that driver to handle the new device.
Or maybe it's an application that scans all the
/proc/bus/usb device files, and ignores most devices.
In either case, it should <function>read()</function> all
the descriptors from the device file,
and check them against what it knows how to handle.
It might just reject everything except a particular
vendor and product ID, or need a more complex policy.
</para>
<para>Never assume there will only be one such device
on the system at a time!
If your code can't handle more than one device at
a time, at least detect when there's more than one, and
have your users choose which device to use.
</para>
<para>Once your user mode driver knows what device to use,
it interacts with it in either of two styles.
The simple style is to make only control requests; some
devices don't need more complex interactions than those.
(An example might be software using vendor-specific control
requests for some initialization or configuration tasks,
with a kernel driver for the rest.)
</para>
<para>More likely, you need a more complex style driver:
one using non-control endpoints, reading or writing data
and claiming exclusive use of an interface.
<emphasis>Bulk</emphasis> transfers are easiest to use,
but only their sibling <emphasis>interrupt</emphasis> transfers
work with low speed devices.
Both interrupt and <emphasis>isochronous</emphasis> transfers
offer service guarantees because their bandwidth is reserved.
Such "periodic" transfers are awkward to use through usbfs,
unless you're using the asynchronous calls. However, interrupt
transfers can also be used in a synchronous "one shot" style.
</para>
<para>Your user-mode driver should never need to worry
about cleaning up request state when the device is
disconnected, although it should close its open file
descriptors as soon as it starts seeing the ENODEV
errors.
</para>
</sect1>
<sect1><title>The ioctl() Requests</title>
<para>To use these ioctls, you need to include the following
headers in your userspace program:
<programlisting>#include &lt;linux/usb.h&gt;
#include &lt;linux/usbdevice_fs.h&gt;
#include &lt;asm/byteorder.h&gt;</programlisting>
The standard USB device model requests, from "Chapter 9&#