aha/crypto/Kconfig

620 lines
17 KiB
Text
Raw Normal View History

#
# Generic algorithms support
#
config XOR_BLOCKS
tristate
#
async_tx: add the async_tx api The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_*' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor * async_<operation>(..., struct dma_async_tx_descriptor *depend_tx, dma_async_tx_callback cb_fn, void *cb_param) { struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>); struct dma_device *device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor *tx = device ? device->device_prep_dma_<operation>(chan, len, int_en) : NULL; if (tx) { /* run <operation> asynchronously */ ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { /* run <operation> synchronously */ ... <operation> ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
2007-01-02 18:10:44 +00:00
# async_tx api: hardware offloaded memory transfer/transform support
#
async_tx: add the async_tx api The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_*' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor * async_<operation>(..., struct dma_async_tx_descriptor *depend_tx, dma_async_tx_callback cb_fn, void *cb_param) { struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>); struct dma_device *device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor *tx = device ? device->device_prep_dma_<operation>(chan, len, int_en) : NULL; if (tx) { /* run <operation> asynchronously */ ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { /* run <operation> synchronously */ ... <operation> ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
2007-01-02 18:10:44 +00:00
source "crypto/async_tx/Kconfig"
async_tx: add the async_tx api The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_*' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor * async_<operation>(..., struct dma_async_tx_descriptor *depend_tx, dma_async_tx_callback cb_fn, void *cb_param) { struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>); struct dma_device *device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor *tx = device ? device->device_prep_dma_<operation>(chan, len, int_en) : NULL; if (tx) { /* run <operation> asynchronously */ ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { /* run <operation> synchronously */ ... <operation> ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
2007-01-02 18:10:44 +00:00
#
# Cryptographic API Configuration
#
menuconfig CRYPTO
tristate "Cryptographic API"
help
This option provides the core Cryptographic API.
if CRYPTO
comment "Crypto core or helper"
config CRYPTO_ALGAPI
tristate
help
This option provides the API for cryptographic algorithms.
config CRYPTO_AEAD
tristate
select CRYPTO_ALGAPI
config CRYPTO_BLKCIPHER
tristate
select CRYPTO_ALGAPI
config CRYPTO_HASH
tristate
select CRYPTO_ALGAPI
config CRYPTO_MANAGER
tristate "Cryptographic algorithm manager"
select CRYPTO_ALGAPI
help
Create default cryptographic template instantiations such as
cbc(aes).
config CRYPTO_GF128MUL
tristate "GF(2^128) multiplication functions (EXPERIMENTAL)"
depends on EXPERIMENTAL
help
Efficient table driven implementation of multiplications in the
field GF(2^128). This is needed by some cypher modes. This
option will be selected automatically if you select such a
cipher mode. Only select this option by hand if you expect to load
an external module that requires these functions.
config CRYPTO_NULL
tristate "Null algorithms"
select CRYPTO_ALGAPI
select CRYPTO_BLKCIPHER
help
These are 'Null' algorithms, used by IPsec, which do nothing.
config CRYPTO_CRYPTD
tristate "Software async crypto daemon"
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
help
This is a generic software asynchronous crypto daemon that
converts an arbitrary synchronous software crypto algorithm
into an asynchronous algorithm that executes in a kernel thread.
config CRYPTO_AUTHENC
tristate "Authenc support"
select CRYPTO_AEAD
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
select CRYPTO_HASH
help
Authenc: Combined mode wrapper for IPsec.
This is required for IPSec.
config CRYPTO_TEST
tristate "Testing module"
depends on m
select CRYPTO_ALGAPI
select CRYPTO_AEAD
select CRYPTO_BLKCIPHER
help
Quick & dirty crypto test module.
comment "Authenticated Encryption with Associated Data"
config CRYPTO_CCM
tristate "CCM support"
select CRYPTO_CTR
select CRYPTO_AEAD
help
Support for Counter with CBC MAC. Required for IPsec.
config CRYPTO_GCM
tristate "GCM/GMAC support"
select CRYPTO_CTR
select CRYPTO_AEAD
select CRYPTO_GF128MUL
help
Support for Galois/Counter Mode (GCM) and Galois Message
Authentication Code (GMAC). Required for IPSec.
config CRYPTO_SEQIV
tristate "Sequence Number IV Generator"
select CRYPTO_AEAD
select CRYPTO_BLKCIPHER
help
This IV generator generates an IV based on a sequence number by
xoring it with a salt. This algorithm is mainly useful for CTR
comment "Block modes"
config CRYPTO_CBC
tristate "CBC support"
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
help
CBC: Cipher Block Chaining mode
This block cipher algorithm is required for IPSec.
config CRYPTO_CTR
tristate "CTR support"
select CRYPTO_BLKCIPHER
select CRYPTO_SEQIV
select CRYPTO_MANAGER
help
CTR: Counter mode
This block cipher algorithm is required for IPSec.
config CRYPTO_CTS
tristate "CTS support"
select CRYPTO_BLKCIPHER
help
CTS: Cipher Text Stealing
This is the Cipher Text Stealing mode as described by
Section 8 of rfc2040 and referenced by rfc3962.
(rfc3962 includes errata information in its Appendix A)
This mode is required for Kerberos gss mechanism support
for AES encryption.
config CRYPTO_ECB
tristate "ECB support"
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
help
ECB: Electronic CodeBook mode
This is the simplest block cipher algorithm. It simply encrypts
the input block by block.
config CRYPTO_LRW
tristate "LRW support (EXPERIMENTAL)"
depends on EXPERIMENTAL
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
select CRYPTO_GF128MUL
help
LRW: Liskov Rivest Wagner, a tweakable, non malleable, non movable
narrow block cipher mode for dm-crypt. Use it with cipher
specification string aes-lrw-benbi, the key must be 256, 320 or 384.
The first 128, 192 or 256 bits in the key are used for AES and the
rest is used to tie each cipher block to its logical position.
config CRYPTO_PCBC
tristate "PCBC support"
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
help
PCBC: Propagating Cipher Block Chaining mode
This block cipher algorithm is required for RxRPC.
config CRYPTO_XTS
tristate "XTS support (EXPERIMENTAL)"
depends on EXPERIMENTAL
select CRYPTO_BLKCIPHER
select CRYPTO_MANAGER
select CRYPTO_GF128MUL
help
XTS: IEEE1619/D16 narrow block cipher use with aes-xts-plain,
key size 256, 384 or 512 bits. This implementation currently
can't handle a sectorsize which is not a multiple of 16 bytes.
comment "Hash modes"
config CRYPTO_HMAC
tristate "HMAC support"
select CRYPTO_HASH
select CRYPTO_MANAGER
help
HMAC: Keyed-Hashing for Message Authentication (RFC2104).
This is required for IPSec.
config CRYPTO_XCBC
tristate "XCBC support"
depends on EXPERIMENTAL
select CRYPTO_HASH
select CRYPTO_MANAGER
help
XCBC: Keyed-Hashing with encryption algorithm
http://www.ietf.org/rfc/rfc3566.txt
http://csrc.nist.gov/encryption/modes/proposedmodes/
xcbc-mac/xcbc-mac-spec.pdf
comment "Digest"
config CRYPTO_CRC32C
tristate "CRC32c CRC algorithm"
select CRYPTO_ALGAPI
select LIBCRC32C
help
Castagnoli, et al Cyclic Redundancy-Check Algorithm. Used
by iSCSI for header and data digests and by others.
See Castagnoli93. This implementation uses lib/libcrc32c.
Module will be crc32c.
config CRYPTO_MD4
tristate "MD4 digest algorithm"
select CRYPTO_ALGAPI
help
MD4 message digest algorithm (RFC1320).
config CRYPTO_MD5
tristate "MD5 digest algorithm"
select CRYPTO_ALGAPI
help
MD5 message digest algorithm (RFC1321).
config CRYPTO_MICHAEL_MIC
tristate "Michael MIC keyed digest algorithm"
select CRYPTO_ALGAPI
help
Michael MIC is used for message integrity protection in TKIP
(IEEE 802.11i). This algorithm is required for TKIP, but it
should not be used for other purposes because of the weakness
of the algorithm.
config CRYPTO_SHA1
tristate "SHA1 digest algorithm"
select CRYPTO_ALGAPI
help
SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2).
config CRYPTO_SHA256
tristate "SHA224 and SHA256 digest algorithm"
select CRYPTO_ALGAPI
help
SHA256 secure hash standard (DFIPS 180-2).
This version of SHA implements a 256 bit hash with 128 bits of
security against collision attacks.
This code also includes SHA-224, a 224 bit hash with 112 bits
of security against collision attacks.
config CRYPTO_SHA512
tristate "SHA384 and SHA512 digest algorithms"
select CRYPTO_ALGAPI
help
SHA512 secure hash standard (DFIPS 180-2).
This version of SHA implements a 512 bit hash with 256 bits of
security against collision attacks.
This code also includes SHA-384, a 384 bit hash with 192 bits
of security against collision attacks.
config CRYPTO_TGR192
tristate "Tiger digest algorithms"
select CRYPTO_ALGAPI
help
Tiger hash algorithm 192, 160 and 128-bit hashes
Tiger is a hash function optimized for 64-bit processors while
still having decent performance on 32-bit processors.
Tiger was developed by Ross Anderson and Eli Biham.
See also:
<http://www.cs.technion.ac.il/~biham/Reports/Tiger/>.
config CRYPTO_WP512
tristate "Whirlpool digest algorithms"
select CRYPTO_ALGAPI
help
Whirlpool hash algorithm 512, 384 and 256-bit hashes
Whirlpool-512 is part of the NESSIE cryptographic primitives.
Whirlpool will be part of the ISO/IEC 10118-3:2003(E) standard
See also:
<http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html>
comment "Ciphers"
config CRYPTO_AES
tristate "AES cipher algorithms"
select CRYPTO_ALGAPI
help
AES cipher algorithms (FIPS-197). AES uses the Rijndael
algorithm.
Rijndael appears to be consistently a very good performer in
both hardware and software across a wide range of computing
environments regardless of its use in feedback or non-feedback
modes. Its key setup time is excellent, and its key agility is
good. Rijndael's very low memory requirements make it very well
suited for restricted-space environments, in which it also
demonstrates excellent performance. Rijndael's operations are
among the easiest to defend against power and timing attacks.
The AES specifies three key sizes: 128, 192 and 256 bits
See <http://csrc.nist.gov/CryptoToolkit/aes/> for more information.
config CRYPTO_AES_586
tristate "AES cipher algorithms (i586)"
depends on (X86 || UML_X86) && !64BIT
select CRYPTO_ALGAPI
select CRYPTO_AES
help
AES cipher algorithms (FIPS-197). AES uses the Rijndael
algorithm.
Rijndael appears to be consistently a very good performer in
both hardware and software across a wide range of computing
environments regardless of its use in feedback or non-feedback
modes. Its key setup time is excellent, and its key agility is
good. Rijndael's very low memory requirements make it very well
suited for restricted-space environments, in which it also
demonstrates excellent performance. Rijndael's operations are
among the easiest to defend against power and timing attacks.
The AES specifies three key sizes: 128, 192 and 256 bits
[CRYPTO] Add x86_64 asm AES Implementation: =============== The encrypt/decrypt code is based on an x86 implementation I did a while ago which I never published. This unpublished implementation does include an assembler based key schedule and precomputed tables. For simplicity and best acceptance, however, I took Gladman's in-kernel code for table generation and key schedule for the kernel port of my assembler code and modified this code to produce the key schedule as required by my assembler implementation. File locations and Kconfig are kept similar to the i586 AES assembler implementation. It may seem a little bit strange to use 32 bit I/O and registers in the assembler implementation but this gives the best code size. My implementation takes one instruction more per round compared to Gladman's x86 assembler but it doesn't require any stack for local variables or saved registers and it is less serialized than Gladman's code. Note that all comparisons to Gladman's code were done after my code was implemented. I did only use FIPS PUB 197 for the implementation so my implementation is independent work. If anybody has a better assembler solution for x86_64 I'll be pleased to have my code replaced with the better solution. Testing: ======== The implementation passes the in-kernel crypto testing module and I'm running it without any problems on my laptop where it is mainly used for dm-crypt. Microbenchmark: =============== The microbenchmark was done in userspace with similar compile flags as used during kernel compile. Encrypt/decrypt is about 35% faster than the generic C implementation. As the generic C as well as my assembler implementation are both table I don't really expect that there is much room for further improvements though I'll be glad to be corrected here. The key schedule is about 5% slower than the generic C implementation. This is due to the fact that some more work has to be done in the key schedule routine to fit the schedule to the assembler implementation. Code Size: ========== Encrypt and decrypt are together about 2.1 Kbytes smaller than the generic C implementation which is important with regard to L1 cache usage. The key schedule routine is about 100 bytes larger than the generic C implementation. Data Size: ========== There's no difference in data size requirements between the assembler implementation and the generic C implementation. License: ======== Gladmans's code is dual BSD/GPL whereas my assembler code is GPLv2 only (I'm not going to change the license for my code). So I had to change the module license for the x86_64 aes module from 'Dual BSD/GPL' to 'GPL' to reflect the most restrictive license within the module. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06 20:55:00 +00:00
See <http://csrc.nist.gov/encryption/aes/> for more information.
config CRYPTO_AES_X86_64
tristate "AES cipher algorithms (x86_64)"
depends on (X86 || UML_X86) && 64BIT
select CRYPTO_ALGAPI
select CRYPTO_AES
[CRYPTO] Add x86_64 asm AES Implementation: =============== The encrypt/decrypt code is based on an x86 implementation I did a while ago which I never published. This unpublished implementation does include an assembler based key schedule and precomputed tables. For simplicity and best acceptance, however, I took Gladman's in-kernel code for table generation and key schedule for the kernel port of my assembler code and modified this code to produce the key schedule as required by my assembler implementation. File locations and Kconfig are kept similar to the i586 AES assembler implementation. It may seem a little bit strange to use 32 bit I/O and registers in the assembler implementation but this gives the best code size. My implementation takes one instruction more per round compared to Gladman's x86 assembler but it doesn't require any stack for local variables or saved registers and it is less serialized than Gladman's code. Note that all comparisons to Gladman's code were done after my code was implemented. I did only use FIPS PUB 197 for the implementation so my implementation is independent work. If anybody has a better assembler solution for x86_64 I'll be pleased to have my code replaced with the better solution. Testing: ======== The implementation passes the in-kernel crypto testing module and I'm running it without any problems on my laptop where it is mainly used for dm-crypt. Microbenchmark: =============== The microbenchmark was done in userspace with similar compile flags as used during kernel compile. Encrypt/decrypt is about 35% faster than the generic C implementation. As the generic C as well as my assembler implementation are both table I don't really expect that there is much room for further improvements though I'll be glad to be corrected here. The key schedule is about 5% slower than the generic C implementation. This is due to the fact that some more work has to be done in the key schedule routine to fit the schedule to the assembler implementation. Code Size: ========== Encrypt and decrypt are together about 2.1 Kbytes smaller than the generic C implementation which is important with regard to L1 cache usage. The key schedule routine is about 100 bytes larger than the generic C implementation. Data Size: ========== There's no difference in data size requirements between the assembler implementation and the generic C implementation. License: ======== Gladmans's code is dual BSD/GPL whereas my assembler code is GPLv2 only (I'm not going to change the license for my code). So I had to change the module license for the x86_64 aes module from 'Dual BSD/GPL' to 'GPL' to reflect the most restrictive license within the module. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06 20:55:00 +00:00
help
AES cipher algorithms (FIPS-197). AES uses the Rijndael
[CRYPTO] Add x86_64 asm AES Implementation: =============== The encrypt/decrypt code is based on an x86 implementation I did a while ago which I never published. This unpublished implementation does include an assembler based key schedule and precomputed tables. For simplicity and best acceptance, however, I took Gladman's in-kernel code for table generation and key schedule for the kernel port of my assembler code and modified this code to produce the key schedule as required by my assembler implementation. File locations and Kconfig are kept similar to the i586 AES assembler implementation. It may seem a little bit strange to use 32 bit I/O and registers in the assembler implementation but this gives the best code size. My implementation takes one instruction more per round compared to Gladman's x86 assembler but it doesn't require any stack for local variables or saved registers and it is less serialized than Gladman's code. Note that all comparisons to Gladman's code were done after my code was implemented. I did only use FIPS PUB 197 for the implementation so my implementation is independent work. If anybody has a better assembler solution for x86_64 I'll be pleased to have my code replaced with the better solution. Testing: ======== The implementation passes the in-kernel crypto testing module and I'm running it without any problems on my laptop where it is mainly used for dm-crypt. Microbenchmark: =============== The microbenchmark was done in userspace with similar compile flags as used during kernel compile. Encrypt/decrypt is about 35% faster than the generic C implementation. As the generic C as well as my assembler implementation are both table I don't really expect that there is much room for further improvements though I'll be glad to be corrected here. The key schedule is about 5% slower than the generic C implementation. This is due to the fact that some more work has to be done in the key schedule routine to fit the schedule to the assembler implementation. Code Size: ========== Encrypt and decrypt are together about 2.1 Kbytes smaller than the generic C implementation which is important with regard to L1 cache usage. The key schedule routine is about 100 bytes larger than the generic C implementation. Data Size: ========== There's no difference in data size requirements between the assembler implementation and the generic C implementation. License: ======== Gladmans's code is dual BSD/GPL whereas my assembler code is GPLv2 only (I'm not going to change the license for my code). So I had to change the module license for the x86_64 aes module from 'Dual BSD/GPL' to 'GPL' to reflect the most restrictive license within the module. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06 20:55:00 +00:00
algorithm.
Rijndael appears to be consistently a very good performer in
both hardware and software across a wide range of computing
environments regardless of its use in feedback or non-feedback
modes. Its key setup time is excellent, and its key agility is
good. Rijndael's very low memory requirements make it very well
suited for restricted-space environments, in which it also
demonstrates excellent performance. Rijndael's operations are
among the easiest to defend against power and timing attacks.
[CRYPTO] Add x86_64 asm AES Implementation: =============== The encrypt/decrypt code is based on an x86 implementation I did a while ago which I never published. This unpublished implementation does include an assembler based key schedule and precomputed tables. For simplicity and best acceptance, however, I took Gladman's in-kernel code for table generation and key schedule for the kernel port of my assembler code and modified this code to produce the key schedule as required by my assembler implementation. File locations and Kconfig are kept similar to the i586 AES assembler implementation. It may seem a little bit strange to use 32 bit I/O and registers in the assembler implementation but this gives the best code size. My implementation takes one instruction more per round compared to Gladman's x86 assembler but it doesn't require any stack for local variables or saved registers and it is less serialized than Gladman's code. Note that all comparisons to Gladman's code were done after my code was implemented. I did only use FIPS PUB 197 for the implementation so my implementation is independent work. If anybody has a better assembler solution for x86_64 I'll be pleased to have my code replaced with the better solution. Testing: ======== The implementation passes the in-kernel crypto testing module and I'm running it without any problems on my laptop where it is mainly used for dm-crypt. Microbenchmark: =============== The microbenchmark was done in userspace with similar compile flags as used during kernel compile. Encrypt/decrypt is about 35% faster than the generic C implementation. As the generic C as well as my assembler implementation are both table I don't really expect that there is much room for further improvements though I'll be glad to be corrected here. The key schedule is about 5% slower than the generic C implementation. This is due to the fact that some more work has to be done in the key schedule routine to fit the schedule to the assembler implementation. Code Size: ========== Encrypt and decrypt are together about 2.1 Kbytes smaller than the generic C implementation which is important with regard to L1 cache usage. The key schedule routine is about 100 bytes larger than the generic C implementation. Data Size: ========== There's no difference in data size requirements between the assembler implementation and the generic C implementation. License: ======== Gladmans's code is dual BSD/GPL whereas my assembler code is GPLv2 only (I'm not going to change the license for my code). So I had to change the module license for the x86_64 aes module from 'Dual BSD/GPL' to 'GPL' to reflect the most restrictive license within the module. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-06 20:55:00 +00:00
The AES specifies three key sizes: 128, 192 and 256 bits
See <http://csrc.nist.gov/encryption/aes/> for more information.
config CRYPTO_ANUBIS
tristate "Anubis cipher algorithm"
select CRYPTO_ALGAPI
help
Anubis cipher algorithm.
Anubis is a variable key length cipher which can use keys from
128 bits to 320 bits in length. It was evaluated as a entrant
in the NESSIE competition.
See also:
<https://www.cosic.esat.kuleuven.ac.be/nessie/reports/>
<http://planeta.terra.com.br/informatica/paulobarreto/AnubisPage.html>
config CRYPTO_ARC4
tristate "ARC4 cipher algorithm"
select CRYPTO_ALGAPI
help
ARC4 cipher algorithm.
ARC4 is a stream cipher using keys ranging from 8 bits to 2048
bits in length. This algorithm is required for driver-based
WEP, but it should not be for other purposes because of the
weakness of the algorithm.
config CRYPTO_BLOWFISH
tristate "Blowfish cipher algorithm"
select CRYPTO_ALGAPI
help
Blowfish cipher algorithm, by Bruce Schneier.
This is a variable key length cipher which can use keys from 32
bits to 448 bits in length. It's fast, simple and specifically
designed for use on "large microprocessors".
See also:
<http://www.schneier.com/blowfish.html>
config CRYPTO_CAMELLIA
tristate "Camellia cipher algorithms"
depends on CRYPTO
select CRYPTO_ALGAPI
help
Camellia cipher algorithms module.
Camellia is a symmetric key block cipher developed jointly
at NTT and Mitsubishi Electric Corporation.
The Camellia specifies three key sizes: 128, 192 and 256 bits.
See also:
<https://info.isl.ntt.co.jp/crypt/eng/camellia/index_s.html>
config CRYPTO_CAST5
tristate "CAST5 (CAST-128) cipher algorithm"
select CRYPTO_ALGAPI
help
The CAST5 encryption algorithm (synonymous with CAST-128) is
described in RFC2144.
config CRYPTO_CAST6
tristate "CAST6 (CAST-256) cipher algorithm"
select CRYPTO_ALGAPI
help
The CAST6 encryption algorithm (synonymous with CAST-256) is
described in RFC2612.
config CRYPTO_DES
tristate "DES and Triple DES EDE cipher algorithms"
select CRYPTO_ALGAPI
help
DES cipher algorithm (FIPS 46-2), and Triple DES EDE (FIPS 46-3).
config CRYPTO_FCRYPT
tristate "FCrypt cipher algorithm"
select CRYPTO_ALGAPI
select CRYPTO_BLKCIPHER
help
FCrypt algorithm used by RxRPC.
config CRYPTO_KHAZAD
tristate "Khazad cipher algorithm"
select CRYPTO_ALGAPI
help
Khazad cipher algorithm.
Khazad was a finalist in the initial NESSIE competition. It is
an algorithm optimized for 64-bit processors with good performance
on 32-bit processors. Khazad uses an 128 bit key size.
See also:
<http://planeta.terra.com.br/informatica/paulobarreto/KhazadPage.html>
config CRYPTO_SALSA20
tristate "Salsa20 stream cipher algorithm (EXPERIMENTAL)"
depends on EXPERIMENTAL
select CRYPTO_BLKCIPHER
help
Salsa20 stream cipher algorithm.
Salsa20 is a stream cipher submitted to eSTREAM, the ECRYPT
Stream Cipher Project. See <http://www.ecrypt.eu.org/stream/>
The Salsa20 stream cipher algorithm is designed by Daniel J.
Bernstein <djb@cr.yp.to>. See <http://cr.yp.to/snuffle.html>
config CRYPTO_SALSA20_586
tristate "Salsa20 stream cipher algorithm (i586) (EXPERIMENTAL)"
depends on (X86 || UML_X86) && !64BIT
depends on EXPERIMENTAL
select CRYPTO_BLKCIPHER
help
Salsa20 stream cipher algorithm.
Salsa20 is a stream cipher submitted to eSTREAM, the ECRYPT
Stream Cipher Project. See <http://www.ecrypt.eu.org/stream/>
The Salsa20 stream cipher algorithm is designed by Daniel J.
Bernstein <djb@cr.yp.to>. See <http://cr.yp.to/snuffle.html>
config CRYPTO_SALSA20_X86_64
tristate "Salsa20 stream cipher algorithm (x86_64) (EXPERIMENTAL)"
depends on (X86 || UML_X86) && 64BIT
depends on EXPERIMENTAL
select CRYPTO_BLKCIPHER
help
Salsa20 stream cipher algorithm.
Salsa20 is a stream cipher submitted to eSTREAM, the ECRYPT
Stream Cipher Project. See <http://www.ecrypt.eu.org/stream/>
The Salsa20 stream cipher algorithm is designed by Daniel J.
Bernstein <djb@cr.yp.to>. See <http://cr.yp.to/snuffle.html>
config CRYPTO_SEED
tristate "SEED cipher algorithm"
select CRYPTO_ALGAPI
help
SEED cipher algorithm (RFC4269).
SEED is a 128-bit symmetric key block cipher that has been
developed by KISA (Korea Information Security Agency) as a
national standard encryption algorithm of the Republic of Korea.
It is a 16 round block cipher with the key size of 128 bit.
See also:
<http://www.kisa.or.kr/kisa/seed/jsp/seed_eng.jsp>
config CRYPTO_SERPENT
tristate "Serpent cipher algorithm"
select CRYPTO_ALGAPI
help
Serpent cipher algorithm, by Anderson, Biham & Knudsen.
Keys are allowed to be from 0 to 256 bits in length, in steps
of 8 bits. Also includes the 'Tnepres' algorithm, a reversed
variant of Serpent for compatibility with old kerneli.org code.
See also:
<http://www.cl.cam.ac.uk/~rja14/serpent.html>
config CRYPTO_TEA
tristate "TEA, XTEA and XETA cipher algorithms"
select CRYPTO_ALGAPI
help
TEA cipher algorithm.
Tiny Encryption Algorithm is a simple cipher that uses
many rounds for security. It is very fast and uses
little memory.
Xtendend Tiny Encryption Algorithm is a modification to
the TEA algorithm to address a potential key weakness
in the TEA algorithm.
Xtendend Encryption Tiny Algorithm is a mis-implementation
of the XTEA algorithm for compatibility purposes.
config CRYPTO_TWOFISH
tristate "Twofish cipher algorithm"
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
help
Twofish cipher algorithm.
Twofish was submitted as an AES (Advanced Encryption Standard)
candidate cipher by researchers at CounterPane Systems. It is a
16 round block cipher supporting key sizes of 128, 192, and 256
bits.
See also:
<http://www.schneier.com/twofish.html>
config CRYPTO_TWOFISH_COMMON
tristate
help
Common parts of the Twofish cipher algorithm shared by the
generic c and the assembler implementations.
config CRYPTO_TWOFISH_586
tristate "Twofish cipher algorithms (i586)"
depends on (X86 || UML_X86) && !64BIT
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
help
Twofish cipher algorithm.
Twofish was submitted as an AES (Advanced Encryption Standard)
candidate cipher by researchers at CounterPane Systems. It is a
16 round block cipher supporting key sizes of 128, 192, and 256
bits.
See also:
<http://www.schneier.com/twofish.html>
config CRYPTO_TWOFISH_X86_64
tristate "Twofish cipher algorithm (x86_64)"
depends on (X86 || UML_X86) && 64BIT
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
help
Twofish cipher algorithm (x86_64).
Twofish was submitted as an AES (Advanced Encryption Standard)
candidate cipher by researchers at CounterPane Systems. It is a
16 round block cipher supporting key sizes of 128, 192, and 256
bits.
See also:
<http://www.schneier.com/twofish.html>
comment "Compression"
config CRYPTO_DEFLATE
tristate "Deflate compression algorithm"
select CRYPTO_ALGAPI
select ZLIB_INFLATE
select ZLIB_DEFLATE
help
This is the Deflate algorithm (RFC1951), specified for use in
IPSec with the IPCOMP protocol (RFC3173, RFC2394).
You will most probably want this if using IPSec.
config CRYPTO_LZO
tristate "LZO compression algorithm"
select CRYPTO_ALGAPI
select LZO_COMPRESS
select LZO_DECOMPRESS
help
This is the LZO algorithm.
source "drivers/crypto/Kconfig"
endif # if CRYPTO