aha/net
Eric Dumazet 271b72c7fa udp: RCU handling for Unicast packets.
Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
 writes should happen in the fast path.

 Note: Multicasts and broadcasts still will need to take a lock,
 because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 02:11:14 -07:00
..
9p 9p: fix sparse warnings 2008-10-22 18:54:47 -05:00
802 net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
8021q vlan: propogate ethtool speed values 2008-10-28 23:02:34 -07:00
appletalk net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
atm net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
ax25 ax25: Quick fix for making sure unaccepted sockets get destroyed. 2008-10-06 12:53:50 -07:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-10-17 08:58:52 -07:00
bridge net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final users 2008-10-28 23:02:38 -07:00
can net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) 2008-10-16 15:24:51 -07:00
core udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
dccp dccp: Port redirection support for DCCP 2008-10-19 23:36:47 -07:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-10-17 08:58:52 -07:00
dsa dsa: fix compile bug on s390 2008-10-13 18:58:48 -07:00
econet
ethernet dsa: add support for Trailer tagging format 2008-10-08 17:24:16 -07:00
ieee80211 net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
ipv4 udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
ipv6 udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
ipx
irda net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
iucv iucv: Fix mismerge again. 2008-09-30 03:03:35 -07:00
key af_key: fix SADB_X_SPDDELETE response 2008-10-10 14:07:03 -07:00
lapb
llc net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
mac80211 mac80211: convert to %pM away from print_mac 2008-10-27 17:06:16 -07:00
netfilter netfilter: replace uses of NIP6_FMT with %p6 2008-10-28 16:08:13 -07:00
netlabel net, misc: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:32 -07:00
netlink netlink: constify struct nlattr * arg to parsing functions 2008-10-28 11:59:11 -07:00
netrom netrom: Fix sock_orphan() use in nr_release 2008-10-06 12:54:57 -07:00
packet
phonet Phonet: do not reply to indication reset packets 2008-10-26 23:07:25 -07:00
rfkill net/rfkill/rfkill-input.c needs <linux/sched.h> 2008-10-14 10:23:27 -07:00
rose
rxrpc net/rxrpc: Use an IS_ERR test rather than a NULL test 2008-08-13 02:40:48 -07:00
sched Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 2008-10-22 09:48:06 +02:00
sctp net, misc: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:32 -07:00
sunrpc net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final users 2008-10-28 23:02:38 -07:00
tipc net: convert print_mac to %pM 2008-10-27 17:06:18 -07:00
unix [PATCH] assorted path_lookup() -> kern_path() conversions 2008-10-23 05:12:52 -04:00
wanrouter
wireless wireless: fix regression caused by regulatory config option 2008-10-26 10:38:52 -07:00
x25
xfrm net, misc: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:32 -07:00
compat.c
Kconfig netns: Coexist with the sysfs limitations v2 2008-10-27 17:51:47 -07:00
Makefile net: Distributed Switch Architecture protocol support 2008-10-08 17:15:19 -07:00
nonet.c
socket.c net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) 2008-10-16 15:24:51 -07:00
sysctl_net.c
TUNABLE