aha/net
Herbert Xu 9a279bcbe3 net: Partially allow skb destructors to be used on receive path
As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.

This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.

With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.

Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off.  Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.

Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.

It turns out that making skb destructors useable on the receive path
isn't as easy as it seems.  For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough.  This is because we assume all
over the IP stack that skb->sk is an IP socket if present.

The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.

Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches.  However, it turns out that for bridging at least we
don't need to do all of this work.

This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).

So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path.  It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.

One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:55:27 -08:00
..
9p net/9p: fid->fid is used uninitialized 2009-01-19 16:20:15 -08:00
802 net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
8021q net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
appletalk net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
atm lec: convert to net_device_ops 2009-01-21 14:02:00 -08:00
ax25 net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
bluetooth bluetooth: driver API update 2009-01-07 17:23:17 -08:00
bridge net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
can net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
core net: Partially allow skb destructors to be used on receive path 2009-02-04 16:55:27 -08:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp dccp: Debugging functions for feature negotiation 2009-01-21 14:34:05 -08:00
decnet net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
dsa net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
econet net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
ethernet eth: Declare an optimized compare_ether_addr_64bits() function 2008-11-23 23:24:32 -08:00
ipv4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-02-03 00:15:35 -08:00
ipv6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-02-03 00:15:35 -08:00
ipx net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
irda net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
iucv s390: remove s390_root_dev_*() 2009-01-06 10:44:34 -08:00
key af_key: initialize xfrm encap_oa 2009-01-25 20:49:14 -08:00
lapb
llc net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
mac80211 mac80211: Cancel the dynamic ps timer in ioctl_siwpower. 2009-01-30 13:38:28 -05:00
netfilter net: Partially allow skb destructors to be used on receive path 2009-02-04 16:55:27 -08:00
netlabel netlabel: Update kernel configuration API 2008-12-31 12:54:11 -05:00
netlink genetlink: export genl_unregister_mc_group() 2009-01-07 10:00:17 -08:00
netrom netrom: convert to net_device_ops 2009-01-21 14:02:02 -08:00
packet net: packet socket packet_lookup_frame fix 2009-02-01 01:53:29 -08:00
phonet net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
rfkill net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning 2009-01-04 17:11:24 -08:00
rose rose: convert to network_device_ops 2009-01-21 14:02:04 -08:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2008-12-28 12:49:40 -08:00
sched pkt_sched: sch_htb: Use workqueue to schedule after too many events. 2009-02-01 01:13:22 -08:00
sctp net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
sunrpc sunrpc: fix rdma dependencies 2009-02-03 15:20:13 -08:00
tipc net/tipc/bcast.h: use ARRAY_SIZE 2009-01-11 00:06:33 -08:00
unix introduce new LSM hooks where vfsmount is available. 2008-12-31 18:07:37 -05:00
wanrouter netdevice wanrouter: Convert directly reference of netdev->priv 2008-11-20 04:26:21 -08:00
wimax wimax: fix build issue when debugfs is disabled 2009-01-29 17:18:31 -08:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-02-03 12:41:58 -08:00
x25 net: replace uses of __constant_{endian} 2009-02-01 00:45:17 -08:00
xfrm Revert "xfrm: For 32/64 compatability wrt. xfrm_usersa_info" 2009-01-20 09:49:51 -08:00
compat.c reintroduce accept4 2008-11-19 18:49:57 -08:00
Kconfig Phonet: move to Networking options like other protocol stacks 2009-01-26 21:03:33 -08:00
Makefile wimax: Makefile, Kconfig and docbook linkage for the stack 2009-01-07 10:00:17 -08:00
nonet.c
socket.c [CVE-2009-0029] System call wrappers part 22 2009-01-14 14:15:27 +01:00
sysctl_net.c
TUNABLE