Commit graph

6507 commits

Author SHA1 Message Date
Linus Torvalds
dbcc2ec60f Revert "mac80211: warn when receiving frames with unaligned data"
This reverts commit 81100eb80a for the
release, to avoid the unnecessary warning noise that is only really
relevant to wireless driver developers.

The warning will probably go right back in after I cut the release, but
at least we won't unnecessarily worry users.

Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-24 13:35:10 -08:00
Herbert Xu
f945fa7ad9 [INET]: Fix truesize setting in ip_append_data
As it is ip_append_data only counts page fragments to the skb that
allocated it.  As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.

This bug was exposed by the UDP accounting patch.

[ The wmem_alloc bumping needs to be moved with the truesize,
  noticed by Takahiro Yasui.  -DaveM ]

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23 03:11:43 -08:00
Denis V. Lunev
ff4b950277 [NETNS]: Re-export init_net via EXPORT_SYMBOL.
init_net is used added as a parameter to a lot of old API calls, f.e.
ip_dev_find. These calls were exported as EXPORT_SYMBOL. So, export init_net
as EXPORT_SYMBOL to keep networking API consistent.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23 03:11:42 -08:00
David S. Miller
1e34a11d55 [IPV4]: Add missing skb->truesize increment in ip_append_page().
And as noted by Takahiro Yasui, we thus need to bump the
sk->sk_wmem_alloc at this spot as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23 03:11:40 -08:00
Dave Young
acea6852f3 [BLUETOOTH]: Move children of connection device to NULL before connection down.
The rfcomm tty device will possibly retain even when conn is down, and
sysfs doesn't support zombie device moving, so this patch move the tty
device before conn device is destroyed.

For the bug refered please see :
http://lkml.org/lkml/2007/12/28/87

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23 03:11:39 -08:00
Wang Chen
5b4d383a1a [ICMP]: ICMP_MIB_OUTMSGS increment duplicated
Commit "96793b482540f3a26e2188eaf75cb56b7829d3e3" (Add ICMPMsgStats
MIB (RFC 4293)) made a mistake.

In that patch, David L added a icmp_out_count() in
ip_push_pending_frames(), remove icmp_out_count() from
icmp_reply(). But he forgot to remove icmp_out_count() from
icmp_send() too.  Since icmp_send and icmp_reply will call
icmp_push_reply, which will call ip_push_pending_frames, a duplicated
increment happened in icmp_send.

This patch remove the icmp_out_count from icmp_send too.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-21 03:39:45 -08:00
Wang Chen
fa95c28322 [IPV6]: RFC 2011 compatibility broken
The snmp6 entry name was changed, and it broke compatibility
to RFC 2011.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-21 03:05:43 -08:00
Wang Chen
c964ff4ffb [IPV6]: ICMP6_MIB_OUTMSGS increment duplicated
icmpv6_send() calls ip6_push_pending_frames() indirectly.
Both ip6_push_pending_frames() and icmpv6_send() increment
counter ICMP6_MIB_OUTMSGS.

This patch remove the increment from icmpv6_send.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-21 03:05:20 -08:00
Patrick McHardy
68365458a4 [NET]: rtnl_link: fix use-after-free
When unregistering the rtnl_link_ops, all existing devices using
the ops are destroyed. With nested devices this may lead to a
use-after-free despite the use of for_each_netdev_safe() in case
the upper device is next in the device list and is destroyed
by the NETDEV_UNREGISTER notifier.

The easy fix is to restart scanning the device list after removing
a device. Alternatively we could add new devices to the front of
the list to avoid having dependant devices follow the device they
depend on. A third option would be to only restart scanning if
dev->iflink of the next device matches dev->ifindex of the current
one. For now this seems like the safest solution.

With this patch, the veth rtnl_link_ops unregistration can use
rtnl_link_unregister() directly since it now also handles destruction
of multiple devices at once.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:45 -08:00
Patrick McHardy
d4782c323d [AF_KEY]: Fix skb leak on pfkey_send_migrate() error
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:45 -08:00
Jesper Juhl
61e44b4815 [IrDA]: af_irda memory leak fixes
Here goes an IrDA patch against your latest net-2.6 tree.

This patch fixes some af_irda memory leaks.  It also checks for
irias_new_obect() return value.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:42 -08:00
David S. Miller
cecbb63967 [NEIGH]: Revert 'Fix race between neigh_parms_release and neightbl_fill_parms'
Commit 9cd4002942 (Fix race between
neigh_parms_release and neightbl_fill_parms) introduced device
reference counting regressions for several people, see:

	http://bugzilla.kernel.org/show_bug.cgi?id=9778

for example.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:42 -08:00
Patrick McHardy
2dc2f207fb [NETFILTER]: bridge-netfilter: fix net_device refcnt leaks
When packets are flood-forwarded to multiple output devices, the
bridge-netfilter code reuses skb->nf_bridge for each clone to store
the bridge port. When queueing packets using NFQUEUE netfilter takes
a reference to skb->nf_bridge->physoutdev, which is overwritten
when the packet is forwarded to the second port. This causes
refcount unterflows for the first device and refcount leaks for all
others. Additionally this provides incorrect data to the iptables
physdev match.

Unshare skb->nf_bridge by copying it if it is shared before assigning
the physoutdev device.

Reported, tested and based on initial patch by
Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:41 -08:00
YOSHIFUJI Hideaki
398bcbebb6 [IPV6] ROUTE: Make sending algorithm more friendly with RFC 4861.
We omit (or delay) sending NSes for known-to-unreachable routers (in
NUD_FAILED state) according to RFC 4191 (Default Router Preferences
and More-Specific Routes).  But this is not fully compatible with RFC
4861 (Neighbor Discovery Protocol for IPv6), which does not remember
unreachability of neighbors.

So, let's avoid mixing sending algorithm of RFC 4191 and that of RFC
4861, and make the algorithm more friendly with RFC 4861 if RFC 4191
is disabled.

Issue was found by IPv6 Ready Logo Core Self_Test 1.5.0b2 (by TAHI
Project), and has been tracked down by Mitsuru Chinen
<mitch@linux.vnet.ibm.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:40 -08:00
Eric Dumazet
8d3f099abe [IPV4] FIB_HASH : Avoid unecessary loop in fn_hash_dump_zone()
I noticed "ip route list" was slower than "cat /proc/net/route" on a
machine with a full Internet routing table (214392 entries : Special
thanks to Robert ;) )

This is similar to problem reported in commit
d8c9283089 ("[IPV4] ROUTE: ip_rt_dump()
is unecessary slow")

Fix is to avoid scanning the begining of fz_hash table, but directly
seek to the right offset.

Before patch :

time ip route >/tmp/ROUTE

real    0m1.285s
user    0m0.712s
sys     0m0.436s

After patch

# time ip route >/tmp/ROUTE

real    0m0.835s
user    0m0.692s
sys     0m0.124s

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:39 -08:00
Joonwoo Park
6725033fa2 [IPV4] fib_trie: fix duplicated route issue
http://bugzilla.kernel.org/show_bug.cgi?id=9493

The fib allows making identical routes with 'ip route replace'.
This patch makes the fib return -EEXIST if replacement would cause duplication.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:38 -08:00
Joonwoo Park
bd566e7525 [IPV4] fib_hash: fix duplicated route issue
http://bugzilla.kernel.org/show_bug.cgi?id=9493

The fib allows making identical routes with 'ip route replace'.
This patch makes the fib return -EEXIST if replacement would cause duplication.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:37 -08:00
Pavel Emelyanov
b3652b2dc5 [IPV6]: Mischecked tw match in __inet6_check_established.
When looking for a conflicting connection the !sk->sk_bound_dev_if
check is performed only for live sockets, but not for timewait-ed.

This is not the case for ipv4, for __inet6_lookup_established in
both ipv4 and ipv6 and for other places that check for tw-s.

Was this missed accidentally? If so, then this patch fixes it and
besides makes use if the dif variable declared in the function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:36 -08:00
Eric Paris
632041f306 rfkill: call rfkill_led_trigger_unregister() on error
Code inspection turned up that error cases in rfkill_register() do not
call rfkill_led_trigger_unregister() even though we have already
registered.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-20 20:31:36 -08:00
Eric Dumazet
1b310fca30 [TOKENRING]: rif_timer not initialized properly
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-13 22:32:49 -08:00
Patrick McHardy
2948d2ebbb [NETFILTER]: bridge: fix double POST_ROUTING invocation
The bridge code incorrectly causes two POST_ROUTING hook invocations
for DNATed packets that end up on the same bridge device. This
happens because packets with a changed destination address are passed
to dst_output() to make them go through the neighbour output function
again to build a new destination MAC address, before they will continue
through the IP hooks simulated by bridge netfilter.

The resulting hook order is:
 PREROUTING	(bridge netfilter)
 POSTROUTING	(dst_output -> ip_output)
 FORWARD	(bridge netfilter)
 POSTROUTING	(bridge netfilter)

The deferred hooks used to abort the first POST_ROUTING invocation,
but since the only thing bridge netfilter actually really wants is
a new MAC address, we can avoid going through the IP stack completely
by simply calling the neighbour output function directly.

Tested, reported and lots of data provided by: Damien Thebault <damien.thebault@gmail.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-11 18:02:18 -08:00
Jan Engelhardt
0ff4d77bd9 [NETFILTER]: xt_helper: Do not bypass RCU
Use the @helper variable that was just obtained.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 22:41:28 -08:00
Yasuyuki Kozakai
8f41f01786 [NETFILTER]: ip6t_eui64: Fixes calculation of Universal/Local bit
RFC2464 says that the next to lowerst order bit of the first octet
of the Interface Identifier is formed by complementing
the Universal/Local bit of the EUI-64. But ip6t_eui64 uses OR not XOR.

Thanks Peter Ivancik for reporing this bug and posting a patch
for it.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 22:40:39 -08:00
Jarek Poplawski
0fe1e567d0 [VLAN]: nested VLAN: fix lockdep's recursive locking warning
Allow vlans nesting other vlans without lockdep's warnings (max. 2 levels
i.e. parent + child). Thanks to Patrick McHardy for pointing a bug in the
first version of this patch.

Reported-by: Benny Amorsen

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 22:38:31 -08:00
Eric Dumazet
0d89d7944f [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache
In dn_rt_cache_get_next(), no need to guard seq->private by a
rcu_dereference() since seq is private to the thread running this
function. Reading seq.private once (as guaranted bu rcu_dereference())
or several time if compiler really is dumb enough wont change the
result.
 
But we miss real spots where rcu_dereference() are needed, both in
dn_rt_cache_get_first() and dn_rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 22:35:21 -08:00
Dave Young
f951375d47 [BLUETOOTH]: rfcomm tty BUG_ON() code fix
1) In tty.c the BUG_ON at line 115 will never be called, because the the
   before list_del_init in this same function.
	115          BUG_ON(!list_empty(&dev->list));
   So move the list_del_init to rfcomm_dev_del 

2) The rfcomm_dev_del could be called from diffrent path
   (rfcomm_tty_hangup/rfcomm_dev_state_change/rfcomm_release_dev),

   So add another BUG_ON when the rfcomm_dev_del is called more than
   one time.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 22:22:52 -08:00
Jarek Poplawski
ecd2ebdea3 [AX25] af_ax25: Possible circular locking.
Bernard Pidoux F6BVP reported:
> When I killall kissattach I can see the following message.
>
> This happens on kernel 2.6.24-rc5 already patched with the 6 previously
> patches I sent recently.
>
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.23.9 #1
> -------------------------------------------------------
> kissattach/2906 is trying to acquire lock:
>  (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
>
> but task is already holding lock:
>  (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
> [ax25]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
...

lockdep is worried about the different order here:

#1 (rose_neigh_list_lock){-+..}:
#3 (ax25_list_lock){-+..}:

#0 (linkfail_lock){-+..}:
#1 (rose_neigh_list_lock){-+..}:

#3 (ax25_list_lock){-+..}:
#0 (linkfail_lock){-+..}:

So, ax25_list_lock could be taken before and after linkfail_lock. 
I don't know if this three-thread clutch is very probable (or
possible at all), but it seems another bug reported by Bernard
("[...] system impossible to reboot with linux-2.6.24-rc5")
could have similar source - namely ax25_list_lock held by
ax25_kill_by_device() during ax25_disconnect(). It looks like the
only place which calls ax25_disconnect() this way, so I guess, it
isn't necessary.

This patch is breaking the lock for ax25_disconnect().

Reported-and-tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 21:21:20 -08:00
maximilian attems
27d1cba21f [AX25]: Kill user triggable printks.
sfuzz can easily trigger any of those.

move the printk message to the corresponding comment: makes the
intention of the code clear and easy to pick up on an scheduled
removal.  as bonus simplify the braces placement.

Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 03:57:29 -08:00
Eric Dumazet
0bcceadceb [IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
In rt_cache_get_next(), no need to guard seq->private by a
rcu_dereference() since seq is private to the thread running this
function. Reading seq.private once (as guaranted bu rcu_dereference())
or several time if compiler really is dumb enough wont change the
result.

But we miss real spots where rcu_dereference() are needed, both in
rt_cache_get_first() and rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 03:55:57 -08:00
Pavel Emelyanov
9cd4002942 [NEIGH]: Fix race between neigh_parms_release and neightbl_fill_parms
The neightbl_fill_parms() is called under the write-locked tbl->lock
and accesses the parms->dev. The negh_parm_release() calls the
dev_put(parms->dev) without this lock. This creates a tiny race window
on which the parms contains potentially stale dev pointer.

To fix this race it's enough to move the dev_put() upper under the
tbl->lock, but note, that the parms are held by neighbors and thus can
live after the neigh_parms_release() is called, so we still can have a
parm with bad dev pointer.

I didn't find where the neigh->parms->dev is accessed, but still think
that putting the dev is to be done in a place, where the parms are
really freed. Am I right with that?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 03:48:38 -08:00
Herbert Xu
1c9b7aa1eb [ATM]: Check IP header validity in mpc_send_packet
Al went through the ip_fast_csum callers and found this piece of code
that did not validate the IP header.  While root crashing the machine
by sending bogus packets through raw or AF_PACKET sockets isn't that
serious, it is still nice to react gracefully.

This patch ensures that the skb has enough data for an IP header and
that the header length field is valid.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-09 03:51:59 -08:00
Brian Haley
1ac4f00885 [IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:52:21 -08:00
Eric Dumazet
0f99be0d11 [XFRM]: xfrm_algo_clone() allocates too much memory
alg_key_len is the length in bits of the key, not in bytes.

Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c 
to include/net/xfrm.h, and to use it in xfrm_algo_clone()

alg_len() is renamed to xfrm_alg_len() because of its global exposition.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:39:06 -08:00
Brice Goglin
877364e60e [LRO] Fix lro_mgr->features checks
lro_mgr->features contains a bitmask of LRO_F_* values which are
defined as power of two, not as bit indexes.
They must be checked with x&LRO_F_FOO, not with test_bit(LRO_F_FOO,&x).

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Acked-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:18 -08:00
Paul Moore
02f1c89d6e [NET]: Clone the sk_buff 'iif' field in __skb_clone()
Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:17 -08:00
Eric Dumazet
d8c9283089 [IPV4] ROUTE: ip_rt_dump() is unecessary slow
I noticed "ip route list cache x.y.z.t" can be *very* slow.

While strace-ing -T it I also noticed that first part of route cache
is fetched quite fast :

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>

while the part at the end of the table is more expensive:

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>

The following patch corrects this performance/latency problem,
removing quadratic behavior.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:16 -08:00
David S. Miller
fed17f3094 [NET]: Stop polling when napi_disable() is pending.
This finally adds the code in net_rx_action() to break out of the
->poll()'ing loop when a napi_disable() is found to be pending.

Now, even if a device is being flooded with packets it can be cleanly
brought down.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:13 -08:00
Andrew Lutomirski
5cdfed54e7 mac80211: return an error when SIWRATE doesn't match any rate
Currently mac80211 fails silently when trying to set a nonexistent
rate.  Return an error instead.

Signed-Off-By: Andy Lutomirski <luto@myrealbox.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-08 23:30:10 -08:00
maximilian attems
9e8d6f8959 [IRDA]: irda_create() nuke user triggable printk
easy to trigger as user with sfuzz.

irda_create() is quiet on unknown sock->type,
match this behaviour for SOCK_DGRAM unknown protocol

Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:05 -08:00
Vlad Yasevich
036b579b11 [SCTP]: Add back the code that accounted for FORWARD_TSN parameter in INIT.
Some recent changes completely removed accounting for the FORWARD_TSN
parameter length in the INIT and INIT-ACK chunk.  This is wrong and
should be restored.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:04 -08:00
Vlad Yasevich
6df9cfc1ad [SCTP]: Correctly handle AUTH parameters in unexpected INIT
When processing an unexpected INIT chunk, we do not need to
do any preservation of the old AUTH parameters.  In fact,
doing such preservations will nullify AUTH and allow connection
stealing.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:03 -08:00
Vlad Yasevich
f691724c4d [SCTP]: Fix the name of the authentication event.
The even should be called SCTP_AUTHENTICATION_INDICATION.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:02 -08:00
Amos Waterland
92ffb85dd3 [IPV4] ipconfig: Fix regression in ip command line processing
The recent changes for ip command line processing fixed some problems
but unfortunately broke some common usage scenarios.  In current
2.6.24-rc6 the following command line results in no IP address
assignment, which is surely a regression:

 ip=10.0.2.15::10.0.2.2:255.255.255.0::eth0:off

Please find below a patch that works for all cases I can find.

Signed-off-by: Amos Waterland <apw@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:29:58 -08:00
Herbert Xu
f844c74fe0 [IPV4] raw: Strengthen check on validity of iph->ihl
We currently check that iph->ihl is bounded by the real length and that
the real length is greater than the minimum IP header length.  However,
we did not check the caes where iph->ihl is less than the minimum IP
header length.

This breaks because some ip_fast_csum implementations assume that which
is quite reasonable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:29:57 -08:00
Mark McLoughlin
44344b2a85 [INET]: Fix netdev renaming and inet address labels
When re-naming an interface, the previous secondary address
labels get lost e.g.

  $> brctl addbr foo
  $> ip addr add 192.168.0.1 dev foo
  $> ip addr add 192.168.0.2 dev foo label foo:00
  $> ip addr show dev foo | grep inet
    inet 192.168.0.1/32 scope global foo
    inet 192.168.0.2/32 scope global foo:00
  $> ip link set foo name bar
  $> ip addr show dev bar | grep inet
    inet 192.168.0.1/32 scope global bar
    inet 192.168.0.2/32 scope global bar:2

Turns out to be a simple thinko in inetdev_changename() - clearly we
want to look at the address label, rather than the device name, for
a suffix to retain.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-04 03:55:34 -08:00
Eric Dumazet
2d60abc2a9 [XFRM]: Do not define km_migrate() if !CONFIG_XFRM_MIGRATE
In include/net/xfrm.h we find :

#ifdef CONFIG_XFRM_MIGRATE
extern int km_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
                      struct xfrm_migrate *m, int num_bundles);
...
#endif

We can also guard the function body itself in net/xfrm/xfrm_state.c
with same condition.

(Problem spoted by sparse checker)
make C=2 net/xfrm/xfrm_state.o
...
net/xfrm/xfrm_state.c:1765:5: warning: symbol 'km_migrate' was not declared. Should it be static?
...

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-04 00:47:03 -08:00
Julia Lawall
76975f8a31 [X25]: Add missing x25_neigh_put
The function x25_get_neigh increments a reference count.  At the point of
the second goto out, the result of calling x25_get_neigh is only stored in
a local variable, and thus no one outside the function will be able to
decrease the reference count.  Thus, x25_neigh_put should be called before
the return in this case.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>

@@
type T,T1,T2;
identifier E;
statement S;
expression x1,x2,x3;
int ret;
@@

  T E;
  ...
* if ((E = x25_get_neigh(...)) == NULL)
  S
  ... when != x25_neigh_put(...,(T1)E,...)
      when != if (E != NULL) { ... x25_neigh_put(...,(T1)E,...); ...}
      when != x1 = (T1)E
      when != E = x3;
      when any
  if (...) {
    ... when != x25_neigh_put(...,(T2)E,...)
        when != if (E != NULL) { ... x25_neigh_put(...,(T2)E,...); ...}
        when != x2 = (T2)E
(
*   return;
|
*   return ret;
)
  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-04 00:47:02 -08:00
James Morris
3392c34922 NFS: add newline to kernel warning message in auth_gss code
Add newline to kernel warning message in gss_create().

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:16 -05:00
Dave Young
38b7da09cf [BLUETOOTH]: put_device before device_del fix
Because of workqueue delay, the put_device could be called before
device_del, so move it to del_conn.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com> 
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-29 19:17:47 -08:00
Gavin McCullagh
2072c228c9 [TCP]: use non-delayed ACK for congestion control RTT
When a delayed ACK representing two packets arrives, there are two RTT
samples available, one for each packet.  The first (in order of seq
number) will be artificially long due to the delay waiting for the
second packet, the second will trigger the ACK and so will not itself
be delayed.

According to rfc1323, the SRTT used for RTO calculation should use the
first rtt, so receivers echo the timestamp from the first packet in
the delayed ack.  For congestion control however, it seems measuring
delayed ack delay is not desirable as it varies independently of
congestion.

The patch below causes seq_rtt and last_ackt to be updated with any
available later packet rtts which should have less (and hopefully
zero) delack delay.  The rtt value then gets passed to
ca_ops->pkts_acked().

Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
within a TSO chunk (!fully_acked), using only the final ACK (which
includes any TSO delay) to generate RTTs.  This patch removes these
checks so RTTs are passed for each ACK to ca_ops->pkts_acked().

For non-delay based congestion control (cubic, h-tcp), rtt is
sometimes used for rtt-scaling.  In shortening the RTT, this may make
them a little less aggressive.  Delay-based schemes (eg vegas, veno,
illinois) should get a cleaner, more accurate congestion signal,
particularly for small cwnds.  The congestion control module can
potentially also filter out bad RTTs due to the delayed ack alarm by
looking at the associated cnt which (where delayed acking is in use)
should probably be 1 if the alarm went off or greater if the ACK was
triggered by a packet.

Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-29 19:11:21 -08:00