aha/net/ipv4
Jiri Olsa a57de0b433 net: adding memory barrier to the poll and receive callbacks
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp->rcv_nxt
  ...                        ...
  tp->rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                        wake_up_interruptible(sk->sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09 17:06:57 -07:00
..
netfilter netfilter: tcp conntrack: fix unacknowledged data detection with NAT 2009-06-29 14:07:56 +02:00
af_inet.c ipv4: remove ip_mc_drop_socket() declaration from af_inet.c. 2009-06-03 21:43:26 -07:00
ah4.c netns xfrm: AH/ESP in netns! 2008-11-25 17:59:27 -08:00
arp.c Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification" 2009-06-30 19:47:08 -07:00
cipso_ipv4.c netlabel: Label incoming TCP connections correctly in SELinux 2009-03-28 15:01:36 +11:00
datagram.c
devinet.c net: Fix devinet_sysctl_forward 2009-05-18 22:15:58 -07:00
esp4.c netns xfrm: AH/ESP in netns! 2008-11-25 17:59:27 -08:00
fib_frontend.c ipv4: cleanup: remove unnecessary include. 2009-05-18 15:16:38 -07:00
fib_hash.c ipv4: cleanup - remove two unused parameters from fib_semantic_match(). 2009-05-18 15:16:37 -07:00
fib_lookup.h ipv4: cleanup - remove two unused parameters from fib_semantic_match(). 2009-05-18 15:16:37 -07:00
fib_rules.c net: Remove unused parameter from fill method in fib_rules_ops. 2009-05-20 17:26:23 -07:00
fib_semantics.c ipv4: cleanup - remove two unused parameters from fib_semantic_match(). 2009-05-18 15:16:37 -07:00
fib_trie.c ipv4: Fix fib_trie rebalancing, part 4 (root thresholds) 2009-07-08 10:46:45 -07:00
icmp.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
igmp.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
inet_connection_sock.c net: move bsockets outside of read only beginning of struct inet_hashinfo 2009-02-01 12:31:33 -08:00
inet_diag.c net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
inet_fragment.c inet fragments: fix sparse warning: context imbalance 2009-02-26 23:13:35 -08:00
inet_hashtables.c net: move bsockets outside of read only beginning of struct inet_hashinfo 2009-02-01 12:31:33 -08:00
inet_lro.c include/net net/ - csum_partial - remove unnecessary casts 2008-11-19 15:44:53 -08:00
inet_timewait_sock.c Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck 2009-06-16 13:09:51 -07:00
inetpeer.c net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c 2008-11-03 00:23:42 -08:00
ip_forward.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
ip_fragment.c ipv4: Use frag list abstraction interfaces. 2009-06-09 00:19:37 -07:00
ip_gre.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
ip_input.c inet: Call skb_orphan before tproxy activates 2009-06-26 19:22:37 -07:00
ip_options.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
ip_output.c net: No more expensive sock_hold()/sock_put() on each tx 2009-06-11 02:55:43 -07:00
ip_sockglue.c net: skb->rtable accessor 2009-06-03 02:51:02 -07:00
ipcomp.c netns xfrm: state lookup in netns 2008-11-25 17:30:50 -08:00
ipconfig.c ipv4: teach ipconfig about the MTU option in DHCP 2009-05-19 15:36:17 -07:00
ipip.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
ipmr.c PIM-SM: namespace changes 2009-06-14 03:16:13 -07:00
Kconfig ipv4: update ARPD help text 2009-06-13 23:36:32 -07:00
Makefile IPVS: Move IPVS to net/netfilter/ipvs 2008-10-07 08:38:24 +11:00
netfilter.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
proc.c snmp: add missing counters for RFC 4293 2009-04-27 02:45:02 -07:00
protocol.c
raw.c net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
route.c ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off 2009-06-23 16:36:26 -07:00
syncookies.c syncookies: remove last_synq_overflow from struct tcp_sock 2009-04-20 02:25:26 -07:00
sysctl_net_ipv4.c net: '&' redux 2008-11-03 18:21:05 -08:00
tcp.c net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
tcp_bic.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_cong.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_cubic.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_diag.c net: inet_diag_handler structs can be const 2008-11-19 15:43:27 -08:00
tcp_highspeed.c
tcp_htcp.c htcp: merge icsk_ca_state compare 2009-03-02 03:00:14 -08:00
tcp_hybla.c tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. 2008-10-07 15:58:17 -07:00
tcp_illinois.c
tcp_input.c tcp: fix loop in ofo handling code and reduce its complexity 2009-05-29 15:02:29 -07:00
tcp_ipv4.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
tcp_lp.c
tcp_minisocks.c tcp: missing check ACK flag of received segment in FIN-WAIT-2 state 2009-06-25 20:03:15 -07:00
tcp_output.c tcp: Stop non-TSO packets morphing into TSO 2009-06-29 19:41:39 -07:00
tcp_probe.c tcp: '< 0' test on unsigned 2009-03-13 16:05:14 -07:00
tcp_scalable.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_timer.c tcp: cleanup ca_state mess in tcp_timer 2009-03-02 03:00:13 -08:00
tcp_vegas.c tcp: tcp_vegas ssthresh bugfix 2009-05-25 22:44:59 -07:00
tcp_vegas.h
tcp_veno.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_westwood.c
tcp_yeah.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tunnel4.c
udp.c net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
udp_impl.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
udplite.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
xfrm4_input.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
xfrm4_mode_beet.c ipsec: Interfamily IPSec BEET 2008-08-06 02:39:30 -07:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
xfrm4_output.c net: skb->dst accessors 2009-06-03 02:51:04 -07:00
xfrm4_policy.c xfrm4: fix the ports decode of sctp protocol 2009-07-03 19:10:06 -07:00
xfrm4_state.c xfrm: remove useless forward declarations 2008-11-25 01:05:54 -08:00
xfrm4_tunnel.c