diff options
author | David S. Miller <davem@davemloft.net> | 2015-10-03 04:32:52 -0700 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2015-10-03 04:32:52 -0700 |
commit | c3fc7ac9a0b978ee8538058743d21feef25f7b33 (patch) | |
tree | 0caf05649d27830ba0f9548704abbb1ec4b5bb91 /net/ipv4/inet_hashtables.c | |
parent | f6d3125fa3c2f55ddf7cf69365c41089de6cfae6 (diff) | |
parent | e994b2f0fb9229aeff5eea9541320bd7b2ca8714 (diff) | |
download | kernel_replicant_linux-c3fc7ac9a0b978ee8538058743d21feef25f7b33.tar.gz kernel_replicant_linux-c3fc7ac9a0b978ee8538058743d21feef25f7b33.tar.bz2 kernel_replicant_linux-c3fc7ac9a0b978ee8538058743d21feef25f7b33.zip |
Merge branch 'tcp-lockless-listener'
Eric Dumazet says:
====================
tcp/dccp: lockless listener
TCP listener refactoring : this is becoming interesting !
This patch series takes the steps to use normal TCP/DCCP ehash
table to store SYN_RECV requests, instead of the private per-listener
hash table we had until now.
SYNACK skb are now attached to their syn_recv request socket,
so that we no longer heavily modify listener sk_wmem_alloc.
listener lock is no longer held in fast path, including
SYNCOOKIE mode.
During my tests, my server was able to process 3,500,000
SYN packets per second on one listener and still had available
cpu cycles.
That is about 2 to 3 order of magnitude what we had with older kernels.
This effort started two years ago and I am pleased to reach expectations.
We'll probably extend SO_REUSEPORT to add proper cpu/numa affinities,
so that heavy duty TCP servers can get proper siloing thanks to multi-queues
NIC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/inet_hashtables.c')
-rw-r--r-- | net/ipv4/inet_hashtables.c | 14 |
1 files changed, 12 insertions, 2 deletions
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 56742e995dd3..bed8886a4b6c 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -398,14 +398,18 @@ static u32 inet_sk_port_offset(const struct sock *sk) inet->inet_dport); } -void __inet_hash_nolisten(struct sock *sk, struct sock *osk) +/* insert a socket into ehash, and eventually remove another one + * (The another one can be a SYN_RECV or TIMEWAIT + */ +int inet_ehash_insert(struct sock *sk, struct sock *osk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct hlist_nulls_head *list; struct inet_ehash_bucket *head; spinlock_t *lock; + int ret = 0; - WARN_ON(!sk_unhashed(sk)); + WARN_ON_ONCE(!sk_unhashed(sk)); sk->sk_hash = sk_ehashfn(sk); head = inet_ehash_bucket(hashinfo, sk->sk_hash); @@ -419,6 +423,12 @@ void __inet_hash_nolisten(struct sock *sk, struct sock *osk) sk_nulls_del_node_init_rcu(osk); } spin_unlock(lock); + return ret; +} + +void __inet_hash_nolisten(struct sock *sk, struct sock *osk) +{ + inet_ehash_insert(sk, osk); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); } EXPORT_SYMBOL_GPL(__inet_hash_nolisten); |