svn commit: r222488 - in head/sys: contrib/pf/net netinet
netinet/ipfw netinet6
Kristof Provost
kristof at sigsegv.be
Sat Jun 4 14:30:48 UTC 2011
Hi,
I'm seeing a panic when I start natd, which I suspect is related to this
commit:
panic: Lock pcbinfohash not exclusively locked @
/usr/src/sys/netinet/in_pcb.c:323
Backtrace:
panic() ...
_rw_assert() at _rw_assert+0x18d
in_pcbbind() at in_pcbbind+0xf5
div_bind() at div_bind+0xb9
sobind() at sobind()+0x79
kern_bind() at kern_bind+0xde
bind() at bind()+0x41
syscallenter() at syscallenter+0x1cb
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xdd
div_bind probably also needs to surround the call to in_pcbbind with
INP_HASHW(UN)LOCK(...)
I'm currently running 222680. I've only now seen the issue, but I've also
just now activated INVARIANTS.
Regards,
Kristof
On 2011-05-30 09:43:55 (+0000), Robert Watson <rwatson at FreeBSD.org> wrote:
> Author: rwatson
> Date: Mon May 30 09:43:55 2011
> New Revision: 222488
> URL: http://svn.freebsd.org/changeset/base/222488
>
> Log:
> Decompose the current single inpcbinfo lock into two locks:
>
> - The existing ipi_lock continues to protect the global inpcb list and
> inpcb counter. This lock is now relegated to a small number of
> allocation and free operations, and occasional operations that walk
> all connections (including, awkwardly, certain UDP multicast receive
> operations -- something to revisit).
>
> - A new ipi_hash_lock protects the two inpcbinfo hash tables for
> looking up connections and bound sockets, manipulated using new
> INP_HASH_*() macros. This lock, combined with inpcb locks, protects
> the 4-tuple address space.
>
> Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
> connection locks, so may be acquired while manipulating a connection on
> which a lock is already held, avoiding the need to acquire the inpcbinfo
> lock preemptively when a binding change might later be required. As a
> result, however, lookup operations necessarily go through a reference
> acquire while holding the lookup lock, later acquiring an inpcb lock --
> if required.
>
> A new function in_pcblookup() looks up connections, and accepts flags
> indicating how to return the inpcb. Due to lock order changes, callers
> no longer need acquire locks before performing a lookup: the lookup
> routine will acquire the ipi_hash_lock as needed. In the future, it will
> also be able to use alternative lookup and locking strategies
> transparently to callers, such as pcbgroup lookup. New lookup flags are,
> supplementing the existing INPLOOKUP_WILDCARD flag:
>
> INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
> INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
>
> Callers must pass exactly one of these flags (for the time being).
>
> Some notes:
>
> - All protocols are updated to work within the new regime; especially,
> TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
> eliminated, and global hash lock hold times are dramatically reduced
> compared to previous locking.
> - The TCP syncache still relies on the pcbinfo lock, something that we
> may want to revisit.
> - Support for reverting to the FreeBSD 7.x locking strategy in TCP input
> is no longer available -- hash lookup locks are now held only very
> briefly during inpcb lookup, rather than for potentially extended
> periods. However, the pcbinfo ipi_lock will still be acquired if a
> connection state might change such that a connection is added or
> removed.
> - Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
> due to maintaining their own hash tables.
> - The interface in6_pcblookup_hash_locked() is maintained, which allows
> callers to acquire hash locks and perform one or more lookups atomically
> with 4-tuple allocation: this is required only for TCPv6, as there is no
> in6_pcbconnect_setup(), which there should be.
> - UDPv6 locking remains significantly more conservative than UDPv4
> locking, which relates to source address selection. This needs
> attention, as it likely significantly reduces parallelism in this code
> for multithreaded socket use (such as in BIND).
> - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
> somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
> is no longer sufficient. A second check once the inpcb lock is held
> should do the trick, keeping the general case from requiring the inpcb
> lock for every inpcb visited.
> - This work reminds us that we need to revisit locking of the v4/v6 flags,
> which may be accessed lock-free both before and after this change.
> - Right now, a single lock name is used for the pcbhash lock -- this is
> undesirable, and probably another argument is required to take care of
> this (or a char array name field in the pcbinfo?).
>
> This is not an MFC candidate for 8.x due to its impact on lookup and
> locking semantics. It's possible some of these issues could be worked
> around with compatibility wrappers, if necessary.
>
> Reviewed by: bz
> Sponsored by: Juniper Networks, Inc.
>
> Modified:
> head/sys/contrib/pf/net/pf.c
> head/sys/netinet/in_pcb.c
> head/sys/netinet/in_pcb.h
> head/sys/netinet/ip_divert.c
> head/sys/netinet/ipfw/ip_fw2.c
> head/sys/netinet/raw_ip.c
> head/sys/netinet/siftr.c
> head/sys/netinet/tcp_input.c
> head/sys/netinet/tcp_subr.c
> head/sys/netinet/tcp_syncache.c
> head/sys/netinet/tcp_timer.c
> head/sys/netinet/tcp_usrreq.c
> head/sys/netinet/udp_usrreq.c
> head/sys/netinet6/in6_pcb.c
> head/sys/netinet6/in6_pcb.h
> head/sys/netinet6/in6_src.c
> head/sys/netinet6/udp6_usrreq.c
>
> Modified: head/sys/contrib/pf/net/pf.c
> ==============================================================================
> --- head/sys/contrib/pf/net/pf.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/contrib/pf/net/pf.c Mon May 30 09:43:55 2011 (r222488)
> @@ -3034,16 +3034,14 @@ pf_socket_lookup(int direction, struct p
> #ifdef INET
> case AF_INET:
> #ifdef __FreeBSD__
> - INP_INFO_RLOCK(pi); /* XXX LOR */
> - inp = in_pcblookup_hash(pi, saddr->v4, sport, daddr->v4,
> - dport, 0, NULL);
> + inp = in_pcblookup(pi, saddr->v4, sport, daddr->v4,
> + dport, INPLOOKUP_RLOCKPCB, NULL);
> if (inp == NULL) {
> - inp = in_pcblookup_hash(pi, saddr->v4, sport,
> - daddr->v4, dport, INPLOOKUP_WILDCARD, NULL);
> - if(inp == NULL) {
> - INP_INFO_RUNLOCK(pi);
> + inp = in_pcblookup(pi, saddr->v4, sport,
> + daddr->v4, dport, INPLOOKUP_WILDCARD |
> + INPLOOKUP_RLOCKPCB, NULL);
> + if (inp == NULL)
> return (-1);
> - }
> }
> #else
> inp = in_pcbhashlookup(tb, saddr->v4, sport, daddr->v4, dport);
> @@ -3058,16 +3056,14 @@ pf_socket_lookup(int direction, struct p
> #ifdef INET6
> case AF_INET6:
> #ifdef __FreeBSD__
> - INP_INFO_RLOCK(pi);
> - inp = in6_pcblookup_hash(pi, &saddr->v6, sport,
> - &daddr->v6, dport, 0, NULL);
> + inp = in6_pcblookup(pi, &saddr->v6, sport,
> + &daddr->v6, dport, INPLOOKUP_RLOCKPCB, NULL);
> if (inp == NULL) {
> - inp = in6_pcblookup_hash(pi, &saddr->v6, sport,
> - &daddr->v6, dport, INPLOOKUP_WILDCARD, NULL);
> - if (inp == NULL) {
> - INP_INFO_RUNLOCK(pi);
> + inp = in6_pcblookup(pi, &saddr->v6, sport,
> + &daddr->v6, dport, INPLOOKUP_WILDCARD |
> + INPLOOKUP_RLOCKPCB, NULL);
> + if (inp == NULL)
> return (-1);
> - }
> }
> #else
> inp = in6_pcbhashlookup(tb, &saddr->v6, sport, &daddr->v6,
> @@ -3085,9 +3081,10 @@ pf_socket_lookup(int direction, struct p
> return (-1);
> }
> #ifdef __FreeBSD__
> + INP_RLOCK_ASSERT(inp);
> pd->lookup.uid = inp->inp_cred->cr_uid;
> pd->lookup.gid = inp->inp_cred->cr_groups[0];
> - INP_INFO_RUNLOCK(pi);
> + INP_RUNLOCK(inp);
> #else
> pd->lookup.uid = inp->inp_socket->so_euid;
> pd->lookup.gid = inp->inp_socket->so_egid;
>
> Modified: head/sys/netinet/in_pcb.c
> ==============================================================================
> --- head/sys/netinet/in_pcb.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/in_pcb.c Mon May 30 09:43:55 2011 (r222488)
> @@ -127,6 +127,10 @@ static VNET_DEFINE(int, ipport_tcplastco
>
> #define V_ipport_tcplastcount VNET(ipport_tcplastcount)
>
> +static struct inpcb *in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo,
> + struct in_addr faddr, u_int fport_arg,
> + struct in_addr laddr, u_int lport_arg,
> + int lookupflags, struct ifnet *ifp);
> static void in_pcbremlists(struct inpcb *inp);
>
> #ifdef INET
> @@ -212,11 +216,13 @@ in_pcbinfo_init(struct inpcbinfo *pcbinf
> {
>
> INP_INFO_LOCK_INIT(pcbinfo, name);
> + INP_HASH_LOCK_INIT(pcbinfo, "pcbinfohash"); /* XXXRW: argument? */
> #ifdef VIMAGE
> pcbinfo->ipi_vnet = curvnet;
> #endif
> pcbinfo->ipi_listhead = listhead;
> LIST_INIT(pcbinfo->ipi_listhead);
> + pcbinfo->ipi_count = 0;
> pcbinfo->ipi_hashbase = hashinit(hash_nelements, M_PCB,
> &pcbinfo->ipi_hashmask);
> pcbinfo->ipi_porthashbase = hashinit(porthash_nelements, M_PCB,
> @@ -234,10 +240,14 @@ void
> in_pcbinfo_destroy(struct inpcbinfo *pcbinfo)
> {
>
> + KASSERT(pcbinfo->ipi_count == 0,
> + ("%s: ipi_count = %u", __func__, pcbinfo->ipi_count));
> +
> hashdestroy(pcbinfo->ipi_hashbase, M_PCB, pcbinfo->ipi_hashmask);
> hashdestroy(pcbinfo->ipi_porthashbase, M_PCB,
> pcbinfo->ipi_porthashmask);
> uma_zdestroy(pcbinfo->ipi_zone);
> + INP_HASH_LOCK_DESTROY(pcbinfo);
> INP_INFO_LOCK_DESTROY(pcbinfo);
> }
>
> @@ -309,8 +319,8 @@ in_pcbbind(struct inpcb *inp, struct soc
> {
> int anonport, error;
>
> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
> INP_WLOCK_ASSERT(inp);
> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>
> if (inp->inp_lport != 0 || inp->inp_laddr.s_addr != INADDR_ANY)
> return (EINVAL);
> @@ -351,8 +361,8 @@ in_pcb_lport(struct inpcb *inp, struct i
> * Because no actual state changes occur here, a global write lock on
> * the pcbinfo isn't required.
> */
> - INP_INFO_LOCK_ASSERT(pcbinfo);
> INP_LOCK_ASSERT(inp);
> + INP_HASH_LOCK_ASSERT(pcbinfo);
>
> if (inp->inp_flags & INP_HIGHPORT) {
> first = V_ipport_hifirstauto; /* sysctl */
> @@ -473,11 +483,10 @@ in_pcbbind_setup(struct inpcb *inp, stru
> int error;
>
> /*
> - * Because no actual state changes occur here, a global write lock on
> - * the pcbinfo isn't required.
> + * No state changes, so read locks are sufficient here.
> */
> - INP_INFO_LOCK_ASSERT(pcbinfo);
> INP_LOCK_ASSERT(inp);
> + INP_HASH_LOCK_ASSERT(pcbinfo);
>
> if (TAILQ_EMPTY(&V_in_ifaddrhead)) /* XXX broken! */
> return (EADDRNOTAVAIL);
> @@ -618,8 +627,8 @@ in_pcbconnect(struct inpcb *inp, struct
> in_addr_t laddr, faddr;
> int anonport, error;
>
> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
> INP_WLOCK_ASSERT(inp);
> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>
> lport = inp->inp_lport;
> laddr = inp->inp_laddr.s_addr;
> @@ -907,8 +916,8 @@ in_pcbconnect_setup(struct inpcb *inp, s
> * Because a global state change doesn't actually occur here, a read
> * lock is sufficient.
> */
> - INP_INFO_LOCK_ASSERT(inp->inp_pcbinfo);
> INP_LOCK_ASSERT(inp);
> + INP_HASH_LOCK_ASSERT(inp->inp_pcbinfo);
>
> if (oinpp != NULL)
> *oinpp = NULL;
> @@ -983,8 +992,8 @@ in_pcbconnect_setup(struct inpcb *inp, s
> if (error)
> return (error);
> }
> - oinp = in_pcblookup_hash(inp->inp_pcbinfo, faddr, fport, laddr, lport,
> - 0, NULL);
> + oinp = in_pcblookup_hash_locked(inp->inp_pcbinfo, faddr, fport,
> + laddr, lport, 0, NULL);
> if (oinp != NULL) {
> if (oinpp != NULL)
> *oinpp = oinp;
> @@ -1007,8 +1016,8 @@ void
> in_pcbdisconnect(struct inpcb *inp)
> {
>
> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
> INP_WLOCK_ASSERT(inp);
> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>
> inp->inp_faddr.s_addr = INADDR_ANY;
> inp->inp_fport = 0;
> @@ -1187,19 +1196,24 @@ void
> in_pcbdrop(struct inpcb *inp)
> {
>
> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
> INP_WLOCK_ASSERT(inp);
>
> + /*
> + * XXXRW: Possibly we should protect the setting of INP_DROPPED with
> + * the hash lock...?
> + */
> inp->inp_flags |= INP_DROPPED;
> if (inp->inp_flags & INP_INHASHLIST) {
> struct inpcbport *phd = inp->inp_phd;
>
> + INP_HASH_WLOCK(inp->inp_pcbinfo);
> LIST_REMOVE(inp, inp_hash);
> LIST_REMOVE(inp, inp_portlist);
> if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
> LIST_REMOVE(phd, phd_hash);
> free(phd, M_PCB);
> }
> + INP_HASH_WUNLOCK(inp->inp_pcbinfo);
> inp->inp_flags &= ~INP_INHASHLIST;
> }
> }
> @@ -1328,7 +1342,8 @@ in_pcbpurgeif0(struct inpcbinfo *pcbinfo
> }
>
> /*
> - * Lookup a PCB based on the local address and port.
> + * Lookup a PCB based on the local address and port. Caller must hold the
> + * hash lock. No inpcb locks or references are acquired.
> */
> #define INP_LOOKUP_MAPPED_PCB_COST 3
> struct inpcb *
> @@ -1346,7 +1361,7 @@ in_pcblookup_local(struct inpcbinfo *pcb
> KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
> ("%s: invalid lookup flags %d", __func__, lookupflags));
>
> - INP_INFO_LOCK_ASSERT(pcbinfo);
> + INP_HASH_LOCK_ASSERT(pcbinfo);
>
> if ((lookupflags & INPLOOKUP_WILDCARD) == 0) {
> struct inpcbhead *head;
> @@ -1450,10 +1465,12 @@ in_pcblookup_local(struct inpcbinfo *pcb
> #undef INP_LOOKUP_MAPPED_PCB_COST
>
> /*
> - * Lookup PCB in hash list.
> + * Lookup PCB in hash list, using pcbinfo tables. This variation assumes
> + * that the caller has locked the hash list, and will not perform any further
> + * locking or reference operations on either the hash list or the connection.
> */
> -struct inpcb *
> -in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
> +static struct inpcb *
> +in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo, struct in_addr faddr,
> u_int fport_arg, struct in_addr laddr, u_int lport_arg, int lookupflags,
> struct ifnet *ifp)
> {
> @@ -1464,7 +1481,7 @@ in_pcblookup_hash(struct inpcbinfo *pcbi
> KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
> ("%s: invalid lookup flags %d", __func__, lookupflags));
>
> - INP_INFO_LOCK_ASSERT(pcbinfo);
> + INP_HASH_LOCK_ASSERT(pcbinfo);
>
> /*
> * First look for an exact match.
> @@ -1574,6 +1591,56 @@ in_pcblookup_hash(struct inpcbinfo *pcbi
>
> return (NULL);
> }
> +
> +/*
> + * Lookup PCB in hash list, using pcbinfo tables. This variation locks the
> + * hash list lock, and will return the inpcb locked (i.e., requires
> + * INPLOOKUP_LOCKPCB).
> + */
> +static struct inpcb *
> +in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
> + u_int fport, struct in_addr laddr, u_int lport, int lookupflags,
> + struct ifnet *ifp)
> +{
> + struct inpcb *inp;
> +
> + INP_HASH_RLOCK(pcbinfo);
> + inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport,
> + (lookupflags & ~(INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)), ifp);
> + if (inp != NULL) {
> + in_pcbref(inp);
> + INP_HASH_RUNLOCK(pcbinfo);
> + if (lookupflags & INPLOOKUP_WLOCKPCB) {
> + INP_WLOCK(inp);
> + if (in_pcbrele_wlocked(inp))
> + return (NULL);
> + } else if (lookupflags & INPLOOKUP_RLOCKPCB) {
> + INP_RLOCK(inp);
> + if (in_pcbrele_rlocked(inp))
> + return (NULL);
> + } else
> + panic("%s: locking bug", __func__);
> + } else
> + INP_HASH_RUNLOCK(pcbinfo);
> + return (inp);
> +}
> +
> +/*
> + * Public inpcb lookup routines, accepting a 4-tuple.
> + */
> +struct inpcb *
> +in_pcblookup(struct inpcbinfo *pcbinfo, struct in_addr faddr, u_int fport,
> + struct in_addr laddr, u_int lport, int lookupflags, struct ifnet *ifp)
> +{
> +
> + KASSERT((lookupflags & ~INPLOOKUP_MASK) == 0,
> + ("%s: invalid lookup flags %d", __func__, lookupflags));
> + KASSERT((lookupflags & (INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)) != 0,
> + ("%s: LOCKPCB not set", __func__));
> +
> + return (in_pcblookup_hash(pcbinfo, faddr, fport, laddr, lport,
> + lookupflags, ifp));
> +}
> #endif /* INET */
>
> /*
> @@ -1588,8 +1655,9 @@ in_pcbinshash(struct inpcb *inp)
> struct inpcbport *phd;
> u_int32_t hashkey_faddr;
>
> - INP_INFO_WLOCK_ASSERT(pcbinfo);
> INP_WLOCK_ASSERT(inp);
> + INP_HASH_WLOCK_ASSERT(pcbinfo);
> +
> KASSERT((inp->inp_flags & INP_INHASHLIST) == 0,
> ("in_pcbinshash: INP_INHASHLIST"));
>
> @@ -1645,8 +1713,9 @@ in_pcbrehash(struct inpcb *inp)
> struct inpcbhead *head;
> u_int32_t hashkey_faddr;
>
> - INP_INFO_WLOCK_ASSERT(pcbinfo);
> INP_WLOCK_ASSERT(inp);
> + INP_HASH_WLOCK_ASSERT(pcbinfo);
> +
> KASSERT(inp->inp_flags & INP_INHASHLIST,
> ("in_pcbrehash: !INP_INHASHLIST"));
>
> @@ -1679,12 +1748,14 @@ in_pcbremlists(struct inpcb *inp)
> if (inp->inp_flags & INP_INHASHLIST) {
> struct inpcbport *phd = inp->inp_phd;
>
> + INP_HASH_WLOCK(pcbinfo);
> LIST_REMOVE(inp, inp_hash);
> LIST_REMOVE(inp, inp_portlist);
> if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
> LIST_REMOVE(phd, phd_hash);
> free(phd, M_PCB);
> }
> + INP_HASH_WUNLOCK(pcbinfo);
> inp->inp_flags &= ~INP_INHASHLIST;
> }
> LIST_REMOVE(inp, inp_list);
>
> Modified: head/sys/netinet/in_pcb.h
> ==============================================================================
> --- head/sys/netinet/in_pcb.h Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/in_pcb.h Mon May 30 09:43:55 2011 (r222488)
> @@ -268,22 +268,22 @@ struct inpcbport {
> * Global data structure for each high-level protocol (UDP, TCP, ...) in both
> * IPv4 and IPv6. Holds inpcb lists and information for managing them.
> *
> - * Each pcbinfo is protected by ipi_lock, covering mutable global fields (such
> - * as the global pcb list) and hashed lookup tables. The lock order is:
> + * Each pcbinfo is protected by two locks: ipi_lock and ipi_hash_lock,
> + * the former covering mutable global fields (such as the global pcb list),
> + * and the latter covering the hashed lookup tables. The lock order is:
> *
> - * ipi_lock (before) inpcb locks
> + * ipi_lock (before) inpcb locks (before) ipi_hash_lock
> *
> * Locking key:
> *
> * (c) Constant or nearly constant after initialisation
> * (g) Locked by ipi_lock
> - * (h) Read using either ipi_lock or inpcb lock; write requires both.
> + * (h) Read using either ipi_hash_lock or inpcb lock; write requires both.
> * (x) Synchronisation properties poorly defined
> */
> struct inpcbinfo {
> /*
> - * Global lock protecting global inpcb list, inpcb count, hash tables,
> - * etc.
> + * Global lock protecting global inpcb list, inpcb count, etc.
> */
> struct rwlock ipi_lock;
>
> @@ -312,17 +312,22 @@ struct inpcbinfo {
> struct uma_zone *ipi_zone; /* (c) */
>
> /*
> + * Global lock protecting hash lookup tables.
> + */
> + struct rwlock ipi_hash_lock;
> +
> + /*
> * Global hash of inpcbs, hashed by local and foreign addresses and
> * port numbers.
> */
> - struct inpcbhead *ipi_hashbase; /* (g) */
> - u_long ipi_hashmask; /* (g) */
> + struct inpcbhead *ipi_hashbase; /* (h) */
> + u_long ipi_hashmask; /* (h) */
>
> /*
> * Global hash of inpcbs, hashed by only local port number.
> */
> - struct inpcbporthead *ipi_porthashbase; /* (g) */
> - u_long ipi_porthashmask; /* (g) */
> + struct inpcbporthead *ipi_porthashbase; /* (h) */
> + u_long ipi_porthashmask; /* (h) */
>
> /*
> * Pointer to network stack instance
> @@ -406,6 +411,18 @@ void inp_4tuple_get(struct inpcb *inp,
> #define INP_INFO_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_WLOCKED)
> #define INP_INFO_UNLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_UNLOCKED)
>
> +#define INP_HASH_LOCK_INIT(ipi, d) \
> + rw_init_flags(&(ipi)->ipi_hash_lock, (d), 0)
> +#define INP_HASH_LOCK_DESTROY(ipi) rw_destroy(&(ipi)->ipi_hash_lock)
> +#define INP_HASH_RLOCK(ipi) rw_rlock(&(ipi)->ipi_hash_lock)
> +#define INP_HASH_WLOCK(ipi) rw_wlock(&(ipi)->ipi_hash_lock)
> +#define INP_HASH_RUNLOCK(ipi) rw_runlock(&(ipi)->ipi_hash_lock)
> +#define INP_HASH_WUNLOCK(ipi) rw_wunlock(&(ipi)->ipi_hash_lock)
> +#define INP_HASH_LOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_hash_lock, \
> + RA_LOCKED)
> +#define INP_HASH_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_hash_lock, \
> + RA_WLOCKED)
> +
> #define INP_PCBHASH(faddr, lport, fport, mask) \
> (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask))
> #define INP_PCBPORTHASH(lport, mask) \
> @@ -466,7 +483,16 @@ void inp_4tuple_get(struct inpcb *inp,
> #define INP_LLE_VALID 0x00000001 /* cached lle is valid */
> #define INP_RT_VALID 0x00000002 /* cached rtentry is valid */
>
> -#define INPLOOKUP_WILDCARD 1
> +/*
> + * Flags passed to in_pcblookup*() functions.
> + */
> +#define INPLOOKUP_WILDCARD 0x00000001 /* Allow wildcard sockets. */
> +#define INPLOOKUP_RLOCKPCB 0x00000002 /* Return inpcb read-locked. */
> +#define INPLOOKUP_WLOCKPCB 0x00000004 /* Return inpcb write-locked. */
> +
> +#define INPLOOKUP_MASK (INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB | \
> + INPLOOKUP_WLOCKPCB)
> +
> #define sotoinpcb(so) ((struct inpcb *)(so)->so_pcb)
> #define sotoin6pcb(so) sotoinpcb(so) /* for KAME src sync over BSD*'s */
>
> @@ -527,7 +553,7 @@ struct inpcb *
> in_pcblookup_local(struct inpcbinfo *,
> struct in_addr, u_short, int, struct ucred *);
> struct inpcb *
> - in_pcblookup_hash(struct inpcbinfo *, struct in_addr, u_int,
> + in_pcblookup(struct inpcbinfo *, struct in_addr, u_int,
> struct in_addr, u_int, int, struct ifnet *);
> void in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr,
> int, struct inpcb *(*)(struct inpcb *, int));
>
> Modified: head/sys/netinet/ip_divert.c
> ==============================================================================
> --- head/sys/netinet/ip_divert.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/ip_divert.c Mon May 30 09:43:55 2011 (r222488)
> @@ -659,9 +659,9 @@ div_pcblist(SYSCTL_HANDLER_ARGS)
> INP_INFO_WLOCK(&V_divcbinfo);
> for (i = 0; i < n; i++) {
> inp = inp_list[i];
> - INP_WLOCK(inp);
> - if (!in_pcbrele(inp))
> - INP_WUNLOCK(inp);
> + INP_RLOCK(inp);
> + if (!in_pcbrele_rlocked(inp))
> + INP_RUNLOCK(inp);
> }
> INP_INFO_WUNLOCK(&V_divcbinfo);
>
>
> Modified: head/sys/netinet/ipfw/ip_fw2.c
> ==============================================================================
> --- head/sys/netinet/ipfw/ip_fw2.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/ipfw/ip_fw2.c Mon May 30 09:43:55 2011 (r222488)
> @@ -657,7 +657,7 @@ check_uidgid(ipfw_insn_u32 *insn, int pr
> (struct bsd_ucred *)uc, ugid_lookupp, ((struct mbuf *)inp)->m_skb);
> #else /* FreeBSD */
> struct inpcbinfo *pi;
> - int wildcard;
> + int lookupflags;
> struct inpcb *pcb;
> int match;
>
> @@ -682,30 +682,31 @@ check_uidgid(ipfw_insn_u32 *insn, int pr
> if (*ugid_lookupp == -1)
> return (0);
> if (proto == IPPROTO_TCP) {
> - wildcard = 0;
> + lookupflags = 0;
> pi = &V_tcbinfo;
> } else if (proto == IPPROTO_UDP) {
> - wildcard = INPLOOKUP_WILDCARD;
> + lookupflags = INPLOOKUP_WILDCARD;
> pi = &V_udbinfo;
> } else
> return 0;
> + lookupflags |= INPLOOKUP_RLOCKPCB;
> match = 0;
> if (*ugid_lookupp == 0) {
> - INP_INFO_RLOCK(pi);
> pcb = (oif) ?
> - in_pcblookup_hash(pi,
> + in_pcblookup(pi,
> dst_ip, htons(dst_port),
> src_ip, htons(src_port),
> - wildcard, oif) :
> - in_pcblookup_hash(pi,
> + lookupflags, oif) :
> + in_pcblookup(pi,
> src_ip, htons(src_port),
> dst_ip, htons(dst_port),
> - wildcard, NULL);
> + lookupflags, NULL);
> if (pcb != NULL) {
> + INP_RLOCK_ASSERT(pcb);
> *uc = crhold(pcb->inp_cred);
> *ugid_lookupp = 1;
> + INP_RUNLOCK(pcb);
> }
> - INP_INFO_RUNLOCK(pi);
> if (*ugid_lookupp == 0) {
> /*
> * We tried and failed, set the variable to -1
> @@ -1827,21 +1828,32 @@ do { \
> else
> break;
>
> + /*
> + * XXXRW: so_user_cookie should almost
> + * certainly be inp_user_cookie?
> + */
> +
> /* For incomming packet, lookup up the
> inpcb using the src/dest ip/port tuple */
> if (inp == NULL) {
> - INP_INFO_RLOCK(pi);
> - inp = in_pcblookup_hash(pi,
> + inp = in_pcblookup(pi,
> src_ip, htons(src_port),
> dst_ip, htons(dst_port),
> - 0, NULL);
> - INP_INFO_RUNLOCK(pi);
> - }
> -
> - if (inp && inp->inp_socket) {
> - tablearg = inp->inp_socket->so_user_cookie;
> - if (tablearg)
> - match = 1;
> + INPLOOKUP_RLOCKPCB, NULL);
> + if (inp != NULL) {
> + tablearg =
> + inp->inp_socket->so_user_cookie;
> + if (tablearg)
> + match = 1;
> + INP_RUNLOCK(inp);
> + }
> + } else {
> + if (inp->inp_socket) {
> + tablearg =
> + inp->inp_socket->so_user_cookie;
> + if (tablearg)
> + match = 1;
> + }
> }
> break;
> }
>
> Modified: head/sys/netinet/raw_ip.c
> ==============================================================================
> --- head/sys/netinet/raw_ip.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/raw_ip.c Mon May 30 09:43:55 2011 (r222488)
> @@ -226,7 +226,7 @@ rip_append(struct inpcb *last, struct ip
> {
> int policyfail = 0;
>
> - INP_RLOCK_ASSERT(last);
> + INP_LOCK_ASSERT(last);
>
> #ifdef IPSEC
> /* check AH/ESP integrity. */
> @@ -834,16 +834,19 @@ rip_detach(struct socket *so)
> static void
> rip_dodisconnect(struct socket *so, struct inpcb *inp)
> {
> + struct inpcbinfo *pcbinfo;
>
> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
> - INP_WLOCK_ASSERT(inp);
> -
> + pcbinfo = inp->inp_pcbinfo;
> + INP_INFO_WLOCK(pcbinfo);
> + INP_WLOCK(inp);
> rip_delhash(inp);
> inp->inp_faddr.s_addr = INADDR_ANY;
> rip_inshash(inp);
> SOCK_LOCK(so);
> so->so_state &= ~SS_ISCONNECTED;
> SOCK_UNLOCK(so);
> + INP_WUNLOCK(inp);
> + INP_INFO_WUNLOCK(pcbinfo);
> }
>
> static void
> @@ -854,11 +857,7 @@ rip_abort(struct socket *so)
> inp = sotoinpcb(so);
> KASSERT(inp != NULL, ("rip_abort: inp == NULL"));
>
> - INP_INFO_WLOCK(&V_ripcbinfo);
> - INP_WLOCK(inp);
> rip_dodisconnect(so, inp);
> - INP_WUNLOCK(inp);
> - INP_INFO_WUNLOCK(&V_ripcbinfo);
> }
>
> static void
> @@ -869,11 +868,7 @@ rip_close(struct socket *so)
> inp = sotoinpcb(so);
> KASSERT(inp != NULL, ("rip_close: inp == NULL"));
>
> - INP_INFO_WLOCK(&V_ripcbinfo);
> - INP_WLOCK(inp);
> rip_dodisconnect(so, inp);
> - INP_WUNLOCK(inp);
> - INP_INFO_WUNLOCK(&V_ripcbinfo);
> }
>
> static int
> @@ -887,11 +882,7 @@ rip_disconnect(struct socket *so)
> inp = sotoinpcb(so);
> KASSERT(inp != NULL, ("rip_disconnect: inp == NULL"));
>
> - INP_INFO_WLOCK(&V_ripcbinfo);
> - INP_WLOCK(inp);
> rip_dodisconnect(so, inp);
> - INP_WUNLOCK(inp);
> - INP_INFO_WUNLOCK(&V_ripcbinfo);
> return (0);
> }
>
> @@ -1077,9 +1068,9 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
> INP_INFO_WLOCK(&V_ripcbinfo);
> for (i = 0; i < n; i++) {
> inp = inp_list[i];
> - INP_WLOCK(inp);
> - if (!in_pcbrele(inp))
> - INP_WUNLOCK(inp);
> + INP_RLOCK(inp);
> + if (!in_pcbrele_rlocked(inp))
> + INP_RUNLOCK(inp);
> }
> INP_INFO_WUNLOCK(&V_ripcbinfo);
>
>
> Modified: head/sys/netinet/siftr.c
> ==============================================================================
> --- head/sys/netinet/siftr.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/siftr.c Mon May 30 09:43:55 2011 (r222488)
> @@ -696,17 +696,16 @@ siftr_findinpcb(int ipver, struct ip *ip
>
> /* We need the tcbinfo lock. */
> INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> - INP_INFO_RLOCK(&V_tcbinfo);
>
> if (dir == PFIL_IN)
> inp = (ipver == INP_IPV4 ?
> - in_pcblookup_hash(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst,
> - dport, 0, m->m_pkthdr.rcvif)
> + in_pcblookup(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst,
> + dport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif)
> :
> #ifdef SIFTR_IPV6
> - in6_pcblookup_hash(&V_tcbinfo,
> + in6_pcblookup(&V_tcbinfo,
> &((struct ip6_hdr *)ip)->ip6_src, sport,
> - &((struct ip6_hdr *)ip)->ip6_dst, dport, 0,
> + &((struct ip6_hdr *)ip)->ip6_dst, dport, INPLOOKUP_RLOCKPCB,
> m->m_pkthdr.rcvif)
> #else
> NULL
> @@ -715,13 +714,13 @@ siftr_findinpcb(int ipver, struct ip *ip
>
> else
> inp = (ipver == INP_IPV4 ?
> - in_pcblookup_hash(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src,
> - sport, 0, m->m_pkthdr.rcvif)
> + in_pcblookup(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src,
> + sport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif)
> :
> #ifdef SIFTR_IPV6
> - in6_pcblookup_hash(&V_tcbinfo,
> + in6_pcblookup(&V_tcbinfo,
> &((struct ip6_hdr *)ip)->ip6_dst, dport,
> - &((struct ip6_hdr *)ip)->ip6_src, sport, 0,
> + &((struct ip6_hdr *)ip)->ip6_src, sport, INPLOOKUP_RLOCKPCB,
> m->m_pkthdr.rcvif)
> #else
> NULL
> @@ -734,12 +733,7 @@ siftr_findinpcb(int ipver, struct ip *ip
> ss->nskip_in_inpcb++;
> else
> ss->nskip_out_inpcb++;
> - } else {
> - /* Acquire the inpcb lock. */
> - INP_UNLOCK_ASSERT(inp);
> - INP_RLOCK(inp);
> }
> - INP_INFO_RUNLOCK(&V_tcbinfo);
>
> return (inp);
> }
>
> Modified: head/sys/netinet/tcp_input.c
> ==============================================================================
> --- head/sys/netinet/tcp_input.c Mon May 30 09:41:38 2011 (r222487)
> +++ head/sys/netinet/tcp_input.c Mon May 30 09:43:55 2011 (r222488)
> @@ -5,6 +5,7 @@
> * Swinburne University of Technology, Melbourne, Australia.
> * Copyright (c) 2009-2010 Lawrence Stewart <lstewart at freebsd.org>
> * Copyright (c) 2010 The FreeBSD Foundation
> + * Copyright (c) 2010-2011 Juniper Networks, Inc.
> * All rights reserved.
> *
> * Portions of this software were developed at the Centre for Advanced Internet
> @@ -16,6 +17,9 @@
> * Internet Architectures, Swinburne University of Technology, Melbourne,
> * Australia by David Hayes under sponsorship from the FreeBSD Foundation.
> *
> + * Portions of this software were developed by Robert N. M. Watson under
> + * contract to Juniper Networks, Inc.
> + *
> * Redistribution and use in source and binary forms, with or without
> * modification, are permitted provided that the following conditions
> * are met:
> @@ -197,10 +201,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
> &VNET_NAME(tcp_autorcvbuf_max), 0,
> "Max size of automatic receive buffer");
>
> -int tcp_read_locking = 1;
> -SYSCTL_INT(_net_inet_tcp, OID_AUTO, read_locking, CTLFLAG_RW,
> - &tcp_read_locking, 0, "Enable read locking strategy");
> -
> VNET_DEFINE(struct inpcbhead, tcb);
> #define tcb6 tcb /* for KAME src sync over BSD*'s */
> VNET_DEFINE(struct inpcbinfo, tcbinfo);
> @@ -591,8 +591,7 @@ tcp_input(struct mbuf *m, int off0)
> char *s = NULL; /* address and port logging */
> int ti_locked;
> #define TI_UNLOCKED 1
> -#define TI_RLOCKED 2
> -#define TI_WLOCKED 3
> +#define TI_WLOCKED 2
>
> #ifdef TCPDEBUG
> /*
> @@ -756,30 +755,25 @@ tcp_input(struct mbuf *m, int off0)
> drop_hdrlen = off0 + off;
>
> /*
> - * Locate pcb for segment, which requires a lock on tcbinfo.
> - * Optimisticaly acquire a global read lock rather than a write lock
> - * unless header flags necessarily imply a state change. There are
> - * two cases where we might discover later we need a write lock
> - * despite the flags: ACKs moving a connection out of the syncache,
> - * and ACKs for a connection in TIMEWAIT.
> + * Locate pcb for segment; if we're likely to add or remove a
> + * connection then first acquire pcbinfo lock. There are two cases
> + * where we might discover later we need a write lock despite the
> + * flags: ACKs moving a connection out of the syncache, and ACKs for
> + * a connection in TIMEWAIT.
> */
> - if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 ||
> - tcp_read_locking == 0) {
> + if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0) {
> INP_INFO_WLOCK(&V_tcbinfo);
> ti_locked = TI_WLOCKED;
> - } else {
> - INP_INFO_RLOCK(&V_tcbinfo);
> - ti_locked = TI_RLOCKED;
> - }
> + } else
> + ti_locked = TI_UNLOCKED;
>
> findpcb:
> #ifdef INVARIANTS
> - if (ti_locked == TI_RLOCKED)
> - INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
> - else if (ti_locked == TI_WLOCKED)
> + if (ti_locked == TI_WLOCKED) {
> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
> - else
> - panic("%s: findpcb ti_locked %d\n", __func__, ti_locked);
> + } else {
> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> + }
> #endif
>
> #ifdef INET
> @@ -797,20 +791,18 @@ findpcb:
> * Transparently forwarded. Pretend to be the destination.
> * already got one like this?
> */
> - inp = in_pcblookup_hash(&V_tcbinfo,
> - ip->ip_src, th->th_sport,
> - ip->ip_dst, th->th_dport,
> - 0, m->m_pkthdr.rcvif);
> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src, th->th_sport,
> + ip->ip_dst, th->th_dport, INPLOOKUP_WLOCKPCB,
> + m->m_pkthdr.rcvif);
> if (!inp) {
> - /* It's new. Try to find the ambushing socket. */
> - inp = in_pcblookup_hash(&V_tcbinfo,
> - ip->ip_src, th->th_sport,
> - next_hop->sin_addr,
> - next_hop->sin_port ?
> - ntohs(next_hop->sin_port) :
> - th->th_dport,
> - INPLOOKUP_WILDCARD,
> - m->m_pkthdr.rcvif);
> + /*
> + * It's new. Try to find the ambushing socket.
> + */
> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src,
> + th->th_sport, next_hop->sin_addr,
> + next_hop->sin_port ? ntohs(next_hop->sin_port) :
> + th->th_dport, INPLOOKUP_WILDCARD |
> + INPLOOKUP_WLOCKPCB, m->m_pkthdr.rcvif);
> }
> /* Remove the tag from the packet. We don't need it anymore. */
> m_tag_delete(m, fwd_tag);
> @@ -820,21 +812,19 @@ findpcb:
> {
> #ifdef INET6
> if (isipv6)
> - inp = in6_pcblookup_hash(&V_tcbinfo,
> - &ip6->ip6_src, th->th_sport,
> - &ip6->ip6_dst, th->th_dport,
> - INPLOOKUP_WILDCARD,
> - m->m_pkthdr.rcvif);
> + inp = in6_pcblookup(&V_tcbinfo, &ip6->ip6_src,
> + th->th_sport, &ip6->ip6_dst, th->th_dport,
> + INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB,
> + m->m_pkthdr.rcvif);
> #endif
> #if defined(INET) && defined(INET6)
> else
> #endif
> #ifdef INET
> - inp = in_pcblookup_hash(&V_tcbinfo,
> - ip->ip_src, th->th_sport,
> - ip->ip_dst, th->th_dport,
> - INPLOOKUP_WILDCARD,
> - m->m_pkthdr.rcvif);
> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src,
> + th->th_sport, ip->ip_dst, th->th_dport,
> + INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB,
> + m->m_pkthdr.rcvif);
> #endif
> }
>
> @@ -865,7 +855,7 @@ findpcb:
> rstreason = BANDLIM_RST_CLOSEDPORT;
> goto dropwithreset;
> }
> - INP_WLOCK(inp);
> + INP_WLOCK_ASSERT(inp);
> if (!(inp->inp_flags & INP_HW_FLOWID)
> && (m->m_flags & M_FLOWID)
> && ((inp->inp_socket == NULL)
> @@ -906,28 +896,26 @@ findpcb:
> * legitimate new connection attempt the old INPCB gets removed and
> * we can try again to find a listening socket.
> *
> - * At this point, due to earlier optimism, we may hold a read lock on
> - * the inpcbinfo, rather than a write lock. If so, we need to
> - * upgrade, or if that fails, acquire a reference on the inpcb, drop
> - * all locks, acquire a global write lock, and then re-acquire the
> - * inpcb lock. We may at that point discover that another thread has
> - * tried to free the inpcb, in which case we need to loop back and
> - * try to find a new inpcb to deliver to.
> + * At this point, due to earlier optimism, we may hold only an inpcb
> + * lock, and not the inpcbinfo write lock. If so, we need to try to
> + * acquire it, or if that fails, acquire a reference on the inpcb,
> + * drop all locks, acquire a global write lock, and then re-acquire
> + * the inpcb lock. We may at that point discover that another thread
> + * has tried to free the inpcb, in which case we need to loop back
> + * and try to find a new inpcb to deliver to.
> + *
> + * XXXRW: It may be time to rethink timewait locking.
> */
> relocked:
> if (inp->inp_flags & INP_TIMEWAIT) {
> - KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED,
> - ("%s: INP_TIMEWAIT ti_locked %d", __func__, ti_locked));
> -
> - if (ti_locked == TI_RLOCKED) {
> - if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) {
> + if (ti_locked == TI_UNLOCKED) {
> + if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) {
> in_pcbref(inp);
> INP_WUNLOCK(inp);
> - INP_INFO_RUNLOCK(&V_tcbinfo);
> INP_INFO_WLOCK(&V_tcbinfo);
> ti_locked = TI_WLOCKED;
> INP_WLOCK(inp);
> - if (in_pcbrele(inp)) {
> + if (in_pcbrele_wlocked(inp)) {
> inp = NULL;
> goto findpcb;
> }
> @@ -975,26 +963,24 @@ relocked:
>
> /*
> * We've identified a valid inpcb, but it could be that we need an
> - * inpcbinfo write lock and have only a read lock. In this case,
> - * attempt to upgrade/relock using the same strategy as the TIMEWAIT
> - * case above. If we relock, we have to jump back to 'relocked' as
> - * the connection might now be in TIMEWAIT.
> - */
> - if (tp->t_state != TCPS_ESTABLISHED ||
> - (thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 ||
> - tcp_read_locking == 0) {
> - KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED,
> - ("%s: upgrade check ti_locked %d", __func__, ti_locked));
> -
> - if (ti_locked == TI_RLOCKED) {
> - if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) {
> + * inpcbinfo write lock but don't hold it. In this case, attempt to
> + * acquire using the same strategy as the TIMEWAIT case above. If we
> + * relock, we have to jump back to 'relocked' as the connection might
> + * now be in TIMEWAIT.
> + */
> +#ifdef INVARIANTS
> + if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0)
> + INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
> +#endif
> + if (tp->t_state != TCPS_ESTABLISHED) {
> + if (ti_locked == TI_UNLOCKED) {
> + if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) {
> in_pcbref(inp);
> INP_WUNLOCK(inp);
> - INP_INFO_RUNLOCK(&V_tcbinfo);
> INP_INFO_WLOCK(&V_tcbinfo);
> ti_locked = TI_WLOCKED;
> INP_WLOCK(inp);
> - if (in_pcbrele(inp)) {
> + if (in_pcbrele_wlocked(inp)) {
> inp = NULL;
> goto findpcb;
> }
> @@ -1027,13 +1013,16 @@ relocked:
> /*
> * When the socket is accepting connections (the INPCB is in LISTEN
> * state) we look into the SYN cache if this is a new connection
> - * attempt or the completion of a previous one.
> + * attempt or the completion of a previous one. Because listen
> + * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock will be
> + * held in this case.
> */
> if (so->so_options & SO_ACCEPTCONN) {
> struct in_conninfo inc;
>
> KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting but "
> "tp not listening", __func__));
> + INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>
> bzero(&inc, sizeof(inc));
> #ifdef INET6
> @@ -1371,13 +1360,17 @@ relocked:
> return;
>
> dropwithreset:
> - if (ti_locked == TI_RLOCKED)
> - INP_INFO_RUNLOCK(&V_tcbinfo);
> - else if (ti_locked == TI_WLOCKED)
> + if (ti_locked == TI_WLOCKED) {
> INP_INFO_WUNLOCK(&V_tcbinfo);
> - else
> - panic("%s: dropwithreset ti_locked %d", __func__, ti_locked);
> - ti_locked = TI_UNLOCKED;
> + ti_locked = TI_UNLOCKED;
> + }
> +#ifdef INVARIANTS
> + else {
> + KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropwithreset "
> + "ti_locked: %d", __func__, ti_locked));
> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> + }
> +#endif
>
> if (inp != NULL) {
> tcp_dropwithreset(m, th, tp, tlen, rstreason);
> @@ -1388,13 +1381,17 @@ dropwithreset:
> goto drop;
>
> dropunlock:
> - if (ti_locked == TI_RLOCKED)
> - INP_INFO_RUNLOCK(&V_tcbinfo);
> - else if (ti_locked == TI_WLOCKED)
> + if (ti_locked == TI_WLOCKED) {
> INP_INFO_WUNLOCK(&V_tcbinfo);
> - else
> - panic("%s: dropunlock ti_locked %d", __func__, ti_locked);
> - ti_locked = TI_UNLOCKED;
> + ti_locked = TI_UNLOCKED;
> + }
> +#ifdef INVARIANTS
> + else {
> + KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropunlock "
> + "ti_locked: %d", __func__, ti_locked));
> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> + }
> +#endif
>
> if (inp != NULL)
> INP_WUNLOCK(inp);
> @@ -1449,13 +1446,13 @@ tcp_do_segment(struct mbuf *m, struct tc
> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
> } else {
> #ifdef INVARIANTS
> - if (ti_locked == TI_RLOCKED)
> - INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
>
> *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
> _______________________________________________
> svn-src-head at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/svn-src-head
> To unsubscribe, send any mail to "svn-src-head-unsubscribe at freebsd.org"
>
More information about the svn-src-all
mailing list