svn commit: r222488 - in head/sys: contrib/pf/net netinet
netinet/ipfw netinet6
Robert Watson
rwatson at FreeBSD.org
Sat Jun 4 16:27:57 UTC 2011
On Sat, 4 Jun 2011, Kristof Provost wrote:
> I'm seeing a panic when I start natd, which I suspect is related to this
> commit:
And, I believe now fixed in the just-committed r222690. My suggestion is that
people running -CURRENT should generally keep INVARIANTS and WITNESS on --
they introduce some overhead, but are invaluable in tracking down problems.
Thanks!
Robert
>
> panic: Lock pcbinfohash not exclusively locked @
> /usr/src/sys/netinet/in_pcb.c:323
>
> Backtrace:
> panic() ...
> _rw_assert() at _rw_assert+0x18d
> in_pcbbind() at in_pcbbind+0xf5
> div_bind() at div_bind+0xb9
> sobind() at sobind()+0x79
> kern_bind() at kern_bind+0xde
> bind() at bind()+0x41
> syscallenter() at syscallenter+0x1cb
> syscall() at syscall+0x4c
> Xfast_syscall() at Xfast_syscall+0xdd
>
> div_bind probably also needs to surround the call to in_pcbbind with
> INP_HASHW(UN)LOCK(...)
>
> I'm currently running 222680. I've only now seen the issue, but I've also
> just now activated INVARIANTS.
>
> Regards,
> Kristof
>
> On 2011-05-30 09:43:55 (+0000), Robert Watson <rwatson at FreeBSD.org> wrote:
>> Author: rwatson
>> Date: Mon May 30 09:43:55 2011
>> New Revision: 222488
>> URL: http://svn.freebsd.org/changeset/base/222488
>>
>> Log:
>> Decompose the current single inpcbinfo lock into two locks:
>>
>> - The existing ipi_lock continues to protect the global inpcb list and
>> inpcb counter. This lock is now relegated to a small number of
>> allocation and free operations, and occasional operations that walk
>> all connections (including, awkwardly, certain UDP multicast receive
>> operations -- something to revisit).
>>
>> - A new ipi_hash_lock protects the two inpcbinfo hash tables for
>> looking up connections and bound sockets, manipulated using new
>> INP_HASH_*() macros. This lock, combined with inpcb locks, protects
>> the 4-tuple address space.
>>
>> Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
>> connection locks, so may be acquired while manipulating a connection on
>> which a lock is already held, avoiding the need to acquire the inpcbinfo
>> lock preemptively when a binding change might later be required. As a
>> result, however, lookup operations necessarily go through a reference
>> acquire while holding the lookup lock, later acquiring an inpcb lock --
>> if required.
>>
>> A new function in_pcblookup() looks up connections, and accepts flags
>> indicating how to return the inpcb. Due to lock order changes, callers
>> no longer need acquire locks before performing a lookup: the lookup
>> routine will acquire the ipi_hash_lock as needed. In the future, it will
>> also be able to use alternative lookup and locking strategies
>> transparently to callers, such as pcbgroup lookup. New lookup flags are,
>> supplementing the existing INPLOOKUP_WILDCARD flag:
>>
>> INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
>> INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
>>
>> Callers must pass exactly one of these flags (for the time being).
>>
>> Some notes:
>>
>> - All protocols are updated to work within the new regime; especially,
>> TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
>> eliminated, and global hash lock hold times are dramatically reduced
>> compared to previous locking.
>> - The TCP syncache still relies on the pcbinfo lock, something that we
>> may want to revisit.
>> - Support for reverting to the FreeBSD 7.x locking strategy in TCP input
>> is no longer available -- hash lookup locks are now held only very
>> briefly during inpcb lookup, rather than for potentially extended
>> periods. However, the pcbinfo ipi_lock will still be acquired if a
>> connection state might change such that a connection is added or
>> removed.
>> - Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
>> due to maintaining their own hash tables.
>> - The interface in6_pcblookup_hash_locked() is maintained, which allows
>> callers to acquire hash locks and perform one or more lookups atomically
>> with 4-tuple allocation: this is required only for TCPv6, as there is no
>> in6_pcbconnect_setup(), which there should be.
>> - UDPv6 locking remains significantly more conservative than UDPv4
>> locking, which relates to source address selection. This needs
>> attention, as it likely significantly reduces parallelism in this code
>> for multithreaded socket use (such as in BIND).
>> - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
>> somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
>> is no longer sufficient. A second check once the inpcb lock is held
>> should do the trick, keeping the general case from requiring the inpcb
>> lock for every inpcb visited.
>> - This work reminds us that we need to revisit locking of the v4/v6 flags,
>> which may be accessed lock-free both before and after this change.
>> - Right now, a single lock name is used for the pcbhash lock -- this is
>> undesirable, and probably another argument is required to take care of
>> this (or a char array name field in the pcbinfo?).
>>
>> This is not an MFC candidate for 8.x due to its impact on lookup and
>> locking semantics. It's possible some of these issues could be worked
>> around with compatibility wrappers, if necessary.
>>
>> Reviewed by: bz
>> Sponsored by: Juniper Networks, Inc.
>>
>> Modified:
>> head/sys/contrib/pf/net/pf.c
>> head/sys/netinet/in_pcb.c
>> head/sys/netinet/in_pcb.h
>> head/sys/netinet/ip_divert.c
>> head/sys/netinet/ipfw/ip_fw2.c
>> head/sys/netinet/raw_ip.c
>> head/sys/netinet/siftr.c
>> head/sys/netinet/tcp_input.c
>> head/sys/netinet/tcp_subr.c
>> head/sys/netinet/tcp_syncache.c
>> head/sys/netinet/tcp_timer.c
>> head/sys/netinet/tcp_usrreq.c
>> head/sys/netinet/udp_usrreq.c
>> head/sys/netinet6/in6_pcb.c
>> head/sys/netinet6/in6_pcb.h
>> head/sys/netinet6/in6_src.c
>> head/sys/netinet6/udp6_usrreq.c
>>
>> Modified: head/sys/contrib/pf/net/pf.c
>> ==============================================================================
>> --- head/sys/contrib/pf/net/pf.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/contrib/pf/net/pf.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -3034,16 +3034,14 @@ pf_socket_lookup(int direction, struct p
>> #ifdef INET
>> case AF_INET:
>> #ifdef __FreeBSD__
>> - INP_INFO_RLOCK(pi); /* XXX LOR */
>> - inp = in_pcblookup_hash(pi, saddr->v4, sport, daddr->v4,
>> - dport, 0, NULL);
>> + inp = in_pcblookup(pi, saddr->v4, sport, daddr->v4,
>> + dport, INPLOOKUP_RLOCKPCB, NULL);
>> if (inp == NULL) {
>> - inp = in_pcblookup_hash(pi, saddr->v4, sport,
>> - daddr->v4, dport, INPLOOKUP_WILDCARD, NULL);
>> - if(inp == NULL) {
>> - INP_INFO_RUNLOCK(pi);
>> + inp = in_pcblookup(pi, saddr->v4, sport,
>> + daddr->v4, dport, INPLOOKUP_WILDCARD |
>> + INPLOOKUP_RLOCKPCB, NULL);
>> + if (inp == NULL)
>> return (-1);
>> - }
>> }
>> #else
>> inp = in_pcbhashlookup(tb, saddr->v4, sport, daddr->v4, dport);
>> @@ -3058,16 +3056,14 @@ pf_socket_lookup(int direction, struct p
>> #ifdef INET6
>> case AF_INET6:
>> #ifdef __FreeBSD__
>> - INP_INFO_RLOCK(pi);
>> - inp = in6_pcblookup_hash(pi, &saddr->v6, sport,
>> - &daddr->v6, dport, 0, NULL);
>> + inp = in6_pcblookup(pi, &saddr->v6, sport,
>> + &daddr->v6, dport, INPLOOKUP_RLOCKPCB, NULL);
>> if (inp == NULL) {
>> - inp = in6_pcblookup_hash(pi, &saddr->v6, sport,
>> - &daddr->v6, dport, INPLOOKUP_WILDCARD, NULL);
>> - if (inp == NULL) {
>> - INP_INFO_RUNLOCK(pi);
>> + inp = in6_pcblookup(pi, &saddr->v6, sport,
>> + &daddr->v6, dport, INPLOOKUP_WILDCARD |
>> + INPLOOKUP_RLOCKPCB, NULL);
>> + if (inp == NULL)
>> return (-1);
>> - }
>> }
>> #else
>> inp = in6_pcbhashlookup(tb, &saddr->v6, sport, &daddr->v6,
>> @@ -3085,9 +3081,10 @@ pf_socket_lookup(int direction, struct p
>> return (-1);
>> }
>> #ifdef __FreeBSD__
>> + INP_RLOCK_ASSERT(inp);
>> pd->lookup.uid = inp->inp_cred->cr_uid;
>> pd->lookup.gid = inp->inp_cred->cr_groups[0];
>> - INP_INFO_RUNLOCK(pi);
>> + INP_RUNLOCK(inp);
>> #else
>> pd->lookup.uid = inp->inp_socket->so_euid;
>> pd->lookup.gid = inp->inp_socket->so_egid;
>>
>> Modified: head/sys/netinet/in_pcb.c
>> ==============================================================================
>> --- head/sys/netinet/in_pcb.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/in_pcb.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -127,6 +127,10 @@ static VNET_DEFINE(int, ipport_tcplastco
>>
>> #define V_ipport_tcplastcount VNET(ipport_tcplastcount)
>>
>> +static struct inpcb *in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo,
>> + struct in_addr faddr, u_int fport_arg,
>> + struct in_addr laddr, u_int lport_arg,
>> + int lookupflags, struct ifnet *ifp);
>> static void in_pcbremlists(struct inpcb *inp);
>>
>> #ifdef INET
>> @@ -212,11 +216,13 @@ in_pcbinfo_init(struct inpcbinfo *pcbinf
>> {
>>
>> INP_INFO_LOCK_INIT(pcbinfo, name);
>> + INP_HASH_LOCK_INIT(pcbinfo, "pcbinfohash"); /* XXXRW: argument? */
>> #ifdef VIMAGE
>> pcbinfo->ipi_vnet = curvnet;
>> #endif
>> pcbinfo->ipi_listhead = listhead;
>> LIST_INIT(pcbinfo->ipi_listhead);
>> + pcbinfo->ipi_count = 0;
>> pcbinfo->ipi_hashbase = hashinit(hash_nelements, M_PCB,
>> &pcbinfo->ipi_hashmask);
>> pcbinfo->ipi_porthashbase = hashinit(porthash_nelements, M_PCB,
>> @@ -234,10 +240,14 @@ void
>> in_pcbinfo_destroy(struct inpcbinfo *pcbinfo)
>> {
>>
>> + KASSERT(pcbinfo->ipi_count == 0,
>> + ("%s: ipi_count = %u", __func__, pcbinfo->ipi_count));
>> +
>> hashdestroy(pcbinfo->ipi_hashbase, M_PCB, pcbinfo->ipi_hashmask);
>> hashdestroy(pcbinfo->ipi_porthashbase, M_PCB,
>> pcbinfo->ipi_porthashmask);
>> uma_zdestroy(pcbinfo->ipi_zone);
>> + INP_HASH_LOCK_DESTROY(pcbinfo);
>> INP_INFO_LOCK_DESTROY(pcbinfo);
>> }
>>
>> @@ -309,8 +319,8 @@ in_pcbbind(struct inpcb *inp, struct soc
>> {
>> int anonport, error;
>>
>> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>> INP_WLOCK_ASSERT(inp);
>> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>>
>> if (inp->inp_lport != 0 || inp->inp_laddr.s_addr != INADDR_ANY)
>> return (EINVAL);
>> @@ -351,8 +361,8 @@ in_pcb_lport(struct inpcb *inp, struct i
>> * Because no actual state changes occur here, a global write lock on
>> * the pcbinfo isn't required.
>> */
>> - INP_INFO_LOCK_ASSERT(pcbinfo);
>> INP_LOCK_ASSERT(inp);
>> + INP_HASH_LOCK_ASSERT(pcbinfo);
>>
>> if (inp->inp_flags & INP_HIGHPORT) {
>> first = V_ipport_hifirstauto; /* sysctl */
>> @@ -473,11 +483,10 @@ in_pcbbind_setup(struct inpcb *inp, stru
>> int error;
>>
>> /*
>> - * Because no actual state changes occur here, a global write lock on
>> - * the pcbinfo isn't required.
>> + * No state changes, so read locks are sufficient here.
>> */
>> - INP_INFO_LOCK_ASSERT(pcbinfo);
>> INP_LOCK_ASSERT(inp);
>> + INP_HASH_LOCK_ASSERT(pcbinfo);
>>
>> if (TAILQ_EMPTY(&V_in_ifaddrhead)) /* XXX broken! */
>> return (EADDRNOTAVAIL);
>> @@ -618,8 +627,8 @@ in_pcbconnect(struct inpcb *inp, struct
>> in_addr_t laddr, faddr;
>> int anonport, error;
>>
>> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>> INP_WLOCK_ASSERT(inp);
>> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>>
>> lport = inp->inp_lport;
>> laddr = inp->inp_laddr.s_addr;
>> @@ -907,8 +916,8 @@ in_pcbconnect_setup(struct inpcb *inp, s
>> * Because a global state change doesn't actually occur here, a read
>> * lock is sufficient.
>> */
>> - INP_INFO_LOCK_ASSERT(inp->inp_pcbinfo);
>> INP_LOCK_ASSERT(inp);
>> + INP_HASH_LOCK_ASSERT(inp->inp_pcbinfo);
>>
>> if (oinpp != NULL)
>> *oinpp = NULL;
>> @@ -983,8 +992,8 @@ in_pcbconnect_setup(struct inpcb *inp, s
>> if (error)
>> return (error);
>> }
>> - oinp = in_pcblookup_hash(inp->inp_pcbinfo, faddr, fport, laddr, lport,
>> - 0, NULL);
>> + oinp = in_pcblookup_hash_locked(inp->inp_pcbinfo, faddr, fport,
>> + laddr, lport, 0, NULL);
>> if (oinp != NULL) {
>> if (oinpp != NULL)
>> *oinpp = oinp;
>> @@ -1007,8 +1016,8 @@ void
>> in_pcbdisconnect(struct inpcb *inp)
>> {
>>
>> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>> INP_WLOCK_ASSERT(inp);
>> + INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>>
>> inp->inp_faddr.s_addr = INADDR_ANY;
>> inp->inp_fport = 0;
>> @@ -1187,19 +1196,24 @@ void
>> in_pcbdrop(struct inpcb *inp)
>> {
>>
>> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>> INP_WLOCK_ASSERT(inp);
>>
>> + /*
>> + * XXXRW: Possibly we should protect the setting of INP_DROPPED with
>> + * the hash lock...?
>> + */
>> inp->inp_flags |= INP_DROPPED;
>> if (inp->inp_flags & INP_INHASHLIST) {
>> struct inpcbport *phd = inp->inp_phd;
>>
>> + INP_HASH_WLOCK(inp->inp_pcbinfo);
>> LIST_REMOVE(inp, inp_hash);
>> LIST_REMOVE(inp, inp_portlist);
>> if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
>> LIST_REMOVE(phd, phd_hash);
>> free(phd, M_PCB);
>> }
>> + INP_HASH_WUNLOCK(inp->inp_pcbinfo);
>> inp->inp_flags &= ~INP_INHASHLIST;
>> }
>> }
>> @@ -1328,7 +1342,8 @@ in_pcbpurgeif0(struct inpcbinfo *pcbinfo
>> }
>>
>> /*
>> - * Lookup a PCB based on the local address and port.
>> + * Lookup a PCB based on the local address and port. Caller must hold the
>> + * hash lock. No inpcb locks or references are acquired.
>> */
>> #define INP_LOOKUP_MAPPED_PCB_COST 3
>> struct inpcb *
>> @@ -1346,7 +1361,7 @@ in_pcblookup_local(struct inpcbinfo *pcb
>> KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
>> ("%s: invalid lookup flags %d", __func__, lookupflags));
>>
>> - INP_INFO_LOCK_ASSERT(pcbinfo);
>> + INP_HASH_LOCK_ASSERT(pcbinfo);
>>
>> if ((lookupflags & INPLOOKUP_WILDCARD) == 0) {
>> struct inpcbhead *head;
>> @@ -1450,10 +1465,12 @@ in_pcblookup_local(struct inpcbinfo *pcb
>> #undef INP_LOOKUP_MAPPED_PCB_COST
>>
>> /*
>> - * Lookup PCB in hash list.
>> + * Lookup PCB in hash list, using pcbinfo tables. This variation assumes
>> + * that the caller has locked the hash list, and will not perform any further
>> + * locking or reference operations on either the hash list or the connection.
>> */
>> -struct inpcb *
>> -in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
>> +static struct inpcb *
>> +in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo, struct in_addr faddr,
>> u_int fport_arg, struct in_addr laddr, u_int lport_arg, int lookupflags,
>> struct ifnet *ifp)
>> {
>> @@ -1464,7 +1481,7 @@ in_pcblookup_hash(struct inpcbinfo *pcbi
>> KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
>> ("%s: invalid lookup flags %d", __func__, lookupflags));
>>
>> - INP_INFO_LOCK_ASSERT(pcbinfo);
>> + INP_HASH_LOCK_ASSERT(pcbinfo);
>>
>> /*
>> * First look for an exact match.
>> @@ -1574,6 +1591,56 @@ in_pcblookup_hash(struct inpcbinfo *pcbi
>>
>> return (NULL);
>> }
>> +
>> +/*
>> + * Lookup PCB in hash list, using pcbinfo tables. This variation locks the
>> + * hash list lock, and will return the inpcb locked (i.e., requires
>> + * INPLOOKUP_LOCKPCB).
>> + */
>> +static struct inpcb *
>> +in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
>> + u_int fport, struct in_addr laddr, u_int lport, int lookupflags,
>> + struct ifnet *ifp)
>> +{
>> + struct inpcb *inp;
>> +
>> + INP_HASH_RLOCK(pcbinfo);
>> + inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport,
>> + (lookupflags & ~(INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)), ifp);
>> + if (inp != NULL) {
>> + in_pcbref(inp);
>> + INP_HASH_RUNLOCK(pcbinfo);
>> + if (lookupflags & INPLOOKUP_WLOCKPCB) {
>> + INP_WLOCK(inp);
>> + if (in_pcbrele_wlocked(inp))
>> + return (NULL);
>> + } else if (lookupflags & INPLOOKUP_RLOCKPCB) {
>> + INP_RLOCK(inp);
>> + if (in_pcbrele_rlocked(inp))
>> + return (NULL);
>> + } else
>> + panic("%s: locking bug", __func__);
>> + } else
>> + INP_HASH_RUNLOCK(pcbinfo);
>> + return (inp);
>> +}
>> +
>> +/*
>> + * Public inpcb lookup routines, accepting a 4-tuple.
>> + */
>> +struct inpcb *
>> +in_pcblookup(struct inpcbinfo *pcbinfo, struct in_addr faddr, u_int fport,
>> + struct in_addr laddr, u_int lport, int lookupflags, struct ifnet *ifp)
>> +{
>> +
>> + KASSERT((lookupflags & ~INPLOOKUP_MASK) == 0,
>> + ("%s: invalid lookup flags %d", __func__, lookupflags));
>> + KASSERT((lookupflags & (INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)) != 0,
>> + ("%s: LOCKPCB not set", __func__));
>> +
>> + return (in_pcblookup_hash(pcbinfo, faddr, fport, laddr, lport,
>> + lookupflags, ifp));
>> +}
>> #endif /* INET */
>>
>> /*
>> @@ -1588,8 +1655,9 @@ in_pcbinshash(struct inpcb *inp)
>> struct inpcbport *phd;
>> u_int32_t hashkey_faddr;
>>
>> - INP_INFO_WLOCK_ASSERT(pcbinfo);
>> INP_WLOCK_ASSERT(inp);
>> + INP_HASH_WLOCK_ASSERT(pcbinfo);
>> +
>> KASSERT((inp->inp_flags & INP_INHASHLIST) == 0,
>> ("in_pcbinshash: INP_INHASHLIST"));
>>
>> @@ -1645,8 +1713,9 @@ in_pcbrehash(struct inpcb *inp)
>> struct inpcbhead *head;
>> u_int32_t hashkey_faddr;
>>
>> - INP_INFO_WLOCK_ASSERT(pcbinfo);
>> INP_WLOCK_ASSERT(inp);
>> + INP_HASH_WLOCK_ASSERT(pcbinfo);
>> +
>> KASSERT(inp->inp_flags & INP_INHASHLIST,
>> ("in_pcbrehash: !INP_INHASHLIST"));
>>
>> @@ -1679,12 +1748,14 @@ in_pcbremlists(struct inpcb *inp)
>> if (inp->inp_flags & INP_INHASHLIST) {
>> struct inpcbport *phd = inp->inp_phd;
>>
>> + INP_HASH_WLOCK(pcbinfo);
>> LIST_REMOVE(inp, inp_hash);
>> LIST_REMOVE(inp, inp_portlist);
>> if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
>> LIST_REMOVE(phd, phd_hash);
>> free(phd, M_PCB);
>> }
>> + INP_HASH_WUNLOCK(pcbinfo);
>> inp->inp_flags &= ~INP_INHASHLIST;
>> }
>> LIST_REMOVE(inp, inp_list);
>>
>> Modified: head/sys/netinet/in_pcb.h
>> ==============================================================================
>> --- head/sys/netinet/in_pcb.h Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/in_pcb.h Mon May 30 09:43:55 2011 (r222488)
>> @@ -268,22 +268,22 @@ struct inpcbport {
>> * Global data structure for each high-level protocol (UDP, TCP, ...) in both
>> * IPv4 and IPv6. Holds inpcb lists and information for managing them.
>> *
>> - * Each pcbinfo is protected by ipi_lock, covering mutable global fields (such
>> - * as the global pcb list) and hashed lookup tables. The lock order is:
>> + * Each pcbinfo is protected by two locks: ipi_lock and ipi_hash_lock,
>> + * the former covering mutable global fields (such as the global pcb list),
>> + * and the latter covering the hashed lookup tables. The lock order is:
>> *
>> - * ipi_lock (before) inpcb locks
>> + * ipi_lock (before) inpcb locks (before) ipi_hash_lock
>> *
>> * Locking key:
>> *
>> * (c) Constant or nearly constant after initialisation
>> * (g) Locked by ipi_lock
>> - * (h) Read using either ipi_lock or inpcb lock; write requires both.
>> + * (h) Read using either ipi_hash_lock or inpcb lock; write requires both.
>> * (x) Synchronisation properties poorly defined
>> */
>> struct inpcbinfo {
>> /*
>> - * Global lock protecting global inpcb list, inpcb count, hash tables,
>> - * etc.
>> + * Global lock protecting global inpcb list, inpcb count, etc.
>> */
>> struct rwlock ipi_lock;
>>
>> @@ -312,17 +312,22 @@ struct inpcbinfo {
>> struct uma_zone *ipi_zone; /* (c) */
>>
>> /*
>> + * Global lock protecting hash lookup tables.
>> + */
>> + struct rwlock ipi_hash_lock;
>> +
>> + /*
>> * Global hash of inpcbs, hashed by local and foreign addresses and
>> * port numbers.
>> */
>> - struct inpcbhead *ipi_hashbase; /* (g) */
>> - u_long ipi_hashmask; /* (g) */
>> + struct inpcbhead *ipi_hashbase; /* (h) */
>> + u_long ipi_hashmask; /* (h) */
>>
>> /*
>> * Global hash of inpcbs, hashed by only local port number.
>> */
>> - struct inpcbporthead *ipi_porthashbase; /* (g) */
>> - u_long ipi_porthashmask; /* (g) */
>> + struct inpcbporthead *ipi_porthashbase; /* (h) */
>> + u_long ipi_porthashmask; /* (h) */
>>
>> /*
>> * Pointer to network stack instance
>> @@ -406,6 +411,18 @@ void inp_4tuple_get(struct inpcb *inp,
>> #define INP_INFO_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_WLOCKED)
>> #define INP_INFO_UNLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_lock, RA_UNLOCKED)
>>
>> +#define INP_HASH_LOCK_INIT(ipi, d) \
>> + rw_init_flags(&(ipi)->ipi_hash_lock, (d), 0)
>> +#define INP_HASH_LOCK_DESTROY(ipi) rw_destroy(&(ipi)->ipi_hash_lock)
>> +#define INP_HASH_RLOCK(ipi) rw_rlock(&(ipi)->ipi_hash_lock)
>> +#define INP_HASH_WLOCK(ipi) rw_wlock(&(ipi)->ipi_hash_lock)
>> +#define INP_HASH_RUNLOCK(ipi) rw_runlock(&(ipi)->ipi_hash_lock)
>> +#define INP_HASH_WUNLOCK(ipi) rw_wunlock(&(ipi)->ipi_hash_lock)
>> +#define INP_HASH_LOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_hash_lock, \
>> + RA_LOCKED)
>> +#define INP_HASH_WLOCK_ASSERT(ipi) rw_assert(&(ipi)->ipi_hash_lock, \
>> + RA_WLOCKED)
>> +
>> #define INP_PCBHASH(faddr, lport, fport, mask) \
>> (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask))
>> #define INP_PCBPORTHASH(lport, mask) \
>> @@ -466,7 +483,16 @@ void inp_4tuple_get(struct inpcb *inp,
>> #define INP_LLE_VALID 0x00000001 /* cached lle is valid */
>> #define INP_RT_VALID 0x00000002 /* cached rtentry is valid */
>>
>> -#define INPLOOKUP_WILDCARD 1
>> +/*
>> + * Flags passed to in_pcblookup*() functions.
>> + */
>> +#define INPLOOKUP_WILDCARD 0x00000001 /* Allow wildcard sockets. */
>> +#define INPLOOKUP_RLOCKPCB 0x00000002 /* Return inpcb read-locked. */
>> +#define INPLOOKUP_WLOCKPCB 0x00000004 /* Return inpcb write-locked. */
>> +
>> +#define INPLOOKUP_MASK (INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB | \
>> + INPLOOKUP_WLOCKPCB)
>> +
>> #define sotoinpcb(so) ((struct inpcb *)(so)->so_pcb)
>> #define sotoin6pcb(so) sotoinpcb(so) /* for KAME src sync over BSD*'s */
>>
>> @@ -527,7 +553,7 @@ struct inpcb *
>> in_pcblookup_local(struct inpcbinfo *,
>> struct in_addr, u_short, int, struct ucred *);
>> struct inpcb *
>> - in_pcblookup_hash(struct inpcbinfo *, struct in_addr, u_int,
>> + in_pcblookup(struct inpcbinfo *, struct in_addr, u_int,
>> struct in_addr, u_int, int, struct ifnet *);
>> void in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr,
>> int, struct inpcb *(*)(struct inpcb *, int));
>>
>> Modified: head/sys/netinet/ip_divert.c
>> ==============================================================================
>> --- head/sys/netinet/ip_divert.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/ip_divert.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -659,9 +659,9 @@ div_pcblist(SYSCTL_HANDLER_ARGS)
>> INP_INFO_WLOCK(&V_divcbinfo);
>> for (i = 0; i < n; i++) {
>> inp = inp_list[i];
>> - INP_WLOCK(inp);
>> - if (!in_pcbrele(inp))
>> - INP_WUNLOCK(inp);
>> + INP_RLOCK(inp);
>> + if (!in_pcbrele_rlocked(inp))
>> + INP_RUNLOCK(inp);
>> }
>> INP_INFO_WUNLOCK(&V_divcbinfo);
>>
>>
>> Modified: head/sys/netinet/ipfw/ip_fw2.c
>> ==============================================================================
>> --- head/sys/netinet/ipfw/ip_fw2.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/ipfw/ip_fw2.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -657,7 +657,7 @@ check_uidgid(ipfw_insn_u32 *insn, int pr
>> (struct bsd_ucred *)uc, ugid_lookupp, ((struct mbuf *)inp)->m_skb);
>> #else /* FreeBSD */
>> struct inpcbinfo *pi;
>> - int wildcard;
>> + int lookupflags;
>> struct inpcb *pcb;
>> int match;
>>
>> @@ -682,30 +682,31 @@ check_uidgid(ipfw_insn_u32 *insn, int pr
>> if (*ugid_lookupp == -1)
>> return (0);
>> if (proto == IPPROTO_TCP) {
>> - wildcard = 0;
>> + lookupflags = 0;
>> pi = &V_tcbinfo;
>> } else if (proto == IPPROTO_UDP) {
>> - wildcard = INPLOOKUP_WILDCARD;
>> + lookupflags = INPLOOKUP_WILDCARD;
>> pi = &V_udbinfo;
>> } else
>> return 0;
>> + lookupflags |= INPLOOKUP_RLOCKPCB;
>> match = 0;
>> if (*ugid_lookupp == 0) {
>> - INP_INFO_RLOCK(pi);
>> pcb = (oif) ?
>> - in_pcblookup_hash(pi,
>> + in_pcblookup(pi,
>> dst_ip, htons(dst_port),
>> src_ip, htons(src_port),
>> - wildcard, oif) :
>> - in_pcblookup_hash(pi,
>> + lookupflags, oif) :
>> + in_pcblookup(pi,
>> src_ip, htons(src_port),
>> dst_ip, htons(dst_port),
>> - wildcard, NULL);
>> + lookupflags, NULL);
>> if (pcb != NULL) {
>> + INP_RLOCK_ASSERT(pcb);
>> *uc = crhold(pcb->inp_cred);
>> *ugid_lookupp = 1;
>> + INP_RUNLOCK(pcb);
>> }
>> - INP_INFO_RUNLOCK(pi);
>> if (*ugid_lookupp == 0) {
>> /*
>> * We tried and failed, set the variable to -1
>> @@ -1827,21 +1828,32 @@ do { \
>> else
>> break;
>>
>> + /*
>> + * XXXRW: so_user_cookie should almost
>> + * certainly be inp_user_cookie?
>> + */
>> +
>> /* For incomming packet, lookup up the
>> inpcb using the src/dest ip/port tuple */
>> if (inp == NULL) {
>> - INP_INFO_RLOCK(pi);
>> - inp = in_pcblookup_hash(pi,
>> + inp = in_pcblookup(pi,
>> src_ip, htons(src_port),
>> dst_ip, htons(dst_port),
>> - 0, NULL);
>> - INP_INFO_RUNLOCK(pi);
>> - }
>> -
>> - if (inp && inp->inp_socket) {
>> - tablearg = inp->inp_socket->so_user_cookie;
>> - if (tablearg)
>> - match = 1;
>> + INPLOOKUP_RLOCKPCB, NULL);
>> + if (inp != NULL) {
>> + tablearg =
>> + inp->inp_socket->so_user_cookie;
>> + if (tablearg)
>> + match = 1;
>> + INP_RUNLOCK(inp);
>> + }
>> + } else {
>> + if (inp->inp_socket) {
>> + tablearg =
>> + inp->inp_socket->so_user_cookie;
>> + if (tablearg)
>> + match = 1;
>> + }
>> }
>> break;
>> }
>>
>> Modified: head/sys/netinet/raw_ip.c
>> ==============================================================================
>> --- head/sys/netinet/raw_ip.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/raw_ip.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -226,7 +226,7 @@ rip_append(struct inpcb *last, struct ip
>> {
>> int policyfail = 0;
>>
>> - INP_RLOCK_ASSERT(last);
>> + INP_LOCK_ASSERT(last);
>>
>> #ifdef IPSEC
>> /* check AH/ESP integrity. */
>> @@ -834,16 +834,19 @@ rip_detach(struct socket *so)
>> static void
>> rip_dodisconnect(struct socket *so, struct inpcb *inp)
>> {
>> + struct inpcbinfo *pcbinfo;
>>
>> - INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>> - INP_WLOCK_ASSERT(inp);
>> -
>> + pcbinfo = inp->inp_pcbinfo;
>> + INP_INFO_WLOCK(pcbinfo);
>> + INP_WLOCK(inp);
>> rip_delhash(inp);
>> inp->inp_faddr.s_addr = INADDR_ANY;
>> rip_inshash(inp);
>> SOCK_LOCK(so);
>> so->so_state &= ~SS_ISCONNECTED;
>> SOCK_UNLOCK(so);
>> + INP_WUNLOCK(inp);
>> + INP_INFO_WUNLOCK(pcbinfo);
>> }
>>
>> static void
>> @@ -854,11 +857,7 @@ rip_abort(struct socket *so)
>> inp = sotoinpcb(so);
>> KASSERT(inp != NULL, ("rip_abort: inp == NULL"));
>>
>> - INP_INFO_WLOCK(&V_ripcbinfo);
>> - INP_WLOCK(inp);
>> rip_dodisconnect(so, inp);
>> - INP_WUNLOCK(inp);
>> - INP_INFO_WUNLOCK(&V_ripcbinfo);
>> }
>>
>> static void
>> @@ -869,11 +868,7 @@ rip_close(struct socket *so)
>> inp = sotoinpcb(so);
>> KASSERT(inp != NULL, ("rip_close: inp == NULL"));
>>
>> - INP_INFO_WLOCK(&V_ripcbinfo);
>> - INP_WLOCK(inp);
>> rip_dodisconnect(so, inp);
>> - INP_WUNLOCK(inp);
>> - INP_INFO_WUNLOCK(&V_ripcbinfo);
>> }
>>
>> static int
>> @@ -887,11 +882,7 @@ rip_disconnect(struct socket *so)
>> inp = sotoinpcb(so);
>> KASSERT(inp != NULL, ("rip_disconnect: inp == NULL"));
>>
>> - INP_INFO_WLOCK(&V_ripcbinfo);
>> - INP_WLOCK(inp);
>> rip_dodisconnect(so, inp);
>> - INP_WUNLOCK(inp);
>> - INP_INFO_WUNLOCK(&V_ripcbinfo);
>> return (0);
>> }
>>
>> @@ -1077,9 +1068,9 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
>> INP_INFO_WLOCK(&V_ripcbinfo);
>> for (i = 0; i < n; i++) {
>> inp = inp_list[i];
>> - INP_WLOCK(inp);
>> - if (!in_pcbrele(inp))
>> - INP_WUNLOCK(inp);
>> + INP_RLOCK(inp);
>> + if (!in_pcbrele_rlocked(inp))
>> + INP_RUNLOCK(inp);
>> }
>> INP_INFO_WUNLOCK(&V_ripcbinfo);
>>
>>
>> Modified: head/sys/netinet/siftr.c
>> ==============================================================================
>> --- head/sys/netinet/siftr.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/siftr.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -696,17 +696,16 @@ siftr_findinpcb(int ipver, struct ip *ip
>>
>> /* We need the tcbinfo lock. */
>> INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
>> - INP_INFO_RLOCK(&V_tcbinfo);
>>
>> if (dir == PFIL_IN)
>> inp = (ipver == INP_IPV4 ?
>> - in_pcblookup_hash(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst,
>> - dport, 0, m->m_pkthdr.rcvif)
>> + in_pcblookup(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst,
>> + dport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif)
>> :
>> #ifdef SIFTR_IPV6
>> - in6_pcblookup_hash(&V_tcbinfo,
>> + in6_pcblookup(&V_tcbinfo,
>> &((struct ip6_hdr *)ip)->ip6_src, sport,
>> - &((struct ip6_hdr *)ip)->ip6_dst, dport, 0,
>> + &((struct ip6_hdr *)ip)->ip6_dst, dport, INPLOOKUP_RLOCKPCB,
>> m->m_pkthdr.rcvif)
>> #else
>> NULL
>> @@ -715,13 +714,13 @@ siftr_findinpcb(int ipver, struct ip *ip
>>
>> else
>> inp = (ipver == INP_IPV4 ?
>> - in_pcblookup_hash(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src,
>> - sport, 0, m->m_pkthdr.rcvif)
>> + in_pcblookup(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src,
>> + sport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif)
>> :
>> #ifdef SIFTR_IPV6
>> - in6_pcblookup_hash(&V_tcbinfo,
>> + in6_pcblookup(&V_tcbinfo,
>> &((struct ip6_hdr *)ip)->ip6_dst, dport,
>> - &((struct ip6_hdr *)ip)->ip6_src, sport, 0,
>> + &((struct ip6_hdr *)ip)->ip6_src, sport, INPLOOKUP_RLOCKPCB,
>> m->m_pkthdr.rcvif)
>> #else
>> NULL
>> @@ -734,12 +733,7 @@ siftr_findinpcb(int ipver, struct ip *ip
>> ss->nskip_in_inpcb++;
>> else
>> ss->nskip_out_inpcb++;
>> - } else {
>> - /* Acquire the inpcb lock. */
>> - INP_UNLOCK_ASSERT(inp);
>> - INP_RLOCK(inp);
>> }
>> - INP_INFO_RUNLOCK(&V_tcbinfo);
>>
>> return (inp);
>> }
>>
>> Modified: head/sys/netinet/tcp_input.c
>> ==============================================================================
>> --- head/sys/netinet/tcp_input.c Mon May 30 09:41:38 2011 (r222487)
>> +++ head/sys/netinet/tcp_input.c Mon May 30 09:43:55 2011 (r222488)
>> @@ -5,6 +5,7 @@
>> * Swinburne University of Technology, Melbourne, Australia.
>> * Copyright (c) 2009-2010 Lawrence Stewart <lstewart at freebsd.org>
>> * Copyright (c) 2010 The FreeBSD Foundation
>> + * Copyright (c) 2010-2011 Juniper Networks, Inc.
>> * All rights reserved.
>> *
>> * Portions of this software were developed at the Centre for Advanced Internet
>> @@ -16,6 +17,9 @@
>> * Internet Architectures, Swinburne University of Technology, Melbourne,
>> * Australia by David Hayes under sponsorship from the FreeBSD Foundation.
>> *
>> + * Portions of this software were developed by Robert N. M. Watson under
>> + * contract to Juniper Networks, Inc.
>> + *
>> * Redistribution and use in source and binary forms, with or without
>> * modification, are permitted provided that the following conditions
>> * are met:
>> @@ -197,10 +201,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
>> &VNET_NAME(tcp_autorcvbuf_max), 0,
>> "Max size of automatic receive buffer");
>>
>> -int tcp_read_locking = 1;
>> -SYSCTL_INT(_net_inet_tcp, OID_AUTO, read_locking, CTLFLAG_RW,
>> - &tcp_read_locking, 0, "Enable read locking strategy");
>> -
>> VNET_DEFINE(struct inpcbhead, tcb);
>> #define tcb6 tcb /* for KAME src sync over BSD*'s */
>> VNET_DEFINE(struct inpcbinfo, tcbinfo);
>> @@ -591,8 +591,7 @@ tcp_input(struct mbuf *m, int off0)
>> char *s = NULL; /* address and port logging */
>> int ti_locked;
>> #define TI_UNLOCKED 1
>> -#define TI_RLOCKED 2
>> -#define TI_WLOCKED 3
>> +#define TI_WLOCKED 2
>>
>> #ifdef TCPDEBUG
>> /*
>> @@ -756,30 +755,25 @@ tcp_input(struct mbuf *m, int off0)
>> drop_hdrlen = off0 + off;
>>
>> /*
>> - * Locate pcb for segment, which requires a lock on tcbinfo.
>> - * Optimisticaly acquire a global read lock rather than a write lock
>> - * unless header flags necessarily imply a state change. There are
>> - * two cases where we might discover later we need a write lock
>> - * despite the flags: ACKs moving a connection out of the syncache,
>> - * and ACKs for a connection in TIMEWAIT.
>> + * Locate pcb for segment; if we're likely to add or remove a
>> + * connection then first acquire pcbinfo lock. There are two cases
>> + * where we might discover later we need a write lock despite the
>> + * flags: ACKs moving a connection out of the syncache, and ACKs for
>> + * a connection in TIMEWAIT.
>> */
>> - if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 ||
>> - tcp_read_locking == 0) {
>> + if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0) {
>> INP_INFO_WLOCK(&V_tcbinfo);
>> ti_locked = TI_WLOCKED;
>> - } else {
>> - INP_INFO_RLOCK(&V_tcbinfo);
>> - ti_locked = TI_RLOCKED;
>> - }
>> + } else
>> + ti_locked = TI_UNLOCKED;
>>
>> findpcb:
>> #ifdef INVARIANTS
>> - if (ti_locked == TI_RLOCKED)
>> - INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
>> - else if (ti_locked == TI_WLOCKED)
>> + if (ti_locked == TI_WLOCKED) {
>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>> - else
>> - panic("%s: findpcb ti_locked %d\n", __func__, ti_locked);
>> + } else {
>> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
>> + }
>> #endif
>>
>> #ifdef INET
>> @@ -797,20 +791,18 @@ findpcb:
>> * Transparently forwarded. Pretend to be the destination.
>> * already got one like this?
>> */
>> - inp = in_pcblookup_hash(&V_tcbinfo,
>> - ip->ip_src, th->th_sport,
>> - ip->ip_dst, th->th_dport,
>> - 0, m->m_pkthdr.rcvif);
>> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src, th->th_sport,
>> + ip->ip_dst, th->th_dport, INPLOOKUP_WLOCKPCB,
>> + m->m_pkthdr.rcvif);
>> if (!inp) {
>> - /* It's new. Try to find the ambushing socket. */
>> - inp = in_pcblookup_hash(&V_tcbinfo,
>> - ip->ip_src, th->th_sport,
>> - next_hop->sin_addr,
>> - next_hop->sin_port ?
>> - ntohs(next_hop->sin_port) :
>> - th->th_dport,
>> - INPLOOKUP_WILDCARD,
>> - m->m_pkthdr.rcvif);
>> + /*
>> + * It's new. Try to find the ambushing socket.
>> + */
>> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src,
>> + th->th_sport, next_hop->sin_addr,
>> + next_hop->sin_port ? ntohs(next_hop->sin_port) :
>> + th->th_dport, INPLOOKUP_WILDCARD |
>> + INPLOOKUP_WLOCKPCB, m->m_pkthdr.rcvif);
>> }
>> /* Remove the tag from the packet. We don't need it anymore. */
>> m_tag_delete(m, fwd_tag);
>> @@ -820,21 +812,19 @@ findpcb:
>> {
>> #ifdef INET6
>> if (isipv6)
>> - inp = in6_pcblookup_hash(&V_tcbinfo,
>> - &ip6->ip6_src, th->th_sport,
>> - &ip6->ip6_dst, th->th_dport,
>> - INPLOOKUP_WILDCARD,
>> - m->m_pkthdr.rcvif);
>> + inp = in6_pcblookup(&V_tcbinfo, &ip6->ip6_src,
>> + th->th_sport, &ip6->ip6_dst, th->th_dport,
>> + INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB,
>> + m->m_pkthdr.rcvif);
>> #endif
>> #if defined(INET) && defined(INET6)
>> else
>> #endif
>> #ifdef INET
>> - inp = in_pcblookup_hash(&V_tcbinfo,
>> - ip->ip_src, th->th_sport,
>> - ip->ip_dst, th->th_dport,
>> - INPLOOKUP_WILDCARD,
>> - m->m_pkthdr.rcvif);
>> + inp = in_pcblookup(&V_tcbinfo, ip->ip_src,
>> + th->th_sport, ip->ip_dst, th->th_dport,
>> + INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB,
>> + m->m_pkthdr.rcvif);
>> #endif
>> }
>>
>> @@ -865,7 +855,7 @@ findpcb:
>> rstreason = BANDLIM_RST_CLOSEDPORT;
>> goto dropwithreset;
>> }
>> - INP_WLOCK(inp);
>> + INP_WLOCK_ASSERT(inp);
>> if (!(inp->inp_flags & INP_HW_FLOWID)
>> && (m->m_flags & M_FLOWID)
>> && ((inp->inp_socket == NULL)
>> @@ -906,28 +896,26 @@ findpcb:
>> * legitimate new connection attempt the old INPCB gets removed and
>> * we can try again to find a listening socket.
>> *
>> - * At this point, due to earlier optimism, we may hold a read lock on
>> - * the inpcbinfo, rather than a write lock. If so, we need to
>> - * upgrade, or if that fails, acquire a reference on the inpcb, drop
>> - * all locks, acquire a global write lock, and then re-acquire the
>> - * inpcb lock. We may at that point discover that another thread has
>> - * tried to free the inpcb, in which case we need to loop back and
>> - * try to find a new inpcb to deliver to.
>> + * At this point, due to earlier optimism, we may hold only an inpcb
>> + * lock, and not the inpcbinfo write lock. If so, we need to try to
>> + * acquire it, or if that fails, acquire a reference on the inpcb,
>> + * drop all locks, acquire a global write lock, and then re-acquire
>> + * the inpcb lock. We may at that point discover that another thread
>> + * has tried to free the inpcb, in which case we need to loop back
>> + * and try to find a new inpcb to deliver to.
>> + *
>> + * XXXRW: It may be time to rethink timewait locking.
>> */
>> relocked:
>> if (inp->inp_flags & INP_TIMEWAIT) {
>> - KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED,
>> - ("%s: INP_TIMEWAIT ti_locked %d", __func__, ti_locked));
>> -
>> - if (ti_locked == TI_RLOCKED) {
>> - if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) {
>> + if (ti_locked == TI_UNLOCKED) {
>> + if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) {
>> in_pcbref(inp);
>> INP_WUNLOCK(inp);
>> - INP_INFO_RUNLOCK(&V_tcbinfo);
>> INP_INFO_WLOCK(&V_tcbinfo);
>> ti_locked = TI_WLOCKED;
>> INP_WLOCK(inp);
>> - if (in_pcbrele(inp)) {
>> + if (in_pcbrele_wlocked(inp)) {
>> inp = NULL;
>> goto findpcb;
>> }
>> @@ -975,26 +963,24 @@ relocked:
>>
>> /*
>> * We've identified a valid inpcb, but it could be that we need an
>> - * inpcbinfo write lock and have only a read lock. In this case,
>> - * attempt to upgrade/relock using the same strategy as the TIMEWAIT
>> - * case above. If we relock, we have to jump back to 'relocked' as
>> - * the connection might now be in TIMEWAIT.
>> - */
>> - if (tp->t_state != TCPS_ESTABLISHED ||
>> - (thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 ||
>> - tcp_read_locking == 0) {
>> - KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED,
>> - ("%s: upgrade check ti_locked %d", __func__, ti_locked));
>> -
>> - if (ti_locked == TI_RLOCKED) {
>> - if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) {
>> + * inpcbinfo write lock but don't hold it. In this case, attempt to
>> + * acquire using the same strategy as the TIMEWAIT case above. If we
>> + * relock, we have to jump back to 'relocked' as the connection might
>> + * now be in TIMEWAIT.
>> + */
>> +#ifdef INVARIANTS
>> + if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0)
>> + INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>> +#endif
>> + if (tp->t_state != TCPS_ESTABLISHED) {
>> + if (ti_locked == TI_UNLOCKED) {
>> + if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) {
>> in_pcbref(inp);
>> INP_WUNLOCK(inp);
>> - INP_INFO_RUNLOCK(&V_tcbinfo);
>> INP_INFO_WLOCK(&V_tcbinfo);
>> ti_locked = TI_WLOCKED;
>> INP_WLOCK(inp);
>> - if (in_pcbrele(inp)) {
>> + if (in_pcbrele_wlocked(inp)) {
>> inp = NULL;
>> goto findpcb;
>> }
>> @@ -1027,13 +1013,16 @@ relocked:
>> /*
>> * When the socket is accepting connections (the INPCB is in LISTEN
>> * state) we look into the SYN cache if this is a new connection
>> - * attempt or the completion of a previous one.
>> + * attempt or the completion of a previous one. Because listen
>> + * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock will be
>> + * held in this case.
>> */
>> if (so->so_options & SO_ACCEPTCONN) {
>> struct in_conninfo inc;
>>
>> KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting but "
>> "tp not listening", __func__));
>> + INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>>
>> bzero(&inc, sizeof(inc));
>> #ifdef INET6
>> @@ -1371,13 +1360,17 @@ relocked:
>> return;
>>
>> dropwithreset:
>> - if (ti_locked == TI_RLOCKED)
>> - INP_INFO_RUNLOCK(&V_tcbinfo);
>> - else if (ti_locked == TI_WLOCKED)
>> + if (ti_locked == TI_WLOCKED) {
>> INP_INFO_WUNLOCK(&V_tcbinfo);
>> - else
>> - panic("%s: dropwithreset ti_locked %d", __func__, ti_locked);
>> - ti_locked = TI_UNLOCKED;
>> + ti_locked = TI_UNLOCKED;
>> + }
>> +#ifdef INVARIANTS
>> + else {
>> + KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropwithreset "
>> + "ti_locked: %d", __func__, ti_locked));
>> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
>> + }
>> +#endif
>>
>> if (inp != NULL) {
>> tcp_dropwithreset(m, th, tp, tlen, rstreason);
>> @@ -1388,13 +1381,17 @@ dropwithreset:
>> goto drop;
>>
>> dropunlock:
>> - if (ti_locked == TI_RLOCKED)
>> - INP_INFO_RUNLOCK(&V_tcbinfo);
>> - else if (ti_locked == TI_WLOCKED)
>> + if (ti_locked == TI_WLOCKED) {
>> INP_INFO_WUNLOCK(&V_tcbinfo);
>> - else
>> - panic("%s: dropunlock ti_locked %d", __func__, ti_locked);
>> - ti_locked = TI_UNLOCKED;
>> + ti_locked = TI_UNLOCKED;
>> + }
>> +#ifdef INVARIANTS
>> + else {
>> + KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropunlock "
>> + "ti_locked: %d", __func__, ti_locked));
>> + INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
>> + }
>> +#endif
>>
>> if (inp != NULL)
>> INP_WUNLOCK(inp);
>> @@ -1449,13 +1446,13 @@ tcp_do_segment(struct mbuf *m, struct tc
>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>> } else {
>> #ifdef INVARIANTS
>> - if (ti_locked == TI_RLOCKED)
>> - INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
>>
>> *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
>> _______________________________________________
>> svn-src-head at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/svn-src-head
>> To unsubscribe, send any mail to "svn-src-head-unsubscribe at freebsd.org"
>>
>
More information about the svn-src-all
mailing list