[CFT/review] new sendfile(2)

Gleb Smirnoff glebius at FreeBSD.org
Sun Aug 31 16:48:29 UTC 2014


  Hi!

  Just a followup with fresh version of the patch. For details
see below.

On Thu, May 29, 2014 at 02:20:54PM +0400, Gleb Smirnoff wrote:
T>   Hello!
T> 
T>   At Netflix and Nginx we are experimenting with improving FreeBSD
T> wrt sending large amounts of static data via HTTP.
T> 
T>   One of the approaches we are experimenting with is new sendfile(2)
T> implementation, that doesn't block on the I/O done from the file
T> descriptor.
T> 
T>   The problem with classic sendfile(2) is that if the the request
T> length is large enough, and file data is not cached in VM, then
T> sendfile(2) syscall would not return until it fills socket buffer
T> with data. With modern internet socket buffers can be up to 1 Mb,
T> thus time taken by the syscall raises by order of magnitude. All
T> the time, the nginx worker is blocked in syscall and doesn't
T> process data from other clients. The best current practice to
T> mitigate that is known as "sendfile(2) + aio_read(2)". This is
T> special mode of nginx operation on FreeBSD. The sendfile(2) call
T> is issued with SF_NODISKIO flag, that forbids the syscall to
T> perform disk I/O, and send only data that is cached by VM. If
T> sendfile(2) reports that I/O needs to be done (but forbidden), then
T> nginx would do aio_read() of a chunk of the file. The data read
T> is cached by VM, as side affect. Then sendfile() is called again.
T> 
T>   Now for the new sendfile. The core idea is that sendfile()
T> schedules the I/O, but doesn't wait for it to complete. It
T> returns immediately to the process, and I/O completion is
T> processed in kernel context. Unlike aio(4), no additional
T> threads in kernel are created. The new sendfile is a drop-in
T> replacement for the old one. Applications (like nginx) doesn't
T> need recompile, neither configuration change. The SF_NODISKIO is
T> ignored.
T> 
T>   The patch for review is available at:
T> 
T> https://phabric.freebsd.org/D102
T> 
T> And for those who prefer email attachments, it is also attached.
T> The patch has 3 logically separate changes in itself:
T> 
T> 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where
T> sb_acc stands for "available character count" and sb_ccc is "claimed
T> character count". This allows us to write a data to a socket, that is
T> not ready yet. The data sits in the socket, consumes its space, and
T> keeps itself in the right order with earlier or later writes to socket.
T> But it can be send only after it is marked as ready. This change is
T> split across many files.
T> 
T> 2) A new vnode operation: VOP_GETPAGES_ASYNC(). This one lives in sys/vm.
T> 
T> 3) Actual implementation of new sendfile(2). This one lives in
T> kern/uipc_syscalls.c
T> 
T> 
T> 
T>   At Netflix, we already see improvements with new sendfile(2).
T> We can send more data utilizing same amount of CPU, and we can
T> push closer to 0% idle, without experiencing short lags.
T> 
T> However, we have somewhat modified VM subsystem, that behaves
T> optimal for our task, but suboptimal for average FreeBSD system.
T> I'd like someone from community to try the new sendfile(2) at
T> other setup and see how does it serve for you.
T> 
T>   To be the early tester you need to checkout projects/sendfile
T> branch and build kernel from it. The world from head/ would
T> run fine with it.
T> 
T>   svn co http://svn.freebsd.org/base/projects/sendfile
T>   cd sendfile
T>   ... build kernel ...
T> 
T> Limitations:
T> - No testing were done on serving files on NFS.
T> - No testing were done on serving files on ZFS.
T> 
T> -- 
T> Totus tuus, Glebius.

T> Index: sys/dev/ti/if_ti.c
T> ===================================================================
T> --- sys/dev/ti/if_ti.c	(.../head)	(revision 266804)
T> +++ sys/dev/ti/if_ti.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1629,7 +1629,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru
T>  			m[i]->m_data = (void *)sf_buf_kva(sf[i]);
T>  			m[i]->m_len = PAGE_SIZE;
T>  			MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE,
T> -			    sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i],
T> +			    sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i],
T>  			    0, EXT_DISPOSABLE);
T>  			m[i]->m_next = m[i+1];
T>  		}
T> @@ -1694,7 +1694,7 @@ nobufs:
T>  		if (m[i])
T>  			m_freem(m[i]);
T>  		if (sf[i])
T> -			sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]);
T> +			sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]);
T>  	}
T>  	return (ENOBUFS);
T>  }
T> Index: sys/dev/cxgbe/tom/t4_cpl_io.c
T> ===================================================================
T> --- sys/dev/cxgbe/tom/t4_cpl_io.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgbe/tom/t4_cpl_io.c	(.../projects/sendfile)	(revision 266807)
T> @@ -338,11 +338,11 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp)
T>  	INP_WLOCK_ASSERT(inp);
T>  
T>  	SOCKBUF_LOCK(sb);
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> -	toep->sb_cc = sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T> +	toep->sb_cc = sbused(sb);
T>  	credits = toep->rx_credits;
T>  	SOCKBUF_UNLOCK(sb);
T>  
T> @@ -863,15 +863,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_
T>  		tp->rcv_nxt = be32toh(cpl->rcv_nxt);
T>  		toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE);
T>  
T> -		KASSERT(toep->sb_cc >= sb->sb_cc,
T> +		KASSERT(toep->sb_cc >= sbused(sb),
T>  		    ("%s: sb %p has more data (%d) than last time (%d).",
T> -		    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -		toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> +		    __func__, sb, sbused(sb), toep->sb_cc));
T> +		toep->rx_credits += toep->sb_cc - sbused(sb);
T>  #ifdef USE_DDP_RX_FLOW_CONTROL
T>  		toep->rx_credits -= m->m_len;	/* adjust for F_RX_FC_DDP */
T>  #endif
T> -		sbappendstream_locked(sb, m);
T> -		toep->sb_cc = sb->sb_cc;
T> +		sbappendstream_locked(sb, m, 0);
T> +		toep->sb_cc = sbused(sb);
T>  	}
T>  	socantrcvmore_locked(so);	/* unlocks the sockbuf */
T>  
T> @@ -1281,12 +1281,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea
T>  		}
T>  	}
T>  
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> -	sbappendstream_locked(sb, m);
T> -	toep->sb_cc = sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T> +	sbappendstream_locked(sb, m, 0);
T> +	toep->sb_cc = sbused(sb);
T>  	sorwakeup_locked(so);
T>  	SOCKBUF_UNLOCK_ASSERT(sb);
T>  
T> Index: sys/dev/cxgbe/tom/t4_ddp.c
T> ===================================================================
T> --- sys/dev/cxgbe/tom/t4_ddp.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgbe/tom/t4_ddp.c	(.../projects/sendfile)	(revision 266807)
T> @@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n)
T>  	tp->rcv_wnd -= n;
T>  #endif
T>  
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T>  #ifdef USE_DDP_RX_FLOW_CONTROL
T>  	toep->rx_credits -= n;	/* adjust for F_RX_FC_DDP */
T>  #endif
T> -	sbappendstream_locked(sb, m);
T> -	toep->sb_cc = sb->sb_cc;
T> +	sbappendstream_locked(sb, m, 0);
T> +	toep->sb_cc = sbused(sb);
T>  }
T>  
T>  /* SET_TCB_FIELD sent as a ULP command looks like this */
T> @@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re
T>  	else
T>  		discourage_ddp(toep);
T>  
T> -	KASSERT(toep->sb_cc >= sb->sb_cc,
T> +	KASSERT(toep->sb_cc >= sbused(sb),
T>  	    ("%s: sb %p has more data (%d) than last time (%d).",
T> -	    __func__, sb, sb->sb_cc, toep->sb_cc));
T> -	toep->rx_credits += toep->sb_cc - sb->sb_cc;
T> +	    __func__, sb, sbused(sb), toep->sb_cc));
T> +	toep->rx_credits += toep->sb_cc - sbused(sb);
T>  #ifdef USE_DDP_RX_FLOW_CONTROL
T>  	toep->rx_credits -= len;	/* adjust for F_RX_FC_DDP */
T>  #endif
T> -	sbappendstream_locked(sb, m);
T> -	toep->sb_cc = sb->sb_cc;
T> +	sbappendstream_locked(sb, m, 0);
T> +	toep->sb_cc = sbused(sb);
T>  wakeup:
T>  	KASSERT(toep->ddp_flags & db_flag,
T>  	    ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x",
T> @@ -897,7 +897,7 @@ handle_ddp(struct socket *so, struct uio *uio, int
T>  #endif
T>  
T>  	/* XXX: too eager to disable DDP, could handle NBIO better than this. */
T> -	if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres ||
T> +	if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres ||
T>  	    uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 ||
T>  	    so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) ||
T>  	    error || so->so_error || sb->sb_state & SBS_CANTRCVMORE)
T> @@ -935,7 +935,7 @@ handle_ddp(struct socket *so, struct uio *uio, int
T>  	 * payload.
T>  	 */
T>  	ddp_flags = select_ddp_flags(so, flags, db_idx);
T> -	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags);
T> +	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags);
T>  	if (wr == NULL) {
T>  		/*
T>  		 * Just unhold the pages.  The DDP buffer's software state is
T> @@ -960,8 +960,9 @@ handle_ddp(struct socket *so, struct uio *uio, int
T>  	 */
T>  	rc = sbwait(sb);
T>  	while (toep->ddp_flags & buf_flag) {
T> +		/* XXXGL: shouldn't here be sbwait() call? */
T>  		sb->sb_flags |= SB_WAIT;
T> -		msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0);
T> +		msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0);
T>  	}
T>  	unwire_ddp_buffer(db);
T>  	return (rc);
T> @@ -1123,8 +1124,8 @@ restart:
T>  
T>  		/* uio should be just as it was at entry */
T>  		KASSERT(oresid == uio->uio_resid,
T> -		    ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d",
T> -		    __func__, oresid, uio->uio_resid, sb->sb_cc));
T> +		    ("%s: oresid = %d, uio_resid = %zd, sbused = %d",
T> +		    __func__, oresid, uio->uio_resid, sbused(sb)));
T>  
T>  		error = handle_ddp(so, uio, flags, 0);
T>  		ddp_handled = 1;
T> @@ -1134,7 +1135,7 @@ restart:
T>  
T>  	/* Abort if socket has reported problems. */
T>  	if (so->so_error) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbused(sb))
T>  			goto deliver;
T>  		if (oresid > uio->uio_resid)
T>  			goto out;
T> @@ -1146,7 +1147,7 @@ restart:
T>  
T>  	/* Door is closed.  Deliver what is left, if any. */
T>  	if (sb->sb_state & SBS_CANTRCVMORE) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbused(sb))
T>  			goto deliver;
T>  		else
T>  			goto out;
T> @@ -1153,7 +1154,7 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer is empty and we shall not block. */
T> -	if (sb->sb_cc == 0 &&
T> +	if (sbused(sb) == 0 &&
T>  	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
T>  		error = EAGAIN;
T>  		goto out;
T> @@ -1160,18 +1161,18 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer got some data that we shall deliver now. */
T> -	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
T> +	if (sbused(sb) && !(flags & MSG_WAITALL) &&
T>  	    ((sb->sb_flags & SS_NBIO) ||
T>  	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
T> -	     sb->sb_cc >= sb->sb_lowat ||
T> -	     sb->sb_cc >= uio->uio_resid ||
T> -	     sb->sb_cc >= sb->sb_hiwat) ) {
T> +	     sbused(sb) >= sb->sb_lowat ||
T> +	     sbused(sb) >= uio->uio_resid ||
T> +	     sbused(sb) >= sb->sb_hiwat) ) {
T>  		goto deliver;
T>  	}
T>  
T>  	/* On MSG_WAITALL we must wait until all data or error arrives. */
T>  	if ((flags & MSG_WAITALL) &&
T> -	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat))
T> +	    (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat))
T>  		goto deliver;
T>  
T>  	/*
T> @@ -1190,7 +1191,7 @@ restart:
T>  
T>  deliver:
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> -	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
T> +	KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__));
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
T>  
T>  	if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled)
T> @@ -1201,7 +1202,7 @@ deliver:
T>  		uio->uio_td->td_ru.ru_msgrcv++;
T>  
T>  	/* Fill uio until full or current end of socket buffer is reached. */
T> -	len = min(uio->uio_resid, sb->sb_cc);
T> +	len = min(uio->uio_resid, sbused(sb));
T>  	if (mp0 != NULL) {
T>  		/* Dequeue as many mbufs as possible. */
T>  		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
T> Index: sys/dev/cxgbe/iw_cxgbe/cm.c
T> ===================================================================
T> --- sys/dev/cxgbe/iw_cxgbe/cm.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgbe/iw_cxgbe/cm.c	(.../projects/sendfile)	(revision 266807)
T> @@ -585,8 +585,8 @@ process_data(struct c4iw_ep *ep)
T>  {
T>  	struct sockaddr_in *local, *remote;
T>  
T> -	CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__,
T> -	    ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc);
T> +	CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__,
T> +	    ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv));
T>  
T>  	switch (state_read(&ep->com)) {
T>  	case MPA_REQ_SENT:
T> @@ -602,11 +602,11 @@ process_data(struct c4iw_ep *ep)
T>  		process_mpa_request(ep);
T>  		break;
T>  	default:
T> -		if (ep->com.so->so_rcv.sb_cc)
T> -			log(LOG_ERR, "%s: Unexpected streaming data.  "
T> -			    "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n",
T> +		if (sbused(&ep->com.so->so_rcv))
T> +			log(LOG_ERR, "%s: Unexpected streaming data. ep %p, "
T> +			    "state %d, so %p, so_state 0x%x, sbused %u\n",
T>  			    __func__, ep, state_read(&ep->com), ep->com.so,
T> -			    ep->com.so->so_state, ep->com.so->so_rcv.sb_cc);
T> +			    ep->com.so->so_state, sbused(&ep->com.so->so_rcv));
T>  		break;
T>  	}
T>  }
T> Index: sys/dev/iscsi/icl.c
T> ===================================================================
T> --- sys/dev/iscsi/icl.c	(.../head)	(revision 266804)
T> +++ sys/dev/iscsi/icl.c	(.../projects/sendfile)	(revision 266807)
T> @@ -758,7 +758,7 @@ icl_receive_thread(void *arg)
T>  		 * is enough data received to read the PDU.
T>  		 */
T>  		SOCKBUF_LOCK(&so->so_rcv);
T> -		available = so->so_rcv.sb_cc;
T> +		available = sbavail(&so->so_rcv);
T>  		if (available < ic->ic_receive_len) {
T>  			so->so_rcv.sb_lowat = ic->ic_receive_len;
T>  			cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx);
T> Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c
T> ===================================================================
T> --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(.../projects/sendfile)	(revision 266807)
T> @@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi
T>  	 * Autosize the send buffer.
T>  	 */
T>  	if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) {
T> -		if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) &&
T> -		    snd->sb_cc < VNET(tcp_autosndbuf_max)) {
T> +		if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) &&
T> +		    sbused(snd) < VNET(tcp_autosndbuf_max)) {
T>  			if (!sbreserve_locked(snd, min(snd->sb_hiwat +
T>  			    VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)),
T>  			    so, curthread))
T> @@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp)
T>  	INP_WLOCK_ASSERT(inp);
T>  
T>  	SOCKBUF_LOCK(so_rcv);
T> -	KASSERT(toep->tp_enqueued >= so_rcv->sb_cc,
T> -	    ("%s: so_rcv->sb_cc > enqueued", __func__));
T> -	toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc;
T> -	toep->tp_enqueued = so_rcv->sb_cc;
T> +	KASSERT(toep->tp_enqueued >= sbused(so_rcv),
T> +	    ("%s: sbused(so_rcv) > enqueued", __func__));
T> +	toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv);
T> +	toep->tp_enqueued = sbused(so_rcv);
T>  	SOCKBUF_UNLOCK(so_rcv);
T>  
T>  	must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd;
T> @@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r
T>  	}
T>  
T>  	toep->tp_enqueued += m->m_pkthdr.len;
T> -	sbappendstream_locked(so_rcv, m);
T> +	sbappendstream_locked(so_rcv, m, 0);
T>  	sorwakeup_locked(so);
T>  	SOCKBUF_UNLOCK_ASSERT(so_rcv);
T>  
T> @@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m)
T>  		so_sowwakeup_locked(so);
T>  	}
T>  
T> -	if (snd->sb_sndptroff < snd->sb_cc)
T> +	if (snd->sb_sndptroff < sbused(snd))
T>  		t3_push_frames(so, 0);
T>  
T>  out_free:
T> Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c
T> ===================================================================
T> --- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c	(.../head)	(revision 266804)
T> +++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1515,11 +1515,11 @@ process_data(struct iwch_ep *ep)
T>  		process_mpa_request(ep);
T>  		break;
T>  	default:
T> -		if (ep->com.so->so_rcv.sb_cc) 
T> +		if (sbavail(&ep->com.so->so_rcv)) 
T>  			printf("%s Unexpected streaming data."
T>  			       " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n",
T>  			       __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state,
T> -			       ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb);
T> +			       sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb);
T>  		break;
T>  	}
T>  	return;
T> Index: sys/kern/uipc_debug.c
T> ===================================================================
T> --- sys/kern/uipc_debug.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_debug.c	(.../projects/sendfile)	(revision 266807)
T> @@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s
T>  	db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff);
T>  
T>  	db_print_indent(indent);
T> -	db_printf("sb_cc: %u   ", sb->sb_cc);
T> +	db_printf("sb_acc: %u   ", sb->sb_acc);
T> +	db_printf("sb_ccc: %u   ", sb->sb_ccc);
T>  	db_printf("sb_hiwat: %u   ", sb->sb_hiwat);
T>  	db_printf("sb_mbcnt: %u   ", sb->sb_mbcnt);
T>  	db_printf("sb_mbmax: %u\n", sb->sb_mbmax);
T> Index: sys/kern/uipc_mbuf.c
T> ===================================================================
T> --- sys/kern/uipc_mbuf.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_mbuf.c	(.../projects/sendfile)	(revision 266807)
T> @@ -389,7 +389,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m)
T>   * cleaned too.
T>   */
T>  void
T> -m_demote(struct mbuf *m0, int all)
T> +m_demote(struct mbuf *m0, int all, int flags)
T>  {
T>  	struct mbuf *m;
T>  
T> @@ -405,7 +405,7 @@ void
T>  			m_freem(m->m_nextpkt);
T>  			m->m_nextpkt = NULL;
T>  		}
T> -		m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE);
T> +		m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags);
T>  	}
T>  }
T>  
T> Index: sys/kern/sys_socket.c
T> ===================================================================
T> --- sys/kern/sys_socket.c	(.../head)	(revision 266804)
T> +++ sys/kern/sys_socket.c	(.../projects/sendfile)	(revision 266807)
T> @@ -167,20 +167,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data,
T>  
T>  	case FIONREAD:
T>  		/* Unlocked read. */
T> -		*(int *)data = so->so_rcv.sb_cc;
T> +		*(int *)data = sbavail(&so->so_rcv);
T>  		break;
T>  
T>  	case FIONWRITE:
T>  		/* Unlocked read. */
T> -		*(int *)data = so->so_snd.sb_cc;
T> +		*(int *)data = sbavail(&so->so_snd);
T>  		break;
T>  
T>  	case FIONSPACE:
T> -		if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) ||
T> -		    (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt))
T> -			*(int *)data = 0;
T> -		else
T> -			*(int *)data = sbspace(&so->so_snd);
T> +		/* Unlocked read. */
T> +		*(int *)data = sbspace(&so->so_snd);
T>  		break;
T>  
T>  	case FIOSETOWN:
T> @@ -246,6 +243,7 @@ soo_stat(struct file *fp, struct stat *ub, struct
T>      struct thread *td)
T>  {
T>  	struct socket *so = fp->f_data;
T> +	struct sockbuf *sb;
T>  #ifdef MAC
T>  	int error;
T>  #endif
T> @@ -261,15 +259,18 @@ soo_stat(struct file *fp, struct stat *ub, struct
T>  	 * If SBS_CANTRCVMORE is set, but there's still data left in the
T>  	 * receive buffer, the socket is still readable.
T>  	 */
T> -	SOCKBUF_LOCK(&so->so_rcv);
T> -	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 ||
T> -	    so->so_rcv.sb_cc != 0)
T> +	sb = &so->so_rcv;
T> +	SOCKBUF_LOCK(sb);
T> +	if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb))
T>  		ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH;
T> -	ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
T> -	SOCKBUF_UNLOCK(&so->so_rcv);
T> -	/* Unlocked read. */
T> -	if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0)
T> +	ub->st_size = sbavail(sb) - sb->sb_ctl;
T> +	SOCKBUF_UNLOCK(sb);
T> +
T> +	sb = &so->so_snd;
T> +	SOCKBUF_LOCK(sb);
T> +	if ((sb->sb_state & SBS_CANTSENDMORE) == 0)
T>  		ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH;
T> +	SOCKBUF_UNLOCK(sb);
T>  	ub->st_uid = so->so_cred->cr_uid;
T>  	ub->st_gid = so->so_cred->cr_gid;
T>  	return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub);
T> Index: sys/kern/uipc_usrreq.c
T> ===================================================================
T> --- sys/kern/uipc_usrreq.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_usrreq.c	(.../projects/sendfile)	(revision 266807)
T> @@ -790,11 +790,10 @@ uipc_rcvd(struct socket *so, int flags)
T>  	u_int mbcnt, sbcc;
T>  
T>  	unp = sotounpcb(so);
T> -	KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL"));
T> +	KASSERT(unp != NULL, ("%s: unp == NULL", __func__));
T> +	KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET,
T> +	    ("%s: socktype %d", __func__, so->so_type));
T>  
T> -	if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET)
T> -		panic("uipc_rcvd socktype %d", so->so_type);
T> -
T>  	/*
T>  	 * Adjust backpressure on sender and wakeup any waiting to write.
T>  	 *
T> @@ -807,7 +806,7 @@ uipc_rcvd(struct socket *so, int flags)
T>  	 */
T>  	SOCKBUF_LOCK(&so->so_rcv);
T>  	mbcnt = so->so_rcv.sb_mbcnt;
T> -	sbcc = so->so_rcv.sb_cc;
T> +	sbcc = sbavail(&so->so_rcv);
T>  	SOCKBUF_UNLOCK(&so->so_rcv);
T>  	/*
T>  	 * There is a benign race condition at this point.  If we're planning to
T> @@ -843,7 +842,10 @@ uipc_send(struct socket *so, int flags, struct mbu
T>  	int error = 0;
T>  
T>  	unp = sotounpcb(so);
T> -	KASSERT(unp != NULL, ("uipc_send: unp == NULL"));
T> +	KASSERT(unp != NULL, ("%s: unp == NULL", __func__));
T> +	KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM ||
T> +	    so->so_type == SOCK_SEQPACKET,
T> +	    ("%s: socktype %d", __func__, so->so_type));
T>  
T>  	if (flags & PRUS_OOB) {
T>  		error = EOPNOTSUPP;
T> @@ -994,7 +996,7 @@ uipc_send(struct socket *so, int flags, struct mbu
T>  		}
T>  
T>  		mbcnt = so2->so_rcv.sb_mbcnt;
T> -		sbcc = so2->so_rcv.sb_cc;
T> +		sbcc = sbavail(&so2->so_rcv);
T>  		sorwakeup_locked(so2);
T>  
T>  		/*
T> @@ -1011,9 +1013,6 @@ uipc_send(struct socket *so, int flags, struct mbu
T>  		UNP_PCB_UNLOCK(unp2);
T>  		m = NULL;
T>  		break;
T> -
T> -	default:
T> -		panic("uipc_send unknown socktype");
T>  	}
T>  
T>  	/*
T> Index: sys/kern/vfs_default.c
T> ===================================================================
T> --- sys/kern/vfs_default.c	(.../head)	(revision 266804)
T> +++ sys/kern/vfs_default.c	(.../projects/sendfile)	(revision 266807)
T> @@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = {
T>  	.vop_close =		VOP_NULL,
T>  	.vop_fsync =		VOP_NULL,
T>  	.vop_getpages =		vop_stdgetpages,
T> +	.vop_getpages_async =	vop_stdgetpages_async,
T>  	.vop_getwritemount = 	vop_stdgetwritemount,
T>  	.vop_inactive =		VOP_NULL,
T>  	.vop_ioctl =		VOP_ENOTTY,
T> @@ -726,10 +727,19 @@ vop_stdgetpages(ap)
T>  {
T>  
T>  	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
T> -	    ap->a_count, ap->a_reqpage);
T> +	    ap->a_count, ap->a_reqpage, NULL, NULL);
T>  }
T>  
T> +/* XXX Needs good comment and a manpage. */
T>  int
T> +vop_stdgetpages_async(struct vop_getpages_async_args *ap)
T> +{
T> +
T> +	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
T> +	    ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg);
T> +}
T> +
T> +int
T>  vop_stdkqfilter(struct vop_kqfilter_args *ap)
T>  {
T>  	return vfs_kqfilter(ap);
T> Index: sys/kern/uipc_socket.c
T> ===================================================================
T> --- sys/kern/uipc_socket.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_socket.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1459,12 +1459,12 @@ restart:
T>  	 *   2. MSG_DONTWAIT is not set
T>  	 */
T>  	if (m == NULL || (((flags & MSG_DONTWAIT) == 0 &&
T> -	    so->so_rcv.sb_cc < uio->uio_resid) &&
T> -	    so->so_rcv.sb_cc < so->so_rcv.sb_lowat &&
T> +	    sbavail(&so->so_rcv) < uio->uio_resid) &&
T> +	    sbavail(&so->so_rcv) < so->so_rcv.sb_lowat &&
T>  	    m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) {
T> -		KASSERT(m != NULL || !so->so_rcv.sb_cc,
T> -		    ("receive: m == %p so->so_rcv.sb_cc == %u",
T> -		    m, so->so_rcv.sb_cc));
T> +		KASSERT(m != NULL || !sbavail(&so->so_rcv),
T> +		    ("receive: m == %p sbavail == %u",
T> +		    m, sbavail(&so->so_rcv)));
T>  		if (so->so_error) {
T>  			if (m != NULL)
T>  				goto dontblock;
T> @@ -1746,9 +1746,7 @@ dontblock:
T>  						SOCKBUF_LOCK(&so->so_rcv);
T>  					}
T>  				}
T> -				m->m_data += len;
T> -				m->m_len -= len;
T> -				so->so_rcv.sb_cc -= len;
T> +				sbmtrim(&so->so_rcv, m, len);
T>  			}
T>  		}
T>  		SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> @@ -1913,7 +1911,7 @@ restart:
T>  
T>  	/* Abort if socket has reported problems. */
T>  	if (so->so_error) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb) > 0)
T>  			goto deliver;
T>  		if (oresid > uio->uio_resid)
T>  			goto out;
T> @@ -1925,7 +1923,7 @@ restart:
T>  
T>  	/* Door is closed.  Deliver what is left, if any. */
T>  	if (sb->sb_state & SBS_CANTRCVMORE) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb) > 0)
T>  			goto deliver;
T>  		else
T>  			goto out;
T> @@ -1932,7 +1930,7 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer is empty and we shall not block. */
T> -	if (sb->sb_cc == 0 &&
T> +	if (sbavail(sb) == 0 &&
T>  	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
T>  		error = EAGAIN;
T>  		goto out;
T> @@ -1939,18 +1937,18 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer got some data that we shall deliver now. */
T> -	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
T> +	if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) &&
T>  	    ((sb->sb_flags & SS_NBIO) ||
T>  	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
T> -	     sb->sb_cc >= sb->sb_lowat ||
T> -	     sb->sb_cc >= uio->uio_resid ||
T> -	     sb->sb_cc >= sb->sb_hiwat) ) {
T> +	     sbavail(sb) >= sb->sb_lowat ||
T> +	     sbavail(sb) >= uio->uio_resid ||
T> +	     sbavail(sb) >= sb->sb_hiwat) ) {
T>  		goto deliver;
T>  	}
T>  
T>  	/* On MSG_WAITALL we must wait until all data or error arrives. */
T>  	if ((flags & MSG_WAITALL) &&
T> -	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat))
T> +	    (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat))
T>  		goto deliver;
T>  
T>  	/*
T> @@ -1964,7 +1962,7 @@ restart:
T>  
T>  deliver:
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> -	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
T> +	KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__));
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
T>  
T>  	/* Statistics. */
T> @@ -1972,7 +1970,7 @@ deliver:
T>  		uio->uio_td->td_ru.ru_msgrcv++;
T>  
T>  	/* Fill uio until full or current end of socket buffer is reached. */
T> -	len = min(uio->uio_resid, sb->sb_cc);
T> +	len = min(uio->uio_resid, sbavail(sb));
T>  	if (mp0 != NULL) {
T>  		/* Dequeue as many mbufs as possible. */
T>  		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
T> @@ -1983,6 +1981,8 @@ deliver:
T>  			for (m = sb->sb_mb;
T>  			     m != NULL && m->m_len <= len;
T>  			     m = m->m_next) {
T> +				KASSERT(!(m->m_flags & M_NOTAVAIL),
T> +				    ("%s: m %p not available", __func__, m));
T>  				len -= m->m_len;
T>  				uio->uio_resid -= m->m_len;
T>  				sbfree(sb, m);
T> @@ -2107,9 +2107,9 @@ soreceive_dgram(struct socket *so, struct sockaddr
T>  	 */
T>  	SOCKBUF_LOCK(&so->so_rcv);
T>  	while ((m = so->so_rcv.sb_mb) == NULL) {
T> -		KASSERT(so->so_rcv.sb_cc == 0,
T> -		    ("soreceive_dgram: sb_mb NULL but sb_cc %u",
T> -		    so->so_rcv.sb_cc));
T> +		KASSERT(sbavail(&so->so_rcv) == 0,
T> +		    ("soreceive_dgram: sb_mb NULL but sbavail %u",
T> +		    sbavail(&so->so_rcv)));
T>  		if (so->so_error) {
T>  			error = so->so_error;
T>  			so->so_error = 0;
T> @@ -3157,7 +3157,7 @@ filt_soread(struct knote *kn, long hint)
T>  	so = kn->kn_fp->f_data;
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T>  
T> -	kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
T> +	kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl;
T>  	if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
T>  		kn->kn_flags |= EV_EOF;
T>  		kn->kn_fflags = so->so_error;
T> @@ -3167,7 +3167,7 @@ filt_soread(struct knote *kn, long hint)
T>  	else if (kn->kn_sfflags & NOTE_LOWAT)
T>  		return (kn->kn_data >= kn->kn_sdata);
T>  	else
T> -		return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat);
T> +		return (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat);
T>  }
T>  
T>  static void
T> @@ -3350,7 +3350,7 @@ soisdisconnected(struct socket *so)
T>  	sorwakeup_locked(so);
T>  	SOCKBUF_LOCK(&so->so_snd);
T>  	so->so_snd.sb_state |= SBS_CANTSENDMORE;
T> -	sbdrop_locked(&so->so_snd, so->so_snd.sb_cc);
T> +	sbdrop_locked(&so->so_snd, sbused(&so->so_snd));
T>  	sowwakeup_locked(so);
T>  	wakeup(&so->so_timeo);
T>  }
T> Index: sys/kern/vnode_if.src
T> ===================================================================
T> --- sys/kern/vnode_if.src	(.../head)	(revision 266804)
T> +++ sys/kern/vnode_if.src	(.../projects/sendfile)	(revision 266807)
T> @@ -477,6 +477,19 @@ vop_getpages {
T>  };
T>  
T>  
T> +%% getpages_async	vp	L L L
T> +
T> +vop_getpages_async {
T> +	IN struct vnode *vp;
T> +	IN vm_page_t *m;
T> +	IN int count;
T> +	IN int reqpage;
T> +	IN vm_ooffset_t offset;
T> +	IN void (*vop_getpages_iodone)(void *);
T> +	IN void *arg;
T> +};
T> +
T> +
T>  %% putpages	vp	L L L
T>  
T>  vop_putpages {
T> Index: sys/kern/uipc_sockbuf.c
T> ===================================================================
T> --- sys/kern/uipc_sockbuf.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_sockbuf.c	(.../projects/sendfile)	(revision 266807)
T> @@ -68,7 +68,152 @@ static	u_long sb_efficiency = 8;	/* parameter for
T>  static struct mbuf	*sbcut_internal(struct sockbuf *sb, int len);
T>  static void	sbflush_internal(struct sockbuf *sb);
T>  
T> +static void
T> +sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m)
T> +{
T> +
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +	KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m));
T> +
T> +	m = m->m_next;
T> +	while (m != NULL && !(m->m_flags & M_NOTREADY)) {
T> +		m->m_flags &= ~M_BLOCKED;
T> +		sb->sb_acc += m->m_len;
T> +		m = m->m_next;
T> +	}
T> +
T> +	sb->sb_fnrdy = m;
T> +}
T> +
T> +int
T> +sbready(struct sockbuf *sb, struct mbuf *m, int count)
T> +{
T> +	u_int blocker;
T> +
T> +	SOCKBUF_LOCK(sb);
T> +
T> +	if (sb->sb_state & SBS_CANTSENDMORE) {
T> +		SOCKBUF_UNLOCK(sb);
T> +		return (ENOTCONN);
T> +	}
T> +
T> +	KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb));
T> +
T> +	blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0;
T> +
T> +	for (int i = 0; i < count; i++, m = m->m_next) {
T> +		KASSERT(m->m_flags & M_NOTREADY,
T> +		    ("%s: m %p !M_NOTREADY", __func__, m));
T> +		m->m_flags &= ~(M_NOTREADY | blocker);
T> +		if (blocker)
T> +			sb->sb_acc += m->m_len;
T> +	}
T> +
T> +	if (!blocker) {
T> +		SOCKBUF_UNLOCK(sb);
T> +		return (EWOULDBLOCK);
T> +	}
T> +
T> +	/* This one was blocking all the queue. */
T> +	for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) {
T> +		KASSERT(m->m_flags & M_BLOCKED,
T> +		    ("%s: m %p !M_BLOCKED", __func__, m));
T> +		m->m_flags &= ~M_BLOCKED;
T> +		sb->sb_acc += m->m_len;
T> +	}
T> +
T> +	sb->sb_fnrdy = m;
T> +
T> +	SOCKBUF_UNLOCK(sb);
T> +
T> +	return (0);
T> +}
T> +
T>  /*
T> + * Adjust sockbuf state reflecting allocation of m.
T> + */
T> +void
T> +sballoc(struct sockbuf *sb, struct mbuf *m)
T> +{
T> +
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +
T> +	sb->sb_ccc += m->m_len;
T> +
T> +	if (sb->sb_fnrdy == NULL) {
T> +		if (m->m_flags & M_NOTREADY)
T> +			sb->sb_fnrdy = m;
T> +		else
T> +			sb->sb_acc += m->m_len;
T> +	} else
T> +		m->m_flags |= M_BLOCKED;
T> +
T> +	if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T> +		sb->sb_ctl += m->m_len;
T> +
T> +	sb->sb_mbcnt += MSIZE;
T> +	sb->sb_mcnt += 1;
T> +
T> +	if (m->m_flags & M_EXT) {
T> +		sb->sb_mbcnt += m->m_ext.ext_size;
T> +		sb->sb_ccnt += 1;
T> +	}
T> +}
T> +
T> +/*
T> + * Adjust sockbuf state reflecting freeing of m.
T> + */
T> +void
T> +sbfree(struct sockbuf *sb, struct mbuf *m)
T> +{
T> +
T> +#if 0	/* XXX: not yet: soclose() call path comes here w/o lock. */
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +
T> +	sb->sb_ccc -= m->m_len;
T> +
T> +	if (!(m->m_flags & M_NOTAVAIL))
T> +		sb->sb_acc -= m->m_len;
T> +
T> +	if (sb->sb_fnrdy == m)
T> +		sb_shift_nrdy(sb, m);
T> +
T> +	if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T> +		sb->sb_ctl -= m->m_len;
T> +
T> +	sb->sb_mbcnt -= MSIZE;
T> +	sb->sb_mcnt -= 1;
T> +	if (m->m_flags & M_EXT) {
T> +		sb->sb_mbcnt -= m->m_ext.ext_size;
T> +		sb->sb_ccnt -= 1;
T> +	}
T> +
T> +	if (sb->sb_sndptr == m) {
T> +		sb->sb_sndptr = NULL;
T> +		sb->sb_sndptroff = 0;
T> +	}
T> +	if (sb->sb_sndptroff != 0)
T> +		sb->sb_sndptroff -= m->m_len;
T> +}
T> +
T> +/*
T> + * Trim some amount of data from (first?) mbuf in buffer.
T> + */
T> +void
T> +sbmtrim(struct sockbuf *sb, struct mbuf *m, int len)
T> +{
T> +
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +	KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len));
T> +
T> +	m->m_data += len;
T> +	m->m_len -= len;
T> +	sb->sb_acc -= len;
T> +	sb->sb_ccc -= len;
T> +}
T> +
T> +/*
T>   * Socantsendmore indicates that no more data will be sent on the socket; it
T>   * would normally be applied to a socket when the user informs the system
T>   * that no more data is to be sent, by the protocol code (in case
T> @@ -127,7 +272,7 @@ sbwait(struct sockbuf *sb)
T>  	SOCKBUF_LOCK_ASSERT(sb);
T>  
T>  	sb->sb_flags |= SB_WAIT;
T> -	return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx,
T> +	return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx,
T>  	    (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait",
T>  	    sb->sb_timeo, 0, 0));
T>  }
T> @@ -184,7 +329,7 @@ sowakeup(struct socket *so, struct sockbuf *sb)
T>  		sb->sb_flags &= ~SB_SEL;
T>  	if (sb->sb_flags & SB_WAIT) {
T>  		sb->sb_flags &= ~SB_WAIT;
T> -		wakeup(&sb->sb_cc);
T> +		wakeup(&sb->sb_acc);
T>  	}
T>  	KNOTE_LOCKED(&sb->sb_sel.si_note, 0);
T>  	if (sb->sb_upcall != NULL) {
T> @@ -519,7 +664,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m)
T>   * that is, a stream protocol (such as TCP).
T>   */
T>  void
T> -sbappendstream_locked(struct sockbuf *sb, struct mbuf *m)
T> +sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags)
T>  {
T>  	SOCKBUF_LOCK_ASSERT(sb);
T>  
T> @@ -529,8 +674,8 @@ void
T>  	SBLASTMBUFCHK(sb);
T>  
T>  	/* Remove all packet headers and mbuf tags to get a pure data chain. */
T> -	m_demote(m, 1);
T> -	
T> +	m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0);
T> +
T>  	sbcompress(sb, m, sb->sb_mbtail);
T>  
T>  	sb->sb_lastrecord = sb->sb_mb;
T> @@ -543,38 +688,59 @@ void
T>   * that is, a stream protocol (such as TCP).
T>   */
T>  void
T> -sbappendstream(struct sockbuf *sb, struct mbuf *m)
T> +sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags)
T>  {
T>  
T>  	SOCKBUF_LOCK(sb);
T> -	sbappendstream_locked(sb, m);
T> +	sbappendstream_locked(sb, m, flags);
T>  	SOCKBUF_UNLOCK(sb);
T>  }
T>  
T>  #ifdef SOCKBUF_DEBUG
T>  void
T> -sbcheck(struct sockbuf *sb)
T> +sbcheck(struct sockbuf *sb, const char *file, int line)
T>  {
T> -	struct mbuf *m;
T> -	struct mbuf *n = 0;
T> -	u_long len = 0, mbcnt = 0;
T> +	struct mbuf *m, *n, *fnrdy;
T> +	u_long acc, ccc, mbcnt;
T>  
T>  	SOCKBUF_LOCK_ASSERT(sb);
T>  
T> +	acc = ccc = mbcnt = 0;
T> +	fnrdy = NULL;
T> +
T>  	for (m = sb->sb_mb; m; m = n) {
T>  	    n = m->m_nextpkt;
T>  	    for (; m; m = m->m_next) {
T> -		len += m->m_len;
T> +		if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) {
T> +			if (m != sb->sb_fnrdy) {
T> +				printf("sb %p: fnrdy %p != m %p\n",
T> +				    sb, sb->sb_fnrdy, m);
T> +				goto fail;
T> +			}
T> +			fnrdy = m;
T> +		}
T> +		if (fnrdy) {
T> +			if (!(m->m_flags & M_NOTAVAIL)) {
T> +				printf("sb %p: fnrdy %p, m %p is avail\n",
T> +				    sb, sb->sb_fnrdy, m);
T> +				goto fail;
T> +			}
T> +		} else
T> +			acc += m->m_len;
T> +		ccc += m->m_len;
T>  		mbcnt += MSIZE;
T>  		if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */
T>  			mbcnt += m->m_ext.ext_size;
T>  	    }
T>  	}
T> -	if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) {
T> -		printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc,
T> -		    mbcnt, sb->sb_mbcnt);
T> -		panic("sbcheck");
T> +	if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) {
T> +		printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n",
T> +		    acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt);
T> +		goto fail;
T>  	}
T> +	return;
T> +fail:
T> +	panic("%s from %s:%u", __func__, file, line);
T>  }
T>  #endif
T>  
T> @@ -800,6 +966,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str
T>  		if (n && (n->m_flags & M_EOR) == 0 &&
T>  		    M_WRITABLE(n) &&
T>  		    ((sb->sb_flags & SB_NOCOALESCE) == 0) &&
T> +		    !(m->m_flags & M_NOTREADY) &&
T>  		    m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */
T>  		    m->m_len <= M_TRAILINGSPACE(n) &&
T>  		    n->m_type == m->m_type) {
T> @@ -806,7 +973,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str
T>  			bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len,
T>  			    (unsigned)m->m_len);
T>  			n->m_len += m->m_len;
T> -			sb->sb_cc += m->m_len;
T> +			sb->sb_ccc += m->m_len;
T> +			if (sb->sb_fnrdy == NULL)
T> +				sb->sb_acc += m->m_len;
T>  			if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T>  				/* XXX: Probably don't need.*/
T>  				sb->sb_ctl += m->m_len;
T> @@ -843,13 +1012,13 @@ sbflush_internal(struct sockbuf *sb)
T>  		 * Don't call sbcut(sb, 0) if the leading mbuf is non-empty:
T>  		 * we would loop forever. Panic instead.
T>  		 */
T> -		if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len))
T> +		if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len))
T>  			break;
T> -		m_freem(sbcut_internal(sb, (int)sb->sb_cc));
T> +		m_freem(sbcut_internal(sb, (int)sb->sb_ccc));
T>  	}
T> -	if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt)
T> -		panic("sbflush_internal: cc %u || mb %p || mbcnt %u",
T> -		    sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt);
T> +	KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0,
T> +	    ("%s: ccc %u mb %p mbcnt %u", __func__,
T> +	    sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt));
T>  }
T>  
T>  void
T> @@ -891,7 +1060,9 @@ sbcut_internal(struct sockbuf *sb, int len)
T>  		if (m->m_len > len) {
T>  			m->m_len -= len;
T>  			m->m_data += len;
T> -			sb->sb_cc -= len;
T> +			sb->sb_ccc -= len;
T> +			if (!(m->m_flags & M_NOTAVAIL))
T> +				sb->sb_acc -= len;
T>  			if (sb->sb_sndptroff != 0)
T>  				sb->sb_sndptroff -= len;
T>  			if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA)
T> @@ -977,8 +1148,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len,
T>  	struct mbuf *m, *ret;
T>  
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__));
T> -	KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__));
T> -	KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__));
T> +	KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__));
T> +	KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__));
T>  
T>  	/*
T>  	 * Is off below stored offset? Happens on retransmits.
T> @@ -1091,7 +1262,7 @@ void
T>  sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb)
T>  {
T>  
T> -	xsb->sb_cc = sb->sb_cc;
T> +	xsb->sb_cc = sb->sb_ccc;
T>  	xsb->sb_hiwat = sb->sb_hiwat;
T>  	xsb->sb_mbcnt = sb->sb_mbcnt;
T>  	xsb->sb_mcnt = sb->sb_mcnt;	
T> Index: sys/kern/uipc_syscalls.c
T> ===================================================================
T> --- sys/kern/uipc_syscalls.c	(.../head)	(revision 266804)
T> +++ sys/kern/uipc_syscalls.c	(.../projects/sendfile)	(revision 266807)
T> @@ -132,9 +132,10 @@ static int	filt_sfsync(struct knote *kn, long hint
T>   */
T>  static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0,
T>      "sendfile(2) tunables");
T> -static int sfreadahead = 1;
T> +
T> +static int sfreadahead = 0;
T>  SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW,
T> -    &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks");
T> +    &sfreadahead, 0, "Read this more pages than socket buffer can accept");
T>  
T>  #ifdef	SFSYNC_DEBUG
T>  static int sf_sync_debug = 0;
T> @@ -1988,7 +1989,7 @@ filt_sfsync(struct knote *kn, long hint)
T>   * Detach mapped page and release resources back to the system.
T>   */
T>  int
T> -sf_buf_mext(struct mbuf *mb, void *addr, void *args)
T> +sf_mext_free(struct mbuf *mb, void *addr, void *args)
T>  {
T>  	vm_page_t m;
T>  	struct sendfile_sync *sfs;
T> @@ -2009,13 +2010,42 @@ int
T>  		sfs = addr;
T>  		sf_sync_deref(sfs);
T>  	}
T> -	/*
T> -	 * sfs may be invalid at this point, don't use it!
T> -	 */
T>  	return (EXT_FREE_OK);
T>  }
T>  
T>  /*
T> + * Same as above, but forces the page to be detached from the object
T> + * and go into free pool.
T> + */
T> +static int
T> +sf_mext_free_nocache(struct mbuf *mb, void *addr, void *args)
T> +{
T> +	vm_page_t m;
T> +	struct sendfile_sync *sfs;
T> +
T> +	m = sf_buf_page(args);
T> +	sf_buf_free(args);
T> +	vm_page_lock(m);
T> +	vm_page_unwire(m, 0);
T> +	if (m->wire_count == 0) {
T> +		vm_object_t obj;
T> +
T> +		if ((obj = m->object) == NULL)
T> +			vm_page_free(m);
T> +		else if (!vm_page_xbusied(m) && VM_OBJECT_TRYWLOCK(obj)) {
T> +			vm_page_free(m);
T> +			VM_OBJECT_WUNLOCK(obj);
T> +		}
T> +	}
T> +	vm_page_unlock(m);
T> +	if (addr != NULL) {
T> +		sfs = addr;
T> +		sf_sync_deref(sfs);
T> +	}
T> +	return (EXT_FREE_OK);
T> +}
T> +
T> +/*
T>   * Called to remove a reference to a sf_sync object.
T>   *
T>   * This is generally done during the mbuf free path to signify
T> @@ -2608,106 +2638,181 @@ freebsd4_sendfile(struct thread *td, struct freebs
T>  }
T>  #endif /* COMPAT_FREEBSD4 */
T>  
T> + /*
T> +  * How much data to put into page i of n.
T> +  * Only first and last pages are special.
T> +  */
T> +static inline off_t
T> +xfsize(int i, int n, off_t off, off_t len)
T> +{
T> +
T> +	if (i == 0)
T> +		return (omin(PAGE_SIZE - (off & PAGE_MASK), len));
T> +
T> +	if (i == n - 1 && ((off + len) & PAGE_MASK) > 0)
T> +		return ((off + len) & PAGE_MASK);
T> +
T> +	return (PAGE_SIZE);
T> +}
T> +
T> +/*
T> + * Offset within object for i page.
T> + */
T> +static inline vm_offset_t
T> +vmoff(int i, off_t off)
T> +{
T> +
T> +	if (i == 0)
T> +		return ((vm_offset_t)off);
T> +
T> +	return (trunc_page(off + i * PAGE_SIZE));
T> +}
T> +
T> +/*
T> + * Pretend as if we don't have enough space, subtract xfsize() of
T> + * all pages that failed.
T> + */
T> +static inline void
T> +fixspace(int old, int new, off_t off, int *space)
T> +{
T> +
T> +	KASSERT(old > new, ("%s: old %d new %d", __func__, old, new));
T> +
T> +	/* Subtract last one. */
T> +	*space -= xfsize(old - 1, old, off, *space);
T> +	old--;
T> +
T> +	if (new == old)
T> +		/* There was only one page. */
T> +		return;
T> +
T> +	/* Subtract first one. */
T> +	if (new == 0) {
T> +		*space -= xfsize(0, old, off, *space);
T> +		new++;
T> +	}
T> +
T> +	/* Rest of pages are full sized. */
T> +	*space -= (old - new) * PAGE_SIZE;
T> +
T> +	KASSERT(*space >= 0, ("%s: space went backwards", __func__));
T> +}
T> +
T> +struct sf_io {
T> +	u_int		nios;
T> +	int		npages;
T> +	struct file	*sock_fp;
T> +	struct mbuf	*m;
T> +	vm_page_t	pa[];
T> +};
T> +
T> +static void
T> +sf_io_done(void *arg)
T> +{
T> +	struct sf_io *sfio = arg;
T> +	struct socket *so;
T> +
T> +	if (!refcount_release(&sfio->nios))
T> +		return;
T> +
T> +	so  = sfio->sock_fp->f_data;
T> +
T> +	if (sbready(&so->so_snd, sfio->m, sfio->npages) == 0) {
T> +		struct mbuf *m;
T> +
T> +		m = m_get(M_NOWAIT, MT_DATA);
T> +		if (m == NULL) {
T> +			panic("XXXGL");
T> +		}
T> +		m->m_len = 0;
T> +		CURVNET_SET(so->so_vnet);
T> +		/* XXXGL: curthread */
T> +		(void )(so->so_proto->pr_usrreqs->pru_send)
T> +		    (so, 0, m, NULL, NULL, curthread);
T> +		CURVNET_RESTORE();
T> +	}
T> +
T> +	/* XXXGL: curthread */
T> +	fdrop(sfio->sock_fp, curthread);
T> +	free(sfio, M_TEMP);
T> +}
T> +
T>  static int
T> -sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd,
T> -    off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res)
T> +sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len,
T> +    int npages, int rhpages)
T>  {
T> -	vm_page_t m;
T> -	vm_pindex_t pindex;
T> -	ssize_t resid;
T> -	int error, readahead, rv;
T> +	vm_page_t *pa = sfio->pa;
T> +	int nios;
T>  
T> -	pindex = OFF_TO_IDX(off);
T> +	nios = 0;
T>  	VM_OBJECT_WLOCK(obj);
T> -	m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY |
T> -	    VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
T> +	for (int i = 0; i < npages; i++)
T> +		pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)),
T> +		    VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
T>  
T> -	/*
T> -	 * Check if page is valid for what we need, otherwise initiate I/O.
T> -	 *
T> -	 * The non-zero nd argument prevents disk I/O, instead we
T> -	 * return the caller what he specified in nd.  In particular,
T> -	 * if we already turned some pages into mbufs, nd == EAGAIN
T> -	 * and the main function send them the pages before we come
T> -	 * here again and block.
T> -	 */
T> -	if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) {
T> -		if (vp == NULL)
T> -			vm_page_xunbusy(m);
T> -		VM_OBJECT_WUNLOCK(obj);
T> -		*res = m;
T> -		return (0);
T> -	} else if (nd != 0) {
T> -		if (vp == NULL)
T> -			vm_page_xunbusy(m);
T> -		error = nd;
T> -		goto free_page;
T> -	}
T> +	for (int i = 0; i < npages;) {
T> +		int j, a, count, rv;
T>  
T> -	/*
T> -	 * Get the page from backing store.
T> -	 */
T> -	error = 0;
T> -	if (vp != NULL) {
T> -		VM_OBJECT_WUNLOCK(obj);
T> -		readahead = sfreadahead * MAXBSIZE;
T> +		if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK,
T> +		    xfsize(i, npages, off, len))) {
T> +			vm_page_xunbusy(pa[i]);
T> +			i++;
T> +			continue;
T> +		}
T>  
T> -		/*
T> -		 * Use vn_rdwr() instead of the pager interface for
T> -		 * the vnode, to allow the read-ahead.
T> -		 *
T> -		 * XXXMAC: Because we don't have fp->f_cred here, we
T> -		 * pass in NOCRED.  This is probably wrong, but is
T> -		 * consistent with our original implementation.
T> -		 */
T> -		error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off),
T> -		    UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead /
T> -		    bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td);
T> -		SFSTAT_INC(sf_iocnt);
T> -		VM_OBJECT_WLOCK(obj);
T> -	} else {
T> -		if (vm_pager_has_page(obj, pindex, NULL, NULL)) {
T> -			rv = vm_pager_get_pages(obj, &m, 1, 0);
T> -			SFSTAT_INC(sf_iocnt);
T> -			m = vm_page_lookup(obj, pindex);
T> -			if (m == NULL)
T> -				error = EIO;
T> -			else if (rv != VM_PAGER_OK) {
T> -				vm_page_lock(m);
T> -				vm_page_free(m);
T> -				vm_page_unlock(m);
T> -				m = NULL;
T> -				error = EIO;
T> +		for (j = i + 1; j < npages; j++)
T> +			if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK,
T> +			    xfsize(j, npages, off, len)))
T> +				break;
T> +
T> +		while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)),
T> +		    NULL, &a) && i < j) {
T> +			pmap_zero_page(pa[i]);
T> +			pa[i]->valid = VM_PAGE_BITS_ALL;
T> +			pa[i]->dirty = 0;
T> +			vm_page_xunbusy(pa[i]);
T> +			i++;
T> +		}
T> +		if (i == j)
T> +			continue;
T> +
T> +		count = min(a + 1, npages + rhpages - i);
T> +		for (j = npages; j < i + count; j++) {
T> +			pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)),
T> +			    VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT);
T> +			if (pa[j] == NULL) {
T> +				count = j - i;
T> +				break;
T>  			}
T> -		} else {
T> -			pmap_zero_page(m);
T> -			m->valid = VM_PAGE_BITS_ALL;
T> -			m->dirty = 0;
T> +			if (pa[j]->valid) {
T> +				vm_page_xunbusy(pa[j]);
T> +				count = j - i;
T> +				break;
T> +			}
T>  		}
T> -		if (m != NULL)
T> -			vm_page_xunbusy(m);
T> +
T> +		refcount_acquire(&sfio->nios);
T> +		rv = vm_pager_get_pages_async(obj, pa + i, count, 0,
T> +		    &sf_io_done, sfio);
T> +
T> +		KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p",
T> +		    __func__, obj, pa[i]));
T> +
T> +		SFSTAT_INC(sf_iocnt);
T> +		nios++;
T> +
T> +		for (j = i; j < i + count && j < npages; j++)
T> +			KASSERT(pa[j] == vm_page_lookup(obj,
T> +			    OFF_TO_IDX(vmoff(j, off))),
T> +			    ("pa[j] %p lookup %p\n", pa[j],
T> +			    vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off)))));
T> +
T> +		i += count;
T>  	}
T> -	if (error == 0) {
T> -		*res = m;
T> -	} else if (m != NULL) {
T> -free_page:
T> -		vm_page_lock(m);
T> -		vm_page_unwire(m, 0);
T>  
T> -		/*
T> -		 * See if anyone else might know about this page.  If
T> -		 * not and it is not valid, then free it.
T> -		 */
T> -		if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m))
T> -			vm_page_free(m);
T> -		vm_page_unlock(m);
T> -	}
T> -	KASSERT(error != 0 || (m->wire_count > 0 &&
T> -	    vm_page_is_valid(m, off & PAGE_MASK, xfsize)),
T> -	    ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off,
T> -	    xfsize));
T>  	VM_OBJECT_WUNLOCK(obj);
T> -	return (error);
T> +
T> +	return (nios);
T>  }
T>  
T>  static int
T> @@ -2814,41 +2919,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
T>  	struct vnode *vp;
T>  	struct vm_object *obj;
T>  	struct socket *so;
T> -	struct mbuf *m;
T> +	struct mbuf *m, *mh, *mhtail;
T>  	struct sf_buf *sf;
T> -	struct vm_page *pg;
T>  	struct shmfd *shmfd;
T>  	struct vattr va;
T> -	off_t off, xfsize, fsbytes, sbytes, rem, obj_size;
T> -	int error, bsize, nd, hdrlen, mnw;
T> +	off_t off, sbytes, rem, obj_size;
T> +	int error, serror, bsize, hdrlen;
T>  
T> -	pg = NULL;
T>  	obj = NULL;
T>  	so = NULL;
T> -	m = NULL;
T> -	fsbytes = sbytes = 0;
T> -	hdrlen = mnw = 0;
T> -	rem = nbytes;
T> -	obj_size = 0;
T> +	m = mh = NULL;
T> +	sbytes = 0;
T>  
T>  	error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize);
T>  	if (error != 0)
T>  		return (error);
T> -	if (rem == 0)
T> -		rem = obj_size;
T>  
T>  	error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so);
T>  	if (error != 0)
T>  		goto out;
T>  
T> -	/*
T> -	 * Do not wait on memory allocations but return ENOMEM for
T> -	 * caller to retry later.
T> -	 * XXX: Experimental.
T> -	 */
T> -	if (flags & SF_MNOWAIT)
T> -		mnw = 1;
T> -
T>  #ifdef MAC
T>  	error = mac_socket_check_send(td->td_ucred, so);
T>  	if (error != 0)
T> @@ -2856,31 +2946,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
T>  #endif
T>  
T>  	/* If headers are specified copy them into mbufs. */
T> -	if (hdr_uio != NULL) {
T> +	if (hdr_uio != NULL && hdr_uio->uio_resid > 0) {
T>  		hdr_uio->uio_td = td;
T>  		hdr_uio->uio_rw = UIO_WRITE;
T> -		if (hdr_uio->uio_resid > 0) {
T> -			/*
T> -			 * In FBSD < 5.0 the nbytes to send also included
T> -			 * the header.  If compat is specified subtract the
T> -			 * header size from nbytes.
T> -			 */
T> -			if (kflags & SFK_COMPAT) {
T> -				if (nbytes > hdr_uio->uio_resid)
T> -					nbytes -= hdr_uio->uio_resid;
T> -				else
T> -					nbytes = 0;
T> -			}
T> -			m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK),
T> -			    0, 0, 0);
T> -			if (m == NULL) {
T> -				error = mnw ? EAGAIN : ENOBUFS;
T> -				goto out;
T> -			}
T> -			hdrlen = m_length(m, NULL);
T> +		/*
T> +		 * In FBSD < 5.0 the nbytes to send also included
T> +		 * the header.  If compat is specified subtract the
T> +		 * header size from nbytes.
T> +		 */
T> +		if (kflags & SFK_COMPAT) {
T> +			if (nbytes > hdr_uio->uio_resid)
T> +				nbytes -= hdr_uio->uio_resid;
T> +			else
T> +				nbytes = 0;
T>  		}
T> -	}
T> +		mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0);
T> +		hdrlen = m_length(mh, &mhtail);
T> +	} else
T> +		hdrlen = 0;
T>  
T> +	rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset;
T> +
T>  	/*
T>  	 * Protect against multiple writers to the socket.
T>  	 *
T> @@ -2900,21 +2986,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui
T>  	 * The outer loop checks the state and available space of the socket
T>  	 * and takes care of the overall progress.
T>  	 */
T> -	for (off = offset; ; ) {
T> +	for (off = offset; rem > 0; ) {
T> +		struct sf_io *sfio;
T> +		vm_page_t *pa;
T>  		struct mbuf *mtail;
T> -		int loopbytes;
T> -		int space;
T> -		int done;
T> +		int nios, space, npages, rhpages;
T>  
T> -		if ((nbytes != 0 && nbytes == fsbytes) ||
T> -		    (nbytes == 0 && obj_size == fsbytes))
T> -			break;
T> -
T>  		mtail = NULL;
T> -		loopbytes = 0;
T> -		space = 0;
T> -		done = 0;
T> -
T>  		/*
T>  		 * Check the socket state for ongoing connection,
T>  		 * no errors and space in socket buffer.
T> @@ -2990,53 +3068,44 @@ retry_space:
T>  				VOP_UNLOCK(vp, 0);
T>  				goto done;
T>  			}
T> -			obj_size = va.va_size;
T> +			if (va.va_size != obj_size) {
T> +				if (nbytes == 0)
T> +					rem += va.va_size - obj_size;
T> +				else if (offset + nbytes > va.va_size)
T> +					rem -= (offset + nbytes - va.va_size);
T> +				obj_size = va.va_size;
T> +			}
T>  		}
T>  
T> +		if (space > rem)
T> +			space = rem;
T> +
T> +		if (off & PAGE_MASK)
T> +			npages = 1 + howmany(space -
T> +			    (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE);
T> +		else
T> +			npages = howmany(space, PAGE_SIZE);
T> +
T> +		rhpages = SF_READAHEAD(flags) ?
T> +		    SF_READAHEAD(flags) : sfreadahead;
T> +		rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) -
T> +		    (npages * PAGE_SIZE), PAGE_SIZE), rhpages);
T> +
T> +		sfio = malloc(sizeof(struct sf_io) +
T> +		    (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK);
T> +		refcount_init(&sfio->nios, 1);
T> +
T> +		nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages);
T> +
T>  		/*
T>  		 * Loop and construct maximum sized mbuf chain to be bulk
T>  		 * dumped into socket buffer.
T>  		 */
T> -		while (space > loopbytes) {
T> -			vm_offset_t pgoff;
T> +		pa = sfio->pa;
T> +		for (int i = 0; i < npages; i++) {
T>  			struct mbuf *m0;
T>  
T>  			/*
T> -			 * Calculate the amount to transfer.
T> -			 * Not to exceed a page, the EOF,
T> -			 * or the passed in nbytes.
T> -			 */
T> -			pgoff = (vm_offset_t)(off & PAGE_MASK);
T> -			rem = obj_size - offset;
T> -			if (nbytes != 0)
T> -				rem = omin(rem, nbytes);
T> -			rem -= fsbytes + loopbytes;
T> -			xfsize = omin(PAGE_SIZE - pgoff, rem);
T> -			xfsize = omin(space - loopbytes, xfsize);
T> -			if (xfsize <= 0) {
T> -				done = 1;		/* all data sent */
T> -				break;
T> -			}
T> -
T> -			/*
T> -			 * Attempt to look up the page.  Allocate
T> -			 * if not found or wait and loop if busy.
T> -			 */
T> -			if (m != NULL)
T> -				nd = EAGAIN; /* send what we already got */
T> -			else if ((flags & SF_NODISKIO) != 0)
T> -				nd = EBUSY;
T> -			else
T> -				nd = 0;
T> -			error = sendfile_readpage(obj, vp, nd, off,
T> -			    xfsize, bsize, td, &pg);
T> -			if (error != 0) {
T> -				if (error == EAGAIN)
T> -					error = 0;	/* not a real error */
T> -				break;
T> -			}
T> -
T> -			/*
T>  			 * Get a sendfile buf.  When allocating the
T>  			 * first buffer for mbuf chain, we usually
T>  			 * wait as long as necessary, but this wait
T> @@ -3045,17 +3114,18 @@ retry_space:
T>  			 * threads might exhaust the buffers and then
T>  			 * deadlock.
T>  			 */
T> -			sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT :
T> -			    SFB_CATCH);
T> +			sf = sf_buf_alloc(pa[i],
T> +			    m != NULL ? SFB_NOWAIT : SFB_CATCH);
T>  			if (sf == NULL) {
T>  				SFSTAT_INC(sf_allocfail);
T> -				vm_page_lock(pg);
T> -				vm_page_unwire(pg, 0);
T> -				KASSERT(pg->object != NULL,
T> -				    ("%s: object disappeared", __func__));
T> -				vm_page_unlock(pg);
T> +				for (int j = i; j < npages; j++) {
T> +					vm_page_lock(pa[j]);
T> +					vm_page_unwire(pa[j], 0);
T> +					vm_page_unlock(pa[j]);
T> +				}
T>  				if (m == NULL)
T> -					error = (mnw ? EAGAIN : EINTR);
T> +					error = ENOBUFS;
T> +				fixspace(npages, i, off, &space);
T>  				break;
T>  			}
T>  
T> @@ -3063,36 +3133,26 @@ retry_space:
T>  			 * Get an mbuf and set it up as having
T>  			 * external storage.
T>  			 */
T> -			m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA);
T> -			if (m0 == NULL) {
T> -				error = (mnw ? EAGAIN : ENOBUFS);
T> -				(void)sf_buf_mext(NULL, NULL, sf);
T> -				break;
T> -			}
T> -			if (m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE,
T> -			    sf_buf_mext, sfs, sf, M_RDONLY, EXT_SFBUF,
T> -			    (mnw ? M_NOWAIT : M_WAITOK)) != 0) {
T> -				error = (mnw ? EAGAIN : ENOBUFS);
T> -				(void)sf_buf_mext(NULL, NULL, sf);
T> -				m_freem(m0);
T> -				break;
T> -			}
T> -			m0->m_data = (char *)sf_buf_kva(sf) + pgoff;
T> -			m0->m_len = xfsize;
T> +			m0 = m_get(M_WAITOK, MT_DATA);
T> +			(void )m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE,
T> +			    (flags & SF_NOCACHE) ? sf_mext_free_nocache :
T> +			    sf_mext_free, sfs, sf, M_RDONLY, EXT_SFBUF,
T> +			    M_WAITOK);
T> +			m0->m_data = (char *)sf_buf_kva(sf) +
T> +			    (vmoff(i, off) & PAGE_MASK);
T> +			m0->m_len = xfsize(i, npages, off, space);
T> +			m0->m_flags |= M_NOTREADY;
T>  
T> +			if (i == 0)
T> +				sfio->m = m0;
T> +
T>  			/* Append to mbuf chain. */
T>  			if (mtail != NULL)
T>  				mtail->m_next = m0;
T> -			else if (m != NULL)
T> -				m_last(m)->m_next = m0;
T>  			else
T>  				m = m0;
T>  			mtail = m0;
T>  
T> -			/* Keep track of bits processed. */
T> -			loopbytes += xfsize;
T> -			off += xfsize;
T> -
T>  			/*
T>  			 * XXX eventually this should be a sfsync
T>  			 * method call!
T> @@ -3104,47 +3164,51 @@ retry_space:
T>  		if (vp != NULL)
T>  			VOP_UNLOCK(vp, 0);
T>  
T> +		/* Keep track of bytes processed. */
T> +		off += space;
T> +		rem -= space;
T> +
T> +		/* Prepend header, if any. */
T> +		if (hdrlen) {
T> +			mhtail->m_next = m;
T> +			m = mh;
T> +			mh = NULL;
T> +		}
T> +
T> +		if (error) {
T> +			free(sfio, M_TEMP);
T> +			goto done;
T> +		}
T> +
T>  		/* Add the buffer chain to the socket buffer. */
T> -		if (m != NULL) {
T> -			int mlen, err;
T> +		KASSERT(m_length(m, NULL) == space + hdrlen,
T> +		    ("%s: mlen %u space %d hdrlen %d",
T> +		    __func__, m_length(m, NULL), space, hdrlen));
T>  
T> -			mlen = m_length(m, NULL);
T> -			SOCKBUF_LOCK(&so->so_snd);
T> -			if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
T> -				error = EPIPE;
T> -				SOCKBUF_UNLOCK(&so->so_snd);
T> -				goto done;
T> -			}
T> -			SOCKBUF_UNLOCK(&so->so_snd);
T> -			CURVNET_SET(so->so_vnet);
T> -			/* Avoid error aliasing. */
T> -			err = (*so->so_proto->pr_usrreqs->pru_send)
T> -				    (so, 0, m, NULL, NULL, td);
T> -			CURVNET_RESTORE();
T> -			if (err == 0) {
T> -				/*
T> -				 * We need two counters to get the
T> -				 * file offset and nbytes to send
T> -				 * right:
T> -				 * - sbytes contains the total amount
T> -				 *   of bytes sent, including headers.
T> -				 * - fsbytes contains the total amount
T> -				 *   of bytes sent from the file.
T> -				 */
T> -				sbytes += mlen;
T> -				fsbytes += mlen;
T> -				if (hdrlen) {
T> -					fsbytes -= hdrlen;
T> -					hdrlen = 0;
T> -				}
T> -			} else if (error == 0)
T> -				error = err;
T> -			m = NULL;	/* pru_send always consumes */
T> +		CURVNET_SET(so->so_vnet);
T> +		if (nios == 0) {
T> +			free(sfio, M_TEMP);
T> +			serror = (*so->so_proto->pr_usrreqs->pru_send)
T> +			    (so, 0, m, NULL, NULL, td);
T> +		} else {
T> +			sfio->sock_fp = sock_fp;
T> +			sfio->npages = npages;
T> +			fhold(sock_fp);
T> +			serror = (*so->so_proto->pr_usrreqs->pru_send)
T> +			    (so, PRUS_NOTREADY, m, NULL, NULL, td);
T> +			sf_io_done(sfio);
T>  		}
T> +		CURVNET_RESTORE();
T>  
T> -		/* Quit outer loop on error or when we're done. */
T> -		if (done)
T> -			break;
T> +		if (serror == 0) {
T> +			sbytes += space + hdrlen;
T> +			if (hdrlen)
T> +				hdrlen = 0;
T> +		} else if (error == 0)
T> +			error = serror;
T> +		m = NULL;	/* pru_send always consumes */
T> +
T> +		/* Quit outer loop on error. */
T>  		if (error != 0)
T>  			goto done;
T>  	}
T> @@ -3179,6 +3243,8 @@ out:
T>  		fdrop(sock_fp, td);
T>  	if (m)
T>  		m_freem(m);
T> +	if (mh)
T> +		m_freem(mh);
T>  
T>  	if (error == ERESTART)
T>  		error = EINTR;
T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c
T> ===================================================================
T> --- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c	(.../head)	(revision 266804)
T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng
T>  	/*
T>   	 * Check if we have more data to send
T>   	 */
T> -
T>  	sbdroprecord(&pcb->so->so_snd);
T> -	if (pcb->so->so_snd.sb_cc > 0) {
T> +	if (sbavail(&pcb->so->so_snd) > 0) {
T>  		if (ng_btsocket_l2cap_send2(pcb) == 0)
T>  			ng_btsocket_l2cap_timeout(pcb);
T>  		else
T> @@ -2510,7 +2509,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc
T>  	
T>  	mtx_assert(&pcb->pcb_mtx, MA_OWNED);
T>  
T> -	if (pcb->so->so_snd.sb_cc == 0)
T> +	if (sbavail(&pcb->so->so_snd) == 0)
T>  		return (EINVAL); /* XXX */
T>  
T>  	m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT);
T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c
T> ===================================================================
T> --- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	(.../head)	(revision 266804)
T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	(.../projects/sendfile)	(revision 266807)
T> @@ -3274,7 +3274,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb
T>  	}
T>  
T>  	for (error = 0, sent = 0; sent < limit; sent ++) { 
T> -		length = min(pcb->mtu, pcb->so->so_snd.sb_cc);
T> +		length = min(pcb->mtu, sbavail(&pcb->so->so_snd));
T>  		if (length == 0)
T>  			break;
T>  
T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c
T> ===================================================================
T> --- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c	(.../head)	(revision 266804)
T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c	(.../projects/sendfile)	(revision 266807)
T> @@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg *
T>  				sbdroprecord(&pcb->so->so_snd);
T>  
T>  			/* Send more if we have any */
T> -			if (pcb->so->so_snd.sb_cc > 0)
T> +			if (sbavail(&pcb->so->so_snd) > 0)
T>  				if (ng_btsocket_sco_send2(pcb) == 0)
T>  					ng_btsocket_sco_timeout(pcb);
T>  
T> @@ -1744,7 +1744,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb)
T>  	mtx_assert(&pcb->pcb_mtx, MA_OWNED);
T>  
T>  	while (pcb->rt->pending < pcb->rt->num_pkts &&
T> -	       pcb->so->so_snd.sb_cc > 0) {
T> +	       sbavail(&pcb->so->so_snd) > 0) {
T>  		/* Get a copy of the first packet on send queue */
T>  		m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT);
T>  		if (m == NULL) {
T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c
T> ===================================================================
T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c	(.../head)	(revision 266804)
T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c	(.../projects/sendfile)	(revision 266807)
T> @@ -746,7 +746,7 @@ sdp_start_disconnect(struct sdp_sock *ssk)
T>  		    ("sdp_start_disconnect: sdp_drop() returned NULL"));
T>  	} else {
T>  		soisdisconnecting(so);
T> -		unread = so->so_rcv.sb_cc;
T> +		unread = sbused(&so->so_rcv);
T>  		sbflush(&so->so_rcv);
T>  		sdp_usrclosed(ssk);
T>  		if (!(ssk->flags & SDP_DROPPED)) {
T> @@ -888,7 +888,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s
T>  		m_adj(mb, SDP_HEAD_SIZE);
T>  		n->m_pkthdr.len += mb->m_pkthdr.len;
T>  		n->m_flags |= mb->m_flags & (M_PUSH | M_URG);
T> -		m_demote(mb, 1);
T> +		m_demote(mb, 1, 0);
T>  		sbcompress(sb, mb, sb->sb_mbtail);
T>  		return;
T>  	}
T> @@ -1258,7 +1258,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps
T>  	/* We will never ever get anything unless we are connected. */
T>  	if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
T>  		/* When disconnecting there may be still some data left. */
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb))
T>  			goto deliver;
T>  		if (!(so->so_state & SS_ISDISCONNECTED))
T>  			error = ENOTCONN;
T> @@ -1266,7 +1266,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps
T>  	}
T>  
T>  	/* Socket buffer is empty and we shall not block. */
T> -	if (sb->sb_cc == 0 &&
T> +	if (sbavail(sb) == 0 &&
T>  	    ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
T>  		error = EAGAIN;
T>  		goto out;
T> @@ -1277,7 +1277,7 @@ restart:
T>  
T>  	/* Abort if socket has reported problems. */
T>  	if (so->so_error) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb))
T>  			goto deliver;
T>  		if (oresid > uio->uio_resid)
T>  			goto out;
T> @@ -1289,7 +1289,7 @@ restart:
T>  
T>  	/* Door is closed.  Deliver what is left, if any. */
T>  	if (sb->sb_state & SBS_CANTRCVMORE) {
T> -		if (sb->sb_cc > 0)
T> +		if (sbavail(sb))
T>  			goto deliver;
T>  		else
T>  			goto out;
T> @@ -1296,18 +1296,18 @@ restart:
T>  	}
T>  
T>  	/* Socket buffer got some data that we shall deliver now. */
T> -	if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
T> +	if (sbavail(sb) && !(flags & MSG_WAITALL) &&
T>  	    ((so->so_state & SS_NBIO) ||
T>  	     (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
T> -	     sb->sb_cc >= sb->sb_lowat ||
T> -	     sb->sb_cc >= uio->uio_resid ||
T> -	     sb->sb_cc >= sb->sb_hiwat) ) {
T> +	     sbavail(sb) >= sb->sb_lowat ||
T> +	     sbavail(sb) >= uio->uio_resid ||
T> +	     sbavail(sb) >= sb->sb_hiwat) ) {
T>  		goto deliver;
T>  	}
T>  
T>  	/* On MSG_WAITALL we must wait until all data or error arrives. */
T>  	if ((flags & MSG_WAITALL) &&
T> -	    (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat))
T> +	    (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat))
T>  		goto deliver;
T>  
T>  	/*
T> @@ -1321,7 +1321,7 @@ restart:
T>  
T>  deliver:
T>  	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
T> -	KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__));
T> +	KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__));
T>  	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
T>  
T>  	/* Statistics. */
T> @@ -1329,7 +1329,7 @@ deliver:
T>  		uio->uio_td->td_ru.ru_msgrcv++;
T>  
T>  	/* Fill uio until full or current end of socket buffer is reached. */
T> -	len = min(uio->uio_resid, sb->sb_cc);
T> +	len = min(uio->uio_resid, sbavail(sb));
T>  	if (mp0 != NULL) {
T>  		/* Dequeue as many mbufs as possible. */
T>  		if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
T> @@ -1509,7 +1509,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb)
T>  	if (so == NULL)
T>  		return;
T>  
T> -	so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1;
T> +	so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1;
T>  	sohasoutofband(so);
T>  	ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB);
T>  	if (!(so->so_options & SO_OOBINLINE)) {
T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c
T> ===================================================================
T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c	(.../head)	(revision 266804)
T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c	(.../projects/sendfile)	(revision 266807)
T> @@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk)
T>  	 * Compute bytes in the receive queue and socket buffer.
T>  	 */
T>  	bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size;
T> -	bytes_in_process += ssk->socket->so_rcv.sb_cc;
T> +	bytes_in_process += sbused(&ssk->socket->so_rcv);
T>  
T>  	return bytes_in_process < max_bytes;
T>  }
T> Index: sys/sys/socket.h
T> ===================================================================
T> --- sys/sys/socket.h	(.../head)	(revision 266804)
T> +++ sys/sys/socket.h	(.../projects/sendfile)	(revision 266807)
T> @@ -602,12 +602,15 @@ struct sf_hdtr_all {
T>   * Sendfile-specific flag(s)
T>   */
T>  #define	SF_NODISKIO     0x00000001
T> -#define	SF_MNOWAIT	0x00000002
T> +#define	SF_MNOWAIT	0x00000002	/* unused since 11.0 */
T>  #define	SF_SYNC		0x00000004
T>  #define	SF_KQUEUE	0x00000008
T> +#define	SF_NOCACHE	0x00000010
T> +#define	SF_FLAGS(rh, flags)	(((rh) << 16) | (flags))
T>  
T>  #ifdef _KERNEL
T>  #define	SFK_COMPAT	0x00000001
T> +#define	SF_READAHEAD(flags)	((flags) >> 16)
T>  #endif /* _KERNEL */
T>  #endif /* __BSD_VISIBLE */
T>  
T> Index: sys/sys/sockbuf.h
T> ===================================================================
T> --- sys/sys/sockbuf.h	(.../head)	(revision 266804)
T> +++ sys/sys/sockbuf.h	(.../projects/sendfile)	(revision 266807)
T> @@ -89,8 +89,13 @@ struct	sockbuf {
T>  	struct	mbuf *sb_lastrecord;	/* (c/d) first mbuf of last
T>  					 * record in socket buffer */
T>  	struct	mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */
T> +	struct	mbuf *sb_fnrdy;	/* (c/d) pointer to first not ready buffer */
T> +#if 0
T> +	struct	mbuf *sb_lnrdy;	/* (c/d) pointer to last not ready buffer */
T> +#endif
T>  	u_int	sb_sndptroff;	/* (c/d) byte offset of ptr into chain */
T> -	u_int	sb_cc;		/* (c/d) actual chars in buffer */
T> +	u_int	sb_acc;		/* (c/d) available chars in buffer */
T> +	u_int	sb_ccc;		/* (c/d) claimed chars in buffer */
T>  	u_int	sb_hiwat;	/* (c/d) max actual char count */
T>  	u_int	sb_mbcnt;	/* (c/d) chars of mbufs used */
T>  	u_int   sb_mcnt;        /* (c/d) number of mbufs in buffer */
T> @@ -120,10 +125,17 @@ struct	sockbuf {
T>  #define	SOCKBUF_LOCK_ASSERT(_sb)	mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED)
T>  #define	SOCKBUF_UNLOCK_ASSERT(_sb)	mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED)
T>  
T> +/*
T> + * Socket buffer private mbuf(9) flags.
T> + */
T> +#define	M_NOTREADY	M_PROTO1	/* m_data not populated yet */
T> +#define	M_BLOCKED	M_PROTO2	/* M_NOTREADY in front of m */
T> +#define	M_NOTAVAIL	(M_NOTREADY | M_BLOCKED)
T> +
T>  void	sbappend(struct sockbuf *sb, struct mbuf *m);
T>  void	sbappend_locked(struct sockbuf *sb, struct mbuf *m);
T> -void	sbappendstream(struct sockbuf *sb, struct mbuf *m);
T> -void	sbappendstream_locked(struct sockbuf *sb, struct mbuf *m);
T> +void	sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags);
T> +void	sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags);
T>  int	sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa,
T>  	    struct mbuf *m0, struct mbuf *control);
T>  int	sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa,
T> @@ -136,7 +148,6 @@ int	sbappendcontrol_locked(struct sockbuf *sb, str
T>  	    struct mbuf *control);
T>  void	sbappendrecord(struct sockbuf *sb, struct mbuf *m0);
T>  void	sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0);
T> -void	sbcheck(struct sockbuf *sb);
T>  void	sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n);
T>  struct mbuf *
T>  	sbcreatecontrol(caddr_t p, int size, int type, int level);
T> @@ -162,59 +173,54 @@ void	sbtoxsockbuf(struct sockbuf *sb, struct xsock
T>  int	sbwait(struct sockbuf *sb);
T>  int	sblock(struct sockbuf *sb, int flags);
T>  void	sbunlock(struct sockbuf *sb);
T> +void	sballoc(struct sockbuf *, struct mbuf *);
T> +void	sbfree(struct sockbuf *, struct mbuf *);
T> +void	sbmtrim(struct sockbuf *, struct mbuf *, int);
T> +int	sbready(struct sockbuf *, struct mbuf *, int);
T>  
T> +static inline u_int
T> +sbavail(struct sockbuf *sb)
T> +{
T> +
T> +#if 0
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +	return (sb->sb_acc);
T> +}
T> +
T> +static inline u_int
T> +sbused(struct sockbuf *sb)
T> +{
T> +
T> +#if 0
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +	return (sb->sb_ccc);
T> +}
T> +
T>  /*
T>   * How much space is there in a socket buffer (so->so_snd or so->so_rcv)?
T>   * This is problematical if the fields are unsigned, as the space might
T> - * still be negative (cc > hiwat or mbcnt > mbmax).  Should detect
T> - * overflow and return 0.  Should use "lmin" but it doesn't exist now.
T> + * still be negative (ccc > hiwat or mbcnt > mbmax).
T>   */
T> -static __inline
T> -long
T> +static inline long
T>  sbspace(struct sockbuf *sb)
T>  {
T> -	long bleft;
T> -	long mleft;
T> +	long bleft, mleft;
T>  
T> +#if 0
T> +	SOCKBUF_LOCK_ASSERT(sb);
T> +#endif
T> +
T>  	if (sb->sb_flags & SB_STOP)
T>  		return(0);
T> -	bleft = sb->sb_hiwat - sb->sb_cc;
T> +
T> +	bleft = sb->sb_hiwat - sb->sb_ccc;
T>  	mleft = sb->sb_mbmax - sb->sb_mbcnt;
T> -	return((bleft < mleft) ? bleft : mleft);
T> -}
T>  
T> -/* adjust counters in sb reflecting allocation of m */
T> -#define	sballoc(sb, m) { \
T> -	(sb)->sb_cc += (m)->m_len; \
T> -	if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \
T> -		(sb)->sb_ctl += (m)->m_len; \
T> -	(sb)->sb_mbcnt += MSIZE; \
T> -	(sb)->sb_mcnt += 1; \
T> -	if ((m)->m_flags & M_EXT) { \
T> -		(sb)->sb_mbcnt += (m)->m_ext.ext_size; \
T> -		(sb)->sb_ccnt += 1; \
T> -	} \
T> +	return ((bleft < mleft) ? bleft : mleft);
T>  }
T>  
T> -/* adjust counters in sb reflecting freeing of m */
T> -#define	sbfree(sb, m) { \
T> -	(sb)->sb_cc -= (m)->m_len; \
T> -	if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \
T> -		(sb)->sb_ctl -= (m)->m_len; \
T> -	(sb)->sb_mbcnt -= MSIZE; \
T> -	(sb)->sb_mcnt -= 1; \
T> -	if ((m)->m_flags & M_EXT) { \
T> -		(sb)->sb_mbcnt -= (m)->m_ext.ext_size; \
T> -		(sb)->sb_ccnt -= 1; \
T> -	} \
T> -	if ((sb)->sb_sndptr == (m)) { \
T> -		(sb)->sb_sndptr = NULL; \
T> -		(sb)->sb_sndptroff = 0; \
T> -	} \
T> -	if ((sb)->sb_sndptroff != 0) \
T> -		(sb)->sb_sndptroff -= (m)->m_len; \
T> -}
T> -
T>  #define SB_EMPTY_FIXUP(sb) do {						\
T>  	if ((sb)->sb_mb == NULL) {					\
T>  		(sb)->sb_mbtail = NULL;					\
T> @@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb)
T>  
T>  #ifdef SOCKBUF_DEBUG
T>  void	sblastrecordchk(struct sockbuf *, const char *, int);
T> +void	sblastmbufchk(struct sockbuf *, const char *, int);
T> +void	sbcheck(struct sockbuf *, const char *, int);
T>  #define	SBLASTRECORDCHK(sb)	sblastrecordchk((sb), __FILE__, __LINE__)
T> -
T> -void	sblastmbufchk(struct sockbuf *, const char *, int);
T>  #define	SBLASTMBUFCHK(sb)	sblastmbufchk((sb), __FILE__, __LINE__)
T> +#define	SBCHECK(sb)		sbcheck((sb), __FILE__, __LINE__)
T>  #else
T> -#define	SBLASTRECORDCHK(sb)      /* nothing */
T> -#define	SBLASTMBUFCHK(sb)        /* nothing */
T> +#define	SBLASTRECORDCHK(sb)	do {} while (0)
T> +#define	SBLASTMBUFCHK(sb)	do {} while (0)
T> +#define	SBCHECK(sb)		do {} while (0)
T>  #endif /* SOCKBUF_DEBUG */
T>  
T>  #endif /* _KERNEL */
T> Index: sys/sys/protosw.h
T> ===================================================================
T> --- sys/sys/protosw.h	(.../head)	(revision 266804)
T> +++ sys/sys/protosw.h	(.../projects/sendfile)	(revision 266807)
T> @@ -209,6 +209,7 @@ struct pr_usrreqs {
T>  #define	PRUS_OOB	0x1
T>  #define	PRUS_EOF	0x2
T>  #define	PRUS_MORETOCOME	0x4
T> +#define	PRUS_NOTREADY	0x8
T>  	int	(*pru_sense)(struct socket *so, struct stat *sb);
T>  	int	(*pru_shutdown)(struct socket *so);
T>  	int	(*pru_flush)(struct socket *so, int direction);
T> Index: sys/sys/sf_buf.h
T> ===================================================================
T> --- sys/sys/sf_buf.h	(.../head)	(revision 266804)
T> +++ sys/sys/sf_buf.h	(.../projects/sendfile)	(revision 266807)
T> @@ -52,7 +52,7 @@ struct sfstat {				/* sendfile statistics */
T>  #include <machine/sf_buf.h>
T>  #include <sys/systm.h>
T>  #include <sys/counter.h>
T> -struct mbuf;	/* for sf_buf_mext() */
T> +struct mbuf;	/* for sf_mext_free() */
T>  
T>  extern counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)];
T>  #define	SFSTAT_ADD(name, val)	\
T> @@ -61,6 +61,6 @@ extern counter_u64_t sfstat[sizeof(struct sfstat)
T>  #define	SFSTAT_INC(name)	SFSTAT_ADD(name, 1)
T>  #endif /* _KERNEL */
T>  
T> -int	sf_buf_mext(struct mbuf *mb, void *addr, void *args);
T> +int	sf_mext_free(struct mbuf *mb, void *addr, void *args);
T>  
T>  #endif /* !_SYS_SF_BUF_H_ */
T> Index: sys/sys/vnode.h
T> ===================================================================
T> --- sys/sys/vnode.h	(.../head)	(revision 266804)
T> +++ sys/sys/vnode.h	(.../projects/sendfile)	(revision 266807)
T> @@ -719,6 +719,7 @@ int	vop_stdbmap(struct vop_bmap_args *);
T>  int	vop_stdfsync(struct vop_fsync_args *);
T>  int	vop_stdgetwritemount(struct vop_getwritemount_args *);
T>  int	vop_stdgetpages(struct vop_getpages_args *);
T> +int	vop_stdgetpages_async(struct vop_getpages_async_args *);
T>  int	vop_stdinactive(struct vop_inactive_args *);
T>  int	vop_stdislocked(struct vop_islocked_args *);
T>  int	vop_stdkqfilter(struct vop_kqfilter_args *);
T> Index: sys/sys/socketvar.h
T> ===================================================================
T> --- sys/sys/socketvar.h	(.../head)	(revision 266804)
T> +++ sys/sys/socketvar.h	(.../projects/sendfile)	(revision 266807)
T> @@ -205,7 +205,7 @@ struct xsocket {
T>  
T>  /* can we read something from so? */
T>  #define	soreadabledata(so) \
T> -    ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \
T> +    (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \
T>  	!TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error)
T>  #define	soreadable(so) \
T>  	(soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE))
T> Index: sys/sys/mbuf.h
T> ===================================================================
T> --- sys/sys/mbuf.h	(.../head)	(revision 266804)
T> +++ sys/sys/mbuf.h	(.../projects/sendfile)	(revision 266807)
T> @@ -922,7 +922,7 @@ struct mbuf	*m_copypacket(struct mbuf *, int);
T>  void		 m_copy_pkthdr(struct mbuf *, struct mbuf *);
T>  struct mbuf	*m_copyup(struct mbuf *, int, int);
T>  struct mbuf	*m_defrag(struct mbuf *, int);
T> -void		 m_demote(struct mbuf *, int);
T> +void		 m_demote(struct mbuf *, int, int);
T>  struct mbuf	*m_devget(char *, int, int, struct ifnet *,
T>  		    void (*)(char *, caddr_t, u_int));
T>  struct mbuf	*m_dup(struct mbuf *, int);
T> Index: sys/vm/vnode_pager.h
T> ===================================================================
T> --- sys/vm/vnode_pager.h	(.../head)	(revision 266804)
T> +++ sys/vm/vnode_pager.h	(.../projects/sendfile)	(revision 266807)
T> @@ -41,7 +41,7 @@
T>  #ifdef _KERNEL
T>  
T>  int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m,
T> -					  int count, int reqpage);
T> +    int count, int reqpage, void (*iodone)(void *), void *arg);
T>  int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m,
T>  					  int count, boolean_t sync,
T>  					  int *rtvals);
T> Index: sys/vm/vm_pager.h
T> ===================================================================
T> --- sys/vm/vm_pager.h	(.../head)	(revision 266804)
T> +++ sys/vm/vm_pager.h	(.../projects/sendfile)	(revision 266807)
T> @@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset
T>      struct ucred *);
T>  typedef void pgo_dealloc_t(vm_object_t);
T>  typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int);
T> +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int,
T> +    void(*)(void *), void *);
T>  typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *);
T>  typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *);
T>  typedef void pgo_pageunswapped_t(vm_page_t);
T>  
T>  struct pagerops {
T> -	pgo_init_t	*pgo_init;		/* Initialize pager. */
T> -	pgo_alloc_t	*pgo_alloc;		/* Allocate pager. */
T> -	pgo_dealloc_t	*pgo_dealloc;		/* Disassociate. */
T> -	pgo_getpages_t	*pgo_getpages;		/* Get (read) page. */
T> -	pgo_putpages_t	*pgo_putpages;		/* Put (write) page. */
T> -	pgo_haspage_t	*pgo_haspage;		/* Does pager have page? */
T> -	pgo_pageunswapped_t *pgo_pageunswapped;
T> +	pgo_init_t		*pgo_init;		/* Initialize pager. */
T> +	pgo_alloc_t		*pgo_alloc;		/* Allocate pager. */
T> +	pgo_dealloc_t		*pgo_dealloc;		/* Disassociate. */
T> +	pgo_getpages_t		*pgo_getpages;		/* Get (read) page. */
T> +	pgo_getpages_async_t	*pgo_getpages_async;	/* Get page asyncly. */
T> +	pgo_putpages_t		*pgo_putpages;		/* Put (write) page. */
T> +	pgo_haspage_t		*pgo_haspage;		/* Query page. */
T> +	pgo_pageunswapped_t	*pgo_pageunswapped;
T>  };
T>  
T>  extern struct pagerops defaultpagerops;
T> @@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v
T>  void vm_pager_bufferinit(void);
T>  void vm_pager_deallocate(vm_object_t);
T>  static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int);
T> +static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int,
T> +    int, void(*)(void *), void *);
T>  static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *);
T>  void vm_pager_init(void);
T>  vm_object_t vm_pager_object_lookup(struct pagerlst *, void *);
T> @@ -131,6 +136,27 @@ vm_pager_get_pages(
T>  	return (r);
T>  }
T>  
T> +static __inline int
T> +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count,
T> +    int reqpage, void (*iodone)(void *), void *arg)
T> +{
T> +	int r;
T> +
T> +	VM_OBJECT_ASSERT_WLOCKED(object);
T> +
T> +	if (*pagertab[object->type]->pgo_getpages_async == NULL) {
T> +		/* Emulate async operation. */
T> +		r = vm_pager_get_pages(object, m, count, reqpage);
T> +		VM_OBJECT_WUNLOCK(object);
T> +		(iodone)(arg);
T> +		VM_OBJECT_WLOCK(object);
T> +	} else
T> +		r = (*pagertab[object->type]->pgo_getpages_async)(object, m,
T> +		    count, reqpage, iodone, arg);
T> +
T> +	return (r);
T> +}
T> +
T>  static __inline void
T>  vm_pager_put_pages(
T>  	vm_object_t object,
T> Index: sys/vm/vm_page.c
T> ===================================================================
T> --- sys/vm/vm_page.c	(.../head)	(revision 266804)
T> +++ sys/vm/vm_page.c	(.../projects/sendfile)	(revision 266807)
T> @@ -2689,6 +2689,8 @@ retrylookup:
T>  		sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ?
T>  		    vm_page_xbusied(m) : vm_page_busied(m);
T>  		if (sleep) {
T> +			if (allocflags & VM_ALLOC_NOWAIT)
T> +				return (NULL);
T>  			/*
T>  			 * Reference the page before unlocking and
T>  			 * sleeping so that the page daemon is less
T> @@ -2716,6 +2718,8 @@ retrylookup:
T>  	}
T>  	m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY);
T>  	if (m == NULL) {
T> +		if (allocflags & VM_ALLOC_NOWAIT)
T> +			return (NULL);
T>  		VM_OBJECT_WUNLOCK(object);
T>  		VM_WAIT;
T>  		VM_OBJECT_WLOCK(object);
T> Index: sys/vm/vm_page.h
T> ===================================================================
T> --- sys/vm/vm_page.h	(.../head)	(revision 266804)
T> +++ sys/vm/vm_page.h	(.../projects/sendfile)	(revision 266807)
T> @@ -390,6 +390,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa);
T>  #define	VM_ALLOC_IGN_SBUSY	0x1000	/* vm_page_grab() only */
T>  #define	VM_ALLOC_NODUMP		0x2000	/* don't include in dump */
T>  #define	VM_ALLOC_SBUSY		0x4000	/* Shared busy the page */
T> +#define	VM_ALLOC_NOWAIT		0x8000	/* Return NULL instead of sleeping */
T>  
T>  #define	VM_ALLOC_COUNT_SHIFT	16
T>  #define	VM_ALLOC_COUNT(count)	((count) << VM_ALLOC_COUNT_SHIFT)
T> Index: sys/vm/vnode_pager.c
T> ===================================================================
T> --- sys/vm/vnode_pager.c	(.../head)	(revision 266804)
T> +++ sys/vm/vnode_pager.c	(.../projects/sendfile)	(revision 266807)
T> @@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj
T>  static int vnode_pager_input_old(vm_object_t object, vm_page_t m);
T>  static void vnode_pager_dealloc(vm_object_t);
T>  static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int);
T> +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int,
T> +    void(*)(void  *), void *);
T>  static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *);
T>  static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *);
T>  static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t,
T> @@ -92,6 +94,7 @@ struct pagerops vnodepagerops = {
T>  	.pgo_alloc =	vnode_pager_alloc,
T>  	.pgo_dealloc =	vnode_pager_dealloc,
T>  	.pgo_getpages =	vnode_pager_getpages,
T> +	.pgo_getpages_async = vnode_pager_getpages_async,
T>  	.pgo_putpages =	vnode_pager_putpages,
T>  	.pgo_haspage =	vnode_pager_haspage,
T>  };
T> @@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t
T>  	return rtval;
T>  }
T>  
T> +static int
T> +vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count,
T> +    int reqpage, void (*iodone)(void *), void *arg)
T> +{
T> +	int rtval;
T> +	struct vnode *vp;
T> +	int bytes = count * PAGE_SIZE;
T> +
T> +	vp = object->handle;
T> +	VM_OBJECT_WUNLOCK(object);
T> +	rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg);
T> +	KASSERT(rtval != EOPNOTSUPP,
T> +	    ("vnode_pager: FS getpages_async not implemented\n"));
T> +	VM_OBJECT_WLOCK(object);
T> +	return rtval;
T> +}
T> +
T> +struct getpages_softc {
T> +	vm_page_t *m;
T> +	struct buf *bp;
T> +	vm_object_t object;
T> +	vm_offset_t kva;
T> +	off_t foff;
T> +	int size;
T> +	int count;
T> +	int unmapped;
T> +	int reqpage;
T> +	void (*iodone)(void *);
T> +	void *arg;
T> +};
T> +
T> +int	vnode_pager_generic_getpages_done(struct getpages_softc *);
T> +void	vnode_pager_generic_getpages_done_async(struct buf *);
T> +
T>  /*
T>   * This is now called from local media FS's to operate against their
T>   * own vnodes if they fail to implement VOP_GETPAGES.
T> @@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t
T>   */
T>  int
T>  vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount,
T> -    int reqpage)
T> +    int reqpage, void (*iodone)(void *), void *arg)
T>  {
T>  	vm_object_t object;
T>  	vm_offset_t kva;
T> -	off_t foff, tfoff, nextoff;
T> +	off_t foff;
T>  	int i, j, size, bsize, first;
T>  	daddr_t firstaddr, reqblock;
T>  	struct bufobj *bo;
T> @@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	struct mount *mp;
T>  	int count;
T>  	int error;
T> +	int unmapped;
T>  
T>  	object = vp->v_object;
T>  	count = bytecount / PAGE_SIZE;
T> @@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	 * requires mapped buffers.
T>  	 */
T>  	mp = vp->v_mount;
T> -	if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 &&
T> -	    unmapped_buf_allowed) {
T> +	unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS));
T> +	if (unmapped && unmapped_buf_allowed) {
T>  		bp->b_data = unmapped_buf;
T>  		bp->b_kvabase = unmapped_buf;
T>  		bp->b_offset = 0;
T> @@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  
T>  	/* build a minimal buffer header */
T>  	bp->b_iocmd = BIO_READ;
T> -	bp->b_iodone = bdone;
T>  	KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred"));
T>  	KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred"));
T>  	bp->b_rcred = crhold(curthread->td_ucred);
T> @@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  
T>  	/* do the input */
T>  	bp->b_iooffset = dbtob(bp->b_blkno);
T> -	bstrategy(bp);
T>  
T> -	bwait(bp, PVM, "vnread");
T> +	if (iodone) { /* async */
T> +		struct getpages_softc *sc;
T>  
T> +		sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK);
T> +
T> +		sc->m = m;
T> +		sc->bp = bp;
T> +		sc->object = object;
T> +		sc->foff = foff;
T> +		sc->size = size;
T> +		sc->count = count;
T> +		sc->unmapped = unmapped;
T> +		sc->reqpage = reqpage;
T> +		sc->kva = kva;
T> +
T> +		sc->iodone = iodone;
T> +		sc->arg = arg;
T> +
T> +		bp->b_iodone = vnode_pager_generic_getpages_done_async;
T> +		bp->b_caller1 = sc;
T> +		BUF_KERNPROC(bp);
T> +		bstrategy(bp);
T> +		/* Good bye! */
T> +	} else {
T> +		struct getpages_softc sc;
T> +
T> +		sc.m = m;
T> +		sc.bp = bp;
T> +		sc.object = object;
T> +		sc.foff = foff;
T> +		sc.size = size;
T> +		sc.count = count;
T> +		sc.unmapped = unmapped;
T> +		sc.reqpage = reqpage;
T> +		sc.kva = kva;
T> +
T> +		bp->b_iodone = bdone;
T> +		bstrategy(bp);
T> +		bwait(bp, PVM, "vnread");
T> +		error = vnode_pager_generic_getpages_done(&sc);
T> +	}
T> +
T> +	return (error ? VM_PAGER_ERROR : VM_PAGER_OK);
T> +}
T> +
T> +void
T> +vnode_pager_generic_getpages_done_async(struct buf *bp)
T> +{
T> +	struct getpages_softc *sc = bp->b_caller1;
T> +	int error;
T> +
T> +	error = vnode_pager_generic_getpages_done(sc);
T> +
T> +	vm_page_xunbusy(sc->m[sc->reqpage]);
T> +
T> +	sc->iodone(sc->arg);
T> +
T> +	free(sc, M_TEMP);
T> +}
T> +
T> +int
T> +vnode_pager_generic_getpages_done(struct getpages_softc *sc)
T> +{
T> +	vm_object_t object;
T> +	vm_offset_t kva;
T> +	vm_page_t *m;
T> +	struct buf *bp;
T> +	off_t foff, tfoff, nextoff;
T> +	int i, size, count, unmapped, reqpage;
T> +	int error = 0;
T> +
T> +	m = sc->m;
T> +	bp = sc->bp;
T> +	object = sc->object;
T> +	foff = sc->foff;
T> +	size = sc->size;
T> +	count = sc->count;
T> +	unmapped = sc->unmapped;
T> +	reqpage = sc->reqpage;
T> +	kva = sc->kva;
T> +
T>  	if ((bp->b_ioflags & BIO_ERROR) != 0)
T>  		error = EIO;
T>  
T> @@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	}
T>  	if ((bp->b_flags & B_UNMAPPED) == 0)
T>  		pmap_qremove(kva, count);
T> -	if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) {
T> +	if (unmapped) {
T>  		bp->b_data = (caddr_t)kva;
T>  		bp->b_kvabase = (caddr_t)kva;
T>  		bp->b_flags &= ~B_UNMAPPED;
T> @@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_
T>  	if (error) {
T>  		printf("vnode_pager_getpages: I/O read error\n");
T>  	}
T> -	return (error ? VM_PAGER_ERROR : VM_PAGER_OK);
T> +
T> +	return (error);
T>  }
T>  
T>  /*
T> Index: sys/rpc/clnt_vc.c
T> ===================================================================
T> --- sys/rpc/clnt_vc.c	(.../head)	(revision 266804)
T> +++ sys/rpc/clnt_vc.c	(.../projects/sendfile)	(revision 266807)
T> @@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int
T>  			 * error condition
T>  			 */
T>  			do_read = FALSE;
T> -			if (so->so_rcv.sb_cc >= sizeof(uint32_t)
T> +			if (sbavail(&so->so_rcv) >= sizeof(uint32_t)
T>  			    || (so->so_rcv.sb_state & SBS_CANTRCVMORE)
T>  			    || so->so_error)
T>  				do_read = TRUE;
T> @@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int
T>  			 * buffered.
T>  			 */
T>  			do_read = FALSE;
T> -			if (so->so_rcv.sb_cc >= ct->ct_record_resid
T> +			if (sbavail(&so->so_rcv) >= ct->ct_record_resid
T>  			    || (so->so_rcv.sb_state & SBS_CANTRCVMORE)
T>  			    || so->so_error)
T>  				do_read = TRUE;
T> Index: sys/rpc/svc_vc.c
T> ===================================================================
T> --- sys/rpc/svc_vc.c	(.../head)	(revision 266804)
T> +++ sys/rpc/svc_vc.c	(.../projects/sendfile)	(revision 266807)
T> @@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack)
T>  {
T>  
T>  	*ack = atomic_load_acq_32(&xprt->xp_snt_cnt);
T> -	*ack -= xprt->xp_socket->so_snd.sb_cc;
T> +	*ack -= sbused(&xprt->xp_socket->so_snd);
T>  	return (TRUE);
T>  }
T>  
T> Index: sys/ufs/ffs/ffs_vnops.c
T> ===================================================================
T> --- sys/ufs/ffs/ffs_vnops.c	(.../head)	(revision 266804)
T> +++ sys/ufs/ffs/ffs_vnops.c	(.../projects/sendfile)	(revision 266807)
T> @@ -105,6 +105,7 @@ extern int	ffs_rawread(struct vnode *vp, struct ui
T>  static vop_fsync_t	ffs_fsync;
T>  static vop_lock1_t	ffs_lock;
T>  static vop_getpages_t	ffs_getpages;
T> +static vop_getpages_async_t ffs_getpages_async;
T>  static vop_read_t	ffs_read;
T>  static vop_write_t	ffs_write;
T>  static int	ffs_extread(struct vnode *vp, struct uio *uio, int ioflag);
T> @@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = {
T>  	.vop_default =		&ufs_vnodeops,
T>  	.vop_fsync =		ffs_fsync,
T>  	.vop_getpages =		ffs_getpages,
T> +	.vop_getpages_async =	ffs_getpages_async,
T>  	.vop_lock1 =		ffs_lock,
T>  	.vop_read =		ffs_read,
T>  	.vop_reallocblks =	ffs_reallocblks,
T> @@ -847,18 +849,16 @@ ffs_write(ap)
T>  }
T>  
T>  /*
T> - * get page routine
T> + * Get page routines.
T>   */
T>  static int
T> -ffs_getpages(ap)
T> -	struct vop_getpages_args *ap;
T> +ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage)
T>  {
T> -	int i;
T>  	vm_page_t mreq;
T>  	int pcount;
T>  
T> -	pcount = round_page(ap->a_count) / PAGE_SIZE;
T> -	mreq = ap->a_m[ap->a_reqpage];
T> +	pcount = round_page(count) / PAGE_SIZE;
T> +	mreq = m[reqpage];
T>  
T>  	/*
T>  	 * if ANY DEV_BSIZE blocks are valid on a large filesystem block,
T> @@ -870,24 +870,48 @@ static int
T>  	if (mreq->valid) {
T>  		if (mreq->valid != VM_PAGE_BITS_ALL)
T>  			vm_page_zero_invalid(mreq, TRUE);
T> -		for (i = 0; i < pcount; i++) {
T> -			if (i != ap->a_reqpage) {
T> -				vm_page_lock(ap->a_m[i]);
T> -				vm_page_free(ap->a_m[i]);
T> -				vm_page_unlock(ap->a_m[i]);
T> +		for (int i = 0; i < pcount; i++) {
T> +			if (i != reqpage) {
T> +				vm_page_lock(m[i]);
T> +				vm_page_free(m[i]);
T> +				vm_page_unlock(m[i]);
T>  			}
T>  		}
T>  		VM_OBJECT_WUNLOCK(mreq->object);
T> -		return VM_PAGER_OK;
T> +		return (VM_PAGER_OK);
T>  	}
T>  	VM_OBJECT_WUNLOCK(mreq->object);
T>  
T> -	return vnode_pager_generic_getpages(ap->a_vp, ap->a_m,
T> -					    ap->a_count,
T> -					    ap->a_reqpage);
T> +	return (-1);
T>  }
T>  
T> +static int
T> +ffs_getpages(struct vop_getpages_args *ap)
T> +{
T> +	int rv;
T>  
T> +	rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage);
T> +	if (rv == VM_PAGER_OK)
T> +		return (rv);
T> +
T> +	return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count,
T> +	    ap->a_reqpage, NULL, NULL));
T> +}
T> +
T> +static int
T> +ffs_getpages_async(struct vop_getpages_async_args *ap)
T> +{
T> +	int rv;
T> +
T> +	rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage);
T> +	if (rv == VM_PAGER_OK) {
T> +		(ap->a_vop_getpages_iodone)(ap->a_arg);
T> +		return (rv);
T> +	}
T> +	return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count,
T> +	    ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg));
T> +}
T> +
T>  /*
T>   * Extended attribute area reading.
T>   */
T> Index: sys/tools/vnode_if.awk
T> ===================================================================
T> --- sys/tools/vnode_if.awk	(.../head)	(revision 266804)
T> +++ sys/tools/vnode_if.awk	(.../projects/sendfile)	(revision 266807)
T> @@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) {
T>  		if (sub(/;$/, "") < 1)
T>  			die("Missing end-of-line ; in \"%s\".", $0);
T>  
T> -		# pick off variable name
T> -		if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1)
T> -			die("Missing var name \"a_foo\" in \"%s\".", $0);
T> -		args[numargs] = substr($0, argp);
T> -		$0 = substr($0, 1, argp - 1);
T> -
T> -		# what is left must be type
T> -		# remove trailing space (if any)
T> -		sub(/ $/, "");
T> -		types[numargs] = $0;
T> +		# pick off argument name
T> +		if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) {
T> +			args[numargs] = substr($0, argp);
T> +			$0 = substr($0, 1, argp - 1);
T> +			sub(/ $/, "");
T> +			delete fargs[numargs];
T> +			types[numargs] = $0;
T> +		} else {	# try to parse a function pointer argument
T> +			if ((argp = match($0,
T> +			    /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1)
T> +				die("Missing var name \"a_foo\" in \"%s\".",
T> +				    $0);
T> +			args[numargs] = substr($0, argp + 2);
T> +			sub(/\).+/, "", args[numargs]);
T> +			fargs[numargs] = substr($0, argp);
T> +			sub(/^\([^)]+\)/, "", fargs[numargs]);
T> +			$0 = substr($0, 1, argp - 1);
T> +			sub(/ $/, "");
T> +			types[numargs] = $0;
T> +		}
T>  	}
T>  	if (numargs > 4)
T>  		ctrargs = 4;
T> @@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) {
T>  	if (hfile) {
T>  		# Print out the vop_F_args structure.
T>  		printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;");
T> -		for (i = 0; i < numargs; ++i)
T> -			printh("\t" t_spc(types[i]) "a_" args[i] ";");
T> +		for (i = 0; i < numargs; ++i) {
T> +			if (fargs[i]) {
T> +				printh("\t" t_spc(types[i]) "(*a_" args[i] \
T> +				    ")" fargs[i] ";");
T> +			} else
T> +				printh("\t" t_spc(types[i]) "a_" args[i] ";");
T> +		}
T>  		printh("};");
T>  		printh("");
T>  
T> @@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) {
T>  		printh("");
T>  		printh("static __inline int " uname "(");
T>  		for (i = 0; i < numargs; ++i) {
T> -			printh("\t" t_spc(types[i]) args[i] \
T> -			    (i < numargs - 1 ? "," : ")"));
T> +			if (fargs[i]) {
T> +				printh("\t" t_spc(types[i]) "(*" args[i] \
T> +				    ")" fargs[i] \
T> +				    (i < numargs - 1 ? "," : ")"));
T> +			} else {
T> +				printh("\t" t_spc(types[i]) args[i] \
T> +				    (i < numargs - 1 ? "," : ")"));
T> +			}
T>  		}
T>  		printh("{");
T>  		printh("\tstruct " name "_args a;");
T> Index: sys/netinet/tcp_reass.c
T> ===================================================================
T> --- sys/netinet/tcp_reass.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_reass.c	(.../projects/sendfile)	(revision 266807)
T> @@ -248,7 +248,7 @@ present:
T>  			m_freem(mq);
T>  		else {
T>  			mq->m_nextpkt = NULL;
T> -			sbappendstream_locked(&so->so_rcv, mq);
T> +			sbappendstream_locked(&so->so_rcv, mq, 0);
T>  			wakeup = 1;
T>  		}
T>  	}
T> Index: sys/netinet/accf_http.c
T> ===================================================================
T> --- sys/netinet/accf_http.c	(.../head)	(revision 266804)
T> +++ sys/netinet/accf_http.c	(.../projects/sendfile)	(revision 266807)
T> @@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb)
T>  	    "mbcnt(%ld) >= mbmax(%ld): %d",
T>  	    sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat,
T>  	    sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax);
T> -	return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax);
T> +	return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax);
T>  }
T>  
T>  /*
T> @@ -162,13 +162,14 @@ static int
T>  sohashttpget(struct socket *so, void *arg, int waitflag)
T>  {
T>  
T> -	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) {
T> +	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 &&
T> +	    !sbfull(&so->so_rcv)) {
T>  		struct mbuf *m;
T>  		char *cmp;
T>  		int	cmplen, cc;
T>  
T>  		m = so->so_rcv.sb_mb;
T> -		cc = so->so_rcv.sb_cc - 1;
T> +		cc = sbavail(&so->so_rcv) - 1;
T>  		if (cc < 1)
T>  			return (SU_OK);
T>  		switch (*mtod(m, char *)) {
T> @@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int
T>  		goto fallout;
T>  
T>  	m = so->so_rcv.sb_mb;
T> -	cc = so->so_rcv.sb_cc;
T> +	cc = sbavail(&so->so_rcv);
T>  	inspaces = spaces = 0;
T>  	for (m = so->so_rcv.sb_mb; m; m = n) {
T>  		n = m->m_nextpkt;
T> @@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in
T>  	 * have NCHRS left
T>  	 */
T>  	copied = 0;
T> -	ccleft = so->so_rcv.sb_cc;
T> +	ccleft = sbavail(&so->so_rcv);
T>  	if (ccleft < NCHRS)
T>  		goto readmore;
T>  	a = b = c = '\0';
T> Index: sys/netinet/sctp_os_bsd.h
T> ===================================================================
T> --- sys/netinet/sctp_os_bsd.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_os_bsd.h	(.../projects/sendfile)	(revision 266807)
T> @@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t;
T>  #define SCTP_SOWAKEUP(so)	wakeup(&(so)->so_timeo)
T>  /* clear the socket buffer state */
T>  #define SCTP_SB_CLEAR(sb)	\
T> -	(sb).sb_cc = 0;		\
T> +	(sb).sb_ccc = 0;		\
T>  	(sb).sb_mb = NULL;	\
T>  	(sb).sb_mbcnt = 0;
T>  
T> Index: sys/netinet/tcp_output.c
T> ===================================================================
T> --- sys/netinet/tcp_output.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_output.c	(.../projects/sendfile)	(revision 266807)
T> @@ -322,7 +322,7 @@ after_sack_rexmit:
T>  			 * to send then the probe will be the FIN
T>  			 * itself.
T>  			 */
T> -			if (off < so->so_snd.sb_cc)
T> +			if (off < sbavail(&so->so_snd))
T>  				flags &= ~TH_FIN;
T>  			sendwin = 1;
T>  		} else {
T> @@ -348,7 +348,8 @@ after_sack_rexmit:
T>  	 */
T>  	if (sack_rxmit == 0) {
T>  		if (sack_bytes_rxmt == 0)
T> -			len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off);
T> +			len = ((long)ulmin(sbavail(&so->so_snd), sendwin) -
T> +			    off);
T>  		else {
T>  			long cwin;
T>  
T> @@ -357,8 +358,8 @@ after_sack_rexmit:
T>  			 * sending new data, having retransmitted all the
T>  			 * data possible in the scoreboard.
T>  			 */
T> -			len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) 
T> -			       - off);
T> +			len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) -
T> +			    off);
T>  			/*
T>  			 * Don't remove this (len > 0) check !
T>  			 * We explicitly check for len > 0 here (although it 
T> @@ -457,12 +458,15 @@ after_sack_rexmit:
T>  	 * TODO: Shrink send buffer during idle periods together
T>  	 * with congestion window.  Requires another timer.  Has to
T>  	 * wait for upcoming tcp timer rewrite.
T> +	 *
T> +	 * XXXGL: should there be used sbused() or sbavail()?
T>  	 */
T>  	if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) {
T>  		if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat &&
T> -		    so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) &&
T> -		    so->so_snd.sb_cc < V_tcp_autosndbuf_max &&
T> -		    sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) {
T> +		    sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) &&
T> +		    sbused(&so->so_snd) < V_tcp_autosndbuf_max &&
T> +		    sendwin >= (sbused(&so->so_snd) -
T> +		    (tp->snd_nxt - tp->snd_una))) {
T>  			if (!sbreserve_locked(&so->so_snd,
T>  			    min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc,
T>  			     V_tcp_autosndbuf_max), so, curthread))
T> @@ -499,10 +503,11 @@ after_sack_rexmit:
T>  		tso = 1;
T>  
T>  	if (sack_rxmit) {
T> -		if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
T> +		if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd)))
T>  			flags &= ~TH_FIN;
T>  	} else {
T> -		if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc))
T> +		if (SEQ_LT(tp->snd_nxt + len, tp->snd_una +
T> +		    sbavail(&so->so_snd)))
T>  			flags &= ~TH_FIN;
T>  	}
T>  
T> @@ -532,7 +537,7 @@ after_sack_rexmit:
T>  		 */
T>  		if (!(tp->t_flags & TF_MORETOCOME) &&	/* normal case */
T>  		    (idle || (tp->t_flags & TF_NODELAY)) &&
T> -		    len + off >= so->so_snd.sb_cc &&
T> +		    len + off >= sbavail(&so->so_snd) &&
T>  		    (tp->t_flags & TF_NOPUSH) == 0) {
T>  			goto send;
T>  		}
T> @@ -660,7 +665,7 @@ dontupdate:
T>  	 * if window is nonzero, transmit what we can,
T>  	 * otherwise force out a byte.
T>  	 */
T> -	if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) &&
T> +	if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) &&
T>  	    !tcp_timer_active(tp, TT_PERSIST)) {
T>  		tp->t_rxtshift = 0;
T>  		tcp_setpersist(tp);
T> @@ -786,7 +791,7 @@ send:
T>  			 * fractional unless the send sockbuf can
T>  			 * be emptied.
T>  			 */
T> -			if (sendalot && off + len < so->so_snd.sb_cc) {
T> +			if (sendalot && off + len < sbavail(&so->so_snd)) {
T>  				len -= len % (tp->t_maxopd - optlen);
T>  				sendalot = 1;
T>  			}
T> @@ -889,7 +894,7 @@ send:
T>  		 * give data to the user when a buffer fills or
T>  		 * a PUSH comes in.)
T>  		 */
T> -		if (off + len == so->so_snd.sb_cc)
T> +		if (off + len == sbavail(&so->so_snd))
T>  			flags |= TH_PUSH;
T>  		SOCKBUF_UNLOCK(&so->so_snd);
T>  	} else {
T> Index: sys/netinet/siftr.c
T> ===================================================================
T> --- sys/netinet/siftr.c	(.../head)	(revision 266804)
T> +++ sys/netinet/siftr.c	(.../projects/sendfile)	(revision 266807)
T> @@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb *
T>  	pn->flags = tp->t_flags;
T>  	pn->rxt_length = tp->t_rxtcur;
T>  	pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat;
T> -	pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc;
T> +	pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd);
T>  	pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat;
T> -	pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc;
T> +	pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv);
T>  	pn->sent_inflight_bytes = tp->snd_max - tp->snd_una;
T>  	pn->t_segqlen = tp->t_segqlen;
T>  
T> Index: sys/netinet/sctp_indata.c
T> ===================================================================
T> --- sys/netinet/sctp_indata.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_indata.c	(.../projects/sendfile)	(revision 266807)
T> @@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_
T>  
T>  	/*
T>  	 * This is really set wrong with respect to a 1-2-m socket. Since
T> -	 * the sb_cc is the count that everyone as put up. When we re-write
T> +	 * the sb_ccc is the count that everyone as put up. When we re-write
T>  	 * sctp_soreceive then we will fix this so that ONLY this
T>  	 * associations data is taken into account.
T>  	 */
T> @@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_
T>  	if (stcb->sctp_socket == NULL)
T>  		return (calc);
T>  
T> -	if (stcb->asoc.sb_cc == 0 &&
T> +	if (stcb->asoc.sb_ccc == 0 &&
T>  	    asoc->size_on_reasm_queue == 0 &&
T>  	    asoc->size_on_all_streams == 0) {
T>  		/* Full rwnd granted */
T> @@ -1358,7 +1358,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s
T>  		 * When we have NO room in the rwnd we check to make sure
T>  		 * the reader is doing its job...
T>  		 */
T> -		if (stcb->sctp_socket->so_rcv.sb_cc) {
T> +		if (stcb->sctp_socket->so_rcv.sb_ccc) {
T>  			/* some to read, wake-up */
T>  #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
T>  			struct socket *so;
T> Index: sys/netinet/sctp_pcb.c
T> ===================================================================
T> --- sys/netinet/sctp_pcb.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_pcb.c	(.../projects/sendfile)	(revision 266807)
T> @@ -3328,7 +3328,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi
T>  			if ((asoc->asoc.size_on_reasm_queue > 0) ||
T>  			    (asoc->asoc.control_pdapi) ||
T>  			    (asoc->asoc.size_on_all_streams > 0) ||
T> -			    (so && (so->so_rcv.sb_cc > 0))) {
T> +			    (so && (so->so_rcv.sb_ccc > 0))) {
T>  				/* Left with Data unread */
T>  				struct mbuf *op_err;
T>  
T> @@ -3556,7 +3556,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi
T>  		TAILQ_REMOVE(&inp->read_queue, sq, next);
T>  		sctp_free_remote_addr(sq->whoFrom);
T>  		if (so)
T> -			so->so_rcv.sb_cc -= sq->length;
T> +			so->so_rcv.sb_ccc -= sq->length;
T>  		if (sq->data) {
T>  			sctp_m_freem(sq->data);
T>  			sq->data = NULL;
T> @@ -4775,7 +4775,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct
T>  			inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED;
T>  			if (so) {
T>  				SOCK_LOCK(so);
T> -				if (so->so_rcv.sb_cc == 0) {
T> +				if (so->so_rcv.sb_ccc == 0) {
T>  					so->so_state &= ~(SS_ISCONNECTING |
T>  					    SS_ISDISCONNECTING |
T>  					    SS_ISCONFIRMING |
T> Index: sys/netinet/sctp_pcb.h
T> ===================================================================
T> --- sys/netinet/sctp_pcb.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_pcb.h	(.../projects/sendfile)	(revision 266807)
T> @@ -369,7 +369,7 @@ struct sctp_inpcb {
T>  	}     ip_inp;
T>  
T>  
T> -	/* Socket buffer lock protects read_queue and of course sb_cc */
T> +	/* Socket buffer lock protects read_queue and of course sb_ccc */
T>  	struct sctp_readhead read_queue;
T>  
T>  	              LIST_ENTRY(sctp_inpcb) sctp_list;	/* lists all endpoints */
T> Index: sys/netinet/sctp_usrreq.c
T> ===================================================================
T> --- sys/netinet/sctp_usrreq.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_usrreq.c	(.../projects/sendfile)	(revision 266807)
T> @@ -586,7 +586,7 @@ sctp_must_try_again:
T>  	if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) &&
T>  	    (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) {
T>  		if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) ||
T> -		    (so->so_rcv.sb_cc > 0)) {
T> +		    (so->so_rcv.sb_ccc > 0)) {
T>  #ifdef SCTP_LOG_CLOSING
T>  			sctp_log_closing(inp, NULL, 13);
T>  #endif
T> @@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so)
T>  			}
T>  			if (((so->so_options & SO_LINGER) &&
T>  			    (so->so_linger == 0)) ||
T> -			    (so->so_rcv.sb_cc > 0)) {
T> +			    (so->so_rcv.sb_ccc > 0)) {
T>  				if (SCTP_GET_STATE(asoc) !=
T>  				    SCTP_STATE_COOKIE_WAIT) {
T>  					/* Left with Data unread */
T> @@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how)
T>  		inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ;
T>  		SCTP_INP_READ_UNLOCK(inp);
T>  		SCTP_INP_WUNLOCK(inp);
T> -		so->so_rcv.sb_cc = 0;
T> +		so->so_rcv.sb_ccc = 0;
T>  		so->so_rcv.sb_mbcnt = 0;
T>  		so->so_rcv.sb_mb = NULL;
T>  	}
T> @@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how)
T>  		 * First make sure the sb will be happy, we don't use these
T>  		 * except maybe the count
T>  		 */
T> -		so->so_snd.sb_cc = 0;
T> +		so->so_snd.sb_ccc = 0;
T>  		so->so_snd.sb_mbcnt = 0;
T>  		so->so_snd.sb_mb = NULL;
T>  
T> Index: sys/netinet/sctp_structs.h
T> ===================================================================
T> --- sys/netinet/sctp_structs.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_structs.h	(.../projects/sendfile)	(revision 266807)
T> @@ -982,7 +982,7 @@ struct sctp_association {
T>  
T>  	uint32_t total_output_queue_size;
T>  
T> -	uint32_t sb_cc;		/* shadow of sb_cc */
T> +	uint32_t sb_ccc;		/* shadow of sb_ccc */
T>  	uint32_t sb_send_resv;	/* amount reserved on a send */
T>  	uint32_t my_rwnd_control_len;	/* shadow of sb_mbcnt used for rwnd
T>  					 * control */
T> Index: sys/netinet/tcp_input.c
T> ===================================================================
T> --- sys/netinet/tcp_input.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_input.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1729,7 +1729,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
T>  					tcp_timer_activate(tp, TT_REXMT,
T>  						      tp->t_rxtcur);
T>  				sowwakeup(so);
T> -				if (so->so_snd.sb_cc)
T> +				if (sbavail(&so->so_snd))
T>  					(void) tcp_output(tp);
T>  				goto check_delack;
T>  			}
T> @@ -1837,7 +1837,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
T>  					    newsize, so, NULL))
T>  						so->so_rcv.sb_flags &= ~SB_AUTOSIZE;
T>  				m_adj(m, drop_hdrlen);	/* delayed header drop */
T> -				sbappendstream_locked(&so->so_rcv, m);
T> +				sbappendstream_locked(&so->so_rcv, m, 0);
T>  			}
T>  			/* NB: sorwakeup_locked() does an implicit unlock. */
T>  			sorwakeup_locked(so);
T> @@ -2541,7 +2541,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
T>  					 * Otherwise we would send pure ACKs.
T>  					 */
T>  					SOCKBUF_LOCK(&so->so_snd);
T> -					avail = so->so_snd.sb_cc -
T> +					avail = sbavail(&so->so_snd) -
T>  					    (tp->snd_nxt - tp->snd_una);
T>  					SOCKBUF_UNLOCK(&so->so_snd);
T>  					if (avail > 0)
T> @@ -2676,10 +2676,10 @@ process_ACK:
T>  		cc_ack_received(tp, th, CC_ACK);
T>  
T>  		SOCKBUF_LOCK(&so->so_snd);
T> -		if (acked > so->so_snd.sb_cc) {
T> -			tp->snd_wnd -= so->so_snd.sb_cc;
T> +		if (acked > sbavail(&so->so_snd)) {
T> +			tp->snd_wnd -= sbavail(&so->so_snd);
T>  			mfree = sbcut_locked(&so->so_snd,
T> -			    (int)so->so_snd.sb_cc);
T> +			    (int)sbavail(&so->so_snd));
T>  			ourfinisacked = 1;
T>  		} else {
T>  			mfree = sbcut_locked(&so->so_snd, acked);
T> @@ -2805,7 +2805,7 @@ step6:
T>  		 * actually wanting to send this much urgent data.
T>  		 */
T>  		SOCKBUF_LOCK(&so->so_rcv);
T> -		if (th->th_urp + so->so_rcv.sb_cc > sb_max) {
T> +		if (th->th_urp + sbavail(&so->so_rcv) > sb_max) {
T>  			th->th_urp = 0;			/* XXX */
T>  			thflags &= ~TH_URG;		/* XXX */
T>  			SOCKBUF_UNLOCK(&so->so_rcv);	/* XXX */
T> @@ -2827,7 +2827,7 @@ step6:
T>  		 */
T>  		if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) {
T>  			tp->rcv_up = th->th_seq + th->th_urp;
T> -			so->so_oobmark = so->so_rcv.sb_cc +
T> +			so->so_oobmark = sbavail(&so->so_rcv) +
T>  			    (tp->rcv_up - tp->rcv_nxt) - 1;
T>  			if (so->so_oobmark == 0)
T>  				so->so_rcv.sb_state |= SBS_RCVATMARK;
T> @@ -2897,7 +2897,7 @@ dodata:							/* XXX */
T>  			if (so->so_rcv.sb_state & SBS_CANTRCVMORE)
T>  				m_freem(m);
T>  			else
T> -				sbappendstream_locked(&so->so_rcv, m);
T> +				sbappendstream_locked(&so->so_rcv, m, 0);
T>  			/* NB: sorwakeup_locked() does an implicit unlock. */
T>  			sorwakeup_locked(so);
T>  		} else {
T> Index: sys/netinet/sctp_input.c
T> ===================================================================
T> --- sys/netinet/sctp_input.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_input.c	(.../projects/sendfile)	(revision 266807)
T> @@ -1042,7 +1042,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_
T>  	if (stcb->sctp_socket) {
T>  		if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
T>  		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
T> -			stcb->sctp_socket->so_snd.sb_cc = 0;
T> +			stcb->sctp_socket->so_snd.sb_ccc = 0;
T>  		}
T>  		sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
T>  	}
T> Index: sys/netinet/sctp_var.h
T> ===================================================================
T> --- sys/netinet/sctp_var.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_var.h	(.../projects/sendfile)	(revision 266807)
T> @@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs;
T>  
T>  #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND))
T>  
T> -#define	sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0))
T> +#define	sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0))
T>  
T> -#define	sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0))
T> +#define	sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0))
T>  
T>  #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0)
T>  
T> @@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs;
T>  }
T>  
T>  #define sctp_sbfree(ctl, stcb, sb, m) { \
T> -	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \
T> +	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \
T>  	SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \
T>  	if (((ctl)->do_not_ref_stcb == 0) && stcb) {\
T> -		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \
T> +		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \
T>  		SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \
T>  	} \
T>  	if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \
T> @@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs;
T>  }
T>  
T>  #define sctp_sballoc(stcb, sb, m) { \
T> -	atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \
T> +	atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \
T>  	atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \
T>  	if (stcb) { \
T> -		atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \
T> +		atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \
T>  		atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \
T>  	} \
T>  	if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \
T> Index: sys/netinet/sctp_output.c
T> ===================================================================
T> --- sys/netinet/sctp_output.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctp_output.c	(.../projects/sendfile)	(revision 266807)
T> @@ -7104,7 +7104,7 @@ one_more_time:
T>  			if ((stcb->sctp_socket != NULL) && \
T>  			    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
T>  			    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) {
T> -				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length);
T> +				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length);
T>  			}
T>  			if (sp->data) {
T>  				sctp_m_freem(sp->data);
T> @@ -11382,7 +11382,7 @@ jump_out:
T>  		drp->current_onq = htonl(asoc->size_on_reasm_queue +
T>  		    asoc->size_on_all_streams +
T>  		    asoc->my_rwnd_control_len +
T> -		    stcb->sctp_socket->so_rcv.sb_cc);
T> +		    stcb->sctp_socket->so_rcv.sb_ccc);
T>  	} else {
T>  		/*-
T>  		 * If my rwnd is 0, possibly from mbuf depletion as well as
T> Index: sys/netinet/tcp_usrreq.c
T> ===================================================================
T> --- sys/netinet/tcp_usrreq.c	(.../head)	(revision 266804)
T> +++ sys/netinet/tcp_usrreq.c	(.../projects/sendfile)	(revision 266807)
T> @@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  		m_freem(control);	/* empty control, just free it */
T>  	}
T>  	if (!(flags & PRUS_OOB)) {
T> -		sbappendstream(&so->so_snd, m);
T> +		sbappendstream(&so->so_snd, m, flags);
T>  		if (nam && tp->t_state < TCPS_SYN_SENT) {
T>  			/*
T>  			 * Do implied connect if not yet connected,
T> @@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  			socantsendmore(so);
T>  			tcp_usrclosed(tp);
T>  		}
T> -		if (!(inp->inp_flags & INP_DROPPED)) {
T> +		if (!(inp->inp_flags & INP_DROPPED) &&
T> +		    !(flags & PRUS_NOTREADY)) {
T>  			if (flags & PRUS_MORETOCOME)
T>  				tp->t_flags |= TF_MORETOCOME;
T>  			error = tcp_output(tp);
T> @@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  		 * of data past the urgent section.
T>  		 * Otherwise, snd_up should be one lower.
T>  		 */
T> -		sbappendstream_locked(&so->so_snd, m);
T> +		sbappendstream_locked(&so->so_snd, m, flags);
T>  		SOCKBUF_UNLOCK(&so->so_snd);
T>  		if (nam && tp->t_state < TCPS_SYN_SENT) {
T>  			/*
T> @@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct
T>  			tp->snd_wnd = TTCP_CLIENT_SND_WND;
T>  			tcp_mss(tp, -1);
T>  		}
T> -		tp->snd_up = tp->snd_una + so->so_snd.sb_cc;
T> -		tp->t_flags |= TF_FORCEDATA;
T> -		error = tcp_output(tp);
T> -		tp->t_flags &= ~TF_FORCEDATA;
T> +		tp->snd_up = tp->snd_una + sbavail(&so->so_snd);
T> +		if (!(flags & PRUS_NOTREADY)) {
T> +			tp->t_flags |= TF_FORCEDATA;
T> +			error = tcp_output(tp);
T> +			tp->t_flags &= ~TF_FORCEDATA;
T> +		}
T>  	}
T>  out:
T>  	TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB :
T> Index: sys/netinet/accf_dns.c
T> ===================================================================
T> --- sys/netinet/accf_dns.c	(.../head)	(revision 266804)
T> +++ sys/netinet/accf_dns.c	(.../projects/sendfile)	(revision 266807)
T> @@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla
T>  	struct sockbuf *sb = &so->so_rcv;
T>  
T>  	/* If the socket is full, we're ready. */
T> -	if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax)
T> +	if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax)
T>  		goto ready;
T>  
T>  	/* Check to see if we have a request. */
T> @@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) {
T>  	unsigned long packlen;
T>  	struct packet q, *p = &q;
T>  
T> -	if (sb->sb_cc < 2)
T> +	if (sbavail(sb) < 2)
T>  		return DNS_WAIT;
T>  
T>  	q.m = sb->sb_mb;
T> @@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) {
T>  	q.n = q.m->m_nextpkt;
T>  	q.moff = 0;
T>  	q.offset = 0;
T> -	q.len = sb->sb_cc;
T> +	q.len = sbavail(sb);
T>  
T>  	GET16(p, packlen);
T>  	if (packlen + 2 > q.len)
T> Index: sys/netinet/sctputil.c
T> ===================================================================
T> --- sys/netinet/sctputil.c	(.../head)	(revision 266804)
T> +++ sys/netinet/sctputil.c	(.../projects/sendfile)	(revision 266807)
T> @@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st
T>  	struct sctp_cwnd_log sctp_clog;
T>  
T>  	sctp_clog.x.sb.stcb = stcb;
T> -	sctp_clog.x.sb.so_sbcc = sb->sb_cc;
T> +	sctp_clog.x.sb.so_sbcc = sb->sb_ccc;
T>  	if (stcb)
T> -		sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc;
T> +		sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc;
T>  	else
T>  		sctp_clog.x.sb.stcb_sbcc = 0;
T>  	sctp_clog.x.sb.incr = incr;
T> @@ -4356,7 +4356,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp,
T>  {
T>  	/*
T>  	 * Here we must place the control on the end of the socket read
T> -	 * queue AND increment sb_cc so that select will work properly on
T> +	 * queue AND increment sb_ccc so that select will work properly on
T>  	 * read.
T>  	 */
T>  	struct mbuf *m, *prev = NULL;
T> @@ -4482,7 +4482,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp,
T>  	 * the reassembly queue.
T>  	 * 
T>  	 * If PDAPI this means we need to add m to the end of the data.
T> -	 * Increase the length in the control AND increment the sb_cc.
T> +	 * Increase the length in the control AND increment the sb_ccc.
T>  	 * Otherwise sb is NULL and all we need to do is put it at the end
T>  	 * of the mbuf chain.
T>  	 */
T> @@ -4694,10 +4694,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s
T>  
T>  	if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) ||
T>  	    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) {
T> -		if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) {
T> -			stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size;
T> +		if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) {
T> +			stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size;
T>  		} else {
T> -			stcb->sctp_socket->so_snd.sb_cc = 0;
T> +			stcb->sctp_socket->so_snd.sb_ccc = 0;
T>  
T>  		}
T>  	}
T> @@ -5232,11 +5232,11 @@ sctp_sorecvmsg(struct socket *so,
T>  	in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR);
T>  	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) {
T>  		sctp_misc_ints(SCTP_SORECV_ENTER,
T> -		    rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid);
T> +		    rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid);
T>  	}
T>  	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) {
T>  		sctp_misc_ints(SCTP_SORECV_ENTERPL,
T> -		    rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid);
T> +		    rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid);
T>  	}
T>  	error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0));
T>  	if (error) {
T> @@ -5255,7 +5255,7 @@ restart_nosblocks:
T>  	    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) {
T>  		goto out;
T>  	}
T> -	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) {
T> +	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) {
T>  		if (so->so_error) {
T>  			error = so->so_error;
T>  			if ((in_flags & MSG_PEEK) == 0)
T> @@ -5262,7 +5262,7 @@ restart_nosblocks:
T>  				so->so_error = 0;
T>  			goto out;
T>  		} else {
T> -			if (so->so_rcv.sb_cc == 0) {
T> +			if (so->so_rcv.sb_ccc == 0) {
T>  				/* indicate EOF */
T>  				error = 0;
T>  				goto out;
T> @@ -5269,9 +5269,9 @@ restart_nosblocks:
T>  			}
T>  		}
T>  	}
T> -	if ((so->so_rcv.sb_cc <= held_length) && block_allowed) {
T> +	if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) {
T>  		/* we need to wait for data */
T> -		if ((so->so_rcv.sb_cc == 0) &&
T> +		if ((so->so_rcv.sb_ccc == 0) &&
T>  		    ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
T>  		    (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) {
T>  			if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) {
T> @@ -5307,7 +5307,7 @@ restart_nosblocks:
T>  		}
T>  		held_length = 0;
T>  		goto restart_nosblocks;
T> -	} else if (so->so_rcv.sb_cc == 0) {
T> +	} else if (so->so_rcv.sb_ccc == 0) {
T>  		if (so->so_error) {
T>  			error = so->so_error;
T>  			if ((in_flags & MSG_PEEK) == 0)
T> @@ -5364,11 +5364,11 @@ restart_nosblocks:
T>  			SCTP_INP_READ_LOCK(inp);
T>  		}
T>  		control = TAILQ_FIRST(&inp->read_queue);
T> -		if ((control == NULL) && (so->so_rcv.sb_cc != 0)) {
T> +		if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) {
T>  #ifdef INVARIANTS
T>  			panic("Huh, its non zero and nothing on control?");
T>  #endif
T> -			so->so_rcv.sb_cc = 0;
T> +			so->so_rcv.sb_ccc = 0;
T>  		}
T>  		SCTP_INP_READ_UNLOCK(inp);
T>  		hold_rlock = 0;
T> @@ -5489,11 +5489,11 @@ restart_nosblocks:
T>  		}
T>  		/*
T>  		 * if we reach here, not suitable replacement is available
T> -		 * <or> fragment interleave is NOT on. So stuff the sb_cc
T> +		 * <or> fragment interleave is NOT on. So stuff the sb_ccc
T>  		 * into the our held count, and its time to sleep again.
T>  		 */
T> -		held_length = so->so_rcv.sb_cc;
T> -		control->held_length = so->so_rcv.sb_cc;
T> +		held_length = so->so_rcv.sb_ccc;
T> +		control->held_length = so->so_rcv.sb_ccc;
T>  		goto restart;
T>  	}
T>  	/* Clear the held length since there is something to read */
T> @@ -5790,10 +5790,10 @@ get_more_data:
T>  					if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) {
T>  						sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len);
T>  					}
T> -					atomic_subtract_int(&so->so_rcv.sb_cc, cp_len);
T> +					atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len);
T>  					if ((control->do_not_ref_stcb == 0) &&
T>  					    stcb) {
T> -						atomic_subtract_int(&stcb->asoc.sb_cc, cp_len);
T> +						atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len);
T>  					}
T>  					copied_so_far += cp_len;
T>  					freed_so_far += cp_len;
T> @@ -5938,7 +5938,7 @@ wait_some_more:
T>  		    (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) {
T>  			goto release;
T>  		}
T> -		if (so->so_rcv.sb_cc <= control->held_length) {
T> +		if (so->so_rcv.sb_ccc <= control->held_length) {
T>  			error = sbwait(&so->so_rcv);
T>  			if (error) {
T>  				goto release;
T> @@ -5965,8 +5965,8 @@ wait_some_more:
T>  				}
T>  				goto done_with_control;
T>  			}
T> -			if (so->so_rcv.sb_cc > held_length) {
T> -				control->held_length = so->so_rcv.sb_cc;
T> +			if (so->so_rcv.sb_ccc > held_length) {
T> +				control->held_length = so->so_rcv.sb_ccc;
T>  				held_length = 0;
T>  			}
T>  			goto wait_some_more;
T> @@ -6113,13 +6113,13 @@ out:
T>  			    freed_so_far,
T>  			    ((uio) ? (slen - uio->uio_resid) : slen),
T>  			    stcb->asoc.my_rwnd,
T> -			    so->so_rcv.sb_cc);
T> +			    so->so_rcv.sb_ccc);
T>  		} else {
T>  			sctp_misc_ints(SCTP_SORECV_DONE,
T>  			    freed_so_far,
T>  			    ((uio) ? (slen - uio->uio_resid) : slen),
T>  			    0,
T> -			    so->so_rcv.sb_cc);
T> +			    so->so_rcv.sb_ccc);
T>  		}
T>  	}
T>  stage_left:
T> Index: sys/netinet/sctputil.h
T> ===================================================================
T> --- sys/netinet/sctputil.h	(.../head)	(revision 266804)
T> +++ sys/netinet/sctputil.h	(.../projects/sendfile)	(revision 266807)
T> @@ -284,10 +284,10 @@ do { \
T>  		} \
T>     	        if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
T>  	            (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
T> -			if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \
T> -				atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \
T> +			if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \
T> +				atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \
T>  			} else { \
T> -				stcb->sctp_socket->so_snd.sb_cc = 0; \
T> +				stcb->sctp_socket->so_snd.sb_ccc = 0; \
T>  			} \
T>  		} \
T>          } \
T> @@ -305,10 +305,10 @@ do { \
T>  		} \
T>     	        if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
T>  	            (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
T> -			if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \
T> -				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \
T> +			if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \
T> +				atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \
T>  			} else { \
T> -				stcb->sctp_socket->so_snd.sb_cc = 0; \
T> +				stcb->sctp_socket->so_snd.sb_ccc = 0; \
T>  			} \
T>  		} \
T>          } \
T> @@ -320,7 +320,7 @@ do { \
T>  	if ((stcb->sctp_socket != NULL) && \
T>  	    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \
T>  	     (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \
T> -		atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \
T> +		atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \
T>  	} \
T>  } while (0)
T>  
T> Index: usr.bin/bluetooth/btsockstat/btsockstat.c
T> ===================================================================
T> --- usr.bin/bluetooth/btsockstat/btsockstat.c	(.../head)	(revision 266804)
T> +++ usr.bin/bluetooth/btsockstat/btsockstat.c	(.../projects/sendfile)	(revision 266807)
T> @@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr)
T>  			(unsigned long) pcb.so,
T>  			(unsigned long) this,
T>  			pcb.flags,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			pcb.addr.hci_node);
T>  	}
T>  } /* hcirawpr */
T> @@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr)
T>  "%-8lx %-8lx %6d %6d %-17.17s\n",
T>  			(unsigned long) pcb.so,
T>  			(unsigned long) this,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			bdaddrpr(&pcb.src, NULL, 0));
T>  	}
T>  } /* l2caprawpr */
T> @@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr)
T>  		fprintf(stdout,
T>  "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n",
T>  			(unsigned long) this,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			bdaddrpr(&pcb.src, local, sizeof(local)),
T>  			pcb.psm,
T>  			bdaddrpr(&pcb.dst, remote, sizeof(remote)),
T> @@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr)
T>  		fprintf(stdout,
T>  "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n",
T>  			(unsigned long) this,
T> -			so.so_rcv.sb_cc,
T> -			so.so_snd.sb_cc,
T> +			so.so_rcv.sb_ccc,
T> +			so.so_snd.sb_ccc,
T>  			bdaddrpr(&pcb.src, local, sizeof(local)),
T>  			bdaddrpr(&pcb.dst, remote, sizeof(remote)),
T>  			pcb.channel,
T> Index: usr.bin/systat/netstat.c
T> ===================================================================
T> --- usr.bin/systat/netstat.c	(.../head)	(revision 266804)
T> +++ usr.bin/systat/netstat.c	(.../projects/sendfile)	(revision 266807)
T> @@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in
T>  	struct netinfo *p;
T>  
T>  	if ((p = enter(inp, state, proto)) != NULL) {
T> -		p->ni_rcvcc = so->so_rcv.sb_cc;
T> -		p->ni_sndcc = so->so_snd.sb_cc;
T> +		p->ni_rcvcc = so->so_rcv.sb_ccc;
T> +		p->ni_sndcc = so->so_snd.sb_ccc;
T>  	}
T>  }
T>  
T> Index: usr.bin/netstat/netgraph.c
T> ===================================================================
T> --- usr.bin/netstat/netgraph.c	(.../head)	(revision 266804)
T> +++ usr.bin/netstat/netgraph.c	(.../projects/sendfile)	(revision 266807)
T> @@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int
T>  		if (Aflag)
T>  			printf("%8lx ", (u_long) this);
T>  		printf("%-5.5s %6u %6u ",
T> -		    name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc);
T> +		    name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc);
T>  
T>  		/* Get info on associated node */
T>  		if (ngpcb.node_id == 0 || csock == -1)
T> Index: usr.bin/netstat/unix.c
T> ===================================================================
T> --- usr.bin/netstat/unix.c	(.../head)	(revision 266804)
T> +++ usr.bin/netstat/unix.c	(.../projects/sendfile)	(revision 266807)
T> @@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket *
T>  	} else {
T>  		printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx",
T>  		    (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc,
T> -		    so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn,
T> +		    so->so_snd.sb_cc, (long)unp->unp_vnode,
T> +		    (long)unp->unp_conn,
T>  		    (long)LIST_FIRST(&unp->unp_refs),
T>  		    (long)LIST_NEXT(unp, unp_reflink));
T>  	}
T> Index: usr.bin/netstat/inet.c
T> ===================================================================
T> --- usr.bin/netstat/inet.c	(.../head)	(revision 266804)
T> +++ usr.bin/netstat/inet.c	(.../projects/sendfile)	(revision 266807)
T> @@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char *
T>  static void
T>  sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb)
T>  {
T> -	xsb->sb_cc = sb->sb_cc;
T> +	xsb->sb_cc = sb->sb_ccc;
T>  	xsb->sb_hiwat = sb->sb_hiwat;
T>  	xsb->sb_mbcnt = sb->sb_mbcnt;
T>  	xsb->sb_mcnt = sb->sb_mcnt;
T> @@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int
T>  				printf("%6u %6u %6u ", tp->t_sndrexmitpack,
T>  				       tp->t_rcvoopack, tp->t_sndzerowin);
T>  		} else {
T> -			printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc);
T> +			printf("%6u %6u ",
T> +			    so->so_rcv.sb_cc, so->so_snd.sb_cc);
T>  		}
T>  		if (numeric_port) {
T>  			if (inp->inp_vflag & INP_IPV4) {

T> _______________________________________________
T> freebsd-arch at freebsd.org mailing list
T> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
T> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"


-- 
Totus tuus, Glebius.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sendfile.diff
Type: text/x-diff
Size: 123147 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20140831/1c7fba68/attachment-0001.diff>


More information about the freebsd-arch mailing list