Re: compressed TIME-WAIT to be decomissioned
- Reply: Gleb Smirnoff : "Re: compressed TIME-WAIT to be decomissioned"
- In reply to: Gleb Smirnoff : "compressed TIME-WAIT to be decomissioned"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 12 Jan 2022 22:01:51 UTC
Removed current@ given your comment below. On 12 Jan 2022, at 13:48, Gleb Smirnoff wrote: > Hi! > > [crossposted to current@, but let's keep discussion at net@] > > I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and > Igor Sysoev (author of nginx). Now posting for wider discussion. > > TLDR: struct tcptw shall be decomissioned > > Longer version covers three topics: why does tcptw exist? why is it no > longer necessary? what would we get removing it? > > Why does struct tcptw exist? > > When TCP connection goes to TIME-WAIT state, it can only retransmit > the very last ACK, thus doesn't need all of the control data in the kernel. > However, we are required to keep it in memory for certain amount of time > (2*MSL). So, let's save memory: free the socket, free the tcpcb and > leave only inpcb that will point at small tcptw (much smaller than tcpcb) > that holds enough info to retransmit the last ACK. This was done in > early 2003, see 340c35de6a2. > > What was different in 2003 compared to 2022? > > * First of all, internet servers were running i386 with only 2 Gb of KVA > space. Unlike today, they were memory constrained in the first place, not > CPU bound like they are today. > > * Many of HTTP connections were made by older browsers, which were not able > to use persistent HTTP connections. Those browsers that could, would > recycle connections more often, then today. Default timeouts in Apache > for persistent connections were short. So, the ratio of connections > in TIME-WAIT compared to live connections was much bigger than today. > Here is sample data from 2008 provided to me by Igor Sysoev: > > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > tcpcb: 728, 163840, 22938, 72722, 13029632, 0 > tcptw: 88, 163842, 10253, 72949, 2447928, 0 > > We see that TIME-WAITs are ~ 50% of live connections. > > Today I see that TIME-WAITs are ~ 1% of connections. My data is biased > here, since I'm looking at servers that do mostly video streaming. I'd > be grateful if anybody replies to this email with some other modern data > on ratio between tcpcb and tcptw allocations. > > * The Internet bandwidth was lower and thus average size of HTTP object > much smaller. That made the average send socket buffer size much smaller > than today. Note that TCP socket buffers autosizing came in 2009 only. > This means that today most significant portion of kernel memory consumed > by an average TCP connection is the send socket buffer, and > socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb to > tcptw we are saving a fraction of a fraction of memory consumed by average > connection. > > * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT? > In 71d2d5adfe1 I added some stats on usage of tcptw and experimented a bit > with lowering net.inet.tcp.msl. It appeared that lowering it down three > times doesn't have statistically significant effect on TIME-WAIT use stats. > This means that the already miniscule number of TIME-WAIT connection on a > modern HTTP server can be lowered 3 times more. Feel free to lower > net.inet.tcp.msl and do your own measurements with > 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results. The origin of the 2*MSL is pretty old and from a different type of network, but, my understanding of your proposal is not a change to this value anyway, is that correct? The removal of tcptw is a separate issue, if I understand you correctly. > Ok, now what would removal give us? > > * One less alloc/free during socket lifetime (immediately). > * Reduced code complexity. inp->inp_ppcb always can be dereferenced as tcpcb. > Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away (eventually). > * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS. > Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven connection > may transition to TIME-WAIT, so we can't use tcpcb. Now we would be able to. > So, for non TCP connections memory footprint shrinks (with following changes). > * Embedding inpcb into protocols cb. An inpcb becomes one piece of memory with > tcpcb. One more less alloc/free during socket lifetime. Reduced code > complexity, since now inpcb == tcpb (following changes). > > How much memory are we going to lose? > > (kgdb) p tcpcb_zone->uz_keg->uk_rsize > $5 = 1064 > (kgdb) p tcptw_zone->uz_keg->uk_rsize > $6 = 72 > (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize > $8 = 424 > > After change a connection in TIME-WAIT would consume 424+1064 bytes instead > of 424+72. Multiply that by expected number of connections in TIME-WAIT on > your machine. > > Comments welcome. This all seems fine and I'm interested to see the proposed patch. Even the smallest embedded machines that FreeBSD runs on without modification (i.e. just install/run) have plenty of memory at this point. If someone really wants to create a very small, FreeBSD based, web server then they'll care but they can probably come up with another way to handle their memory needs. Best, George