Tor on FreeBSD Performance issues
Julian Wissmann
julianwissmann at gmail.com
Sun Feb 12 15:18:34 UTC 2012
Hi
>
> On 11 Feb 2012, at 00:06, Steven Murdoch wrote:
>
>> On 10 Feb 2012, at 22:22, Robert N. M. Watson wrote:
>>> I wonder if we're looking at some sort of different in socket buffer tuning between Linux and FreeBSD that is leading to better link utilisation under this workload. Both FreeBSD and Linux auto-tune socket buffer sizes, but I'm not sure if their policies for enabling/etc auto-tuning differ. Do we know if Tor fixes socket buffer sizes in such a way that it might lead to FreeBSD disabling auto-tuning?
>>
>> If ConstrainedSockets is set to 1 (it defaults to 0), then Tor will "setsockopt(sock, SOL_SOCKET, SO_SNDBUF" and "setsockopt(sock, SOL_SOCKET, SO_RCVBUF" to ConstrainedSockSize (defaults 8192). Otherwise I don't see any fiddling with buffer size. So I'd first confirm that ConstrainedSockets is set to zero, and perhaps try experimenting with it on for different values of ConstrainedSockSize.
> In FreeBSD, I believe the current policy is that any TCP socket that doesn't have a socket option specifically set will be auto-tuning. So it's likely that, as long as ConstrainedSockSize isn't set, auto-tuning is enabled.
This is set to zero in Tor.
>
>>> I'm a bit surprised by the out-of-order packet count -- is that typical of a Tor workload, and can we compare similar statistics on other nodes there? This could also be a symptom of TCP reassembly queue issues. Lawrence: did we get the fixes in place there to do with the bounded reassembly queue length, and/or are there any workarounds for that issue? Is it easy to tell if we're hitting it in practice?
>>
>> I can't think of any inherent reason for excessive out-of-order packets, as the host TCP stack is used by all Tor nodes currently. It could be some network connections from users are bad (we have plenty of dial-up users).
>
> I guess what I'm wondering about is relative percentages. Out-of-order packets can also arise as a result of network stack bugs, and might explain a lower aggregate bandwidth. The netstat -Q options I saw in the forwarded e-mail suggest that the scenarios that could lead to this aren't present, but since it stands out, it would be worth trying to explain just to convince ourselves it's not a stack bug.
As we have two boxes with identical configuration in the same datacenter here I can give some Linux Output, too:
# netstat -s
Ip:
1099780169 total packets received
0 forwarded
0 incoming packets discarded
2062308427 incoming packets delivered
2800933295 requests sent out
694 outgoing packets dropped
798042 fragments dropped after timeout
143378847 reassemblies required
45697700 packets reassembled ok
18522117 packet reassembles failed
1070 fragments received ok
761 fragments failed
28174 fragments created
Icmp:
92792968 ICMP messages received
18458681 input ICMP message failed.
ICMP input histogram:
destination unreachable: 73204262
timeout in transit: 6996342
source quenches: 813143
redirects: 9100882
echo requests: 1646656
echo replies: 5
2005869 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 359208
echo request: 5
echo replies: 1646656
IcmpMsg:
InType0: 5
InType3: 73204262
InType4: 813143
InType5: 9100882
InType8: 1646656
InType11: 6996342
OutType0: 1646656
OutType3: 359208
OutType8: 5
Tcp:
4134119965 active connections openings
275823710 passive connection openings
2002550589 failed connection attempts
199749970 connection resets received
31931 connections established
1839369825 segments received
3631158795 segments send out
3353305069 segments retransmited
2152248 bad segments received.
237858281 resets sent
Udp:
129942286 packets received
203329 packets to unknown port received.
0 packet receive errors
109523321 packets sent
UdpLite:
TcpExt:
7088 SYN cookies sent
15275 SYN cookies received
3196797 invalid SYN cookies received
1093456 resets received for embryonic SYN_RECV sockets
36073572 packets pruned from receive queue because of socket buffer overrun
77060 packets pruned from receive queue
232 packets dropped from out-of-order queue because of socket buffer overrun
362884 ICMP packets dropped because they were out-of-window
85 ICMP packets dropped because socket was locked
673831896 TCP sockets finished time wait in fast timer
48600 time wait sockets recycled by time stamp
2013223394 delayed acks sent
3477567 delayed acks further delayed because of locked socket
Quick ack mode was activated 440274027 times
35711291 times the listen queue of a socket overflowed
35711291 SYNs to LISTEN sockets dropped
457 packets directly queued to recvmsg prequeue.
1460 bytes directly in process context from backlog
48211 bytes directly received in process context from prequeue
1494466591 packet headers predicted
33 packets header predicted and directly queued to user
4257229715 acknowledgments not containing data payload received
740819251 predicted acknowledgments
442309 times recovered from packet loss due to fast retransmit
197193098 times recovered from packet loss by selective acknowledgements
494378 bad SACK blocks received
Detected reordering 221053 times using FACK
Detected reordering 1053064 times using SACK
Detected reordering 72059 times using reno fast retransmit
Detected reordering 4265 times using time stamp
336672 congestion windows fully recovered without slow start
356482 congestion windows partially recovered using Hoe heuristic
41059770 congestion windows recovered without slow start by DSACK
54306977 congestion windows recovered without slow start after partial ack
245685510 TCP data loss events
TCPLostRetransmit: 7881258
421631 timeouts after reno fast retransmit
70726251 timeouts after SACK recovery
26797894 timeouts in loss state
349218987 fast retransmits
19632788 forward retransmits
224201891 retransmits in slow start
2441482671 other TCP timeouts
220051 classic Reno fast retransmits failed
22663942 SACK retransmits failed
160105897 packets collapsed in receive queue due to low socket buffer
568326755 DSACKs sent for old packets
12316261 DSACKs sent for out of order packets
157800118 DSACKs received
1008695 DSACKs for out of order packets received
2043 connections reset due to unexpected SYN
48512275 connections reset due to unexpected data
15085625 connections reset due to early user close
1702109944 connections aborted due to timeout
TCPSACKDiscard: 231850
TCPDSACKIgnoredOld: 99417376
TCPDSACKIgnoredNoUndo: 33053947
TCPSpuriousRTOs: 5163955
TCPMD5Unexpected: 8
TCPSackShifted: 290984575
TCPSackMerged: 613203726
TCPSackShiftFallback: 747049207
IpExt:
InBcastPkts: 12617896
OutBcastPkts: 1456356
InOctets: -1096131435
OutOctets: -1263483369
InBcastOctets: -2144923256
OutBcastOctets: 187483424
>
>>> On the other hand, I think Steven had mentioned that Tor has changed how it does exit node load distribution to better take into account realised rather than advertised bandwidth. If that's the case, you might get larger systemic effects causing feedback: if you offer slightly less throughput then you get proportionally less traffic. This is something I can ask Steven about on Monday.
>>
>> There is active probing of capacity, which then is used to adjust the weighting factors that clients use.
>
> So there is a chance that the effect we're seeing has to do with clients not being directed to the host, perhaps due to larger systemic issues, or the FreeBSD box responding less well to probing and therefore being assigned less work by Tor as a whole. Are there any tools for diagnosing these sorts of interactions in Tor, or fixing elements of the algorithm to allow experiments with capacity to be done more easily? We can treat this as a FreeBSD stack problem in isolation, but in as much as we can control for effects like that, it would be useful.
>
> There's a non-trivial possibility that we're simply missing a workaround for known-bad Broadcom hardware, as well, so it would be worth our taking a glance at the pciconf -lv output describing the card so we can compare Linux driver workarounds with FreeBSD driver workarounds, and make sure we have them all. If I recall correctly, that silicon is not known for its correctness, so failing to disable some hardware feature could have significant effect.
#pciconf -lv
bge0 at pci0:32:0:0: class=0x020000 card=0x705d103c chip=0x165b14e4 rev=0x10 hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme BCM5723 Gigabit Ethernet PCIe'
class = network
subclass = ethernet
bge1 at pci0:34:0:0: class=0x020000 card=0x705d103c chip=0x165b14e4 rev=0x10 hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme BCM5723 Gigabit Ethernet PCIe'
class = network
subclass = ethernet
>
>>> Could someone remind me if Tor is multi-threaded these days, and if so, how socket I/O is distributed over threads?
>>
>> I believe that Tor is single-threaded for the purposes of I/O. Some server operators with fat pipes have had good experiences of running several Tor instances in parallel on different ports to increase bandwidth utilisation.
>
> It would be good to confirm the configuration in this particular case to make sure we understand it. It would also be good to know if the main I/O thread in Tor is saturating the core it's running on -- if so, we might be looking at some poor behaviour relating to, for example, frequent timestamp checking, which is currently more expensive on FreeBSD than Linux.
We have two Tor processes running. It still only uses multi-threading for crypto work, but not even for all of that (only Onionskins). On polling I actually got both Tor Processes to nearly saturate the cores they were on, but now that I disabled polling and went back to 1000HZ I don't get there. Currently one process is at 60% WCPU, the other one being at about 50%.
As It's been asked: Yes, it is a FreeBSD 9 Box and no, there is no net.inet.tcp.inflight.enable.
Also libevent is using kqueue and I've tried patching both Tor and libevent to use CLOCK_MONOTONIC_FAST and CLOCK_REALTIME_FAST, as has been pointed out by Alexander.
If by flow cache you mean net.inet.flowtable, then I believe that the sysctl won't show up unless I activate IP Forwarding, which I have not (and I don't have the net.inet.flowtable available).
Also some sysctls as requested:
kern.ipc.somaxconn=16384
kern.ipc.maxsockets=204800
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=200000
net.inet.tcp.recvbuf_max=10485760
net.inet.tcp.recvbuf_inc=65535
net.inet.tcp.sendbuf_max=10485760
net.inet.tcp.sendbuf_inc=65535
net.inet.tcp.sendspace=10485760
net.inet.tcp.recvspace=10485760
net.inet.tcp.delayed_ack=0
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
net.inet.ip.rtexpire=2
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=1024
net.inet.tcp.rfc1323=0
net.inet.tcp.maxtcptw=200000
net.inet.ip.intr_queue_maxlen=4096
net.inet.tcp.ecn.enable=1 (net.inet.ip.intr_queue_drops is zero)
net.inet.ip.portrange.reservedlow=0
net.inet.ip.portrange.reservedhigh=0
net.inet.ip.portrange.hifirst=1024
security.mac.portacl.enabled=1
security.mac.portacl.suser_exempt=1
security.mac.portacl.port_high=1023
security.mac.portacl.rules=uid:80:tcp:80
security.mac.portacl.rules=uid:256:tcp:443
Thanks for the replies and all of this information.
Julian
More information about the freebsd-performance
mailing list