Re: epair and vnet jail loose connection.
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 10 Mar 2022 14:31:33 UTC
On 10/03/2022 13:37, Wolfgang Zenker wrote: > Hi Kristof, > > Am Thu, Mar 10, 2022 at 12:44:00PM +0100 schrieb Kristof Provost: >> On 10 Mar 2022, at 10:13, Johan Hendriks wrote: >>> On 10/03/2022 08:54, Patrick M. Hausen wrote: >>>> Hi Johan, >>>> >>>> we experience the same on 13.1-PRERELEASE. Currently trying to collect some evidence >>>> (dtrace) to send to Kristof Provost who was so kind to assist. We are hit by the problem >>>> in production in 12-24 hour intervals. Have not done any artificial load tests, yet. >>>> >>>> May I ask you to run this dtrace script while at least one jail is disconnected and while >>>> traffic is present that is trying to reach the jail? If you can afford to do that in production (?) >>>> that would be great. Forward to Kristof (kp@), please. >>>> >>>> Thanks and kind regards >>>> Patrick >>>> ---------- >>>> #!/usr/sbin/dtrace -s >>>> >>>> BEGIN >>>> { >>>> self->in_menq = 0; >>>> } >>>> >>>> fbt:if_epair:epair_menq:entry >>>> { >>>> self->in_menq = 1; >>>> printf("In epair_menq"); >>>> } >>>> >>>> fbt:if_epair:epair_menq:return >>>> / self->in_menq == 1 / >>>> { >>>> self->in_menq = 0; >>>> printf("Leave epair_menq"); >>>> } >>>> >>>> fbt:kernel:taskqueue_enqueue:entry >>>> / self->in_menq == 1 / >>>> { >>>> printf("Enqueue task"); >>>> >>>> } >>>> >>>> fbt:if_epair:epair_tx_start_deferred:entry >>>> { >>>> printf("epair_tx_start_deferred"); >>>> } >>>> ---------- >>>> >>> I was asked the above, so hereby the output of that command. >>> I did do a hey -h2 -n 10 -c 10 -z 60s https://wp.test.nl to that machine and in the 60 seconds the jail became unresponsive. Then i did run the dtrace.sh script above like so /root/bin/dtrace.sh > /root/dtrace_output >>> >>> I hope this helps, if you need anything please let me know. Also root access is possible if you want. That way you do not have to create a test environment. >> Were there other epair interfaces running at this time, with active traffic? >> The dtrace output appears to show that the appropriate callouts (to epair_tx_start_deferred()) are getting through, so I’d expect traffic to be flowing. > There is one second jail using epair on that system, using the same > bridge as well. This second jail is a low-traffic system, it is unlikely > but possible that there was some traffic during that time. > In all previous cases this second jail continued to be reachable all > the time. > > Wolfgang > I use 13-STABLE from 01-02-2022 this year and i can not replicate this, i step ahead a week and do a rebuild and try again.