rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Andreas Longwitz
longwitz at incore.de
Sun Nov 18 13:32:55 UTC 2018
Thank you all for explanation how counter(9) works in detail.
> A single CPU instruction is atomic by definition, with regards to the CPU.
> A preemption can not happen in a middle of instruction. What the "lock"
> prefix does is memory locking to avoid unlocked parallel access to the
> same address by different CPUs.
OK, my view of "atomic" in this context was wrong.
> No, it does not look correct. The only atomicity guarantee that is required
> from the counter.h inc and zero methods are atomicity WRT context switches.
> The instructions are always executed on the CPU which owns the PCPU element
> in the counter array, and since the update is executed as single instruction,
> it does not require more expensive cache line lock AKA LOCK prefix. This
> is the main feature of the counters on x86.
>
> It might read bogus value when fetching the counter but counter.h KPI only
> guarantee is that the readouts are mostly correct. If you have systematically
> wrong value always read, there is probably something different going on.
On one of my two failing servers I have eliminated all "rdr pass" rules,
so counter(9) is not used at the moment for pf_default_rule.states_cur.
Using DTrace I can see the negative value -49 for this counter:
CPU ID FUNCTION:NAME
3 1 :BEGIN
feature=bfebfbff, ncpus=4
pf_default_rule.states_cur=0xc82cb3c8
0xc82cb3c8: counter0=0x00000000007bd25b
0xc82cb7c8: counter1=0xffffffffffd32262
0xc82cbbc8: counter2=0xffffffffffd87de1
0xc82cbfc8: counter3=0xffffffffffd88d31
counter =0xffffffffffffffcf
On my other concerned server I have introduces a panic call as soon as
the counter value returned by counter_u64_fetch() in pf_state_expires()
will become negative. So I will wait for the panic and hope for more
information from the kerneldump.
There is one unusual configuration on the two servers: they use pf and
ipfw/ipdivert at once. The reason for this is my use of natd for
incoming ftp requests to my ftp server, pf handles all the other
traffic. This configuration is a little bit tricky but works correct for
many years.
One exception was recently a problem with a buggy remote ftp client I
had to debug. During this period I had to restart/reload ipfw and natd a
couple of times. Because pf also has a reference to ipdivert, perhaps
there is a hidden interaction with the expire problem of pf.
Annotation: the buggy ftp client revealed a problem in natd (PR 230755).
Regards,
Andreas
More information about the freebsd-pf
mailing list