ipfw named objejcts, table values and syntax change
Alexander V. Chernikov
melifaro at FreeBSD.org
Fri Aug 15 13:25:13 UTC 2014
On 08.08.2014 16:11, Dmitry Selivanov wrote:
> 04.08.2014 23:51, Alexander V. Chernikov пишет:
>> On 04.08.2014 15:58, Luigi Rizzo wrote:
>>> On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote:
>>>> On 02.08.2014 12:33, Alexander V. Chernikov wrote:
>>>>> On 02.08.2014 10:33, Luigi Rizzo wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov
>>>>>> <melifaro at freebsd.org <mailto:melifaro at freebsd.org>> wrote:
>>>>>>
>>>>>> Hello all.
>>>>>>
>>>>>> I'm currently working on to enhance ipfw in some areas.
>>>>>> The most notable (and user-visible) change is named table
>>>>>> support.
>>>>>> The other one is support for different lookup algorithms
>>>>>> for different
>>>>>> key types.
>>>>>>
>>>>>> For example, new ipfw permits writing this:
>>>>>>
>>>>>> ipfw table tb1 create type cidr
>>>>>> ipfw add allow ip from table(tl1) to any
>>>>>> ipfw add allow ip from any lookup dst-ip tb1
>>>>>>
>>>>>> ipfw table if1 create type iface
>>>>>> ipfw add skipto tablearg ip from any to any via table(if1)
>>>>>>
>>>>>> or even this:
>>>>>> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port
>>>>>> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444
>>>>>> ipfw add allow ip from any to any flow table(fl1)
>>>>>>
>>>>>> all these changes fully preserve backward compatibility.
>>>>>> (actually tables needs now to be created before use and
>>>>>> their type needs
>>>>>> to match with opcode used, but new ipfw(8) performs
>>>>>> auto-creation
>>>>>> for cidr tables).
>>>>>>
>>>>>> There is another thing I'm going to change and I'm not sure
>>>>>> I can keep
>>>>>> the same compatibility level.
>>>>>>
>>>>>> Table values, from one point of view, can be classified to
>>>>>> the following
>>>>>> types:
>>>>>>
>>>>>> - skipto argument
>>>>>> - fwd argument (*)
>>>>>> - link to another object (nat, pipe, queue)
>>>>>> - plain u32 (not bound to any object)
>>>>>> (divert/tee,netgraph,tag/utag,limit)
>>>>>>
>>>>>> There are the following reasons why I think it is necessary
>>>>>> to implement
>>>>>> explicit table values typing (like tables):
>>>>>> - Implementing fwd tablearg for IPv6 hosts requires
>>>>>> indirection table
>>>>>> - Converting nat/pipe instance ids to names renders values
>>>>>> unusable
>>>>>> - retiring old hack with storing saved pointer of found
>>>>>> object/rule
>>>>>> inside rule w/o proper locking
>>>>>> - making faster skipto
>>>>>>
>>>>>>
>>>>>> ??????i don't buy the idea that you need typed arguments
>>>>>> for all the cases above. Maybe the case that
>>>>>> may make sense is the fwd argument (and in the future
>>>>>> something else).
>>>>>> We already discussed, i think, the fact that now it
>>>>>> is legal to have references to non existing things
>>>>>> (skipto, pipes etc.) implemented as u32.
>>>>>> Removing that would break configurations.
>>>>> It depends on actual implementation. This can be preserved by
>>>>> auto-creating necessary objects in kernel and/or in userspace, so
>>>>> we can (and should) avoid breaking in this particular way.
>>>> Can you please explain your vision on values another time?
>>>> As far as I understand, you're not against it in general, but the
>>>> details matter:
>>>> * IP address can be one of the types (it won't break much, and we can
>>>> simply skip that one for MFC)
>>>> * what about typing for nat/pipes ? we're not going to convert
>>>> their ids
>>>> to names? (or maybe you can suggest other non-disruptive way?)
>>>> * everything else is type "u32"
>>>
>>> Correct, I am mostly concerned about the details, not on the general
>>> concept.
>>>
>>> To summarize the discussion Alexander and I had about converting
>>> identifiers from numbers to arbitrary strings (this is partly related
>>> to the values stored in tables, but I think we should have a coherent
>>> behaviour)
>>>
>>> 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits
>>> or less)
>>> for rules, pipes, queues, tables, probably nat instances.
>>>
>>> 2. CURRENTLY, in all the above contexts, it is legal to reference a
>>> non existing object (rule, pipe, table names, etc.),
>>> and the kernel will do something reasonable, namely jump to the
>>> next rule, drop traffic for non existing pipes, and so on.
>>>
>>> 3. of course we want to preserve backward compatibility both for
>>> the ioctl interface, and for user configurations.
>>>
>>> 4. The in-kernel representation of identifiers is not visible to users,
>>> so we can use a numeric representation in the kernel for
>>> identifiers.
>>> Strings like "12345" are converted with atoi() or the like,
>>> whereas for other identifiers or numbers outside of the 2^16 range
>>> the kernel manages a translation table, allocating new numeric
>>> identifiers if a new string appears.
>>> This permits backward compatibility for old rulesets, and does not
>>> impact performance because the translation table is only
>>> used during rules additions or deletion.
>> Yes. However this requires either holding either (1) 2 pointers (old&new
>> arrays), or (2) 65k+ index array, or (3) chained hash table.
>> (1) would require additional pointers for each subsystem (and some
>> additional management),
>> (2) will definitely upset embedded guys and
>> (3) is worse in terms of performance
>>>
>>> With this in mind, i think we should follow a similar approach for
>>> objects stored in tables, hence
>>>
>>> if an u32 value was available in the past, it must be
>>> available also in the new implementation.
>>>
>>> The issue with tables is that some convoluted configuration could
>>> use the same table to reference pipes _and_ rules _and_ perhaps
>>> other things represented as numbers (the former is not too strange,
>>> if i have a large configuration i might place sections at rules
>>> 12000, 13000, 14000... and associate pipes with the same numberic
>>> identifier to each block of rules).
>>>
>>> Typed table values would clearly disturb backward compatibility
>>> in the above configurations. However it should not be difficult
>>> to accept arbitrary strings as the values stored in tables, and
>>> then store multiple representations as appropriate, including:
>> Well, I've thought about thas one. It may be an option, but the details
>> are not so promising (below)
>>> - the string representation, unconditionally
>>> - for names that can be resolved by DNS, the ipv6 and ipv4 address(es)
>>> associated with them. ipfw already translates hostnames in rules
>>> so this is POLA
>> I'm not happy what ipfw(8) is doing instead of translation. The proper
>> way would be not simply using first AF_INET answer but saving ALL
>> IPv4+IPv6 records inside rule (and some more tracking should be done
>> afterwards, but that's totally different story). Additionally, I'm
>> unsure if we really need next-hop value expressed as hostname (how can
>> we deal with multiple addresses and diffrent AFs?). We may store strings
>> (and I think we should do it) but I'm unsure about this particular
>> option of interpreting them.
>>> - for other strings, a u32 from the translation table as previously
>>> indicated
>>> - and for numeric values, the u32 representation (truncated if needed,
>>> according to whatever is the existing behaviour)
>>> - <add other representations if needed>
>>> If we cannot generate an u32 we will put some value (e.g. 0)
>>> that hopefully will not cause confusion.
>> As far as I understand, we accept some string "s" as table value inside
>> the kernel, than, we have some logic that says:
>> oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s"
>> has just been created, let's save indices.
>>
>> That would require additional indirection table like:
>>
>> index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ]
>> ( so we will have 2-level indirection table for fwd if we do IPv6)
>>
>> We can optimize this if we use "same name -> same kidx" approach
>> regardless of kernel object we're refering to. That might require some
>> more memory, but that's OK from my point of view.
>>
>> So we end up with
>> int [ skipto idx | fwd idx | obj idx ]
>>
>> idx "0" is special value which means the same as 2.CURRENT
>>
>> That looks better, but still way to complex.
>> I do care about compatibility, but it's hard to improve things without
>> changing.
>>
>> I'd like to propose the following:
>> * Split values into 3 types ("ip|nexthop", "number", "object")
>> * Do not insist on object existence, use value "0" to mimic 2.CURRENT
>> behavior.
>> * Retain full compatibility by introducing special value type "legacy"
>> which matches any type and is backed by given indirection table.
>> * Issue warning in ipfw(8) binary on all auto-created tables that
>> auto-creation is legacy and this behavior will be dropped in next major
>> release (e.g. 11.0)
>> * Save this behavior in MFC but drop "legacy" tables in head after a
>> month after actual MFC.
>>
>> That do you think?
>>>
>>> If we do it this way, we should be able to preserve backward
>>> compatibility _and_ add features that people may need.
>>>
>>> cheers
>>> luigi
>>>
> Here is my idea: tablearg should contain more than one value. I think
> getting several values from one table lookup is faster than several
> table lookups with one value.
> Let tablearg be not just uint32, but array with different value types
> inside it.
There are some use cases where we might need 2-level value lookup (e.g.
algo returning index for index table where actual data reside) and each
data item can
really be up to 64-bytes long. The problem is in actual partitioning and
compatibility.
>
> For example I have many such rules:
> allow src-ip 1.2.3.4 MAC any 11:22:33:44:55:66 recv vlan1234 dst-ip
> 1.1.1.1
Sorry, what task are you solving by using given rules?
>
> These rules can be replaced with such construction:
> allow src-ip table(1) MAC any tablearg[1] recv tablearg[2] dst-ip
> tablearg[3]
>
> But I don't think indexing by value is a good idea. I think
> index==starting byte is a better way:
> allow src-ip table(1) MAC any tablearg:0 recv tablearg:6 dst-ip
> tablearg:32
> where MAC's 6 bytes are from 0 to 5 in tablearg; iface string is from
> 6 and till \0, but less than 26 bytes; and IPv4's 4 bytes are from 32
> to 35.
> So we need to create table for it:
> table 1 set MAC:0 string:6:26 ip:32
> table 1 add 1.2.3.4 11:22:33:44:55:66 vlan1234 1.1.1.1
>
> String can be used both for iface and comment.
> Other possible value types:
> uint16 for nat, pipe, skipto and other 2-bytes actions
> IPv4 4 bytes
> CIDRv4 5 bytes
> IPv6 16 bytes
> CIDRv6 17 bytes
> table_id 2 bytes - link to another table
Well, it seems we have enough space to store most of these, however,
problems seem to remain the same: typing and compatibility.
When you're creating new table (or it is auto-created) which values
types should be assumed ? All of them?
What should `ipfw table X list` show as "value" field ?
How should ipfw(8) treat "add 1.1.1.1 0" input?
What will happen if we want to add another type field to this list? (MAC
address of Infiniband MAC address, for example).
>
> Table value length can be set for example with loader tunable like
> net.inet.ip.fw.table_value_length.
> Even with default uint32 value length we can get 2 uint16 values or 4
> uint8 values, this can help in some configurations.
>
> This way is more complex, but much more flexible. It's like netgraph
> subsystem.
> I think it suites both Alexander and Luigi requests.
>
>
More information about the freebsd-net
mailing list