ipv6 connection hash function wanted ...

Thu Nov 16 20:01:10 UTC 2006

JINMEI Tatuya / ???? wrote:
>>>>>> On Tue, 14 Nov 2006 20:20:47 +0100, 
>>>>>> Max Laier <max at love2party.net> said:
>>>>>>             
>
>   
>>>> Any ideas?  Any papers that deal with this problem?
>>>>         
>>> Assuming you don't want to use one of the standard cryptographic
>>> ones (which I can imagine being a bit slow for something done
>>> per-packet), then one option might be to use a simpler hash that
>>> is keyed. Choose the key at boot/module load time and make it hard
>>> to produce collisions unless you know the key.
>>>       
>
>   
>> That's exactly what I am looking for ... now I need someone[tm] - with 
>> better Math-Knowledge than mine - to write such a thing down in a simple 
>> formula :-) i.e. take those bits from there and there and XOR them with 
>> your canary yada-yada-yada ...
>>     
>
> If you want something whose behavior is mathematically guaranteed, I'd
> recommend universal hashing as already suggested in this thread.
>
> One example implementation is given below, assuming the hash key is
> source and destination IPv6 addresses and transport layer ports(*).
> As shown in the implementation, it's pretty simple and should run
> reasonably fast.  Also, as long as the random values are reasonably
> unpredictable, the expected probability of producing the same hash
> value for arbitrarily-chosen two different keys is 1/65537.  In
> general, for any prime number p as the value of LARGE_PRIME, this
> probability is 1/p (so if 65537 is too large, we can use a smaller
> number, e.g., 1009, depending on the desired balance between collision
> risk and memory footprint for hash buckets).
>
> (*)Technically, using in6_addr to represent an IPv6 address may not be
> enough; we may want to take into account zone indices (sin6_scope_id)
> as part of hash keys.
>
> 					JINMEI, Tatuya
> 					Communication Platform Lab.
> 					Corporate R&D Center, Toshiba Corp.
> 					jinmei at isl.rdc.toshiba.co.jp
>
> #define HASHKEYLEN 38	 /* sizeof(in6_addr) * 2 + sizeof(in_port_t) * 2 */
> #define LARGE_PRIME 65537
>
> /*
>  * This should be filled with unpredictable random values between 0
>  * and LARGE_PRIME-1 at startup time.
>  */
> u_int32_t random_parameters[HASHKEYLEN];
>
> u_int32_t
> hash(struct in6_addr *saddr, struct in6_addr *daddr,
>     in_port_t sport, in_port_t dport)
> {
> 	int i, j = 0;
> 	u_int32_t accumulated = 0;
> 	u_char *cp;
>
> 	for (i = 0; i < sizeof(*saddr); i++)
> 		accumulated += saddr->s6_addr[i] * random_parameters[j++];
> 	for (i = 0; i < sizeof(*saddr); i++)
> 		accumulated += daddr->s6_addr[i] * random_parameters[j++];
> 	cp = (u_char *)&sport;
> 	for (i = 0; i < sizeof(sport); i++)
> 		accumulated += cp[i] * random_parameters[j++];
> 	cp = (u_char *)&dport;
> 	for (i = 0; i < sizeof(dport); i++)
> 		accumulated += cp[i] * random_parameters[j++];
>
> 	return (accumulated % LARGE_PRIME);
> }
Jinmei,
Wouldn't you get some overflow if (pardon my set notation) "INT_MAX < 
{saddr,daddr}->s6_addr[i] * random_parameters[j++]", which would 
implicitly introduce collision into your algorithm. Or is the overall 
set size sufficiently large not to worry about this particular issue?
-Garrett