Linux tc multi-level massive hashing
It is little known that Linux tc traffic-shaping framework supports multi-depth filter hashing, allowing to reduce CPU load for installations with a lot of filters. Here is how to configure it.
Say, we have an installation with several thousand hosts in 10.1.C.D and 10.2.C.D ranges. First, we create hash table for 10.1.0.0/16:
tc filter add dev eth3 parent 1:0 prio 1 handle 100: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 800:: match ip dst 10.1.0.0/16 hashkey mask 0x0000ff00 at 16 link 100:
This instructs the kernel to create hash table 100 (hex!) with 256 buckets. The next line assigns a filter which would make all traffic with destination IP in 10.1.0.0/16 range be looked up in this hash table ("link 100:") using the the third (C) IP address octet ("hashkey mask 0x0000ff00 at 16").
The next command does the same, but for 10.2.0.0/16:
tc filter add dev eth3 parent 1:0 prio 1 handle 101: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 800:: match ip dst 10.2.0.0/16 hashkey mask 0x0000ff00 at 16 link 101:
Now, we create a hash table for every /24 subnet used inside these /16 ranges. Say, for 10.1.1.0/24:
tc filter add dev eth3 parent 1:0 prio 1 handle 201: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:1: match ip dst 10.1.1.0/24 hashkey mask 0x000000ff at 16 link 201:
First line creates a hash table 201 with 256 buckets. The second line is more complex: "ht 100:1:" means that this filter is to be placed into hash table 100, bucket 1 (hex). So, considering the filter for hash table 100, that means this rule will be evaluate when the third (C) octet of the IP address matches 1, e.g. 10.1.1.X, and then do a further lookup in the hash table 201 ("link 201:"). "hashkey mask 0x000000ff at 16" means that lookup will happen in table 201 using the fourth (D) octet of the IP address, e.g. 10.1.1.1 goes into table 201 bucket 1, 10.1.1.2 into bucket 2, 10.1.1.3 into bucket 3 etc.
We go on with similar configuration for 10.1.2.0/24, 10.2.1.0/24, 10.2.2.0/24, assigning a unique hash table number for each subnet:
# 10.1.2.0/24
tc filter add dev eth3 parent 1:0 prio 1 handle 202: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:2: match ip dst 10.1.2.0/24 hashkey mask 0x000000ff at 16 link 200:
# 10.2.1.0/24
tc filter add dev eth3 parent 1:0 prio 1 handle 203: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:1: match ip dst 10.2.1.0/24 hashkey mask 0x000000ff at 16 link 203:
# 10.2.2.0/24
tc filter add dev eth3 parent 1:0 prio 1 handle 204: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:2: match ip dst 10.2.1.0/24 hashkey mask 0x000000ff at 16 link 204:
Note that ht values for 10.2.1.0 and 10.1.1.0 are the same ("ht 100:1:"). This is because the third octet is the same, so rules go into the same bucket. For the same reason ht for both 10.2.2.0 and 10.1.2.0 is (100:2:).
The last step is to populate hash tables for the fourth (D) octet, e.g. for 10.1.1.D:
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:1: match ip dst 10.1.1.1/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:2: match ip dst 10.1.1.2/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:3: match ip dst 10.1.1.3/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:a: match ip dst 10.1.1.10/32 flowid 1:10
For example, fourth octet of 10.1.1.1 is 1, so the kernel will look for a rule in hash table 201, bucket 1. That's why the first line contains "ht 201:1:". Similarly, for 10.1.1.2 we use "ht 201:2:". Remember, all ht values are in hex. That's why 10.1.1.10 has "ht 201:a:". "flowid 1:10" indicates which class this filter belongs to - probably you are using HTB for shaping and this would be one of its classes (say, gold or bronze).
Apply the same approach to hosts in other subnets:
# Hosts in 10.1.2.0/24
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 202:1: match ip dst 10.1.2.1/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 202:2: match ip dst 10.1.2.2/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 202:3: match ip dst 10.1.2.3/32 flowid 1:20
# Hosts in 10.2.1.0/24
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 203:1: match ip dst 10.2.1.1/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 203:2: match ip dst 10.2.1.2/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 203:3: match ip dst 10.2.1.3/32 flowid 1:20
# Hosts in 10.2.2.0/24
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 204:1: match ip dst 10.2.2.1/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 204:2: match ip dst 10.2.2.2/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 204:3: match ip dst 10.2.2.3/32 flowid 1:10
Done!
Say, we have an installation with several thousand hosts in 10.1.C.D and 10.2.C.D ranges. First, we create hash table for 10.1.0.0/16:
tc filter add dev eth3 parent 1:0 prio 1 handle 100: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 800:: match ip dst 10.1.0.0/16 hashkey mask 0x0000ff00 at 16 link 100:
This instructs the kernel to create hash table 100 (hex!) with 256 buckets. The next line assigns a filter which would make all traffic with destination IP in 10.1.0.0/16 range be looked up in this hash table ("link 100:") using the the third (C) IP address octet ("hashkey mask 0x0000ff00 at 16").
The next command does the same, but for 10.2.0.0/16:
tc filter add dev eth3 parent 1:0 prio 1 handle 101: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 800:: match ip dst 10.2.0.0/16 hashkey mask 0x0000ff00 at 16 link 101:
Now, we create a hash table for every /24 subnet used inside these /16 ranges. Say, for 10.1.1.0/24:
tc filter add dev eth3 parent 1:0 prio 1 handle 201: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:1: match ip dst 10.1.1.0/24 hashkey mask 0x000000ff at 16 link 201:
First line creates a hash table 201 with 256 buckets. The second line is more complex: "ht 100:1:" means that this filter is to be placed into hash table 100, bucket 1 (hex). So, considering the filter for hash table 100, that means this rule will be evaluate when the third (C) octet of the IP address matches 1, e.g. 10.1.1.X, and then do a further lookup in the hash table 201 ("link 201:"). "hashkey mask 0x000000ff at 16" means that lookup will happen in table 201 using the fourth (D) octet of the IP address, e.g. 10.1.1.1 goes into table 201 bucket 1, 10.1.1.2 into bucket 2, 10.1.1.3 into bucket 3 etc.
We go on with similar configuration for 10.1.2.0/24, 10.2.1.0/24, 10.2.2.0/24, assigning a unique hash table number for each subnet:
# 10.1.2.0/24
tc filter add dev eth3 parent 1:0 prio 1 handle 202: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:2: match ip dst 10.1.2.0/24 hashkey mask 0x000000ff at 16 link 200:
# 10.2.1.0/24
tc filter add dev eth3 parent 1:0 prio 1 handle 203: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:1: match ip dst 10.2.1.0/24 hashkey mask 0x000000ff at 16 link 203:
# 10.2.2.0/24
tc filter add dev eth3 parent 1:0 prio 1 handle 204: protocol ip u32 divisor 256
tc filter add dev eth3 protocol ip parent 1:0 prio 1 u32 ht 100:2: match ip dst 10.2.1.0/24 hashkey mask 0x000000ff at 16 link 204:
Note that ht values for 10.2.1.0 and 10.1.1.0 are the same ("ht 100:1:"). This is because the third octet is the same, so rules go into the same bucket. For the same reason ht for both 10.2.2.0 and 10.1.2.0 is (100:2:).
The last step is to populate hash tables for the fourth (D) octet, e.g. for 10.1.1.D:
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:1: match ip dst 10.1.1.1/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:2: match ip dst 10.1.1.2/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:3: match ip dst 10.1.1.3/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 201:a: match ip dst 10.1.1.10/32 flowid 1:10
For example, fourth octet of 10.1.1.1 is 1, so the kernel will look for a rule in hash table 201, bucket 1. That's why the first line contains "ht 201:1:". Similarly, for 10.1.1.2 we use "ht 201:2:". Remember, all ht values are in hex. That's why 10.1.1.10 has "ht 201:a:". "flowid 1:10" indicates which class this filter belongs to - probably you are using HTB for shaping and this would be one of its classes (say, gold or bronze).
Apply the same approach to hosts in other subnets:
# Hosts in 10.1.2.0/24
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 202:1: match ip dst 10.1.2.1/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 202:2: match ip dst 10.1.2.2/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 202:3: match ip dst 10.1.2.3/32 flowid 1:20
# Hosts in 10.2.1.0/24
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 203:1: match ip dst 10.2.1.1/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 203:2: match ip dst 10.2.1.2/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 203:3: match ip dst 10.2.1.3/32 flowid 1:20
# Hosts in 10.2.2.0/24
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 204:1: match ip dst 10.2.2.1/32 flowid 1:10
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 204:2: match ip dst 10.2.2.2/32 flowid 1:20
tc filter add dev eth3 parent 1:0 protocol ip prio 1 u32 ht 204:3: match ip dst 10.2.2.3/32 flowid 1:10
Done!