Netgraph performance with ng_ipfw

Fri Jan 22 21:48:22 UTC 2010

Hi, 
I have several routers under heavy load, running FreeBSD 7.2
These routers use Netgraph to impelement traffic shaping and accouting
(using ng_car and ng_netflow nodes). 
Packets are passed from firewall to netgraph using the following rules
accounting:
netgraph 100 ip from any to any in
shaping:
netgraph tablearg ip from any to table(118) out
netgraph tablearg ip from table(118) to any in

Table 118 contains users' ip addresses with tablearg referencing configured individual ng_car node.
At peak, there are 1500-2000 entries in table and configured nodes.
The problem is that at peak load the router loses packets. After studying the sources & doing some debugging, 
it became clear that packets are being droped at netgraph queue, at ng_alloc_item function:

static __inline item_p
ng_alloc_item(int type, int flags)
{
        item_p item;

        KASSERT(((type & ~NGQF_TYPE) == 0),
            ("%s: incorrect item type: %d", __func__, type));

        item = uma_zalloc((type == NGQF_DATA)?ng_qdzone:ng_qzone,
            ((flags & NG_WAITOK) ? M_WAITOK : M_NOWAIT) | M_ZERO);

        if (item) {
                item->el_flags = type;
#ifdef  NETGRAPH_DEBUG
                mtx_lock(&ngq_mtx);
                TAILQ_INSERT_TAIL(&ng_itemlist, item, all);
                allocated++;
                mtx_unlock(&ngq_mtx);
#endif
        }

        return (item);
}

It returns NULL if it is unable to allocate entry in ng_qdzone.
When it is being called from ng_package_data, this causes the packet to be dropped:
item_p
ng_package_data(struct mbuf *m, int flags)
{
        item_p item;

        if ((item = ng_alloc_item(NGQF_DATA, flags)) == NULL) {
                NG_FREE_M(m);
                return (NULL);
        }
        ITEM_DEBUG_CHECKS;
        item->el_flags |= NGQF_READER;
        NGI_M(item) = m;
        return (item);
}

After tuning maxdata parameter, I was able to decrease loses(and increase delays), but the question is, why 
the system does not contain some kind of a counter of packets dropped at Netgraph queue? It seem to be 
a trivial task to add, for example, a sysctl variable that would reflect the number of dropped packets, and it would 
really simplify things.

The second question is about the effectiveness of Netgraph queueing and ng_ipfw node with SMP kernel...
At ng_ipfw_connect function, when being connected to some other node, 
to avoid recursion the hook is set to queueing mode:
/*
 * Set hooks into queueing mode, to avoid recursion between
 * netgraph layer and ip_{input,output}.
 */
static int
ng_ipfw_connect(hook_p hook)
{
        NG_HOOK_FORCE_QUEUE(hook);
        return (0);
}

This causes the packets to be queued when being passed back to ng_ipfw node. 
On SMP kernels, several kernel processes are created to process 
queues(they are seen as ng_queue* processes in ps).
Now, the code of ngthread that processes the queue:

static void
ngthread(void *arg)
{
        for (;;) {
                node_p  node;

                /* Get node from the worklist. */
                NG_WORKLIST_LOCK();
                while ((node = TAILQ_FIRST(&ng_worklist)) == NULL)
                        NG_WORKLIST_SLEEP();
                TAILQ_REMOVE(&ng_worklist, node, nd_work);
                NG_WORKLIST_UNLOCK();
                CTR3(KTR_NET, "%20s: node [%x] (%p) taken off worklist",
                    __func__, node->nd_ID, node);
                /*
                 * We have the node. We also take over the reference
                 * that the list had on it.
                 * Now process as much as you can, until it won't
                 * let you have another item off the queue.
                 * All this time, keep the reference
                 * that lets us be sure that the node still exists.
                 * Let the reference go at the last minute.
                 */
                for (;;) {
                        item_p item;
                        int rw;

                        NG_QUEUE_LOCK(&node->nd_input_queue);
                        item = ng_dequeue(&node->nd_input_queue, &rw);
                        if (item == NULL) {
                                atomic_clear_int(&node->nd_flags, NGF_WORKQ);
                                NG_QUEUE_UNLOCK(&node->nd_input_queue);
                                break; /* go look for another node */
                        } else {
                                NG_QUEUE_UNLOCK(&node->nd_input_queue);
                                NGI_GET_NODE(item, node); /* zaps stored node */
                                ng_apply_item(node, item, rw);
                                NG_NODE_UNREF(node);
                        }
                }
                NG_NODE_UNREF(node);
        }
}

It takes the node from ng_worklist, and tries to process as many items 
in queue as possible, until ng_dequeue function returns NULL(no more items). 
Note that in ng_worklist there is usually only one node - ng_ipfw(if other nodes 
did not configure queueing for themselves, that is the case with ng_car and ng_netflow nodes).
If the large number of packets is being passed back to ng_ipfw node 
from other nodes, it is clear that one kernel process(ng_queue*) will simply take one node, and 
if the packets are being passed quicker than they are being processed in ng_ipfw(sent further to 
ip_input or ip_output), one of the ng_queue* processes will take 100% time of one CPU core, when the others will not 
process anything.
I have seen such behavior on my routers - at peak load, one of ng_queue* processes takes 100% of one core, 
and the other processes are seen in top taking 0% of CPU. 
This seem to be a problem of ng_ipfw - it doesn't seem to be working good with SMP.
My question is, can it somehow be fixed?

The third question is about the algorithm of finding hooks in ng_ipfw.
When being passed from firewall, ng_ipfw_input is called, in turn, 
it calls ng_ipfw_findhook1 function to find hook matching cookie from 
struct ip_fw_args *fwa.

        if (fw_node == NULL ||
           (hook = ng_ipfw_findhook1(fw_node, fwa->cookie)) == NULL) {
                if (tee == 0)
                        m_freem(*m0);
                return (ESRCH);         /* no hook associated with this rule */
        }

ng_ipfw_findhook function calls converts this cookie to numeric representation 
and calls ng_ipfw_findhook1:

/* Look up hook by name */
hook_p
ng_ipfw_findhook(node_p node, const char *name)
{
        u_int16_t n;    /* numeric representation of hook */
        char *endptr;

        n = (u_int16_t)strtol(name, &endptr, 10);
        if (*endptr != '\0')
                return NULL;
        return ng_ipfw_findhook1(node, n);
}

and ng_ipfw_findhook1 simply goes through the whole list of hooks to find one matching 
given cookie:

/* Look up hook by rule number */
static hook_p
ng_ipfw_findhook1(node_p node, u_int16_t rulenum)
{
        hook_p  hook;
        hpriv_p hpriv;

        LIST_FOREACH(hook, &node->nd_hooks, hk_hooks) {
                hpriv = NG_HOOK_PRIVATE(hook);
                if (NG_HOOK_IS_VALID(hook) && (hpriv->rulenum == rulenum))
                        return (hook);
        }

        return (NULL);
}

When the large number of hooks is present, as in the configuration given in the beginning of this message, 
this would cause an obvious decrease in performance - for each packet passed from ipfw to netgraph, 
1 to 1500-2000 iterations are needed to find matching hook. And again, it seem to be a trivial task to rewrite 
this code to find hook by hash or even by array.