[ixpmanager] SFLOW Under Reporting?

Sat Jun 24 09:13:14 IST 2023

Hi,

On 2023-06-22 16:37, Nick Hilliard (INEX) wrote:
> yes, this would make a difference. For reference, if your switch config
> is automated with L2 ACLs, then we recommend using configured macs for 
> sflow
> collection.  If you have any rejected flows then, that means there's a
> straightforward misconfig, either in IXP Manager or else on the 
> participant port.

Yep - problem is, we were running an old verison of the script as when 
we had done
some work to update/re-deploy it in the past, the graphs coming from the 
new
deployment were not consistent with the old ones, so it ended up going 
on the
back burner. That old script is pre the ability to switch between
configured/discovered.

What i've done in the past day or so is to deploy a new box, with the 
scripts
from the latest IXP Manager release. I've switched to using configured 
MACs
and put a temporary fix in place for subinterfaces. This has fixed most 
of the
dropped/rejected lines and means I can run in debug mode without 
accumulating
huge logs.

Curiously, i'm not seeing the exact same results between the old and new 
boxes,
even though they are receiving the same data - fanned out with 
sflowtool...but
they are kind of in the ball park.

The strange thing is - all of the sflow stats seem to follow the correct 
trend -
i.e shape of the graph, but it's just ~50% too low at any particular 
time.

I've tried different sflow sample rates: 8,192, 16,384, 32,768 and 
65,536.
The script seems to do the right thing and this seems to make no 
difference -
which is good at least. This and the fact that it seems to be 
consistently
50% of what it should be would indicate that it's not struggling with
the amount of flows, CPU, I/O etc.

One thing I have noticed is the default periodic/flush interval in the 
code is
60s, but it can sometimes take longer than that to run (possibly when 
it's
reloading the mac table?)

It took 93s here:
Jun 24 07:53:27 sflow sflow-to-rrd-handler[74757]: DEBUG: starting rrd 
flush at time interval: 60.001857, time: 1687589607
Jun 24 07:55:00 sflow sflow-to-rrd-handler[74757]: DEBUG: flush 
completed at 1687589700

Not sure if that has any effect in the resulting RRD files.

The standard run time is between 30s and 40s (i've modified the script
to show it):

Jun 24 08:56:02 sflow sflow-to-rrd-handler[76458]: DEBUG: flush 
completed at 1687593362 (33s)
Jun 24 08:56:29 sflow sflow-to-rrd-handler[76458]: DEBUG: starting rrd 
flush at time interval: 60.001306, time: 1687593389
Jun 24 08:57:05 sflow sflow-to-rrd-handler[76458]: DEBUG: flush 
completed at 1687593425 (36s)
Jun 24 08:57:29 sflow sflow-to-rrd-handler[76458]: DEBUG: starting rrd 
flush at time interval: 59.99794, time: 1687593449
Jun 24 08:58:02 sflow sflow-to-rrd-handler[76458]: DEBUG: flush 
completed at 1687593482 (33s)
Jun 24 08:58:29 sflow sflow-to-rrd-handler[76458]: DEBUG: starting rrd 
flush at time interval: 59.999949, time: 1687593509
Jun 24 08:58:58 sflow sflow-to-rrd-handler[76458]: DEBUG: flush 
completed at 1687593538 (29s)

Ian