[ixpmanager] SFLOW Under Reporting?
Ian Chilton
ian at lonap.net
Tue Jun 27 10:18:05 IST 2023
Hi,
So... since my earlier message (below), I whipped up a little test
script to try out threads in perl.
On the back of that, I tried a little hack:
Added `use threads;` and did this 1 line fix:
- process_rrd($interval, $matrix, $rrdcached);
+ threads->create('process_rrd', $interval, $matrix,
$rrdcached);
Bingo!
We're now doing 736G according to MRTG, the life sflow graphs are
reporting 332G, now my new instance in a VM is giving 703G and on bare
metal 705G.
I still think it's worth re-writing this in Go for performance, which I
can look at in the future, but for now that appears to have resolved
things.
I'll do a pull-request on GitHub, as long as this seems to continue to
work ok.
Ian
On 2023-06-27 08:01, Ian Chilton wrote:
> Hi,
>
>> On 2023-06-27 00:31, Nick Hilliard (INEX) wrote:
>> My initial suspicion would be tail drops due to a buffer overflow on
>> the pipe between the sflowtool process and sflow-to-rrd-handler.
>
> You beat to me to posting, but also came to this conclusion and I
> believe i've proved this to be the case.
>
> I wrote this to simply parse the sflow data and sums it. Every
> 'interval', it takes the total and zeros it. In a thread (which is
> irrelevant here but I was testing for the bigger workload), it prints
> it out.
>
> https://gist.github.com/ichilton/b53cde596bb02289fca88fb61480c58f
>
> It was 1AM when I did this and now only 07:30 so i've not tested it at
> peak traffic, but at lower traffic at those times i'm seeing between
> 5-8% of the total traffic reported by MRTG (obviously that is over a 5
> minute interval and this is at a 1 minute interval, but it's in the
> ballpark).
>
> So I believe, assuming it can even keep up normally, what is happening
> is while it's busy executing the periodic flush / mac table code, the
> buffer is overflowing and it's missing samples.
>
> That explains why the shape/trend of the graphs are correct - they are
> lower in numbers than they should be.
>
> A quick workaround would be to hack the current script to do the
> periodic flush/reload in a thread so it happens concurrently to the
> flow parsing.
>
> Ultimately, we'll keep hitting this in the future as interfaces and
> traffic increases.
>
> I plan to do the following:
>
> - Re-write this in Go for better performance.
>
> - I have an idea that we could have a thread per switch - each switch
> sends sflow to a different port, the script manages an sflowtool per
> switch, so that thread is only parsing a subset of the overall data (we
> have ~20 switches), which would make it more scalable.
>
> - Nicolaas posted to the list about goflow2, which is worth comparing
> to sflowtool.
>
> I'm busy for the rest of this week with travel/DCs/event and prep for
> that, but I plan to work on this more in the coming weeks.
>
> In the meantime, any thoughts/ideas/suggestions are welcome.
>
> Ian
More information about the ixpmanager
mailing list