[ixpmanager] SFLOW Under Reporting?
Jeroen Van Bemmel (Nokia)
jeroen.van_bemmel at nokia.com
Tue Jun 27 08:56:25 IST 2023
Hi Ian,
At https://github.com/netsampler/goflow2/blob/main/cmd/enricher/main.go you can find an example "plugin" (pipeline program) for GoFlow2.
It processes incoming protobuf records and adds GeoIP/ASN information (in only ~200 lines of Go code)
Perhaps we could build something similar for IXP Manager. Besides performance benefits and more accurate reporting, this would also enable support for IPFix and Netflow inputs
It would look something like this:
$ ./goflow2 -transport.file.sep= -format=pb -format.protobuf.fixedlen=true | ./ixp_manager_flow_processor -mysql "flowuser:<MySQLdatabasepassword>@tcp(127.0.0.1:3306)/ixp_manager"
Regards,
Jeroen
-----Original Message-----
From: ixpmanager <ixpmanager-bounces at inex.ie> On Behalf Of Ian Chilton via ixpmanager
Sent: Tuesday, June 27, 2023 2:02 AM
To: Nick Hilliard (INEX) <nick at inex.ie>
Cc: Ian Chilton <ian at lonap.net>; INEX IXP Manager Users Mailing List <ixpmanager at inex.ie>
Subject: Re: [ixpmanager] SFLOW Under Reporting?
CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
Hi,
> On 2023-06-27 00:31, Nick Hilliard (INEX) wrote:
> My initial suspicion would be tail drops due to a buffer overflow on
> the pipe between the sflowtool process and sflow-to-rrd-handler.
You beat to me to posting, but also came to this conclusion and I believe i've proved this to be the case.
I wrote this to simply parse the sflow data and sums it. Every 'interval', it takes the total and zeros it. In a thread (which is irrelevant here but I was testing for the bigger workload), it prints it out.
https://gist.github.com/ichilton/b53cde596bb02289fca88fb61480c58f
It was 1AM when I did this and now only 07:30 so i've not tested it at peak traffic, but at lower traffic at those times i'm seeing between 5-8% of the total traffic reported by MRTG (obviously that is over a 5 minute interval and this is at a 1 minute interval, but it's in the ballpark).
So I believe, assuming it can even keep up normally, what is happening is while it's busy executing the periodic flush / mac table code, the buffer is overflowing and it's missing samples.
That explains why the shape/trend of the graphs are correct - they are lower in numbers than they should be.
A quick workaround would be to hack the current script to do the periodic flush/reload in a thread so it happens concurrently to the flow parsing.
Ultimately, we'll keep hitting this in the future as interfaces and traffic increases.
I plan to do the following:
- Re-write this in Go for better performance.
- I have an idea that we could have a thread per switch - each switch sends sflow to a different port, the script manages an sflowtool per switch, so that thread is only parsing a subset of the overall data (we have ~20 switches), which would make it more scalable.
- Nicolaas posted to the list about goflow2, which is worth comparing to sflowtool.
I'm busy for the rest of this week with travel/DCs/event and prep for that, but I plan to work on this more in the coming weeks.
In the meantime, any thoughts/ideas/suggestions are welcome.
Ian
_______________________________________________
INEX IXP Manager mailing list
ixpmanager at inex.ie
Unsubscribe or change options here: https://www.inex.ie/mailman/listinfo/ixpmanager
More information about the ixpmanager
mailing list