In this section, we use the nfdump product, installed in the previous section, to query the collected data and answer some basic questions about network activity which we generate. This practice is fundamental to understanding what we can do with NetFlow, so don’t rush.
Let’s practice querying the recorded data using the nfdump tool. All similar tools should be able to produce much of the same, although there will be some large differences in user interface and user-friendliness for various tasks. As you do each item in the list, take a screenshot.
First some very basic usage. Look at the manual page for nfdump to determine what these options mean.
nfdump -R /var/cache/nfdump/ -c 5
This should print out the first five flows stored in the
/var/cache/nfdump/. Take a
screenshot showing the output of that command. Have a look
in the directory that is used. Briefly describe how the data
is stored; you will need this for the assessment.
Have a look at the summary line. What useful information is listed here?
Often we are concerned about a particular time slice. Let’s repeat the previous example using a particular timeslice; you’ll need to determine a suitable example timeslice of which you will have traffic recorded.
nfdump -R /var/cache/nfdump/ -c 5 -t… first five flows of a two day period …
nfdump -R /var/cache/nfdump/ -c 5 -t… first five flows of December 2009 …
nfdump -R /var/cache/nfdump/ -c 5 -t… first five flows of 4th December 2009, 12:00-13:59 …
We can select a different output format that is useful for either casual use, closer inspection, or for use with other programs such as for generating billing data. We can even specify a custom output format with nfdump.
nfdump -R /var/cache/nfdump/ -o long
Have a look at the manual page for
nfdump and play with different arguments to
Notice that in the previous example, the rightmost column of the ‘extended’ output mode shows the number of flows that are aggregated in that line, and that they are all 1.
Aggregation is an important concept when generating reports, as it aggregates the data from individual flows into various groups we are interested in.
Some connections, such as SSH connections, can be long-lived and so may easily be exported from a probe due to time, rather than due to flow completion. For this reason, one very simple form of aggregation is simply to aggregate based on ‘connections’, so individual flows which may have been exported because they spanned a long-time get aggregated together, which is a rather more ‘natural’ way to look at it.
nfdump -R /var/cache/nfdump/ -a -o long
Now you should notice that there are numerous lines that have numerous flows aggregated.
What are the top ten hosts with regard to bandwidth use? This is another form of aggregation, only based on a particular NetFlow attribute.
nfdump -R /var/cache/nfdump/ -s srcip/bytes -n 10
The size of the report can be parameterised using
-n for the number of reports, and/or using
-L +40M to limit those results
to those over 40 MB.
We can use two queries, one with srcip and the other with dstip, to look at data both incoming and outgoing. You can start to get a feel that we are approaching what looks like an accounting report.
However, for accounting purposes, there is a lot of stuff that is duplicated. If we want stuff going to/from the internet and our private network, we only want stuff that goes in one interface and out another. Typically, we might use the incoming and outgoing interface ID, but our probe doesn’t supply us with that information.
This provides us a useful opportunity to look at the filter capability of nfdump. This gives us great ad-hoc flexibility. The syntax is similar, but sufficiently different enough to be a little annoying, to the filter syntax used by libpcap (ie. tcpdump and friends).
… This is our upload report …
nfdump -R /var/cache/nfdump/ -s srcip/bytes -L +10M 'src net 192.168.1.0/24'Byte limit: > 10485760 bytes Top 10 Src IP Addr ordered by bytes: Date first seen Duration Proto Src IP Addr Flows Packets Bytes … 2009-12-03 16:09:07.812 96171.835 any 192.168.1.51 10074 446939 34.6 M … 2009-12-03 16:21:44.166 104711.381 any 192.168.1.52 6561 57491 16.4 M … Summary: total flows: 23357, total bytes: 57.6 M, total packets: 544964, avg bps: 4578, avg pps: 5, avg bpp: 110 Time window: 2009-12-03 16:09:07 - 2009-12-04 21:26:55 Total flows processed: 40795, Records skipped: 0, Bytes read: 2125540 Sys: 0.016s flows/second: 2549687.5 Wall: 0.011s flows/second: 3677875.9 … This is our download report …
nfdump -R /var/cache/nfdump/ -s dstip/bytes -L +10M 'dst net 192.168.1.0/24'Byte limit: > 10485760 bytes Top 10 Dst IP Addr ordered by bytes: Date first seen Duration Proto Dst IP Addr Flows Packets Bytes … 2009-12-03 16:09:07.812 96170.221 any 192.168.1.51 9787 549401 710.0 M … 2009-12-03 16:21:44.166 104711.381 any 192.168.1.52 6420 73105 52.0 M … 2009-12-03 16:19:20.083 104837.615 any 192.168.1.145 2373 24907 19.7 M … Summary: total flows: 23035, total bytes: 789.8 M, total packets: 662711, avg bps: 62818, avg pps: 6, avg bpp: 1249 Time window: 2009-12-03 16:09:07 - 2009-12-04 21:26:55 Total flows processed: 40795, Records skipped: 0, Bytes read: 2125540 Sys: 0.016s flows/second: 2549687.5 Wall: 0.010s flows/second: 3717084.3
Have a think about what you see in this report and in particular the filtering expression used. What might be missed?