More episodes
Telemetry Now  |  Season 1 - Episode 4  |  December 27, 2022

Underutilized and underrated: why flow data is still an engineer's best friend

Play now

 

In this episode of Telemetry Now, Justin Ryburn, the VP of Global Solutions Engineering at Kentik, joins us to talk discuss why flow data like NetFlow, IPFIX, and sFlow are still one of the best visibility tools in a network engineer's toolbox.


Key Takeaways

  • [00:52 - 01:24] Meet Justin Ryburn
  • [01:45 - 03:08] Losing airline and hotel points during COVID
  • [03:11 - 05:08] Flows and SNMP, passe?
  • [05:09 - 08:04] A push model vs pull model with streaming telemetry
  • [09:03 - 11:52] Flow data on a macro level and packets
  • [11:51 - 13:36] The networking industry focusing on new data and telemetry
  • [15:05 - 17:26] Real time analysis and troubleshooting, a tool in the toolbox



Transcript

This is telemetry now, and I'm your host Phil Gervasse, and joining me today is Justin Ryburn, the vice president of Global Solutions Engineering at Kentech, and an avid Traveler. And from what I can tell, a very opinionated person when it comes to how flow data is used by the networking industry right now. In fact, that's what, that's what today's episode is all about. So in spite of all the great new telemetry that we have now, cloud logs, streaming, and so on, flow data still matters. And of course, to Justin is probably still one of the best and most valuable sources of data that we have. And the problem then isn't isn't the data it's how it's being used by most engineers.

Provocative opinion for sure, so let's get started.

So Justin, it's great to have you here today. I really appreciate that you took some some time out of your busy schedule to nerd out with us.

Oh, happy to be here. Thanks for having me, Phil.

Awesome. So before we get started, can you explain just a little bit about your background and what you do now.

Sure.

Yeah. So I've been in, technology for about twenty five years. I started my career building operating large search router networks, during the the internet heyday as I like to call it. And then I discovered a love for presale. And I spent about ten years on the vendor side at Juniper Networks.

As you said, I currently run the solutions engineering team globally, here at Kintech. I've been doing that for about the last, five to five and a half years.

Yeah. That's pretty cool. I had a a kind kind of a similar journey. I went from being a traditional in the field engineer for many years and then into solutions engineering.

My title is solutions architect, but I I really love that. Being both still close to the tech and then getting to work directly with customers, you know, discussing design and and how to solve problems. I really enjoy that. But I do wanna ask you now. You know, you you you mentioned that, that you love travel. So how angry were you when you lost all of your airline and hotel points, during COVID.

Well, I think I was fortunate enough that, I had status with my preferred airline and hotel, so I didn't really lose a lot of my status or my miles for the most part. What I will say though is the hiatus was tough. You know, one of the things I I love about my job is that I get to travel all over the world. I get to meet with customers as as part of my day job. And then, you know, my wife and my family, we enjoy, traveling, in our personal time as well. It was a really hard adjustment when that all stopped overnight.

You know, being stuck in the house and quarantine, my wife and I both work from home, not going out on the road, not seeing new places, not being out in front of people and talking to customers.

It was definitely a big change.

I will on the on the positive side though, the fourth time being at home with the with the family and being able to slow down a little bit and spend some nice family quality family time together was, was a nice change, and, I really enjoyed that aspect of it.

Yeah. Yeah. I remember that too. You know, I was traveling a lot for work. I was in presales at the time and, went from traveling, almost every week or every other week to zero.

You know? And I really feel like we went from, you know, zero to a hundred again recently. Right?

It sure feels like it. Yeah. Being back out on the road quite a bit. These days.

So so, we both been network engineers in one way shape or form, presales you mentioned, post sales.

I've been out there in data centers. You know, Shlepping routers in my life, working for VAR vendors. And I think we both have some experience in internal corporate IT as well, So I know that we share some experience using a lot of those network visibility tools that have been out for years. Right?

And and and a lot of them were focused pretty much on flows in SNMP. You know, one or the other, maybe both, some combination of both. But recently, it feels like like, that's almost old school. You know what I mean? Like, Pase.

Have you sent something like that in the industry as well?

Yeah. I have. And honestly one of the reasons that I joined Kintech. Obviously, we've had both SNMP and Flow collectors for many years.

Like you said, And on the surface, these technologies don't really seem all that exciting. So, like, it seems like it's a solved problem. But the the challenge that we have as an industry historically, I think, is that this telemetry data was collected on these appliances. And the these appliances were limited by CPU memory disk.

I mean, you can cram so much of that into a single, you know, server appliance.

Right. And so then the approach that the the vendors who were manufacturing these appliances took was that, you know, they had to roll up the data. They had to aggregate it. They had to somehow make some engineering trade off to be able to fit within that sheet metal.

And, what we're seeing from vendors today is they're taking a much more modern approach. If I could use a buzzword, and I I call it big data. And and what I really mean by that is they're ingesting and storing large volumes of data in in clustering these systems across multiple sheet metal server. So it it gives you a lot larger volume of data, but it also allows you to do a lot more interesting things, with that data.

And I think that's what we're seeing from the industry from a lot of the vendors is that we're able to leverage, a lot of this, newer technology that exist in in clustered big data systems to solve this, telemetry problem.

So it sounds like that in recent years, One of the reasons that flow data kinda became, Pase, specifically because we were limited by the compute and the storage that we were able to apply to collecting that kind of stream and doing something with it. And then I guess maybe that just became the culture. Right? It just became all. Well, it's not as useful. So let's go look at streaming now.

Yeah. Yeah. And I mean, I think, you know, those technologies streaming to, telemetry is is useful, but it it I think it solves a very different problem. Right?

I think of streaming telemetry, and I think, you know, a lot of the a lot of the leaders in our industry have made some great, forte's into being able to more of a push model with streaming telemetry than the pole model we have in this NEP. It's more scalable. It solves a lot of problems. But it's still, for the most part, looking at the same data set, and that's things interface utilization, queue depths, drops, retransmits of of data on a particular interface, CPU memory, All that stuff is very valuable and is very interesting and is is necessary for a for a network operator to feel like they know what's going on in their network.

But what's really interesting flow data is it really gives you more information about the makeup, of what that traffic is.

One story I like to tell is when I, worked early in my career for a service provider, we started off just collecting interface metric, in in a platform called MRG, which I assuming a lot of the listeners have probably played with that, or something similar to that. Yep.

And that was awesome. I mean, before that, like, you know, we really didn't have good way to see graphs of our traffic. We had to log into the devices and look at the interfaces themselves to see how much traffic was on them.

So it was definitely a a step four from what we had before that, but I can remember spending long hours troubleshooting, denial of service attacks when that first started happening to to our service provider and trying to use MRTG graphs to trace back to the source where that traffic was ending network and then applying ACLs and inbound interface to Yep.

Accept that traffic, log it, dig through the logs manually, try and figure out what the the makeup of the traffic was, and then ultimately being able to actually block the attack, but it took hours. And then we started we put a flow based product into our network. Started collecting flow and did the same thing you know, using floated. And I just remember thinking, man, this is what I've been missing. Like, the ability to literally see the attack come in, see what made up the the the attack, what IP addresses it came in from, what IP addresses it was going to, ports and protocols, like the, you know, the the real deep level of knowledge of of of what the the the attack made up made it so much easier to do that troubleshooting and that that mitigation of the attack. It was night and day difference.

Yeah. So you're able to see what the traffic is made up of as opposed to just, like, interface statistics like you said. So you're getting a different dimension.

Absolutely. Yeah.

You know, I've I've I've been calling that past couple years working in visibility and observability, the diversity of visibility data, or diversions of data, whatever you wanna call it. But really, kinda like that that classic picture of the, of the dog. I don't know if you've seen it in presentations where it's a picture of a dog with a regular camera, and you see the dog. Delmatian, whatever.

Mhmm. And then the next picture is an x-ray. And you can see that the the dog swallowed some keys. And then the next picture, it's an MRI.

And so you see like the muscular skeletal system. You know, so each each type of data provides you a different view as to what's going on and flow is unique in that way.

Yeah. I love that analogy, by the way.

What's that?

I love that analogy, by the way. I I think, you know, because if I think about that, you're starting off with a high level view of the Right? You see it from the outside, and then you kinda peel back the layers of the onion, if you will, or you get deeper into it. I really think that's what what flow does for us. Right? We can get a high level of what's going through our interfaces by looking at S and P. But by looking at flow data, you really figure out what makes up that traffic that's on that interface.

Right?

I think I would still say that flow data is still kind of macro because it's not like it's packets. You're not doing deep packet inspection and looking at, you know, payloads and stuff. So it's still kind of macro in that sense. In fact, that that does make me wonder. I mean, if we have that storage capacity now, right, and we have the access to compute, whether like I'm gonna say virtually an air quotes virtually unlimited because we can just access it in the cloud. Right? What why don't we just do everything with packets?

Well, I think there's a couple things that that that I see with packets when when talking to, you know, customers that we have here it becomes expensive to instrument. Right? I mean, to do to do proper deep packet inspection, you know, large enterprises and definitely when you get in a service riders and the volume of data that they're dealing with on the network to put a deep packet inspection appliance in that can actually capture that becomes very expensive. It becomes cost prohibitive quite honestly.

Yep. So that's one thing. And then the second thing is how often you really need every single packet. I mean, flow gives you, enough data typically to know The makeup of your traffic without having to know every single packet and the whole payload, because mostly what you care about is the stuff that's in the headers anyway, which is really what flow is analyzing is the header information.

You really need to know the payload in most cases. And then actually, I guess if I as I if I think that through a little more, one of the one of the comments we hear a lot is not having to pay load as a benefit because we now no longer have to worry about a lot of security things. Right? Like, if I do DPI and I'm storing packets and that packet has financial transactions in it, or it has medical record information in it, then I have HIPAA or PCI, I'm sorry, the other way around.

I have PCI or HIPAA compliance things to worry about. Whereas, if I'm collecting flow, and all I have is the header information.

I don't have to worry about those security concerns because I don't have the payload. Nobody could first engineer that and figure out, you know, put the packets back together.

Although I do remember back in the day, doing a lab when I did the CCMP voice, I don't know what they call it now. CCI, I think the collab, they call it collab, not voice, whatever. Whatever is called.

I'm not sure.

But I remember doing that. And, you know, the old Wireshark packet capture of a voice conversation and then replaying it back as a WAV file, that is pretty cool.

Oh, yeah.

What? I gotta say that's pretty much the extent of my experience, like actually needing or requiring, packet level visibility. Well, I take it back wasn't even visibility. That was just a fun thing to do.

So, yeah.

I I hear what you're saying.

That was more of a troubleshooting thing, right, which is, you know, and that's where I really think packets and DPI really come play at, you know, at scale is when you're doing that troubleshooting. You're doing that deep troubleshooting. You really do need to see the package. You need to see the entire transaction. You need able to put it back together. And like you said, be able to play the callback.

I think that's still a valuable.

Why do you think that so many network visibility vendors over the past few years or more than a few years. Why do you think that they've focused so much on other types of telemetry? You know, I kind of been more more recent days. I mean, for a little while, you know, in all the presentations that saw online and the different events. Everything was streaming telemetry. Right?

You know, I remember doing screen scraping to troubleshoot issues. I remember doing Wireshark captures and all this stuff, but everything everything was just about streaming for a while. Why why do you think visibility vendors have focused on that and shied away from flow.

Yeah. It's an interesting question.

You know, I think, again, streaming telemetry definitely an improvement over SN and P, and for those types of data. Right? Being able to get it at much larger volume, be able to get it much faster so that I could my data points are, more accurate because I'm capturing it faster. So I think that's where a lot of the industry has focused on is the similar types of data they were getting from SNMP and just getting it in a different, in a different way so that it was much more scalable.

You know, I think a lot of vendors that have added, flow have not really focused on it. They haven't really focused their R and D on it. And so they've just kinda bolted it onto the side of an existing solution that they have. Yeah. And therefore, the the customers of those vendors, the users of those products have been led to believe that it's that that flow doesn't really provide that much additional value in what what is unfortunate about that is that there is a lot of really interesting information in flow data. It's just that it's never been focused on by some of these vendors They've never really unlocked the power of it by building a scalable solution, really focusing on what could I do with that data.

Okay. So then what do you think Not what do you think, but what do you know?

What kind of visibility can we get from flow that's unique to flow data that we really can't get from from other types of, telemetry.

Yeah. Like I said, it's really the the the makeup of the traffic. I mean, if you were to go and and really get nerdy and look at the the the specs for, like, IP fix or or Sflow or one of flow protocols, you'll see there's a lot of lot of really interesting information there. So you've got what I call the five doubles, you got the source IP, desk IP, source port, desk port protocol. But there's a lot of other things about the incoming interface, the outgoing interface.

A lot of times it'll have source and destination AS numbers in there that gets from the routing tables on a router, for example.

Oh, yeah.

That's pretty cool. Yeah. Yeah. And a lot of the over the last, at least in the five years that I've been a kintech, there's definitely been a lot of vendors who have added their own fields into IP fix.

So IP fix gives you the ability to sort of add on to it. Right? And so it's if you think of it kind of directionally similar as an enterprise specific Mib for SNMP, It's similar to that in IP fix, where, for example, a lot of our SD WAN vendors that the enterprises are using are now exporting data in their flow rec that talk about the tunnel and the application, the users. I mean, there's a lot of additional rich information that's coming directly in those flow records that tell you about that traffic and what traffic it is, who's consuming it.

There's a lot of lot of great information that you just can't get.

I really look sensible. Yeah. So then in in twenty twenty two, It's still a very useful, method for both ongoing monitoring, but also for, you know, like, real time Peru cousin troubleshooting. Mhmm. Would you say?

Yep. Absolutely. Yeah.

Yeah. So still a valuable tool in a, I guess a traditional network engineer's toolbox Right?

Oh, absolutely. For sure. Awesome. I I think I feel like we still, to some degree, just scratch the surface of what you can do with it. And it, you know, a lot of it is not I mean, there's a lot of great things in in the flow records themselves, but where it really becomes powerful is when you figure out what can I correlate it with? What can I enrich it with? Right?

Know, just some examples that, you know, that we do here at Kintake, not, you know, a lot of lot of other people could do this as well, but, you know, adding geo tagging to it. Like, what are what geo does the source IP belong to, what geo does the destination IP belong to?

Threat feed. You know, there are a lot of security companies out there that publish IP reputation data. So that allows you to then say, okay, I have traffic coming from this IP or traffic going to that IP. Are those IP addresses known bad actors? They know compromised, IP addresses. And so from a security angle, get you the ability to say, okay. I can now proactively get alerted, get notified when I have traffic that starts talking to some known compromise host.

Know, these are just a few examples I think of off top my head that I've seen Yep. Seen customers do, but I think we've probably only scratched the surface of things you could do when you take the data that's in flow. And then in in, like I said, enrich it or correlate it with some of these other, you know, other pieces of data to solve other more interesting use cases.

Yeah. And that was my originally thought, original thought, you know, starting this episode off.

You know, what what what is it about flow data that you know, that we're underutilizing. You know, why why is it that we're we're stopping when there's so much more? And and you highlighted a couple things, you know, that now that we have the ability to first of all, is very extensible. So it's it's more than just looking at, you know, twenty seven percent of my network as HTTP and then stopping. It's a lot more than than that.

But also that we have the storage and compute available to us today to do so much more than we ever could. And therefore, it's still very, very useful. I mean, I think I think the first flow protocols were in, like, the mid nineties or so. So Oh, yeah. Yeah. You know, I get it.

It's been around for a long time.

Yeah. But then so it's TCP IP. Do we throw that out too? Exactly. Kind of silly.

Anyway, Justin, this has been a really great episode talking about how, you know, how important flow data still really is today. And I really do appreciate your perspective on the industry too. That's that's valuable to me. So thank you for joining me today.

And before we go, where can folks find you online to ask questions, maybe get in touch with you in general.

Sure.

So you can find me on Twitter at justin r y b you are in, all one word.

I'm much more active these days on LinkedIn. If you wanna search for my name there and follow me there, it's probably the easiest.

Great. You can find me on Twitter at network underscore fill. You can search my name, Philip Jervacio on LinkedIn.

My blog network fill dot com. And, if you're interested in hearing more, telemetry now episodes, check out our website, kentech dot com slash telemetry now. And, please feel free to let us know if there are any topics that you'd like to hear about, or if you'd like to be a guest on the show. So until next time. Thanks. Bye bye.

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS? Well, you're in the right place! Telemetry Now is the podcast for you! Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.