Kentik - Network Observability
More episodes
Telemetry Now  |  Season 2 - Episode 8  |  July 18, 2024

A Deep Dive into BGP Flowspec

Play now

 
Flowspec can be thought of as a way of distributing basic firewall functionality into BGP. In the last decade, it’s become a popular way to mitigate the effects of a DDoS attack by filtering incoming BGP Update Messages. In this episode, Justin Ryburn, Field CTO at Kentik, explains how Flowspec works, how it’s implemented, and why it hasn’t been adopted quite as much as we expected it to be.

Transcript

BGP Flowspec was first introduced back in 2009. And since then, it's been a very popular way to mitigate the effects of a DDoS attack on organizations with a pretty decent Internet presence. So you could sort of think about Flowspec as a way of distributing a type of firewall functionality into BGP.

So in this episode, I'm joined by Justin Ryburn, field CTO at Kentik, who's written extensively about Flowspec and given talks on it at various conferences, and we're gonna get into what it is. What is Flowspec? How does it work? How is it implemented? And we'll even talk about why it hasn't been adopted quite as much as we expected it to be. My name is Philip Gervasi, and this is Telemetry Now.

Justin, it's great to have you back on the podcast. This is not the first time that you're on, so it's great to have you on as a returning guest. Now I know you personally pretty well having worked with you for a few years and, and hung out with you at different events as well. But before we get started with today's episode, we're gonna get eyeball deep into BGP Flowspec.

Can you give us a little bit about your background, maybe how you came to your deep understanding of, Flowspec specifically that would help and and what you're doing now?

Yeah. Sure. First of all, thanks for having me on the podcast again, Phil. Happy to be here.

Like you said, I I have a pretty strong background in service provider. I spent ten years, Juniper Networks. And one of the projects that I worked on, actually, about ten years ago at Juniper was on Beachbody Flowspec. My customer at the time, I was working in presales, and they were having to try and figure out a solution around DDoS attacks.

They were having quite a few of them that were having a pretty big impact on their customer base, which we'll probably talk about some trends in that at some point here throughout the podcast. But, you know, we're trying to figure out, like, what's a good solution. They were asking me some questions about BGP Flowspec. I had heard of it, but didn't know a whole lot about it.

So I did a lot of research, reading, did some lab testing, and became, you know, I guess, somewhat of an expert. I don't typically like to use the word expert for myself, but, you know, for lack of a better term, I'll use that one. And someone pointed out to me, they're like, there's not much documentation on this. Why don't you document this?

So I I wrote a a day one guide is what Juniper called it. It kinda the idea behind those, if if folks aren't familiar with them, it's it's basically a book that is the name implies in Monday. You can kinda get up to speed on the topic. Right?

So it's not full on book, but it's definitely more than a than a white paper. It's somewhere in the middle. So I wrote a day one guide on BGP Flowspec. Did a few public speaking engagements at Nanog and, a couple other conferences.

We could put some links to in the show notes if people are interested in in, researching that. But it's been a lot of time reading, researching, coming up to speed on Flowspec, and I've been kinda following it ever since.

Okay. Great. Yeah. And and then for the audience's sake, this episode today sort of, spawned from a recent blog post that Justin wrote.

Forgot the name of it now. BGB Flowspec doesn't suck. We're just using it wrong. Is that correct?

Yeah. Something like that. Right.

Something like that. But, anyway, it's it's about BGB, Flowspec, going into the background, what it is from a technical perspective, but also the use case and, maybe why it hasn't been adopted as much as we think it should have been or it should have been by now.

But, ultimately, I do encourage you to go read that because it's a nice way to just sit down and get a a good overview, if you're not familiar with the technology. So let's start there. What is BGP Flowspec? And, from a technical perspective, of course, like, what does it actually do, and, and what do we use it for?

Sure. Yeah. So, essentially it's an extension to the BGP protocol itself. So when the RFC was ratified by the, ATF, They ex they expanded BGP by adding a new what's called NLRI, which stands for network layer reachability information.

And in that, we think of NLRIs as like prefixes, IP address prefixes, right, to advertise routes that tell a router how to get from well, how to get to a destination.

And in this specific case, what they what the IETF expanded, BGP to do for Flowspec and then LRI was to describe the type of traffic that we want to deny, that we want to well, that we want to filter. We wanna take an action on, I guess. So if you think of it as like a matching criteria, you need a firewall filter or an access interface, the matching criteria is advertised in BGP and then NLRI. So you can do things like, destination prefix, source prefix, ports, protocols, TCP flags, packet links. There's a lot of different, types of matching criteria that that exist in that, NLRI.

Okay.

Yeah. And then the next thing that they did was they, you obviously once you match traffic, you need to be able to say, well, what action do you want to take on that? Right? And the way that they came up with to be able to do that is you attach a community to the route, similar to the way you would attach a community to a route when you're advertising it for prefixes to tell it, you know, where did this prefix come from, who advertised it. There's a lot of different reasons or a lot of different ways that people design their communities when they're doing, BGP network design.

But in this particular case, the communities tell the router, what to do what action to take on that traffic. So the most obvious is just to drop it. Technically, what that means is I'm setting a rate of zero on it, but I can do other things where I can redirect the traffic into a specific VRF. So maybe I'm gonna build myself a a scrubbing center where I'm gonna redirect my traffic to some sort of scrubbing appliance, and I wanna put the dirty traffic into one VRF and the clean traffic exists in my global table. So I can redirect it into a VRF.

I can change the DSCP markings, the, class of service, quality of service, markings on it to make the traffic, like, best effort, for example, so that if I get to a point where there's congestion, then I'll drop that traffic before any of my more important traffic. So there's a few different things that you can do with the matching criterias that are advertised, in the community that's attached to the route.

Ultimately, though, it is, looking at that information, five tuple information. It's not doing any kinda, like, DPI or anything that, invasive on traffic. Right? So it is a five tuple.

Yep. We're matching on traffic source destination port protocol, all that kind of stuff. That's fine. But it's specifically in the BGP update message that it's looking at.

Right? Yep. Okay. Correct. Alright. So we are focused specifically on that BGP communication between peers Mhmm.

Or neighbors, whatever you wanna call them Mhmm.

In an adjacency.

So, ultimately, though, when I match some sort of traffic, it's to do something. Right? You you mentioned firewall, though. Is is that the idea behind behind Flowspec that I wanna block traffic, necessarily? Because otherwise, it's basically kinda like an advanced form of policy based routing, but specifically for BGP. Right?

Yeah. I'm sure there are probably some creative things you could do, you know, to do policy based routing using Flowspec. That's not really what it was designed for. That's not really the use case that folks had in mind when they admitted the protocol.

So the use case that, folks had in mind, when they designed a protocol was really around DDoS attacks and mitigating those blocking traffic. So we'll just take the concept we were talking about earlier a step further when a router receives a BGP Flowspec and it looks at the NLRI and says, okay. This is the traffic I wanna match, and then it looks the community and says, this is the action I wanna take. That results in essentially the similar concept as an ACL or a firewall filter on the interface.

A little bit of a nuance detail there, and it actually is applied to the forwarding table, not to the interface. So it applies globally on the device, unless you exclude certain interfaces.

So anyway, point being that traffic that matches that criteria, the action action is taken on it. So it it basically allows you to kind of dynamically advertise over BGP firewall policies instead of having to log in and configure them manually on the device. So, I mean, back to your original question, I guess, you know, if you really wanna get cute and wanted to get creative, you probably could use it for policy based routing, but I've I've not seen a design like that. And I think it would be something Right. Okay. Really test well to make sure there weren't any unintended consequences there.

Sure. Sure. Well, which to be to be fair is what we should do with any kind of, traffic engineering. Right?

For sure.

But, but, basically, what you're saying is that the primary use case for Flowspec, though it can be used for other things, is it it's primarily a DDoS mitigation, technology. Right? So we're gonna match on, information that we learn in a BGP update message that is, indicative of a DDoS attack of some sort, and then we can use Flowspec to, mitigate that attack. So can you explain how that works? And and by by explaining how it works, you know, you're gonna have to get in a little bit about how DDoS works for our audience as well, I think.

I think that you you you're gonna have to have some sort of way to know what the parameter of the attack is. Right? And so whether you're going to do, a manual mitigation like logging in and configuring an ACL or firewall filter on your device, or you're gonna do what's called a remote trigger black hole, which is also a BGP advertisement. But all what a remote trigger black hole does is it just completely takes that host offline.

So if I have a slash thirty two, a given host that's under attack and I do remote trigger black hole, I basically just say to my router, any packets that come in that are destined to that IP address, just drop them on the floor. Right? Just just route them to null zero, just drop them on the floor, which, there's a term called completing the attack, meaning that if that host was under attack, it was already performing poorly. But in theory, it was kind of up.

Some packets were you know, some legitimate packets were getting through. Some attack packets were getting through. If I use remote trigger black hole, it's completely offline now. Right?

So Flowspec at least helps make that better in that I can get more surgical. I can get more, granular in what it is that I'm trying to attack, to filter without having to manually log in and configure, firewall filters on the interface. But, again, like I said, at the at the top of that comment, you have to have some way to know what the parameters of the attack are and what you're trying to mitigate. So presumably, you're gonna have some sort of monitoring system that tells you, hey. You're under attack. Here's the premise of attack. Here's what your source and desk IPs and ports and protocols and all that kind of the five simple type of stuff looks like so that you can then in turn create the Flowspec rule.

So since Flowspec is integrated into BGP and BGP is inherently very scalable, right, that's one of the benefits of BGP, and therefore, Flowspec is also very scalable. And that's as opposed to, like, manually configuring ACL rules on interfaces or deploying firewalls or IDS, IPS appliances everywhere. I mean, Flowspec, again, because it it operates within the context of BGP is therefore inherently a very scalable solution to mitigate against DDoS attacks, which also can be very broad in their scope. Right?

Mhmm. Yeah. And that's, I mean, that's one of the I I guess the, you know, the pros and cons. Well, the main reason that people really pushed for the protocol to begin with is doing a manual, firewall filter on interface works really well, but it requires that somebody either manually logs in and configures that, filter on the interface or you develop some sort of automation script that takes care of it for you. You know, it requires a lot of, work on the operator side. The idea behind Flowspec is we can take that same, pro you know, pros that we get out of being able to do very granular filters and advertise it out in a very dynamic way. And keep in mind, we're talking we're kinda talking here in the context of advertising to one device.

But if I am a global server writer and I have hundreds of edge nodes, I can create one BGP route on something like a route server, and it'll advertise it out to all one hundred of my edge gates, right, as opposed to having to log in to each one of them manually or, again, push out a script that makes that configuration and so forth.

Yeah. And that's where I was going with the whole scalability thing. So that makes sense. So does that mean that in your experience and maybe in your opinion as well, it really is more of a service provider technology than it is for, like, a large enterprise technology? Because if I'm running a large enterprise with a lot of, it appearings, wouldn't I be putting that, Flowspec information in my community, B2B community information that I'm then advertising back upstream to my service provider. So there's got that does that's a whole another world of, trust relationships and how that works.

Yeah. There's there's a lot to impact there. So, I mean, I guess, first first thing I'll say is that, you know, we as I suggest would just say as an industry, we talk we talk about, like, enterprise enterprises like they're two completely different beasts. Right? And the reality is there's it's sort of all all a bit of a grayscale. Right?

Yes. It's true that enterprises build networks for different purposes most times than service providers do. Right? Service providers build a network to sell as a service.

Enterprises typically build a a network to service their their end users, their applications. Right? But we work with a lot of I I work here at Kentik with a lot of enterprises that are building, I'll call them surfer rider type networks. Right?

These could be gaming networks. It could be high frequency trading networks. They're every bit as critical to the mission of that organization as the service providers network is to them. Right?

And at very similar scales. Right? I mean, Google, who probably runs one of the biggest networks in the world, is technically enterprise. If you think about it, they're not a service provider.

They don't sell their network per se. Right? But they're operating a huge scale network. So, anyway, the lines are a little blurry between service provider enterprise when you think of it from that perspective.

But if you're if you're operating a large scale Internet connected network regardless of what your business model is, then having something like this is is super critical, to to your network. Right? And and to being able to keep your network online, being able to scale out these, you know, and and block these type of attacks and keep the network running.

Yeah. And and then, you know, blocking these kind of, like, large scale attacks is it's it's gonna be something that has to be, like, vendor agnostic, and we are talking about an industry standard. I did look at the RFC. It's, RFC fifty five seventy five from two thousand nine ish which is relatively new. I mean, some people might consider it old.

There's actually you know, it's interesting in in researching and and, for that latest blog that I did, it turns out they actually just ratified a new one. It's now r c, eight nine five five.

Okay. Yeah. Interesting.

Which replaced the previous one that was about twelve or thirteen years old. So it's actually the protocol itself has been around quite a while, but there's been some advancements to it over the years and some, you know, lessons learned and some testing that's been done. And so the the there's a new RFC that's been ratified that addresses some of those things.

Okay. And so being an industry standard then, it is not a vendor specific technology. It is not proprietary to Cisco or to Juniper or to Arista or anyone, for that matter.

But does that mean that the implementation is pretty similar among those vendors? Like, if I were a service provider or a larger enterprise looking to mitigate DDoS attacks and the the ramifications of those DDoS attacks, I should say. Right?

Can I and and I'm running a multi vendor network? Can I look at Flowspec as a solution?

Yeah. It is multi vendor. It's not proprietary to any one vendor. I would say that, you know, the the thing that to keep in mind though is that, the resulting filter that we talked about earlier is implemented in hardware.

Right? It's in the chip. So the one thing you would need to make sure is that whatever chipset, the line cards from whatever vendor you have are running are capable of that. Now, you know, in in twenty twenty four or even your commodity ASICs from places like Broadcom and so forth, even they, have, support for BGP flows back.

It's been around long enough that they're starting to add in support in a lot of those. So odds are pretty good if you're running a fairly large scale network to the point where you're having these type of problems that Flowspec would help you with, large attacks and lot of distributed edge, Internet edge type of connectivity.

Odds are pretty good whatever you're buying, has the support for it built right in.

Okay. So let's say that we, we have some, rules and, in our, Flowspec implementation and the rules match and it fires off some sort of action like you said, what are the kinds of actions that I would take on on traffic that, that triggers one of those rules? I mean, obviously, you mentioned one is just black hole the traffic. That's a note. That's one. And and that's obvious, I think. But what are some other, actions that we would take specifically for a DDoS attack?

Yeah. The only two I've really seen implemented in practice is, you know, dropping the traffic and then redirecting it to, like I said, a scrubbing service. And so what what some companies will do is they'll, you know, have, like I don't know. Let's call it two data centers. One on the East Coast and one on the West Coast of the United States for US operator or, you know, or something similar globally.

And they will have, you know, scrubbing appliances from from a DDoS vendor or they'll have routers whose purpose built to do, scrubbing of attack traffic or whatever. And so on their edge devices, they'll basically just put any traffic that's matching the criteria into, a dirty VRF is what they call it. You know? I mean, for lack of a better term.

That traffic will then get sent over to the scrubbing service and the scrubbing center that they've built and those data centers they've built. We'll scrub the traffic out and then reinject it back into their global table. So Okay. One of the matching criteria is you can do in one of those communities is to tell it any traffic that matches this, basically, stick it into this VRF.

Basically, back to your point about it's kind of like a policy based, Yeah. Yeah. Routing, type of action that you're taking on that traffic. So it it's still, the use case for that is still for, mitigating traffic.

It's just you're not mitigating it right there on the device. You're redirecting it somewhere else to be, inspected a second time to to go deeper into the packet potentially or find more creative. Keep in mind, this entire conversation has been sort of predicated on the concept of volumetric attacks.

Right? Yeah. If you're if you've got if the type of attack you have is like a TCP send that's filling up the session tables on a server, sometimes those can be what they call slow and low attacks where they're not high volume. They're just sending in, like, a really small send packets.

So they're kinda hard to catch, from a, you know, from a from a volumetric perspective. They're redirecting that to a scrubbing service. They might have a DPI appliance in it. They can look farther into the packet and figure out that, oh, yeah.

This is in fact, you know, a slow and low attack.

You know, might allow you to mitigate something that just a router and Flowspec's not gonna light.

Okay.

So we're looking at at a lot of stuff here. We are looking at source and destination IP. We're looking at, protocols and ports. We are looking at TCP flags as well. Right? So we we can actually get pretty granular in our filtering for whatever reason, whether whether it's for filtering and traffic engineering or for mitigating DDoS attacks or any other kind of, nefarious activity, that you can get pretty granular, again, within the context of the BGP updates.

So let's talk about the, adoption rate.

You know, that was nine years ago that you gave that talk at Nanog. And in that talk, which I just watched, I think, yesterday and then a little bit earlier today, you were talking about the adoption rate and how it's kinda slow and not going, you know, quickly. It's not going as much as we thought it would be considering the the rise in DDoS attacks. Why do you think that is, and where are we at today in twenty twenty four?

Yeah. It's really hard to get a good handle on that. There's not been a lot of great, research done on that, at least not that I've come across. If any of the listeners have have have come across something, please send it my way. I'd I'd love to see some some data on this.

I've, you know, I've googled around, not been able to find much as far as research on that. I'll just say somewhat anecdotally from talking to our customers, you know, here at Kinetic about it. Definitely a lot more questions about it, ever with every passing year, there seems to be a lot more people testing it in their labs, starting to roll it out inside of their own networks.

So definitely seems to be, you know, some some traction coming with it.

You know, I I just think it's one of those things a lot like I p v six where it's been around for a long time, but adoption is very slow. Right? Change takes time, especially when we're talking about the Internet. We're talking about networks that aren't under any single one person's control.

Mhmm. You know, it just it it we move as as quick as things change in this industry, adoption of things actually happen very slowly in this industry is what is kinda what I have found. Yeah. I get that.

Low spec is no different than anything else when it comes to that.

I mean, the only thing I would say to that is I I wonder if the adoption is slow for a particular technology in tech because the urgency isn't there. Otherwise, the adoption will be much quicker.

Like, for example That's fair.

And I'm gonna get in big trouble for saying this, but most organizations don't have a burning need, an urgency that things are going to go down if they don't migrate to an entirely I p v six network, and so they run dual stack or just I p v four. And we have NAT and things, to to accommodate.

I wonder if it's the same for a Flowspec. Are there alternatives that can mitigate, DDoS attacks and other kind of, security threats as effectively?

Yeah. Yeah. I mean, like I said earlier, alluded to one earlier, which is, you know, you can build, scripts. So, you know, you can build you know, it takes takes a pretty decent programmer.

I know you and I both communicate in some automation forums where, you know, specifically the network automation forum where they're really trying to get the the practice of network automation up into the right and more people adopting it. That's a use case for that type of thing. Right? Building out an automation framework and maybe even a web portal where people can go in, put in the parameters of the attack, and it would then in turn, you know, build some Python script or some, Ansible scripts or something that would then push the the the firewall filter out and would log in to the device and and configure the firewall device through automation.

So there, you know, there are some other ways to accomplish this. There are some other ways to get around it. I'm sure there are organizations that have have figured that out.

You know, there are other for for a lot of enterprises, they can actually buy a DDoS service. Right? There are companies, some of which are, you know, partners of Kintix that provide DDoS scrubbing as a service. And, typically, the way those work is you'll either have an always on service where you route your traffic through them, and they're constantly keeping an eye on your traffic and mitigating.

And if they see an attack, they're basically in line. It's basically like a DPI Gervasi. Right? Increases your latency a little bit, like anything.

There's engineering trade offs for that.

You could do what's called, on demand where basically once you're under attack, you swing the traffic over to them. They advertise your BGP routes on the Internet on your behalf, swing the traffic into their scrubbing service, scrub the traffic out, and then re, return the clean traffic to you. So there are, again, you know, a number of different ways to try and solve this problem. As with anything, each one of them kinda has their own pros and cons. And so I think, you know, to your point, that's probably part of the reason that, you know, it's not a hundred percent of every organization is doing b two b Flowspec because they may have different, you know, trade offs that they've made and different ways to approach the problem.

I get it. And it's like a tool, you know, an engineer's toolbox like we say about so many technologies. Maybe it's, the familiarity of the staff with Flowspec or, you know, it's already in existence, and so the organization continues with it. It's just another tool. But, let's talk about a couple of the problems that have happened due to a Flowspec implement implementation that maybe maybe led to a reluctance among some engineers to deploy it. Like, for example, I see here in our notes that there were some outages at CenturyLink and CloudFlare, as a result of Flowspec.

Yeah. Yeah. And I think, you know, that definitely can't be, ignored as far as a reason that there's been some real reluctancy to adopt Flowspec. In fact, I've had conversations with people that that's the number one reason why they have some reluctancy too. Right?

When especially when Flowspec was new and not a lot of folks had adopted it and not a lot was known about, you know, the pros and cons and best practices. It hadn't had nearly as much lab testing. There are some pretty visible outages as a result of it. You know, I'm not gonna pick on CloudFlare, but that's the one I think that's top of mind for most people when this comes up because it was a very visible outage.

In that particular, outage, they have an entire blog on their website about it. As as CloudFlore normally does, they used it as an opportunity to share what they learned with the industry and did a really nice kind of blameless post mortem on it that that folks can go and read.

But, you know, they they had, advertised they had they had some automation in place, and they had advertised a Flowspec rule that said to block any packets from ninety nine thousand nine hundred and seventy one to ninety nine thousand nine hundred eighty five bytes long.

Now for those who are listening, who are network engineers, the words I just use make no sense. Like, you can't have a packet that size that's above the maximum, you know, MTU on pretty much any link on in any network. Right?

So that doesn't actually make any sense. And the way that they happen to have Juniper devices in their network and the way that the Juniper line cards reacted to getting a Flowspec rule that told them to do that was they rebooted. Right? Right?

They had to know what to do, so they pan they had a panic in the software, and they rebooted, which caught made the out of numbers. Now I would argue Juniper's devices shouldn't have reacted that way to an invalid packet size. But on the flip side, you know, it's one of those things as a as a network engineer, as a network automation person, you wanna check your fields to make sure that you're not advertising a value that doesn't make any sense. It's no different than trying to, you know, put an IP address that's more than thirty two bit into a field.

Right? I mean, those aren't just numbers. They had they they have some meaning. Right? Same same type of thing on the packets.

So, you know, yeah, that was that was very, obviously, very unfortunate for for CloudFlare, and a lot of people saw that and said, well, you know, there are some this is a big hammer. I wanna be careful using this tool because it's a big hammer. So I think it definitely Yeah. Caused the protocol itself to kinda get a bit of a black eye for sure.

Yeah. Well, I mean, when you look at the configuration of Flowspec and I'm most familiar with Cisco. I know you're a former Juniper, but I'm gonna stick with Cisco because that's what I know. You are creating class maps.

You're matching on addresses and prefixes. You're writing policy maps. This is very reminiscent of policy based routing. Right?

Mhmm.

You know, there is the community that you're writing and and and and incorporating that into. So that's all kind of manual and and pieces that can easily be messed up by a human being engineer, and then propagated to your provider or to peers and things like that. So I understand how that there is a potential for a problem in a blast radius.

But that that is a human problem. It's not a it's not a problem with the technology, is it? I mean, it it's not like the technology is ineffective. We shouldn't use it. Like, it's lacking in some way, like rip versus, you know, OSPF or something like that.

It's a problem with human beings implementing it.

Yeah. And that that's long been sort of my argument.

Like, two two people who bring that to my attention when, you know, when I'm involved in the conversation is I mean, that that's the same with with most technologies that we adopt. Right? It needs to be tested well. You need to lab test it, understand what the, you know, what the pros and cons are, where the rough edges are, and so forth.

But a lot of that is, like you said, like a a human problem, human error problem, or things that we can very easily build best practices or build scripts or build policies and config best practices and so forth to avoid, you know, having those negative impacts and the benefits to me. If, you know, if you're the kind of person who, like, at one point in my career, I was dealing with the loss of tax on a daily basis. You know, if that's the world you're living in and it provides you with some positive benefit to make this much quicker and much easier to mitigate, then, you know, it's probably worth going through the the the, you know, the hassle or the trouble to figure out what, you know, what best practices you wanna build around that and what policies and so forth you wanna build to make sure that you can do it in a in a way that doesn't cause outages in your network.

And I and I don't know if I'd even use the term hassle or trouble because it is part of it's part of how we do networking. I mean, everything that we do in networking is very impactful. It's not like migrating some back end server where you have, you know, other servers that can take up the workload and things like that.

You know, you you wiggle the wrong wire, you could take down an entire location. You know? Hopefully, you have some kind of path redundancy.

I think we all have storage. We've done that on accident. Right? If you've been doing this long enough Yeah.

Yeah. Yeah. I mean, it it it obviously, if you are wiggling a wire and take down an entire site, there's probably a designed issue that needs to be addressed. Like, why is that entire site based on anyway anyway, whatever we do in networking is depth is is typically very impactful right down to the end user, and that's really what I mean by impactful.

So that's why I like to say, you know, best practices aren't a trouble or a hassle, but they're just part of, of our workflow. You know?

They're part of good engineering and good network design.

Engineering. Exactly. Exactly. So let's get into that a little bit more. So, you know, I I describe, you know, as at least on Cisco devices and I assume on on other vendors as well, you know, you're you're writing class maps and matching addresses and policy maps, and, there's probably ACLs in there that you're referring to. I'm not exactly sure.

So there's a manual process to writing out your Flowspec implementation. What are some of the best practices then that an organization that engineers can take when, implementing Flowspec?

Yeah. This is a conversation I have quite frequently with customers, and the things that I, you know, tend to advise people on is make sure you know what the forwarding performance impacts are to your line cards. Right? It's typically not zero. Right? What is resulting from that Flowspec advertisement, like you said, is a, some sort of firewall rule or some sort of ACL. Each chipset's gonna implement it slightly different the way they do it.

But you're taking up a resource on that chip. Right? And the chip's primary function is the forward traffic as we all know. Right?

These are these are routers. That's what route routers route. That's what they're designed to do. So, you know, I I I know the Juniper architecture the best.

They spent a lot of years there. They advertised, sixteen thousand Flowspec routes in the current in the generation of chips that they were making when I was there. Now that's been Okay. Seven years ago, so it may have may have gone up from that.

But the the catch there is that's unidirectional. That's if that chip if that card is doing nothing else. As you get close to that number, you will see a forwarding performance impact. So I advise people if you're gonna try and push the limits of what the vendor, said states is the limit on the number of low spec rules that they can support, make sure you do it with, some sort of test set running packets through that line card and figure out what it's important.

You know, if it can forward it line rate ten gig with no Flowspec rules and you start at, you know, ramping up the number of Flowspec rules, at what point can you only get eight gig through, or can you only get seven gig? You know, what is your forwarding performance so that you can stay, you know, under that limit?

Also, not all Flowspec rules are created equal, and that may seem obvious when I explain this, but you, if you have a a a really simple rule that only matches on a slash thirty two and doesn't have any ports or protocols or TCP flags or anything in it, that's gonna take up a lot less resources on your line card than a more complex one that does have all those type of things. Right? So when you're thinking about your how how you're gonna do this in your network design, the simpler that you can make your Flowspec rules and still accomplish what you're trying to filter, the better. Right?

Because it'll take up less resources and it'll scale a lot better. Mhmm. Right. So those are a couple of things I I always recommend, that people, test and and check and just make sure.

The, you know, the other thing that you mentioned a couple times I wanted to clarify, I've seen most people in their in in this industry that will advertise BGP flows back IGP throughout their own ASN, their own the network that they have under their own command.

Yeah. There are some carriers out there that will expect will accept Flowspec rules from their downstream customers, from their enterprise customers. It's very rare.

That seems to be where people kinda draw the line as an industry as far as, you know, risk and and, you know, concerns about the protocol and so forth is inter domain routing. It's like, I'm happy to do it inside of my own network where I can control everything. I can control the devices, but allowing my customers to advertise them to me, I get that's where people tend to get a little nervous about things. So if that's what you're thinking about if you're an enterprise and that's what you're thinking about doing, you might wanna call your carrier and make sure that that's something that they accept and see if there's any upgrade to your, you know, contracts for your circuits that you have to do to be able to send them Flowspec rules because that is kinda rare that that they allow that.

Okay. That makes sense. So does that mean that, Flowspec rules are or rather the incoming traffic that, the the Flowspec implementation is looking at, is that running in CPU like old school policy based routing, or is it running off of, hardware off the ASIC?

Every router I'm aware of is doing this in hardware. Right? And I think that, you know, couple couple things. One, that's why, you know, back almost ten years ago when I did that talk at Nanog, the there was somewhat limited vendor support for it.

It was basically just Juniper, Cisco, and I think Nokia at the time. Right? Because it was done in in silicon. It was done in custom silicon.

Your Broadcom commodity ASICs, didn't support it. Like I mentioned, most of those, newer Broadcom chips are supporting this. So people like Arista and some other vendors who are using commodity ASICs are now supporting it. So it's got a lot broader, support than than it did years ago.

But, yeah, it is done in hardware, so it does scale much better. You wouldn't want I I would think you probably wouldn't wanna do this if it were done in software.

It just would be a nonstarter because it would have too much Yeah.

Yeah.

Impact on your devices.

I mean, in theory. And I say that because I know that is the idea. Right? That we don't we don't wanna, pump that sort of filtering, processing to the CPU to to look at and to do. Mhmm.

But I have seen I have done that reluctantly in deployments in the past, and it the impact on the CPU is generally minimal, unless you're seeing a very high volume of traffic. However, we are talking about service providers and aggregation routers and things like that.

So, it's too.

Exact and volumetric attacks. So it is different. So you wanna you wanna spare, any kind of processing by the CPU on unnecessary tasks as as much as you can.

So, where are we today with the development around Flowspec? Maybe what what does the adoption look like today in twenty twenty four? What can folks do to, be active in the community?

Maybe maybe even in the community looking at how we can mitigate DDoS attacks in general. So where where can folks go? What can they look at? What can they read?

Yeah. Sure. So, like I mentioned earlier, the IETF, has a has a more recent ratification, of the of the draft that RFC eighty nine fifty five.

That falls under the inter domain routing working group, which maintains all things BGP, including BGP Flowspec. So, folks can go and sign up for that mailing alias, read about some of the conversations that are taking place there as folks are starting to roll this out and and test it and so forth. That's where they go to ask questions or come up with things that they've run into their challenges that might require a change to the protocol itself.

Know, a couple things that I know are are recurring themes just from having snooped that alias a little bit is, like, how do we do all this with I p v six? Right?

That requires some some slightly different communication and the protocols and so forth.

A lot of the vendors, I alluded to it earlier, but a lot of the vendors have added in the ability to exclude certain interfaces. So for example, I know Cisco allows you to go in and say, okay. When I configure to allow flows back, I have an interface group that I wanna accept that was probably, like, my management interface, like, my now a man management interface. I don't wanna apply firewall, you know, BHB flows back rules to that.

If I have, interfaces that are, I don't know, voice go to my voice network or go to certain networks that I or internal only that I don't want that it wouldn't make sense to ever apply a filter to. I can exclude those so that they'll they'll never be included in a filter no matter what the matching criteria is on the filter. So there's few things like that that are implementation details that people have learned over the years, and they're, contributing those learnings back to the to the community by being a part of the, the IETF working group and the mailing alias there.

You know, I would say go to industry events. I know, Phil, you and I participate in the in you, the NUGS, the network user groups that the USNUA puts on. Those are a good place to meet peers if you're if you're running enterprise networks, if you're running large surf riders, like a lot of your nanogs, your ripes, your apricots. There's a lot of different industry events those folks put on.

Those are great places to get out and talk to other people and ask them, hey. Are you doing Flowspec? What have you seen? You know, what have you run into?

Good, bad, and otherwise.

You know, that's where I've learned a lot about over the years. It's just talking to people at those events and kinda picking their brain on on what they've seen.

So, Justin, for the sake of time, I think we're gonna end it here. And, it's interesting, though, the timing of this particular episode because not long ago, I recorded a a podcast with Andrew Sullivan, the president of the Internet Society. And, he or or his organization actually, in their analysis, found that there is a there's an uptick right now in, DDoS attacks. I don't know what scale he was referring to, whether that was North America or the, the Internet as a whole, a global scale.

But nevertheless, I'm sure how to mitigate, the effects of a DDoS attack are top of mind for a lot of people right now. So really interesting timing. Thanks so much for joining today. Now if folks have a question for you, if they have a comment and they'd like to reach out to you, how can they find you online?

Yeah. I check LinkedIn more so than any other social media. So I'm just Justin Ryburn. That's spelled r y b u r n on LinkedIn. I am still on, Twitter, x, whatever we're calling it these days, but I don't check that nearly as often.

Or people are welcome to email me. I'm just, jryburn at kentik.com.

Great. Thanks very much, Justin. And I am still active on Twitter at network underscore Phil. My, LinkedIn, you can just search my name Philip Gervasi. My personal blog networkphil.com, but certainly also check out both Justin's blogs and my blog posts on the Kentik blog.

Now if you have a idea for an episode or if you'd like to be a guest on now, I'd love to hear from you. You can email us at telemetrynow at kentik.com. So for now, thanks very much for listening. Bye bye.

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS? Well, you're in the right place! Telemetry Now is the podcast for you! Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.