Back to Blog

DIY: The Hidden Risks of Open Source Network Flow Analyzers

Ken Osowski
Ken OsowskiIndependent Industry Consultant

Network Engineering
blog post feature image

Summary

Advances in open source software packages for big data have made Do-It-Yourself (DIY) approaches to Network Flow Analyzers attractive. However, careful analysis of all the pros and cons needs to be completed before jumping in. In this post, we look at the hidden pitfalls and costs of the DIY approach.


Analyzing the DIY Approach to Network Traffic Analyzers

DIY disasters

The “do it yourself” (or DIY) mentality isn’t new to our industry. From hardware to software, people have been grappling with the buy vs. build dilemma for years. However, as enterprises and service providers put their 2018 tech budgets into action, we’re here to point out one DIY networking trend where the fine print is worth reading: Open source network flow analyzers. Traditional network monitoring appliances and network flow analyzer software were built with one primary purpose: to diagnose network issues. Legacy vendors behind those products put less attention on developing proactive and actionable insight. To do the latter, they would need analyzer software to process network data, in real-time, from many different sources across the network, including: sensors, routers, switches, and hosts, and complement that with BGP, SNMP, GeoIP, and threat data. But to offer that at scale, isn’t easy or cheap. With the limitations and expense of using legacy solutions, DIY is tempting. Some network teams have decided to develop their own custom analyzer software for network monitoring. And why wouldn’t they? It’s much more doable now than ever with open source building blocks readily available.

DIY Requirements

The biggest challenge in DIY tech typically involves finding the best option to start from among a myriad of options available on the open source market. For DIY NetFlow analyzer projects, that boils down to identifying an open source big data backend for NetFlow data analysis that meets the most critical big data requirements:

  • High-volume NetFlow collector ingest scalability
  • Easy to use and expand UI frontend
  • NetFlow data retention scalability
  • Real-time query response
  • High availability
  • Open API access

Hadoop, ELK, and Google’s BigQuery are among the short list of options that meet some of those requirements for DIY projects. But in looking closely at each:

  • Hadoop has one main shortcoming. It does not have any tools that help with data modeling to support the data analysis for NetFlow records. Some have implemented data cubes to fill this gap, but cubes fall short for very large volumes of data and are unable to adapt the data model in real time.
  • The ELK stack is a set of open source analyzer tools. Yet, when evaluating the ELK stack against the key NetFlow analysis requirements list, it falls short in several areas. For one, no binary data can be stored. This results in all binary data (such as NetFlow data) having to be re-formatted as JSON, resulting in massive storage bloat at scale and performance bottlenecks. It also has inadequate multi-tenancy fair usage, meaning the ELK stack can use tags to implement data access segmentation, but there is no way to enforce fairness.
  • BigQuery is a RESTful web service that enables interactive analysis of large datasets working in conjunction with Google Storage. When evaluating BigQuery against the key NetFlow analysis requirements, the data throughput limit of 100K records/second falls short for large networks generating tens of millions of NetFlow records/second.

The Cost of DIY

The DIY tools approach promises to address network analyzer functions at a lower cost than commercial vendor offerings. However, staffing an in-house deployment results in up-front investments and continuing, long-term resource allocations that can skew long-term total cost of ownership (TCO) higher. When estimating a DIY project in the feasibility stage, key costs often underestimated come from:

  • Training all teams on the involved network protocols and their usage
  • Maintaining resilience and reliability at scale
  • Implementing geo-distributed flow data ingest
  • Creating and maintaining a flow friendly data-store

While building an in-house, custom, DIY network analyzer may seem like the right approach. Careful consideration needs to be given to the pros and the cons of this approach. It can have hidden costs that are not obvious at the outset.

A Clear DIY Alternative

Based on the big data requirements and cost alone, if you’re planning to tackle a DIY project in the year ahead, you and your network team need to consider another model. One that delivers on the original drive for DIY projects – lower costs and faster time-to-use. The best DIY alternative is a cloud-based SaaS model for implementing network analytics. SaaS-based network analytics has many benefits, including that the approach can:

  • Ensure “day 0” time-to-value by eliminating the need for hardware Achieve real-time network visibility for real-time problem resolution
  • Provide instant compatibility and integration with various NetFlow-enabled devices
  • Eliminate capital and related operations investments such as space, power, and cooling.
  • Reduce staffing required to maintain hardware and software
  • Improve speed to deployment for new software features and bug fixes

Kentik’s SaaS Approach

Kentik is the first to take on ultra-high-volume NetFlow monitoring using a highly cost-effective SaaS approach at massive scale with near real-time results. Kentik SaaS-based customers are getting immediate results for lower costs when they start using the service, and Kentik’s operations team is always there to ensure the health and success of each managed environment. In order to meet these NetFlow big data backend requirements, the Kentik Detect® platform leverages the following key elements, all of which are critical to a successful SaaS implementation:

  • A clustered ingest layer to receive and process flow data in real-time at massive scale.
  • A front end/API that uses an industry-standard language. All front end queries used by the UI portal or a client API are based on PostgreSQL.
  • Caching of query results by one-minute and one-hour time periods to support sub-second response.
  • Full support of compression for file storage to provide both storage and I/O read efficiency.
  • Rate-limiting of ad-hoc, un-cached queries to provide fairness across all queries.

This environment delivers on planned (but often not met) DIY objectives by including a robust set of open REST and SQL APIs. This enables internal tool development teams to integrate with the Kentik SaaS environment to address their specific operational needs. These use cases include:

  • Customizing flow enrichment from sensors, routers, switches, hosts
  • Unifying ops and business data with BGP, SNMP, Geo, threat data
  • Creating a custom UI for network + business needs

Using any of the standard big data open source distributions can lead to partial success for DIY projects, but only with a large investment of time and money for DIY. Either way, network analytics is no longer optional for network operators since the insights learned from network intelligence can translate directly into operational and business value. To move past DIY and learn how to harness the power of Kentik Detect for a truly effective real-time NetFlow analyzer, request a demo or sign up for a free trial today.

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.