Evaluating Cloud Gateways for Cost and Performance


Summary
Cloud networking costs can escalate due to inefficient routing and limited visibility. Kentik’s cloud visibility and analytics solution helps engineers optimize transit, reduce costs, and improve performance by analyzing AWS Transit Gateways and exploring alternatives like direct peering, storage endpoints, and AWS CloudWAN.
As many enterprises are painfully aware, cloud networking costs can quickly spiral out of control. It can be a struggle to see how data moves across cloud environments, leading to unexpected expenses and inefficient resource usage. Understanding where traffic is flowing, how much data is moving, and whether there are more cost-effective routing options is crucial for maintaining both network performance and financial efficiency.
Kentik provides a powerful suite of data analysis, exploration, and reporting tools designed to give engineers deeper insights into their cloud networking activity. By using Kentik’s capabilities, engineers can identify inefficiencies and control costs more effectively by optimizing cloud transit.

Managing cloud transit efficiently
Cloud service providers offer Transit Gateways to simplify connectivity between VPCs and on-premises networks. While these gateways consolidate connectivity, they also introduce significant costs – especially when processing large amounts of data. This is why understanding how and where these gateways are used can help us optimize our cloud architecture.
Kentik ingests and processes a variety of network telemetry, including cloud flow logs and metrics, which we can then use to analyze heavily used transit gateways, identify high-traffic patterns, and explore alternative connectivity options.
Analyzing Transit Gateway traffic in Kentik
Kentik’s Data Explorer provides the ability to explore and find detailed insights from AWS and Azure traffic patterns. To evaluate heavily used Transit Gateways, you can run queries sorting key traffic dimensions.
For example, we can build a query to identify unnecessarily high-cost data transfers. In the image below, notice that we can filter our data to help pinpoint traffic traversing our Transit Gateways.

Here, we can see something concerning. There is significant intra-VPC traffic, or in other words, internal traffic, using a Transit Gateway. This incurs cost, can impact performance, and is unnecessary because the traffic is only internal.
Notice that the Transit Gateway is listed under Source Interface Type and identified by name under Source ENI Entity Name. Exploring our cloud data in this way can help us make informed decisions to optimize cloud network activity.
Adjusting our filters allows us to gain more insight. The following image provides a visual breakdown of our applied dimensions and filters, helping us understand which traffic is ingressing or egressing, where it’s headed, and which applications are generating traffic.

In this way, we can use these filters and dimensions to pinpoint which Transit Gateways are handling the most traffic, flagging potential areas where cost optimization strategies could be applied.
Transit Gateway flow logs
In some cases, VPC flow logs may not be the right choice for assessing traffic across a Transit Gateway. These logs can contain additional information specific to transit gateways, but they may also lack information specific to VPCs. So, they make a great addition to VPC flow logs when monitoring overall network health.
They can also provide additional packet loss information for each flow to help troubleshoot difficult problems like large MTU drops, TTL expired, no route, and blackhole routes. These issues can be challenging to track down, so Transit Gateway flow logs can quickly help isolate the problem and associated workloads!
Alternatives to Transit Gateways
Once we identify high-traffic transit gateways, we can take steps to optimize cloud connectivity and reduce costs.
First, we can consider direct peering. If large volumes of data are moving between VPCs frequently, peering may be a more cost-effective option than routing traffic through a Transit Gateway.
Next, we can use storage endpoints. For workloads that frequently access cloud storage services, using dedicated storage endpoints instead of routing through a TGW can definitely reduce costs.
Another alternative is to utilize AWS CloudWAN. While AWS CloudWAN is a Transit Gateway under the hood, the significant difference is that it is owned and managed by AWS. Unfortunately, that means Transit Gateway flow logs are unavailable to CloudWAN customers. However, Kentik uses the AWS Network Manager API to collect CloudWan metadata and enrich standard VPC flow logs to provide the same actionable insights you get when using Transit Gateways.
Fourth, we should monitor data processing trends. Kentik’s historical traffic analysis allows us to track usage trends over time and adjust cloud architecture accordingly.
Lastly, we need to be alerted when utilization is high. In Kentik, we can set up automated alerts when TGW usage spikes unexpectedly or, as in the cases above, when traffic is unnecessarily traversing paths it shouldn’t be. This provides us with proactive investigation and mitigation capabilities.
With rising cloud networking costs, engineers need powerful tools to understand and optimize traffic patterns. Kentik’s advanced data analysis and reporting capabilities give us the visibility we need to manage AWS Transit Gateways more effectively. By analyzing traffic with Data Explorer, we can identify high-traffic gateways, explore alternative routing strategies, and ultimately reduce unnecessary cloud spending.