Optimizing AWS NAT Gateway Usage


Summary
AWS NAT Gateways are essential for private subnet access but can quickly become a costly burden, even when idle. With Kentik, cloud and network engineers gain deep visibility into NAT Gateway traffic, allowing them to identify underutilized gateways, analyze high-cost usage, and explore cost-saving alternatives like VPC Endpoints, Internet Gateways, or direct peering. By optimizing traffic routing and eliminating unnecessary NAT Gateway expenses, teams can significantly reduce cloud networking costs while improving efficiency.
Optimizing AWS NAT Gateway Usage
NAT Gateways are an essential component of cloud networking, enabling private subnet access for resources in AWS. However, they come with a hidden cost burden that can quickly become out of control—even when there’s no active traffic. As infrastructure scales, the expenses associated with NAT Gateways can become a significant problem. Identifying heavily used and underutilized NAT Gateways can provide opportunities for optimization, which can lead to substantial cost savings.
With Kentik, cloud and network engineers can gain deep visibility into NAT Gateway traffic activity. This allows them to assess whether specific NAT Gateways should be replaced with more cost-effective alternatives, such as VPC Endpoints for private connectivity or Internet Gateways for services with a public IP.

Identifying expensive NAT Gateway usage
We can build a query in Data Explorer to evaluate NAT Gateway usage. Our goal is to gain visibility into NAT Gateway traffic specifically so we can filter using the relevant Dimensions and sort by average Kbits/sec to identify high-traffic flows. This could be used to assess cloud costs or to evaluate whether certain traffic should be routed through an Internet Gateway.
In the image below, we’re using several Dimensions, including Traffic Path, Source AWS Interface Type, and Source ENI Entity Name. Data Explorer can be much more granular, though. In this case, we’re adding a filter to include only NAT Gateways to fine-tune what we get back with AWS Interface Type.

The Traffic Path shows how data moves through the network, while the Source AWS Interface Type and the associated filter isolate traffic originating from NAT Gateways. Lastly, we identify the specific NAT Gateway instance responsible for that traffic.
Going deeper
We can expand on this initial query by adding more Dimensions and adjusting our filters. For example, if we want to determine where that traffic is going, we can add Destination AWS Gateway Type, which shows us another NAT Gateway, an IGW, or an external endpoint. Once we build a query precisely as needed, we can save it to use again and share it with our team.
In the following image, we’ve added the AWS Gateway Type and the AWS Flow Direction, which help us see if the flow is ingressing or egressing with respect to particular ENIs. We’ve also added the source VPC Name to associate the specific VPC with each NAT Gateway, correlating gateway costs to specific workloads. Lastly, we included the application to make it easier to identify the traffic and determine whether it’s unnecessary.

Analyze usage trends and cost impact
Our Data Explorer query results can help us understand a few essential aspects of our NAT Gateway environment.
First, we can identify underutilized NAT Gateways. If a NAT Gateway has little or no traffic, it’s incurring unnecessary charges and should be decommissioned.
Next, we can pinpoint high-traffic NAT Gateways. If a NAT Gateway handles a large volume of data, it may be a major contributor to cloud networking costs, and we can reevaluate how to route traffic.
Third, we can correctly assess traffic patterns. This way, we can determine if specific applications or workloads could use direct peering, VPC Endpoints, or Internet Gateways to bypass expensive NAT charges.
The following image clearly shows interesting and likely concerning traffic behavior just after noon on February 4. We can use the function “Compare over previous period” to see precisely what traffic using which NAT Gateway is involved in this drop and spike.

We can also zoom into this time period to gain a more precise understanding of the nature of this traffic and drill down into the flows themselves.

Alternatives to NAT Gateways
After we identify unnecessary NAT Gateway costs, we can consider several alternatives.
First, we can use VPC Endpoints. For workloads that only need private connectivity to AWS services, a VPC Endpoint is often a better choice because, unlike NAT Gateways, VPC Endpoints do not incur per-gigabyte data processing fees. Therefore, switching to a VPC Endpoint could lead to immediate cost reductions.
Second, routing traffic through an Internet Gateway (IGW) instead of a NAT Gateway can eliminate unnecessary charges for services that already have public IPs and don’t need NAT. This is particularly useful for public-facing services that don’t need private address translation.
Also, there are third-party firewall solutions or even proxies that can be deployed instead of NAT gateways, though implementing a third-party firewall could result in losing some of the cloud-native observability that NAT Gateways provide.
Lastly, suppose large volumes of data are being transferred between AWS VPCs or on-premises resources. In that case, direct peering or AWS PrivateLink can bypass NAT Gateway costs entirely while improving performance and security.
NAT Gateways are notorious for cost overruns and can incur charges even when idle. With Kentik, network and cloud engineers can identify cost-heavy NAT Gateway usage, optimize traffic routing, and explore more efficient alternatives. Eliminating unnecessary NAT gateway costs through VPC Endpoints, IGWs, or direct peering can improve cloud network efficiency and significantly reduce expenses.