Case Study

Game on: Square Enix gains critical network insights from Kentik


To satisfy the demands of millions of gamers, Square Enix must maintain the highest levels of performance and availability in its global network infrastructure. The company uses the Kentik Observability Cloud to gain valuable insights into network operations, maximize availability, enhance security, and optimize operational efficiency.

Situation: A Demanding User Community

For millions of fans worldwide, video games from Square Enix are a passionate pursuit. But while it’s all fantasy for users, it’s serious reality for network managers at Square Enix.

At any given time, several million gamers can be accessing its global network. And every one of those players expects lightning-fast response times. This can impact revenues since online gaming is now largely a bill-for-use business in which users can quickly switch loyalty. Consequently, Square Enix must maintain a high-performance, super-reliable global network to meet its users’ demands for always-on, low-latency gaming experiences.

Helping satisfy that requirement is the responsibility of the company’s core-network infrastructure team, headed by Tatsuya Mori, Senior Manager of IT Infrastructure/Network. His team manages the connections between the core IT infrastructure and the internet.

Challenge: Maintaining Visibility into Network Operations

With so much relying on network performance, Mori and his team must keep constant watch on all aspects of network operations. “Latency is the enemy of gamers – especially with massive role-playing games, where lots of users are connected at once – and will quickly let us know when their expectations of response times are not being met,” Mori says. “Our top priority in cases of user complaints is to quickly identify the source of any network performance issue and resolve it.”

That task sometimes is easier said than done, because the fault could lie in any number of places – a cloud provider, an ISP, a long-haul carrier, or a backbone router in Square Enix’s own infrastructure. “The faster we can isolate the problem, the quicker we can resolve it,” Mori notes. “So, observability into all parts of the network is very important.”

“With Kentik Synthetics, we have a clear visualization not only of our routers, but of internet connections around the world.”

For many years, the tools available to Mori’s team were limited; mostly flow monitors on routers. But these offered only a surface-layer glimpse into network operations and provided little in the way of proactive warnings of potential problems. In addition, Mori says, there was no visualization capability in these legacy tools, making it difficult to interpret results.

Solution: The Kentik Network Observability Platform

In search of better tools, Mori visited the San Francisco office of Kentik, where he saw a demonstration of the company’s extensive capabilities for observing all aspects of a complex network infrastructure. “I was so impressed by what I saw that I strongly urged the reseller we work with to sell us Kentik. And they did.”

Mori’s team deploys Kentik on the backbone routers it uses to connect to the internet. Other teams at Square Enix – including those monitoring cloud services and content delivery networks – also use Kentik.

Kentik produces a level of observability into Square Enix’s global infrastructure unmatched by any other solution. “When we first installed Kentik, we were surprised how easily you can visualize the network and how deeply you can analyze network operations,” Mori says. “When you open a graph or dashboard in Data Explorer, it just takes a click to drill down and see more detailed information. And with Kentik Synthetics, we have a clear visualization not only of our routers, but of internet connections around the world.”

Mori cites several aspects of Data Explorer that his team finds valuable.

“I can’t imagine how much more flexible it can get than this.”

“The number of dimensions we can select is unmatched by any other monitoring tool. We can categorize by source, destination, and other factors, and reposition them anyway we want onto a graph. I can’t imagine how much more flexible it can get than this.”

The ability to illustrate the AS path of traffic is a feature “we had never seen before. We use it daily to check before and after we perform traffic engineering within our backbone network.”

Another valuable feature is the filtering option. “We can create multiple ad-hoc filters and group multiple filters into one, giving us a lot of flexibility. We use this to identify traffic to or from a specific IP address or AS number.”

“We always want to be ahead of events that can cause big spikes, and Kentik gives us all the information we need to be proactive and prepared.”

The ready availability of such highly granular information means that if there is a pattern of user complaints about performance, detailed information about the network connections involved can be quickly accessed and analyzed, Mori notes. This sharply reduces mean time to resolution by turning troubleshooting from a scattershot activity to a thoroughly informed investigation of root causes.

Kentik also is used to assess various options for proactively avoiding bottlenecks and improving network performance. “If we have a carrier link, for instance, that is showing performance lags or is reaching capacity, Kentik’s capacity-planning tool gives us the information we need to determine if we should invest in greater capacity on that line, or if we should move some traffic to different links. And if we do decide to move traffic, Kentik also will tell us what the impacts of that change will be.”

This capability can be particularly valuable in instances such as the release of a new game, when large increases in traffic can be expected. “We always want to be ahead of events that can cause big spikes,” Mori says, “and Kentik gives us all the information we need to be proactive and prepared.”

“With Kentik, we can pinpoint the source of suspicious traffic virtually in real time.”

New game releases also are a time when security threats can increase, he adds: “When we have a new release on our on-line service, we often see changes in our traffic pattern. Kentik helps us pinpoint the sources of that traffic so we can alert our security team about the suspicious traffic.”

End-users aren’t the only community impacting network traffic. Software developers and graphics studios partnering with Square Enix generate large volumes of traffic with patterns that aren’t always predictable. “When we roll out new tools for developers, we sometimes see traffic patterns or protocols that aren’t familiar to us,” Mori observes. “In these cases, we use Kentik to quickly identify the sources of this traffic to ensure that they are from legitimate parties.”

Results: Unprecedented Observability

Mori cites three categories of benefits from the Kentik Observability Cloud:

Deeper insights

With the highly granular information presented by Kentik, network operators at Square Enix have a much more detailed understanding of exactly what kind of traffic is flowing where. If there’s a sudden surge in traffic, the network team can immediately trace the cause and take action, if needed, to avoid a problem.

In addition, Kentik provides the detailed data and analytic tools to understand long-term traffic patterns, which helps in capacity planning and negotiations with carriers and cloud providers.

And while some network-monitoring tools look at traffic on only a portion of the network, such as the customer edge, Kentik gives the Square Enix team the ability to easily analyze traffic between POPs, data centers, cloud providers and its core infrastructure – with information from all those sources clearly presented on dashboards.

Enhanced security

Detecting sources of traffic is an essential element in network security, which is always a top priority. “As a gaming company, we see a lot of DDoS attacks, and when we see them, we use Kentik to analyze where they’re coming from and alert our security team so they can deploy countermeasures.”

Previously, he adds, “all we had was an on-prem DDoS monitor that would send an alert about unusual changes in traffic volume, but we didn’t have any information about what type of traffic it was. We could only react to alerts and start hunting across a wide range of potential sources of an attack. With Kentik, we can pinpoint the source of suspicious traffic virtually in real time.”

“The visualization tools of Kentik make everything so clear and easy to understand.”

Kentik also can help identify a false alarm, he adds. “In the past, we might have seen unexpected spikes in traffic from developers that set off the on-prem DDoS alarm. Now with Kentik, we can see exactly what that traffic is and where it’s coming from, to determine if it’s normal traffic and something threatening.”

Operational efficiency

With Kentik, the network team at Square Enix spends far less time troubleshooting and resolving issues that could potentially degrade network performance or threaten security. In addition, capacity planning has been enhanced, putting the team ahead of the curve of events that could impact network operations.

“It takes very little time for someone new to learn Kentik and start benefiting from all that it has to offer.”

Operational efficiency also is enhanced by the ease of use of the Kentik cloud. “The visualization tools of Kentik make everything so clear and easy to understand,” Mori says. “We have had some turnover on our team recently, but it takes very little time for someone new to learn Kentik and start benefiting from all that it has to offer. They find the Data Explorer interface very intuitive, and they can make queries that exactly match their individual needs.”

“We’ve been using Kentik for so long that it is essential to our operations.”

Mori says the positive impact of Kentik on Square Enix operations is hard to overstate. “Without Kentik, we would always be in reactive mode, our operational costs would be higher, and the quality of our network service would degrade, which could lead users to stop playing our games.”

“We’ve been using Kentik for so long that it is essential to our operations.”



