TECH PERSPECTIVE / WHITE PAPER
8 OCTOBER 2018
In the first of two articles on optimizing Cloud connectivity, we investigate some of the reasons why the Internet may not be the connectivity of choice for all enterprise users or all enterprise Cloud applications.
The 21st Century Network
These days we take access to our social media and streamed TV for granted, whether it’s via our smartphone, home broadband or in the office. At the heart of today’s connected culture is the Internet; the most extensive and resilient network ever offering ever faster “speeds” or bandwidth.
When you want to check Facebook or stream the newest episode of Game of Thrones on a Friday night, the Internet is a wondrous thing. After all, even if the Internet “slows down” the only impact is your photos may take longer to load or your TV program may freeze momentarily. The consequences of “the Internet going slow” in social situations is at worst, an inconvenience.
But look at this from the corporate world and the reality can be much different.
Having fully embraced the “Cloud” for the agility and speed-to-market it offers, enterprises are now finding that the network that connects them to their business-critical Cloud applications is mostly outside of their control.
Yes, it is theoretically possible to minimise the number of “hops” it takes for traffic to move from end-user to a Cloud-hosted application if you select the right Internet Service Providers (ISPs). But what happens when that connection doesn’t deliver the performance that you expect, or even need, for the application to work properly?
Even greater emphasis is now put on the network than ever before thanks to the widespread adoption of SaaS applications like Microsoft Office365, Google Apps and Salesforce. In parallel a large scale migration of key enterprise applications from on-premise data centers into IaaS and PaaS Clouds like AWS, Microsoft Azure, Alibaba Cloud, IBM Cloud, Oracle and SAP is ongoing.
Only now “the network” includes the Internet; often synonymous with unpredictable performance, hackers and even government regulation.
Having fully embraced the “Cloud” for the agility and speed-to-market it offers, enterprises are now finding that the network that connects them to the business-critical Cloud applications is mostly outside of their control.
Internet Latency: The Silent Application Killer
Take “latency” for instance. By slowing down application response time, latency impacts a user’s experience more than almost any other factor (including bandwidth availability or packet loss). And latency issues are not always obvious since latency can slow application performance even when there seems to be ample bandwidth available.
Latency is the time it takes for a data packet to travel from source to destination. It therefore has a direct impact on the traffic throughput that can be achieved for the vast majority of applications, most of which nowadays are built on the TCP protocol.
TCP is based on send / receive acknowledgements in which the “sender” of a packet waits for the “receiver” of the packet to reply with an acknowledgement of receipt. Once receipt has been confirmed, the next packets can be sent … and so on.
Therefore, the longer the sender must wait for the acknowledgement, the longer it also must wait to send the next packet. This delay, or latency, basically means less data can be sent, and is known as low “throughput”. And because throughput is inversely proportional to the latency over a connection, irrespective of the available bandwidth, the higher the latency, the lower the throughput.
In other words, as latency grows, throughput is reduced, and the performance of applications is perceived to drop.
But what does this have to do with the Internet? Why should users accessing Cloud applications that are hosted perhaps in the same geographic region to them, over a seemingly manageable physical distance, experience performance-impacting latency?
The Effect of Internet Peering and Inter-Connection on Application Performance
On a network where bandwidth is dedicated end-to-end, or where shared infrastructure is controlled (like with an MPLS network), latency will closely correlate with the physical distance between user and application. This may also be the case with the Internet where traffic flows are “uncontended”.
But Internet bandwidth is not dedicated bandwidth. Nor is traffic controlled, thanks to the “net neutrality” principle. And most ISPs contend their connections to sweat their assets as much as possible.
All of these mean the Internet is a “best efforts” network.
If you also consider the Internet’s “tiered” architecture of peering and inter-connects, you have a network whose performance most definitely cannot be guaranteed.
And even though we use a single term to refer to the globally inter-connected networks that we call “the Internet”, in reality, individual providers’ networks are engineered in vastly different ways with inter-connections changing on a regular basis.
This means that the time taken for an application packet to travel from a Cloud provider’s network, to the ISP connecting an end-user (the latency) may vary wildly from location to location or even from ISP to ISP. This can even be the case for users located close to each other but accessing the Internet via different ISPs.
Worse still, it may also be the case for the same user accessing the application at different times.
As an example, an application packet originating in a Cloud platform located in Ireland for instance, may have to take a path to a user located in France, via Internet Exchanges in London, Frankfurt, Amsterdam or (in an extreme case) even New York thanks to the network inter-connections and “peering” between ISPs in its path. This effect of network traffic taking unpredictable routes over asymmetric traffic paths which leads to latency and packet loss, is sometimes referred to as “tromboning”.
Tromboning can have a disastrous effect on enterprise applications accessed over a widespread network; especially those hosted in the Cloud.
So, can an IT Manager ask their ISP to “fix the Internet” and resolve their performance issues? Not really. At best, the user can reduce the risk of tromboning and performance-impacting latency from occurring, but there are no guarantees.
In the networking industry, peering is a sensitive subject, so it can be difficult for an enterprise to know at any given time, what path their traffic will take when a user connects to a Cloud application. And even if they can find this out when they initially decide which ISP(s) to use, there is a high probability traffic paths will change repeatedly in the future.
So even though packets travel at the speed of light, latency can kill application performance if the packets must travel hundreds or thousands more kilometers over the Internet than they logically should.
Internet Congestion Leads to Packet Loss
As usage grows, even the “fastest” Internet connection can become congested and application performance can appear to “slow down”. To manage congestion, network routers throughout the Internet will drop application packets at random. This is “packet loss”. In the networking industry, most network and backbone providers use the “Weighted Random Early Detection” or “WRED” discipline to manage congestion.
Thinking back to the fundamentals of TCP-based applications which rely on packets being received in order, packet loss can have a detrimental impact on throughput. If acknowledgement packets are not received by the sender because they have been dropped at random, previous packets must be resent and ongoing application data cannot be transmitted.
And this in turn adds latency.
If you compare packet loss in private MPLS networks to the “best efforts” Internet, you can clearly see the performance limitations of the Internet:
- Typical Packet Loss in MPLS networks: 0.05% to 0.1%
- Typical Packet Loss in Internet: 0.5% to 1% (during peak times breaches 2%)
At 1% packet loss, application performance will appear “poor” for the users. By the time it goes much beyond 1.5% it can simply stop an application from working.
Latency and Packet Loss in Real Terms
So, while connecting to Cloud applications over the Internet is quick, easy and usually low cost, there are no guarantees that applications will work. And the greater the physical distance between the users and their Cloud applications, the more likely it is that latency and packet loss, will render these applications unusable.
Imagine the impact of this on a software company or online gaming publisher who uses a public IaaS Cloud provider to host a variety of global dev and test environments. Excess latency and / or packet loss could mean failed testing, which in turn could lead to dev and test programs having to be restarted. The knock-on effect of this could be a delay in the release of the latest software update or smartphone game. And when individual game revenues can run into tens of millions of dollars on release, the impact of delayed revenue thanks to poor Internet performance can be material.
The Cyber Security Question
Now let’s consider Internet security. The Internet is a public network and is susceptible to malicious security attack. As smartphone growth and the Internet of Things (IoT) has pushed the number of potential points-of-attack to an almost innumerable level, cyber-attacks have become increasingly sophisticated, leading to significant risk of business disruption, stolen funds or customer data, or worse. Much worse.
Internet-based cyber-attacks are headline news with massive Distributed Denial of Service (DDoS) attacks bringing even the most sophisticated Web and Cloud services to a grinding halt.
Attacks like volumetric DDoS can be costly. Verisign, a global leader in domain names and internet security reported a 53% increase in DDoS attacks in Q1 20181 compared to the previous 3 months, with the “Average of Attack Peak” 47% larger than Q4 2017. While the Financial Services industry remained the most commonly targeted, it is notable that more than ¼ of Verisign’s mitigations were on behalf of companies in IT Services / Cloud / SaaS. And as witnessed by leading DNS provider Dyn in late October 2016, a botnet-based bombardment can last for hours and is easily replicated, leading to the complete shut-down of Web or Cloud services as bandwidth becomes saturated.
According to Akamai, the world’s largest and most trusted cloud delivery platform, the scale of attacks is growing. So it stands to reason that they are becoming more challenging (and costly) to mitigate and defend against. In February 2018, “the biggest DDoS Akamai has seen to date2” was measured at 1.35 Tbps and took place against a software development company. The size of such an attack is simply breathtaking. To put it into context, the Eagle express subsea cable system which GCX is building between Italy and Hong Kong via Mumbai, will see state-of-the-art 100G technology to drive 120Tbps of capacity. While the February attack represented just over 1% of this capacity, the numbers are jaw-dropping. Few enterprises on the planet could come out of it unscathed.
And when critical network systems are shut down, productivity grinds to a halt. As we’ve seen in the news recently, even the most valuable of brands can suffer when customers can’t access a website, or worse still, become casualties of a data breach.
But it’s not just about DDoS. There’s Man in the Middle (MitM) attacks like Heartbleed, Poodle and FREAK, that see cyber-attackers exploit weak cryptographic models to downgrade encryption ciphers and obtain session keys in real time. This can lead to the capture and manipulation of data passing in and out of SSL/TLS based services.
Finally, there’s Address Spoofing which cyber-attackers use to explore vulnerabilities over Internet-facing Cloud services to create Advance Persistent Threats (APTs) to servers and critical applications.
The risk of suffering from any of these cyber-attacks is amplified enormously when moving applications out of the private network where they may be protected by multiple levels of security, and into the public Internet. Especially if traffic to and from the applications travels all the way over the Internet (rather than making use of a private network on its journey).
According to Gartner Analyst, Neil Rickard, “the internet is supposed to be a global network of networks, allowing the free flow of information between everything connected to it. However, government-mandated initiatives to filter, block and otherwise control the flow of information across the internet is creating the “splinternet,” a term used to describe the phenomenon of the internet splintering or fragmenting into a series of increasing isolated subnetworks.” 3
Needless to say, this can cause chaos for an enterprise’s users trying to access Cloud-hosted applications from within one of these countries.
“Routing traffic via these filtering devices requires traffic to take longer paths than would otherwise be necessary, resulting in somewhat increased latency. More significantly, the filtering devices themselves introduce latency and can also introduce packet loss and jitter (delay variability). Together, these effects can amount to tens to hundreds of milliseconds of additional round-trip delay and high single-digit packet loss. This can be enough to render many applications unusable and can drive down voice and video call quality to unacceptably poor levels.”4
Neil Rickard, Gartner Analyst
Rickard also notes, “in some of the more restrictive markets, access to specific cloud services can be blocked, either permanently or sporadically, as a response to events, such as the government wishing to block news or if the cloud provider has contravened its policies.”5
For users based in one of these countries, their applications hosted in an offshore Cloud data center may simply be inaccessible over the Internet.
Data Sovereignty and Data Protection
A further consideration when migrating applications to the Cloud is data protection and compliance. The ruling of the European Court of Justice in late 2016 declaring the data transfer agreement between US and Europe, known as “Safe Harbor”, as invalid had wide implications on how end-user data is processed and where it is stored within the Cloud. Enterprises with European operations now have to ensure critical end-user data remains within Europe.
Cloud Service Providers (CSPs) now must ensure they are transparent with their processes for storing and handling end-user information and need to clearly specify and control where the data physically resides. Unsurprisingly this led to a number of new Cloud data centers springing up across Europe offering EU-compliant “zones” and “regions”.
The implication is that, due to the dynamic nature of the Internet, enterprises connecting to the Cloud may continue to rely on private connections to their CSPs to ensure end to end control of personal data.
The Right Cloud Connectivity to Underpin Digital Transformation
So, no matter how extensive or critical your deployment is, a successful Cloud strategy is not just about choosing the right architecture and Cloud provider. You need the right connectivity too!
Therefore, what can the enterprise do to ensure their investment in Cloud continues to deliver the expected value and provide the enabling platform for digital transformation?
1 Verisign, Distributed Denial of Service Trends Report Q1 2018
2 Akamai, Summer 2018 State of the Internet / Security: Web Attacks Report
3, 4, 5 Gartner, Coping With the Splinternet: The Impact of Internet Fragmentation on Global Enterprise Networks, 24 August 2018