{"id":269906,"date":"2024-01-05T11:28:00","date_gmt":"2024-01-05T16:28:00","guid":{"rendered":"https:\/\/www.webscale.com\/blog\/the-complexities-of-building-and-operating-edge-networks-and-infrastructure\/"},"modified":"2024-01-05T11:28:00","modified_gmt":"2024-01-05T16:28:00","slug":"the-complexities-of-building-and-operating-edge-networks-and-infrastructure","status":"publish","type":"post","link":"https:\/\/www.webscale.com\/blog\/the-complexities-of-building-and-operating-edge-networks-and-infrastructure\/","title":{"rendered":"The Complexities of Building and Operating Edge Networks and Infrastructure"},"content":{"rendered":"
This is the final post in a three-part series digging into the complexities that developers and operations engineers face when building, managing, and scaling edge architectures. In the first post, we looked at <\/span>the complexities of replicating the cloud developer experience at the edge<\/span><\/a>. In the second part, we discussed how to approach application, selection, deployment, and management for the edge. In this post, we\u2019ll focus on the complexities of managing the network, infrastructure, and operations in a distributed compute environment.<\/span><\/p>\n We are in the midst of a foundational technological shift in communications infrastructure. IDC predicts that by 2023, <\/span>more than 50% of new enterprise IT infrastructure will be deployed at the edge<\/span><\/a> and the edge access market is predicted to drive $50 billion in revenues by 2027. To interconnect this hyper-distributed environment, which spans on-premise data centers, multi-clouds, and the edge, the network is in the process of evolving and becoming more agile, elastic, and cognitive.<\/span><\/p>\n Let\u2019s dive into a high-level overview of some of the critical components you need to consider when building and operating distributed networks. Areas that we\u2019ll cover include:<\/span><\/p>\n As you\u2019re evaluating all of these considerations, bear in mind that working with <\/span>Edge as a Service<\/span><\/a> can solve many of these complexities for you.<\/span><\/p>\n The domain name system (DNS) is often referred to as the phonebook of the Internet since it translates domain names to IP addresses, allowing browsers to load Internet resources. DNS provides the hierarchical naming model, which lets clients \u201cresolve\u201d or \u201clookup\u201d resource records linked to names. DNS therefore represents one of the most critical components of networking infrastructure.<\/span><\/p>\n Other widely used Internet protocols have started to incorporate end-to-end encryption and authentication. However, many widely deployed DNS services remain unauthenticated and unencrypted, leaving DNS requests and responses vulnerable to threats from on-path network attackers. Hence, when building out DNS services, it\u2019s critical to maintain a security-first approach.<\/span><\/p>\n DNS is lightweight, robust and distributed by design. However, new approaches to computing architecture, including multi-cloud and edge, introduce new considerations when implementing application traffic routing at the DNS level.<\/span><\/p>\n In a distributed compute environment, DNS routing entails ensuring that users are routed to the correct location based on a set of given objectives (e.g. performance, security\/compliance, etc.). When routing, you need to take into account service discovery. The majority of the time, people use DNS for public-facing service discovery, but this can be challenging to pull off. Routing is complicated both in terms of ensuring that users are routed to the right location and that failover is handled \u2013 in other words, what will you do to help update routes when systems fail?<\/span><\/p>\n As an Edge as a Service provider, CloudFlow uses a combination of managed DNS services for routing: <\/span>Amazon Route 53<\/span><\/a> and <\/span>NS1<\/span><\/a>, which, when used in combination with CloudFlow\u2019s <\/span>Adaptive Edge Engine<\/span><\/a> workload scheduler, allows us to determine which endpoints users are routed to. NS1 provides a broader and more granular set of filters, which enable state-based latency routing. (Its \u201cup\u201d filter is a particularly useful feature for multi-CDN or multi-cloud architecture.)<\/span><\/p>\n \u201cNS1 brings automation, velocity, and security to modern application development and delivery. Enterprise infrastructure is evolving faster than ever before. Emerging technologies make it possible to spin up microservices and cloud instances in minutes. DevOps teams are churning out code 40 times faster than legacy production environments. New edge and serverless architectures are taking computing out of the data center and closer to devices enabling global real-time applications. Those organizations not born in the cloud-native era face the additional challenge of connecting legacy applications with new technology in the never-ending race to meet user demands for performance while driving efficiency and security.\u201d<\/span>Kris Beevers, CEO, NS1<\/span><\/a><\/p>\n By working with managed DNS services that are at the forefront of new developments, CloudFlow keeps applications available, performant, secure, and scalable with integrated Anycast DNS hosting, while also removing the burdens of configuration, security, and ongoing management.<\/span><\/p>\n Transport Layer Security (TLS) is an encryption protocol, which protects communications on the Internet. You can feel reassured that your browser is connected via TLS if your URL starts with HTTPS and there\u2019s an indicator with a padlock assuring you that the connection is secure. TLS is also used in other applications, such as email and usenet. It\u2019s important to regularly upgrade to the latest protocol for TLS and its predecessor, the SSL protocol.<\/span><\/p>\n When working with TLS and\/or SSL in distributed environments, you have to either work with your own certificates using a managed service such as <\/span>DigiCert<\/span><\/a> or use an open source version with a service like <\/span>Let\u2019s Encrypt<\/span><\/a> or <\/span>Certbot<\/span><\/a>. The managed certificate authorities such as DigiCert will provide automated tooling to provision certificates. With the open source versions, you will find that you have to build the services to manage auto-renewal components and the provisioning of new certificates.<\/span><\/p>\n An added complexity is the question of how you deploy these protocols? In relation to distributed systems, you will have certificates that need to be running in multiple places. They might be running across multiple providers, for example, you might be using one specific ingress controller in one location and a different ingress controller in another. How do you ensure that your certificates are being deployed where needed in order to actually handle the TLS handshakes? And as the number of domains that you manage increases, so too do the complexities.<\/span><\/p>\n This ties directly back to DNS since you need to ensure that you\u2019re routing traffic to the correct endpoints containing the workloads where your TLS certificates are deployed. Further, you will have to take into account the state of your systems at any point in time and how you route traffic, since you never want to be servicing users incorrectly.<\/span><\/p>\n Ultimately, servicing your user correctly is the end goal, meaning that when implementing TLS at the edge yourself, you have to take into account all these different components.<\/span><\/p>\n CloudFlow\u2019s Edge as a Service includes <\/span>advanced, managed SSL services<\/span><\/a> to procure, install, renew, and test SSL\/TLS certificates for your web applications.<\/span><\/p>\n When protecting against Distributed Denial of Service (DDoS) attacks across distributed systems, the first question to ask should be, where are my systems most vulnerable to attack? The primary layers of focus for protecting against DDoS attacks include Layers 3, 4, and 7 in the <\/span>OSI model<\/span><\/a>.<\/span><\/p>\n <\/p>\n Firms like Wallarm, Signal Sciences, ThreatX, and Snapt will provide DDoS protection for you at the application layer (i.e. Layer 7), and when you deploy these through Edge as a Service providers like CloudFlow, you\u2019re able to leverage a distributed deployment model out-of-the-box.<\/span><\/p>\n However, in an edge computing paradigm that\u2019s made up of heterogeneous networks of providers and infrastructure, there are more questions that need asking. The most important: how do all the different providers I\u2019m using handle network and transport-layer DDoS attacks (i.e. Layers 3 and 4)?<\/span><\/p>\n All major cloud providers typically have built-in DDoS protection, but when you begin to expand across a multi-cloud environment, and even further out to the edge, you need to ensure that your applications are protected across the entire network. This includes knowing how each underlying provider handles DDos protection, along with implementing safeguards for any areas in your networks that may be underprotected. This also takes us back to DNS and the question of how to handle traffic routing when one (or more) of your endpoints becomes compromised.<\/span><\/p>\n CloudFlow\u2019s <\/span>Adaptive Edge Engine<\/span><\/a> works in conjunction with best-in-class WAF offerings, along with advanced DDoS mitigation to ensure your applications are always protected across your entire compute environment.<\/span><\/p>\n The Border Gateway Protocol (BGP) is responsible for examining all the available paths that data can travel across the Internet and picking the best possible route, which usually involves hopping between autonomous systems. Essentially, BGP enables data routing on the Internet with more flexibility to determine the most efficient route for a given scenario.<\/span><\/p>\n BGP is also <\/span>widely considered<\/span><\/a> the most challenging routing protocol to design, configure, and maintain. Underlying the complexities are many attributes, route selection rules, configuration options, and filtering mechanisms that vary among different providers.<\/span><\/p>\n In an edge computing environment, rather than announcing IP addresses from a single location, BGP announcements must be made out of multiple locations, and determining the most efficient route at any given point becomes much more involved.<\/span><\/p>\n Another important consideration when it comes to routing is load balancing at the transport layer (Layer 4). Building a Layer 4 load balancer is complicated for the following reasons:<\/span><\/p>\n The complexities of routing across multi-layer edge-cloud topologies are perhaps the most daunting when it comes to building distributed systems. This is why organizations are increasingly turning to Edge as a Service solutions that take care of all of this for you.<\/span><\/p>\n An effective presence at the edge is based on having a robust location strategy. By moving workloads as close as possible to the end user, latency is reduced. Selecting the appropriate geographies for your specific application within a distributed compute footprint involves careful planning.<\/span><\/p>\n At Webscale, <\/span>we operate an OpEx model<\/span><\/a> and benefit from a very different kind of network to cloud providers and traditional content delivery networks. Our flexible strategies and workflows allow us to tailor the correct edge network for each of our customers, delivering on performance gains and reducing costs for them. The <\/span>CloudFlow Composable Edge Cloud<\/span><\/a> is built on the foundations of AWS, GCP, Azure, Digital Ocean, Lumen, Equinix Metal, RackCorp, and others. We regularly add more hosting providers and have the capacity to deploy endpoints based on the specific needs of our customers who want to define their own edge.<\/span><\/p>\n Compliance also plays a role in the selection of edge locations. Increasingly, regulations and compliance initiatives, such as GDPR in Europe, are requiring companies to store data in specific locations. Edge as a Service providers with flexible edge networks enable DevOps teams to be precise about where they want their data to be processed and stored, without the burdens associated with ongoing management.<\/span><\/p>\n Managing workload orchestration across hundreds, or even thousands, of edge endpoints is no simple feat. This can involve multiple components. You need to start with where you want the workload to be defined (e.g. full application hosting, micro APIs, etc.) Next, ask where will it be stored? Finally, take into account how the workload is actually deployed. How do you determine which edge endpoints your code should be running on at any specific time? What type of automation tooling and DevOps experience do you need to ensure that when you make changes, your code will run correctly?<\/span><\/p>\n Managing constant orchestration over a range of edge endpoints among a diverse mix of infrastructure from a network of different providers is highly complex. To migrate more advanced workloads to the edge at a faster rate, developers are increasingly turning to flexible Edge as a Service solutions, which support distribution of code across multiple programming languages and frameworks.<\/span><\/p>\n A load shedding system provides improved fault tolerance and resilience in message communications. Fault tolerance allows a system to continue to operate, potentially at a reduced level, in the event of a failure within one or more of its components.<\/span><\/p>\n In regards to load shedding and fault tolerance at the edge, the primary area of concern is ensuring that the systems handling your workloads and servicing requests aren\u2019t overloaded. Essentially, how do you make sure that one location isn\u2019t set up to infinitely scale and how do you ensure that load is distributed appropriately?<\/span><\/p>\n Load shedding and fault tolerance brings us on to auto scaling and configuring auto scaling systems. One of Kubernetes\u2019 biggest strengths is its ability to perform effective autoscaling of resources. Kubernetes doesn\u2019t support just one autoscaler or autoscaling approach, but three:<\/span><\/p>\n If you\u2019re not using a container orchestration system like Kubernetes, compute provisioning and scaling can get very challenging very quickly.<\/span><\/p>\n The messaging system provides the means by which you can distribute your configuration changes, cache ban requests, and trace requests to all your running proxy instances in the edge network or CDN, and report back results.<\/span><\/p>\n This involves two primary components:<\/span><\/p>\n\n
DNS: A Critical Component in Networking Infrastructure<\/b><\/h3>\n
DNS Services are Often Vulnerable to Threats<\/b><\/h4>\n
DNS in a Distributed Computing Environment<\/b><\/h4>\n
CloudFlow\u2019s Approach to DNS, Failover, and Service Discovery<\/b><\/h4>\n
TLS: Provisioning, Management, and Deployment Across Distributed Systems<\/b><\/h3>\n
DDoS: Protecting Layers 3, 4, and 7<\/b><\/h3>\n
BGP\/IP Address Management<\/b><\/h3>\n
\n
Edge Location Selection and Availability<\/b><\/h3>\n
Workload Orchestration<\/b><\/h3>\n
Load Shedding and Fault Tolerance<\/b><\/h3>\n
Compute Provisioning and Scaling<\/b><\/h3>\n
\n
Messaging Framework<\/b><\/h3>\n
\n