{"id":269427,"date":"2020-12-07T07:13:09","date_gmt":"2020-12-07T12:13:09","guid":{"rendered":"https:\/\/www.webscale.com\/?p=269288"},"modified":"2023-12-29T08:05:52","modified_gmt":"2023-12-29T13:05:52","slug":"the-challenges-of-distributed-databases-at-the-edge","status":"publish","type":"post","link":"https:\/\/www.webscale.com\/blog\/the-challenges-of-distributed-databases-at-the-edge\/","title":{"rendered":"The Challenges of Distributed Databases at the Edge"},"content":{"rendered":"
Global Internet traffic in 2021 will be equivalent to 135x the volume of the entire Global Internet in 2005,\u00a0according to Cisco<\/a>. Globally, Internet traffic will reach 30 gigabytes per capita in 2021, up from 10 gigabytes per capita in 2016. Drivers of the huge increase in data volume include networked smart devices, emerging technologies \u2013 like IoT, 5G and AI \u2013 seeing rapid uptake, and manufacturing IIoT. Remote working has also contributed to the trend toward distributed data across 2020 and this looks set to largely continue through 2021. In parallel to this, end users have ever growing expectations for reliable connectivity, superior performance, and fast speed of service.<\/p>\n Demand is growing for edge computing, which\u00a0offers many advantages<\/a>\u00a0in meeting these needs. By bringing data processing and storage as close as possible to the end user, edge computing offers benefits in speed, reliability and scalability, not to mention efficiency savings. Edge computing is a fast-growing market,\u00a0with Statista forecasting<\/a>\u00a0global revenue to reach $9 billion by 2024. Meanwhile,\u00a0Gartner predicts<\/a>\u00a0that by 2025, three-quarters of enterprise-generated data will be created and processed at the edge (compared to just 10% in 2018).<\/p>\n Before edge computing can truly deliver on its promise, however, the challenge of distributed databases at the edge needs to be solved. To date, edge computing workloads have been mostly stateless, but changing edge workloads are driving the need for persistent data at the edge. Using cloud and on-premise databases is not the ideal solution. We need to figure out the most efficient way to process the tsunami of data at the edge.<\/p>\n Conventional distributed databases<\/a>\u00a0depend on the centralized coordination of stateful data, scaling out within a centralized datacenter. They rely on a specific set of design assumptions, including:<\/p>\n An alternative approach with\u00a0geo-distributed databases<\/a>\u00a0(a single database spread across two or more geographically distinct locations) has a very different set of design assumptions:<\/p>\n For stateful edge computing and geo-distributed databases to operate at scale and handle real world workloads, edge locations need to find a way to work together in a way that is coordination-free, which allows edge devices to move forward independently when network partitions do occur. Distributed systems need to be designed to work on the Internet within an unpredictable network landscape, use a form of time-keeping that isn\u2019t lossy, and as stated, not be dependent on centralized forms of consensus.<\/p>\n Edge computing systems are highly distributed by design. However, distributed systems require the designer to make decisions about sources of truth, synchronization, and replication. This can result in systems where users can access data and applications independently.<\/p>\n Each edge device needs to work on its own to perform its function, however these devices also need to share – and synchronize data with other edge devices and nodes. Coordinating several edge devices while simultaneously enabling them to work independently has proved to be a continued challenge for designers of distributed systems.<\/p>\n Edge computing is straightforward when the data is stateless or when state is local (when a device maintains its own state or is trivially partitionable). What do we mean when we talk about\u00a0stateless data<\/a>\u00a0For one, there is no stored knowledge of, or reference to, past transactions in stateless data; examples include HTTP, IP and DNS. Stateless transactions consist of a single request and a single response, and typically use CDN, web, or print servers to process the short-term nature of requests. Up until recently, most edge computing use cases have been stateless.<\/p>\n Stateless works well for web applications that present static media and query database tables to inform applications. Stateless, database-centric applications also work well for performing batch analytics on historical data.<\/p>\n In stateless application design, application services don\u2019t have to remember what they\u2019ve done in the past. The database records an application\u2019s state, and any time the application needs to do something, it will ask the database for information. This works well for data that is stored in one location, such as centralized cloud data centers.<\/p>\n In reality, the majority of applications are stateful; they depend on data from previous request\/response transactions in order to inform subsequent requests. Consider, for example, a banking application that keeps a ledger of expenses and deposits. In order to maintain the current balance, it must draw on insights from previous transactions, or state.<\/p>\n In distributed computing architectures, each edge node has its own local context that should inform the data it generates. This context ideally needs to be maintained and made locally available to applications, so that the latency savings that edge computing promises can be delivered on. When application context is available at the edge, it makes it easier to identify relevant insights from the full dataset and discard only the unnecessary information.<\/p>\n Another significant challenge for stateless data at the edge is the centralized coordination necessary, which counters gains made on latency. If a request needs to travel across the network to a centrally stored database, latency is added with each trip (sending a data packet over a local network<\/a>, compared with computing locally at the edge, comprises a difference in magnitude of 107<\/sup>). Network latency, even with advances like 5G, will always be subject to the speed of light, thus constrained by what\u2019s physically possible.<\/p>\n Several approaches to building stateless edge applications have emerged, each with their pros and cons. These include:<\/p>\n Data filtering at the edge combined with data analysis in the cloud<\/strong><\/p>\n The use of IoT gateways or edge data centers<\/strong><\/p>\n Increasing numbers of use cases for edge computing are demanding the processing of stateful data, which is more complex and challenging than stateless.<\/p>\n What is stateful data? Stateful data comes with information about the history of previous events and interactions with other devices, programs, and users. Stateful applications use the same servers every time they process a user request.<\/p>\n \u201cWithout stateful data, the edge will be doomed to forever being nothing more than a place to execute stateless code that routes requests, redirects traffic or performs simple local calculations via serverless functions\u2026 these edge applications would be incapable of remembering anything of significance, forced, instead to constantly look up state somewhere else.\u201d–\u00a0Chetan Venkatesh and Durga Gokina<\/a>, founders of Macrometa Corporation<\/cite><\/p><\/blockquote>\n Stateful is useful for applications that require more context about users or end-user devices, to deliver more personalized experiences. These include:<\/p>\n Stateful, real-time edge computing will enable latency-critical applications by providing the means for processing and distributing streaming data across complex systems without compromising on speed.<\/p>\n There are challenges involved with performing stateful computing at the edge.These include – for the above edge use cases, the ability to sync stateful data with guaranteed consistency. This is essential, for instance, to avoid lag in real-time gaming or prevent freezes in real-time streaming video calls. Without reliable consistency, different applications, devices and users will see different versions of data, leading to unreliable applications, data corruption and data loss.<\/p>\n How do you manage and coordinate state across a range of edge locations or edge nodes and sync data with guaranteed consistency?<\/p>\n One solution being worked on are edge-native databases, which are geo-distributed, multi-master data platforms capable of supporting multiple edge locations without the need for coordination. While they don\u2019t require centralized forms of consensus, they can still guarantee consistency and arrive at a shared version of truth in real-time. These databases promise to overcome the data processing limitations experienced till now at the edge. An additional benefit is that they won\u2019t require developers to have a specialist knowledge of how to design, architect or construct these databases.<\/p>\nWhy Conventional Distributed Databases Don\u2019t Work at the Edge<\/h3>\n
\n
\n
What are the Challenges of Persistent Databases at the Edge?<\/h3>\n
1. The Distributed Nature of Edge Computing Systems<\/h4>\n
2. Stateless Data at the Edge<\/h4>\n
Why is Stateless Straightforward?<\/h5>\n
The Real World isn\u2019t Stateless<\/h5>\n
Stateless Design Causes Latency<\/h5>\n
Approaches to Building Stateless Edge Applications<\/h5>\n
\n
\n
\n
3. Why We Need Stateful Data at the Edge<\/h4>\n
Stateful Use Cases at the Edge<\/h5>\n
\n
The Challenges of Stateful Data and Edge Computing<\/h5>\n
Do Edge-Native Databases Provide the Solution?<\/h3>\n
Conclusion<\/h3>\n