How Fleek Network Achieves Web Service-grade Performance in a Decentralized Setting
Performance bottlenecks have long been one of the main issues plaguing adoption in the web3 industry, especially on the decentralized cloud / decentralized web infrastructure side of the ecosystem. That’s because given how critical speed and latency are to the web, no rational developer or software company (both web2 and web3) will switch to decentralized alternatives unless the performance is as good, or better, than existing Big Cloud alternatives. So naturally, there is a lot of healthy skepticism when Fleek Network claims to be capable of achieving web2 / corporate cloud performance, which is fully expected and appreciated. As a result, the core team thought it would be helpful to highlight a few key areas that help Fleek Network achieve web-grade performance in a decentralized setting. Because while the whitepaper highlights the majority of the architectural decisions made to optimize for low latency and high performance, there are other intricate details we felt would be useful to expand upon to help readers better understand. Those include:
- Optimistic payment processing
- Node incentivization
- Using gateways without traffic tunneling
- Leveraging BGP advertisement dumps
But before diving in, one very important thing to understand is that irrespective of being decentralized, almost everything in Fleek Network’s architecture is standard practice when it comes to building large, highly performant distributed systems, such as a CDN or Edge Network (ex. Cloudflare). So the purpose of this blog is not to explain those aspects common to every major large distributed system, but instead focus on the big technical decisions that make a decentralized alternative to something like Cloudflare possible.
Unlocking scalable performance with optimistic payments
In distributed networks, ensuring seamless performance while managing billing complexities—like avoiding negative balances—can be a major challenge. However, this is an even bigger challenge for such systems that are also decentralized.
Fleek Network introduces an innovative approach with optimistic payment processing, a key factor behind its scalability and high performance. This approach supports what we call local-first work processing, enabling nodes to process tasks efficiently while minimizing latency.
The essence of optimistic payment processing lies in allowing nodes to serve users based on ongoing micro-transactions. Instead of waiting for global payment finality after every interaction, nodes accumulate transactions locally, later broadcasting aggregated payments to the network’s global consensus. This mechanism ensures smooth operations without compromising speed or scalability, even with high transaction volumes.
However, managing payment delays introduces risks, such as users running out of balance when transactions are finalized. Fleek Network mitigates this by setting minimum balance thresholds for service eligibility, empowering nodes to decide when to submit transactions globally. This dynamic system balances scalability with security, creating a resilient network that rivals traditional Web2 systems in performance.
Fleek Network’s decentralized design not only optimizes operations but also lays the foundation for a global infrastructure with superior coverage and low latency. This article explores how Fleek achieves Web2-level performance at scale, the role of gateways as coordinators, and how innovative technologies such as WebRTC, WebTransport, and Anycast further enhance the network’s speed and security.
Diving deeper into optimistic payments
Let’s consider the issue of billing and accounting and the much-feared negative balance issue in the decentralized space. One of the novel contributions of Fleek Network is the idea of optimistic payment processing. This key contribution is one of the primary reasons behind the performance of the network and what enables what we call local-first work.
The short explanation of optimistic payment processing is that when a node starts processing some work for some user, based on the user's account balance in the network, the node may accept to serve the user and get many small payments from the user during their interaction. For example, in the case of content delivery, the node and the client exchange every 256KB of data using a fair exchange protocol in which the node gets a micro-payment for that amount of data served. Assuming a capable node in the network serving at a bandwidth of just 10Gbit/s, that is 10240 microtransactions per second just for that one node. (At 1000 nodes, this is the equivalent of ~10M transactions per second.)
Clearly it would be infeasible for every node to wait for the finality of each of these transactions in a big, distributed system. Optimistic payment processing is the idea that each node can hold on to these micro-transactions locally as long as they feel safe to, and later on broadcast a much smaller aggregated transaction to its peers and to the network’s global consensus.
However, with this approach comes the risk of users having an insufficient balance when the time comes for a node to send their payment receipts to be processed globally. This risk is mitigated by the nodes requiring a safe threshold in the user’s balance above which they choose to serve the user. Each node at any point can decide to send the summary of all of its transactions with a user to everyone else. This is a probabilistic function depending on:
- The user's active balance on the global ledger.
- The user's non-globally committed spend that the node is aware of.
- The number of nodes in the network.
When one node (depending on its non-committed information) sees that it’s time to send the user's transactions globally, other nodes will then have an updated view as well which will then trigger them to possibly share their transactions. At some point, the user's balance might get close to the freezing threshold. In this case, the other nodes will stop serving the client while others will have time to submit their own transactions.
In reality, this does not prevent the user from getting to a marginal negative balance. However, it makes it impractical enough. Given that the nodes are responsible for setting their freezing thresholds locally, it will be up to the community to mitigate this issue.
You might be wondering what bearing this has on the issues previously discussed related to billing. The billing problem is not unique to decentralized networks—this is a problem that hovers over any distributed system that is scaled enough to a certain point. For example, imagine if AWS had decided to have all of their servers know instantly the moment a user didn’t have enough credit to spend. It would simply slow them down as well.
This model makes any task that can be exchanged fairly between one client and the node (or a small group of nodes) as performant as any Web2 centralized alternative, apart from the client performing some sanity checks over the parameters of the fair exchange. There really isn’t much that you wouldn’t expect from a centralized counterpart. At Fleek Network, we put substantial engineering effort towards ensuring that every hot-path in the code is as optimized as it can be. As an example of such low-level work, you can read our article on how we made the fastest JavaScript Blake3 implementation in existence, which is an essential part of the data-fetching logic on the client side. We managed to make it a non-bottleneck by simply making it that much faster!
By breaking the web3 norms, Fleek Network can achieve web2-like performance and can do so at an even larger scale.
Optimizing infrastructure ops by incentivizing nodes
In the case of Fleek Network, decentralization is not a curse but a blessing. By decentralizing the supply side of the infrastructure, we anticipate Fleek Network to be a much larger network than it would be possible for any corporate alternative to provide. The Fleek Network node providers are incentivized to go to where the usage is. This fact can create the perfect outsourced infra-ops team. You can view the node runners as an extended part of the infra team because practically speaking that is what they really are.
In the world of networking, more nodes in more locations simply translates into better coverage and lower latency across the network and user interactions. Even the largest operated networks as of now are networks of sub-500 instances. (AWS Cloudfront maxes out at 440 PoPs in only 90 cities, and something as large as Google Cloud is only at ~100 locations at the time of this writing.)
With the incentives provided by Fleek Network’s decentralized nature, we are anticipating around 1000 nodes as part of the initial mainnet, which would already be a competitive advantage of the network in terms of global coverage without being overly wasteful.
Optimizing data exchange via gateway without tunneling traffic
Given the need for gateways to interact with the network, it is valid to wonder whether the performance of the Network is going to be limited to the performance of its gateways.
This is a valid question and in its core assumes that all of the traffic is routed through the gateways. Although the gateways will act as pass-through traffic tunnels (aka reverse proxy) for certain types of requests (usually those directly coming from a browser), the majority of the traffic will only be facilitated by the gateway as the payer for that traffic on behalf of the user while the user and the node can have a direct line to for the actual data that needs to be exchanged.
To make the matter less confusing, let’s refer to the gateway as the coordinator when talking about them in the context of facilitating the data transfers rather than proxying the entire traffic. In practice, these are only different parts of the same software running on the same hardware and this distinction is only a conceptual one around roles and responsibilities.
The diagram above demonstrates the conceptual difference between a gateway and a coordinator, in addition to some example applications that could benefit greatly from the coordinator. Before that, though, it's important to understand that the one big difference between the coordinator and the gateway — in the case of the coordinator, the client has to understand Fleek Network protocol. However, this task is fairly easy given that we aim to have seamless libraries for end-user that could make this as easy as having a single <script src=...">
tag in their main HTML file, just as easy as embedding the Google Analytics script tag.
Example 1: Serving video files
The initial motivator for this design was streaming video files. The interesting thing about video files is that they are large, and even in the traditional way programmers are used to having a small setup and exchange with one server, which then tells the client where to get the file from using which tokens. Following along the same idea, we realized, we could have a new class of gateways that are simply there to help with the setup (and small delivery notifications from the client) and then have a client open a connection to the node and get the data directly from that node instead of passing that large volume of traffic through the centralized infrastructure. This use case alone can bring a lot of traffic. Given that the gateways try to find the optimal node for a client (the routing policies) the client can have the request responded to from a node closest to them, and in the case of video streaming, this is really valuable.
Example 2: Serving a website
The most challenging aspect of full decentralization is serving an entire frontend—unless, of course, you're willing to build your own browser, though that approach often suffers from limited adoption.
You can read our DNS over SGX blog to get some answers on how we think that mission is possible. But here let’s focus on serving the content directly from the node given Fleek Network’s reverse gas-model of the payments in the network. The idea is simple: the gateways that can act as tunnels for the traffic will be used on the first load to give the first response to the browser (the HTML.)
After that, by leveraging Javascript service workers, you can inspect the outgoing requests on the browser and route them to a Fleek Network node instead on the client side using the network’s direct connect protocol. (This is the same one used in the video retrieval example.)
This will be made possible by a user injecting a little script into their page, which will then set up the service worker. This is a seamless experience. The upside will be faster load times of content in the site as well as verification that every bit of data the frontend receives uses their content hash.
The main technical challenge to solve was securing a connection from a client to a node in case of TLS not being present on that node. As of now, Fleek Network supports two widely supported transports to solve this issue:
- WebRTC: The good thing about WebRTC is that it is built for peer-to-peer communication and doesn’t require a pre-known key like HTTPS does. The nodes support connecting to them using WebRTC.
- WebTransport: Although, like HTTPS, WebTransport also requires an SSL key, there is a little-known fact in the web standards that allows the client JavaScript code to provide a self-signed key on behalf of the server on the client side.
Optimizing routing based on Fleek Network’s understanding of the Internet
Although geography does correlate to latency, for the most part, the Internet is a complex network with unexpected routing behavior given its structure. An optic cable cutting through the ocean can make the transfer of bytes across a continent faster than transferring bytes across two neighboring cities if it happens to have to go through different ASNs. The BGP (Border Gateway Protocol) is the protocol that advertises routing tables of the Internet as a whole. An active part of the networking R&D is attempting to leverage the BGP advertisement dumps in the routing policies to connect a client to a node that is actually closest to them in terms of the packet trace route and not just rely on (often time inaccurate) geo IP lookup tables.
This results in a client connecting to a node within the same ISP if there is one, reducing the expected latency between the client and the node.
The above are just a few key examples of how Fleek Network is able to achieve web-grade performance in a decentralized setting. However, there are several other performance enhancements we are actively exploring and plan to explore in the future, to further push the limits on what is possible, including additional things that are only possible in a decentralized setting.
But before diving in, one very important thing to understand is that irrespective of being decentralized, almost everything in Fleek Network’s architecture is standard practice when it comes to building large, highly performant distributed systems, such as a CDN or Edge Network (ex. Cloudflare). So the purpose of this blog is not to explain those aspects common to every major large distributed system, but instead focus on the big technical decisions that make a decentralized alternative to something like Cloudflare possible.