1:3 Other Ways to Make the Sausage
This chapter works to further the ideas presented in the previous brute force consensus chapter by showcasing the three main ways transactions can be validated to reach a consensus about what is placed into a shared ledger database.
Treat this chapter as a very high level reference guide of sorts to refer back to in subsequent chapters. Many of the topics introduced here will be discussed in much more detail in future chapters. Keep in mind this “three buckets” approach is a cheap mental short cut to try and create an easier mental map of a very complex topic.
As such, these mental shortcuts fall woefully short of doing justice to every nuance needed to understand the space. Instead, think of this chapter as an encyclopedia-esque boot-loader to orient yourself in this bizarre new world.
Let’s start. By "miner" we mean the computer(s) that place transactions in to the ledger. “They Are”, “You Elect”, and “You Are” refer to who (which computers) are performing the computations.
"They"-are-the-miner: Various bitcoin like strategies where miners race to find winning random numbers either in proof-of-work like setups illustrated in the previous chapter, or other "sacrifice" based approaches such as proof-of-capacity where something more useful like hard drive storage space is used to tie validation to the real world. The key element is anyone with a computer can join the race and have a random chance of being selected to validate transactions. As we will see, anyone is a bit of a misnomer due to economies of scale favoring larger and larger pools of miners.
"You"-elect-the-miner: Works by token holders in the system electing who they want to validate transactions, rather than through purely random chance competition. Voters cast their ballots most often with a one-token-per-vote type system, and expect to receive a portion of the network fees back for casting their vote for a particular miner. Within the elected group, an element of randomness must still be present or the system would work no differently from a centralized database. (E.g., the validators must not be able to consistently win or the ledger risks being co-opted)
"You"-are-the-miner: A catchall term for systems that do not fall neatly into the previous two paradigms. In these systems there are flavors of storing your own local chain that matches up with some subset of a global chain of transactions to verify consensus. In effect "you" do some of the work of a miner to push out transactions. “You” is a loose term which could be anything from running a wallet on an old laptop or smartphone, to an AWS virtual machine, to specialized designed networking hardware.
There are no fast and set rules separating these three main approaches, and any distributed ledger network is free to fundamentally alter the way it validates transactions by either:
The majority choosing to support an updated version of the ledger (no split)
The minority choosing to support an updated version of the ledger (split into two projects - legacy version & updated version)
Starting a brand new ledger from scratch with new rules and new owners
Due to the decentralized nature of distributed ledgers, gaining consensus to change critical network parameters is a difficult and often contentious process. Based on which upgrade path is chosen, there will be different winners and losers which becomes a fundamentally human rather than computer science problem.
"They" Are the (Un-elected) Miner
Bitcoin was the first true distributed ledger, setting the precedent for a sacrifice based approach to ensure transactions were honestly processed in a provable, unmodifiable way.
Within this "they-are-the-miner" electricity burning server farm approach there are three primary sub approaches which seek to improve scalability and security of the network.
Off-chain: as the name implies is method allows for transactions that are not recorded onto the ledger each time, but rather reference the ledger only when there is a conflict.
Big Blocks: increasing the block size to increase the amount of transactions
New Sacrifice: replacing the SHA-256 algorithm bitcoin uses with a different type of sacrifice using a different random number finder, or another type of potentially more useful sacrifice.
The dominant version of Bitcoin (as of this writing) chose to implement a multi year development called the lightning network which creates a second layer of transaction processing on top of the primary bitcoin blockchain.
The lightning network system works by opening a channel of transactions where parties can make many transactions with each other without needing to send each transaction to the ledger.
This approach can be hugely beneficial for:
privacy: as there is no blockchain record of the transaction, and
scalability: as the 1 megabyte of transactions bitcoin can process every 10 minutes is not filled up with off chain lightning transactions
When a channel (special network path for sending transactions without using the blockchain) between parties is opened or closed, a single transaction is registered with the blockchain. Once established, a lightning channel removes the need to register every intermediate transaction that takes place in the channel with the underlying blockchain. Like a traditional network router, transactions “hop” between “relay nodes”. (Eg. each individual running a lightning node can forward transactions without being able to steal/co-opt/modify transactions)
You could imagine a system like this working great for someone like a large retailer. The retailer might pay your one time blockchain fee to open up a channel on your behalf, then be able to credit and debit an unlimited number of transactions with you for zero miner fee as none of the transactions are posted to bitcoin itself. Only if there is a dispute between parties does bitcoin get involved as the ultimate arbiter who can move the underlying bitcoin collateral. By “bitcoin getting involved” we really just mean you can revoke a transaction and get your bitcoin back off the lightning network as long as you have your private key.
As off-chain solutions effectively create multiple layers in which to route transactions, off-chain solutions can be seen as the evolution of distributed ledger systems into something that more closely resembles the current internet where many layers of architecture act in coordination to run a massive interconnected global ecosystem.
There are many challenges left to be solved before lightning networks can become mainstream including:
The need for both parties to remain online for transactions to process. The base Bitcoin protocol conversely is “asynchronous” in that you can send “Real Bitcoin” to a public key address that is offline, as the record of the event is stored in the shared ledger that can be synced up to at any time in the future by providing the correct private key.
More insidiously, lightning networks quasi recreate the existing banking system by giving lightning node and channel operators certain power over transactions passing through their network such as what fees to charge, and if they want to ban certain addresses.
Lightning channels are also limited to actual bitcoin collateral backing them (which is a good thing no recreating the fractional reserve system). This however favors routing transactions through lightning hubs with large amounts of collateral which can be seen as further centralization of the protocol.
The alternate version of Bitcoin called "Bitcoin Cash" removed the ability to perform these off-chain transactions in favor of increasing the block size and retaining the simplicity of the original Bitcoin protocol. As of May 2018 each Bitcoin Cash block can process 32 megabytes worth of transactions every 10 minutes, though in practice most blocks are very far from being full.
This approach increases the storage requirements to host a full node (or full copy of the ledger). Larger storage requirements to maintain a full copy of the ledger can limit the number of distributed ledger “full nodes” people are willing to run. If you need to store 32 megabytes of new information every 10 minutes just to send a payment, most average users would not be willing to purchase a new hard drive every year just to use the protocol.
At 3.2 megabytes per minute X 525,600 minutes in a year if all blocks are full roughly 1.6 terabytes of new blockchain data will be created each year.
Thus most users will not download the full copy of the ledger (called a full node) and instead rely on a lite wallet (or software that does not need a full copy of the ledger to work, but instead just references someone else's full copy)
A thought provoking blog post imagines a world with many terabyte blocks that can record every waking moment of life. In such an extreme example, the hosting of the nodes would by necessity become centralized as storing such vast amounts of information could only be done by large scale operations.
The Bitcoin protocol does not provide any incentives to host full copies of the ledger, and instead only rewards the miners that process the transactions. For a distributed ledger system to succeed at scale, creating incentives to host robust copies of the ledger will become ever more crucial.
As mentioned in the previous chapter, Bitcoin uses a very specific math puzzle called SHA2-256 to race for random numbers that will win each new block. Since the introduction of SHA2-256, many newer variants of finding random numbers have entered the market. These newer approaches provide higher levels of encryption with varying hardware requirements to effectively solve these newer types of math problems.
These newer algorithms try to improve on SHA2-256 in three main ways:
Make things harder for big guys to encourage smaller users to mine
Make things more secure to safeguard against quantum computers
Find something new to sacrifice that is more energy efficient or even "useful"
Harder for the big guys
If the key strength of distributed ledgers is in being decentralized, having only a few main groups doing all of the mining is not ideal.
Thus projects like Litecoin have replaced SHA2-256 with a different algorithm called Scrypt that requires much more computer memory to search for random numbers.
Because the memory requirements are higher, mining for litecoin favors smaller users over larger users, as large users have difficulty finding economies of scale when building out server farms with high memory requirements.
Like any competitive activity though, with a high enough price creating enough incentive, ASICs (specialized chips that only do one thing) will be developed to more efficiently mine for any successful protocol. This processes happened to Ethereum in 2018 with the release of the Bitmain ASIC miners that run the Equihash algorithm only, rather than more general purpose GPUs that were used prior to mine Ethereum that could do many other tasks such as playing video games or training AIs.
Instead of using 256 bit keys, other systems use a combination of longer keys and new peer reviewed encryption techniques.
In 2013, the successor to SHA-2 (aptly called SHA-3) was released which has been adopted along with 512 or even 1024 bit encryption keys by projects such as NEM and Stellar Lumens.
The actual encryption mechanism is mostly trivial as any successful evidence of an algorithm like SHA-2-256 being broken would force a split in the network to a newer encryption technique.
The downside for such a move for Bitcoin however, is that the specialized ASIC computers which miners have invested billions into can only crack SHA-2-256, and entirely new ASICs would need to be developed if the math puzzle was updated.
The goal of many in the distributed ledger community is to find a more energy efficient or useful task to perform with all of the computing power available.
The only "useful" work performed by Bitcoin miners is to slowly weaken the security of SHA-2-256 by continually guessing random numbers until eventually (decades hopefully) the security no longer works.
One twist on finding random numbers is to try and find enormous prime numbers instead. While this approach helps accomplish something scientifically useful, it still uses large amounts of electricity to process transactions.
A concept called "proof-of-capacity" or "proof-of-space" mentioned briefly at the top of the chapter requires large amount of hard drive space as "sacrifice" which can be used to create a decentralized storage network that provides useful storage capacity.
The holy grail of useful sacrifice is a system where general computation can be performed by miners in a manner that rivals the efficiency of centralized computation.
Turing complete (systems that can run any arbitrary code) smart contracting platforms like Ethereum can perform this general computation, but are currently 40,000x less efficient in terms of computation cost as a centralized competitor such as Amazon Web Services.
Keep the “useful sacrifice” concept in mind as we will revisit this topic in more detail in subsequent chapters.
The first major paradigm shift away from sacrifice based validation of blocks, is to replace this process with a series of full nodes (computers with a full copy of the ledger) that will validate blocks on your behalf. This approach solves the issue of incentivizing people to keep full copies of the ledger running. In this system, professional IT managers maintain full ledgers in exchange for the fee income received when transactions are validated. As the computational resources can be a little as a single Virtual Machine with a few gigabytes of hard drive space, the barriers to entry for node operators vs Bitcoin miners is drastically lower.
While vastly more efficient in terms of energy spent per transaction validated, these "proof-of-stake" systems have the extremely difficult task of ensuring the nodes validating the transactions will behave honestly.
Thus a random lottery system of some kind needs to be used to ensure a malicious group of nodes controlled by a central party cannot take over the network and process transactions however they see fit (including stealing tokens, refusing to process certain transactions, etc).
A primary concern in a "you-elect-the-miner" system is when there is not a sufficient penalty for creating a network of many accounts controlled by one central party. (known as a Sybil Attack)
To deter sybil attacks, many novel solutions have been developed that rely on mathematics to determine how trustworthy each node in the system is, and attaches a cost to spamming the network with fake accounts. The cost is used to deter the Nothing at Stake problem, which is often illustrated using the spam email example:
If your email system was running on a distributed ledger system, every time someone wanted to send you an email it would cost a tiny fraction of a penny. The cost is low enough it does not interfere with the normal course of interactions, but deters bad actors from sending millions of undirected marketing emails.
Popular flavors of "you-elect-the-miner" are listed below. Common elements between all of these different variations involve how validator nodes are chosen and maintain consensus with each other to continually update a global shared ledger of transactions.
*Note not all “you-elect-the miner” approaches have an election mechanism. Confusing right? Some favors of this system simply pay-to-play where you can buy access to become a validator rather than being freely elected. Just like politics!
Delegated-proof-of-stake (DPOS): A system of voting your tokens to a select group of full nodes who perform the block validation. This system was invented by Dan Larimer, and was first implemented on the Bitshares platform, before subtle variations were used to create the Ark, Lisk, and EOS networks, among others. In these systems only the top X number of full nodes with the most votes are allowed to validate transactions. This system incentivizes full nodes to process transactions honestly as they are rewarded with fees for doing so. However, these systems by design are as centralized as the number of elected validators.
Masternodes: A system similar to DPOS, though Masternodes are not elected directly. Instead anyone that can pay the required amount to have the minimum number of tokens needed to create a masternode becomes a validator.
Leased proof of stake: A hybrid of DPOS and Masternode models where anyone with the minimum number of tokens can validate blocks, but owners with less than the required minimum threshold can pool their resources by "leasing" their tokens to a full node that has the required number of tokens. The Waves platform uses such a system that allows anyone with a pooled amount of 10,000 or more Waves tokens can validate blocks.
Proof of Importance: A validation system used by the NEM platform that uses an implementation of Eigentrust to deter sybil attacks by analyzing relationships between nodes to determine if users are trying to maliciously co-opt the ledger. This system currently employs "supernodes" with 3 million NEM as collateral to validate all transactions on the network.
Slot Leader: A validation system used by Cardano which employs a variation of game theory mathematics to ensure bad actors cannot co-opt the ledger by making the cost of co-opting the ledger higher than the benefits gained.
Practical Byzantine Fault Tolerance: The consensus mechanism governing the IBM Hyperledger, NEO, Stellar, and Ripple projects where interconnected networks of full nodes can be chosen by participants to validate transactions. Variations on solving the Byzantine General's Problem are employed by many other projects all with the same goal to choose a validator that can be trusted to process the transactions honestly.
Each of these validation systems deserves a chapter of its own to explain in full detail, which would involve dissecting each project's individual "whitepaper" (or blueprint) for how they implement their protocol.
In an ideal world, the project with the most efficient and secure consensus mechanism would ultimately win out over projects with less efficient and thus costlier mechanisms. However in the real world we know:
Any project can upgrade to any other project's consensus mechanism with enough network support.
While Betamax had the superior image quality, it did not win out over the technically inferior VHS. This was partially due to the porn industry choosing VHS which spurred consumer demand for VHS players, but more likely because VHS tape manufacturers geared up production for the lower licensing fees VHS offered over the proprietary Betamax, and VHS being capable of recording longer videos. Read into this cautionary metaphor what you will.
(Onchain) Scaling Debate
For the last two chapters, we have built on the notion that there is an inherent inefficiency in distributed ledgers due to a monolithic single source of truth that needs to continuously grow as new transactions are added to the ledger.
But what if you could create your own distributed ledger that did not rely on a single monolithic source of truth to work?
If you follow the DLT space for long enough you'll notice how often projects tout "transactions-per-second" or how many raw transactions the network is capable of processing.
Interestingly, you never hear about the existing centralized internet discussed in how many transactions per second it can process (because it's a absurd notion). The current internet has a maximum throughput limited only by the number of new web servers that can be spun up to meet demand.
This is the heart of the scaling debate: or how to gain the benefits of honest and immutable data records, with the capacity to run the entire existing internet plus the upcoming massive expansion of internet-of-things connected devices.
One of the central questions this book tries to answer is
Public chains vs private chains:
So far we have only discussed public blockchains like bitcoin. Because bitcoin is an open source protocol, anyone is free to copy-paste the bitcoin source code (or any other open source protocol for that matter) onto their own private server network and create their own independent ledger.
This solution is excellent for keeping bloated transaction data off a universal main public chain, but is terrible for immutability. If a corporation employs a private blockchain purely inside of their network with no outside validation, there is no way to know if transactions are honest, which is the entire point of a distributed ledger in the first place. We will devote an entire chapter later on in the book to this point.
This segues nicely into why sidechains are excellent scalability solutions. In effect, a sidechain is just a private chain that decides to write transactions to a main public chain at regular intervals. As hashes are so efficient at turning data of an arbitrary size into a small fixed size, hashes of the private chain can be stored to the main chain instead of the entire private chain. This can increase the scalability of the protocol by orders the magnitude as each full node does not need to store every copy of every transaction on each sidechain, but only needs to maintain a copy of the more streamlined main chain.
From sidechains, we can move one step further by creating systems there the main chain can be broken into many smaller parts, where nodes only need to store a portion (or shard) of the main chain, rather than the entire thing. The mechanics of sharding are complex and are still in the early development stages as there are significant security issues involved with sharding that are fundamentally at odds with universal consensus.
The concept of sharding is so important it will get its own chapter, stay tuned!
What if there was a system where each end user hosted only his or her local copies of transactions relevant to them, plus a partial copy of their neighbors transactions for create redundancy in the system?
This is the heart of implementations such as "hashgraphs", "block lattices" and many other deep jargon terms.
At a high level, these systems want any user to be able to create their own private encrypted network that does not need to communicate with a single main public chain to work. Hence, many of the “you-are” approaches favor private implementations behind corporate or governmental walled gardens, though this does not necessarily need to be the case.
In such systems effectively "you" are the miner that validates your own transactions, or optionally enters into relationships with other nodes on the network to validate transactions for you. (Eg. users that do not wish to run their own software, can still interact with the system by paying a node operator to run a more full copy of the ledger on their behalf)
Such a system does not necessarily need to batch transactions into distinct blocks that must be validated by a single monolithic chain.
Instead a user running a node posts an asynchronous (not at the same time) transaction to their local ledger.
Anyone they wish to share this transaction with can receive the transaction.
No universal copy needs to exist where everyone on the network agrees that the transaction happened. Whether hashes proving the transaction exists to all ledgers should be in the architecture will be explored in chapter 5.
If approached correctly, these new flavors of "multiple consensus" are at the vanguard of the distributed ledger space, and can potentially disrupt "monolithic consensus" ledger based systems. As we will find out further into the book, tying transactions from one ledger to another becomes crucial when trying to prevent someone from spending the same funds twice (Eg the entire reason bitcoin exists).
Distributed ledgers in their current incarnation are flood networks which require all nodes to update their ledgers to work. As transactions can occur asynchronously, it takes time for the latest transactions to propagate through the network. Unfortunately, the mathematics of this process are exponential making distributed flood networks orders of magnitude less efficient than centralized networks, as each node must sync itself with all other nodes over time for the system to function. (Sending bitcoin is like sending an email to every person with an email address just to get to the right person)
At a high level, we have established three possible ways distributed ledgers can reach consensus:
They-are-the-miner based proof-of-work systems maintain their relevance as users agree that "sacrifice" is important to maintain the integrity of the ledger. Rather than transacting directly on the main chain, side chains and off chain solutions could develop that occasionally write transactions to the main chain to for events important enough to spend the electricity for.
You-elect-the-miner systems are chosen over they-are-the-miner systems with the same logic. Eg. sidechain/off chain solutions will use main "you-elect-the-miner" based chain to prove globally that a transaction happened.
You-are-the-miner multi-consensus private chains will operate independently of main chains in overlapping networks of trust. "They-are" and "You-elect" based systems could still exist and be referenced by such "you-are" based systems if there is enough economic incentive to keep monolithic ledgers operating.
Keep in mind there is also nothing stopping an existing protocol from voting in a new system of rules to abide by. Developers can either choose to keep existing tokens relevant via transfers to a new chain, or go out on their own and issue new tokens in a new system in hopes of gaining traction.
Now that we have thoroughly exhausted the different flavors of how these base level consensus protocols work, we can expand to the many other layers of architecture needed to create a new decentralized internet.