1:5 High Assurance
Somewhere deep in your brain, hidden underneath a haze of adolescent hormones, lies the distant memory of writing geometry proofs. Remember those silly things that prove corresponding angles are actually equal?
In the world of computer science (which distributed ledgers are an expression of) there are techniques that allow programmers to mathematically prove code will always execute the same way every time.
These techniques commonly called "high assurance" software design are very labor intensive to deploy, and have largely been reserved for critical functions such as energy grids, airplane control systems, and other software that cannot fail.
There is an amazing level of abstraction between the words you see typed on this screen, and the 1s and 0s your processor turns on and off inside of your computer to show you this text.
Underneath this beautiful Squarespace website lies:
Below this lies an interpreter (either server side or on your web browser) that translates the programming languages into an intermediate representation of machine readable bytecode
At the very bottom lies the execution of the intermediate representation into raw 1s and 0s using your computer's CPU, along with many libraries native to your operating system and web browser that helps the process along.
This amazing ballet for the most part relies on a subset of computer science called "imperative" languages that allow programmers to write code that tells the computer "how" to do something, rather than explicitly "what" to do.
Imperative languages are a massive time and resource savings, as programmers can skip many steps by writing high level statements to accomplish a task, rather than needing to explicitly state every single sub task the computer needs to do.
By giving up control over every step of the process, programmers gain ease of use and speed, but lose the guarantee of security and reliability. This is a perfectly acceptable trade off in any type of application that is not mission critical.
Take this adorable rabbit GIF for instance.
If your web browser has a bug and crashes trying to render the GIF, no permanent damage is done. However, if the database that hosts this GIF has a bug and causes the GIF to be permanently corrupted, the world would be a slightly worse place to live in.
To a computer, it makes no difference if the file is a rabbit GIF, or an audio recording of a taped confession implicating a high ranking official in a corruption case, or a real estate title, or a bank account...
As distributed ledgers contain billions of fiat dollars worth of value today (and will eventually host services like organ donor registries) it is exceedingly important the core levels of the protocol are written in such a way they mathematically cannot fail.
We know how to keep Avionics systems on airplanes from crashing, so we should be able to use these same techniques to keep the rest of our global IT infrastructure from crashing.
To explain this crash proof (or really theoretically crash proof) method of writing software, we must go back to a time before computers even existed.
A (Very) Brief History of the world
Please forgive us for reducing 80 years of mathematics and computer science into a few grossly oversimplified paragraphs.
The problem asks for an algorithm that takes as input a statement of a first-order logic (possibly with a finite number of axioms beyond the usual axioms of first-order logic) and answers "Yes" or "No" according to whether the statement is universally valid, i.e., valid in every structure satisfying the axioms. By the completeness theorem of first-order logic, a statement is universally valid if and only if it can be deduced from the axioms, so the Entscheidungsproblem can also be viewed as asking for an algorithm to decide whether a given statement is provable from the axioms using the rules of logic.
In plain english, this proves that any computation can also be represented in a computer in such a way you can get a provable yes or no answer. To make the proof work, a new mathematical language called Lamda Calculus was developed.
As software crashes when the computer doesn't know to do and gets stuck in between yes and no, Lamda Calculus can be immensely powerful when expressed via programming languages, as it guarantees a definitive yes or no answer as every piece of logic is ultimately represented by a mathematical function.
The hallmark function of Lamda Calculus in relation to computer science is the Y Combinator.
This amazing piece of logic allows for recursion to exist in functional programming languages. Computers gain their immense power from taking the solution to a function and feeding it back into itself potentially billions of times in a second to solve things humans never could by long hand. Without recursion, there would be no Bitcoin mining, as all a Bitcoin miner does is recursively guess over and over again which random number will win the next block.
Recursion is of course a double edged sword, as things like memory leaks will cause your computer to crash if an infinite loop gets going that slowly eats up more and more memory. The beauty of functional programming over imperative programming, is the computer is told to do the recursion explicitly, rather than the bug hidden inside of an opaque structure where the logic cannot be easily reduced what is causing the leak.
In other words, most computer programs written today in imperative languages "trust" opaque black box written somewhere else to do something. If programmers are not incredibly careful, an inadvertent flaw in another part of the code will break the entire program. Effectively, all functional programming does is lay bare each programmatic function as math which can be "provably" checked that it will behave the same way every time.
By the 1980s, immense amounts of groundwork had been laid from the combinatory logic work of Haskell Curry discovering the Y Combinator, to the development of Hindley–Milner type system, culminating in the creation of the first usable functional programming languages.
Ever since a small contingent of developers willing to write code in a rigorous and often unforgiving way are rewarded with code that is mathematically guaranteed to execute the same "formally verified" way every time.
Bringing math to the masses
The world is incredibly far away from running in a formally verified way. Not only to 99% of people on earth not understanding basic programming, 99% of programmers have no idea what formal verification is.
That's fine though, as long as somebody understands formal verification, or the process that can mathematically prove the correctness of any arbitrary algorithm.
In an ideal world, a programmer would write her software using whatever language she felt most comfortable in, and the code would undergo a thorough review during the compilation process revealing all bugs.
In the real world, this does not happen, and will not for quite some time.
This leaves you with is two types of systems:
One that will let you compile bad code and potentially introduce bugs into the system (99% of all computer programs)
And one that will NOT let you compile bad code. (Functional programming and formal verification)
The distributed ledger space has largely divided itself into two camps regarding this issue.
One camp limits what kind of code can be executed on the ledger network to a very narrow range of possible commands to get around the issue entirely. Bitcoin is one such system as it only contains a set number of operational codes "op codes" that limit the types of operations that can occur on the network. In this scenario formal verification is less necessary as the network does not allow arbitrary computation to be run on the network.
The other camp does not limit what types of operations can be executed on the network and allows any conceivable code to run. This way of running a distributed ledger network is fine until it is not fine, as a software bug running on a global network of shared ledgers can result in catastrophic failure.
The ideal solution is likely to combine the security and integrity of the limited base protocol layer, with the unlimited creatively afforded by a separate "smart contracting layer" that can execute any arbitrary command in a formally verifiable way.
When you run software on your local machine the process is fairly straightforward from the programming language, to the compiler, down to the bare metal of your computer's processor, ram, etc.
Running a computer program on a distributed ledger network is not so straightforward as there is no clear relationship between the software you want to run, and the hardware it will run on.
The way this is handled on distributed ledger networks is by running applications (smart contracts) inside of self contained virtual operating systems that compile the code and run the program.
If you have ever "remoted in" to a copy of Windows you have used a virtual machine. This process was probably chunky and error prone as the operating system was not optimized to be so far removed from the base hardware, and thus degraded the performance.
The same thing happens to applications (smart contracts) running on distributed ledger networks. They can't run natively on individual hardware so the code is sent to a virtual machine to be executed and run.
Most web applications today run on a Java virtual machine, which was never optimized to run large scale applications. This is why you never see computationally demanding tasks like playing video games running inside of web browsers.
Fortunately, there is an emerging coalesence from existing centralized services such as Google and Amazon around "serverless" computation using tools like WebAssembly. The development path forward is most likely to borrow from technology across the compute, video game, VR/AR industries, who are all searching for ways to run complex computations natively on the internet.
The distributed ledger networks of the future will need to address countless issues before complex real world applications can be deployed securely on top of distributed protocols.
The following is a laundry list of tasks developers will have to grapple with in the coming decades. While in no way exhaustive, we have attempted to round up an overview of key technical development areas that will need to be addressed before trust can be shifted to distributed ledgers in a meaningful way, beyond the simple sending and receiving of transactions.
Each bullet point deserves into own chapter, but in an attempt to keep the conversation as high level as possible, we will link to the relevant sources and summarize instead of explain each concept in great detail.
Verifying Smart Contracts: To make better code, we need better tools. Rather than expecting developers to create formal methods from scratch each time they develop a new project, pre-existing libraries and dependencies can be called that have gone through rigorous formal verification. Projects such as the K Framework offer a glimpse of what these formally verified libraries can offer developers without asking them to re-configure their code into an actual functional programming language like Haskell or OCAML.
Limiting Precompiles: One of the more powerful features of technologies like Web Assembly is the ability to compile in real time inside of the framework, rather that trusting precompiled code. Critical pieces of distributed ledger infrastructure such as the elliptical-curve cryptography used is currently precompiled on platforms like Ethereum. While very few people on earth are capable of sneaking in vulnerabilities into the pre-compiles, without technologies like WebAssembly we cannot be guaranteed there are no vulnerabilities.
Adopting an Agent-based Model to avoid concurrency issues: Concurrency is the ability for software to run multiple processes simultaneously. This is a wonderful advancement in computer science that allows massively more computation to be performed by running many processing in parallel, rather than waiting for one process to finish before the next can begin. However, in the realm of smart contracts the opposite is often crucial to prevent theft. The original Ethereum DAO hack was possible due to a concurrency bug in the DAO smart contract. The hacker created two instances of the same state and re-injected himself back into the smart contract over and over again, draining out hundreds of millions (now billions in today's dollars) worth of Ethereum.
Secure execution: Several related areas of computer science including secure multi-party computation, zero knowledge proofs, and trusted execution envorinments will form an essential backbone of the distributed ledger infrastructure.
Secure multi-party computation: Is an area of computer science concerned with deploying code onto a distributed network of computers for computation, where the computer executing the code does not know the contents of the code they are executing. For systems like secure voting to work, you want the execution to be a provably trusted black box that is mathematically guaranteed to compute the correct result, but the actual calculation is hidden from the computers processing the data.
Zero Knowledge Proofs: development of Zero Knowledge proofs are intertwined with secure multi-party computation in that they want the contents of the computation hidden, with a proof receipt that the computation was performed properly. However, zero knowledge differs in that the trusted agent running the computation knows the contents, and is simply shielding the computation from outsiders. Zero knowledge proofs are thus easier to execute than truly secure multi-party computation. Shielding the sender and amount of a simple transfer transaction is available today on platforms like Zcash and Monero, but shielding any arbitrary computation is a much more difficult problem to solve.
Trusted Execution Environments: relate to secure areas of computer processors that do not let any other areas of the processor interact with computations inside of the secure area. This is easier said than done, as ultimately a hardware manufacturer is trusted to create these secure environments.
This is Not Easy
If this entire chapter felt like drowning in the deep end of the pool, that's okay. The goal is not to understand the abstruse nuances of every computer science problem. Simply knowing that these problems exist (and that smart people are working to solve them) puts you far ahead of everyone who has not taken the time to familiarize themselves with the fundamental issues involved in moving every mission critical IT system on earth to a distributed ledger.
Part I is intended to serve as menu of raw ingredients that can be combined, dismantled, and re combined at will. Which consensus mechanisms, programming languages, virtual machines, etc. will ultimately constitute the "good architecture" of this new ecosystem is anyone's guess.
However, knowing that something will emerge from this primordial soup of computer science innovation is important. Moreover, knowing that "fit" designs (which actually solve real problems) will beat out vaporware in the long term.
In Part II we will attempt to hang a series of heuristics (ways of thinking) from evolutionary biology, to game theory, and the madness of crowds on the technical framework presented in Part I.
To bridge the technical with the socio-cultural, the final chapter in Part I will be devoted to the logic being developed to help distributed ledger systems interact with real world, including identity management and binding arbitration use cases.