1:8 Quasi-Privacy to Data Security
Legal, moral, and existential implications aside, privacy on distributed ledgers involves nothing more than moving around 1s and 0s in a slightly different way.
Rather than make a stand for or against privacy (we will leave that to centuries of philosophy), this chapter is going to lay bare the technical implementations that make privacy mathematically possible, yet practically very difficult.
The only frame we ask of the reader is to try and approach emotionally charged use cases such as:
privacy of patient health records
and uncensorable freedom of expression
from the same practical viewpoint.
By default, distributed ledgers are "quasi-private":
Private in that transactions are not in plain text, but encrypted into random strings of letters and numbers.
Public in that transactions posted to public ledgers can be searched on a globally accessible database that anyone in the world and look up and analyze.
As our example in Chapter 1 showed:
Alice does NOT send the plain text results of her transcript (or blood test, or real estate title, or any arbitrary data) to Bob
Instead Abababab sends the encrypted results to Bcbcbcbcbc
Even though we don't know the sender, receiver, or contents of the messages directly (as they are hidden behind encryption) we have an abundance of publicly accessible Metadata about each transaction.
“Meta” + “Data” just means data that can be inferred from the base set of data. As in that that scary movie about scary movies is "so meta" when it references tropes from the genre indirectly.
While a meta scary movie might play on tropes to frighten you temporarily, Metadata can build a meticulous profile about your habits which in turn can color how the outside world views you as an economic & social actor.
As Edward Snowden proved, the NSA might not listen to your phone calls directly, but they do know a lot of metadata about you including:
a networked web of who you call the most frequently
who they call the most frequently and call duration
which cell phone tower this web of contacts connects through, etc.
With the power of metadata analysis, anyone that thinks their Bitcoin transactions will be anonymous forever is fooling themselves.
As every Bitcoin transaction ever made is hashed together to all previous transactions, it is straightforward to play the entire concert of Bitcoin transactions in reverse back to the origin of the protocol in 2009.
Let's follow the money on a hypothetical digital extortion case to see how attackers try to maintain anonymity, while leaving behind digital breadcrumbs that can lead to their real world identities being found.
Our hypothetical heist requires three key events to pull off the job successfully:
The hack: Hackers gain access to a hotel's key card system and demand payment of 1 bitcoin to turn the system back on.
The ransom payment: The hotel converts fiat currency into bitcoin and sends a payment to the attacker's public key address.
The getaway: Once the attacker receives the bitcoin, there are many different ways she can attempt to launder it.
Ranked from dumbest to smartest, the attacker could:
Send the Bitcoin to a merchant like Overstock.com and attempt to buy something.
Send the Bitcoin to a centralized exchange to try and convert the Bitcoin to US dollars
Use a service like localbitcoins.com to attempt to sell the Bitcoin to someone in person for cash.
The commonality between the first three dumb ways to maintain privacy revolve around linking the illicit Bitcoin address to the “real world”.
Overstock.com is going to mail an illicit purchase to some physical address
The centralized exchange is going to require some kind of know your customer proof before releasing fiat currency to a bank (which of course would track the inbound transaction)
The person you meet for cash outside of doing something from a spy movie like dead drop could identify you
That’s the fascinating twist with financial crime, eventually it will end up back in the real world. Unless the ransom money is cleaned effectively (eg cannot be traced back to its origin) the money is effectively useless.
This happens frequently with art heists. Sure the original “rightful” owners of the art might have been murdered by the Nazis in World War II, but someone else might want the painting secretly hanging in their South American mansion. Is the art worth millions or nothing? It all depends if a buyer can be found or not. The market for immoral money might only be a tiny fraction of global liquidity, but it is out there.
What if instead of a detective piecing together the clues after the fact, the transaction was simply flagged by the person being extorted so everyone would know not to accept the stolen funds?
This the logic behind "coloring tokens". Just like a dye pack exploding on physical cash after a bank robbery, a digital explosion of color would go off that lets everyone in the system know they shouldn’t accept the stolen funds. If they are programmatically capable of receiving the funds is an entirely different matter.
We already have systems just like coloring tokens today in our email servers to prevent spam, or with services like Cloudflare that protect websites from distributed denial of service (DDOS) attacks.
It is fairly straightforward to create a white-list of approved addresses, and a black-list of blocked addresses to prevent bad actors from interacting with the system.
However, unless a protocol upgrade is made where Bitcoin miners are forced to reject transactions from a master blocked list, blanket enforcement is impossible. As Bitcoin derives its value from perceived immunity from outside collusion, implementing a censorship system directly into the core Bitcoin protocol is unlikely.
Instead, it is more likely large exchanges would act just like today's large email providers, and build their their own internal block-lists to thwart bad actors intending to spend illicit funds.
To get around such a system, the fourth option our attacker can use involves:
Converting the Bitcoin to another token with special privacy features before attempting to spend it.
This drags us down a very deep and dark black hole that will illuminate ways to protect patient health data and build provably secure anonymous elections, as much as it will be used to evade being caught for digital extortion.
There are two key pieces at play in our attackers fourth option
The conversion process itself from one token to another
And the added step of converting to special kinds of tokens with in built privacy features.
Say hypothetically the attacker sends her stolen bitcoin to an exchange and buys Ethereum to then spend on an Initial Coin Offering (more on ICOs in part II).
This is any regulator's worst nightmare: using illicit funds to acquire interest in a business.
Fortunately, in this example there is still a breadcrumb trail that can be followed back to the source. The series of transactions involved in the heist would look something like:
Hotel sends fiat currency to exchange X and receives 1 Bitcoin at address B123.
Hotel sends 1 Bitcoin to attacker's address B234
Attacker sends 1 Bitcoin at address B234 to exchange Y's address B345 and purchases 10 Ethereum.
Attacker withdraws 10 Ethereum from exchange Y and into an Ethereum wallet with address E456
Attacker sends 10 Ethereum from address E456 to the Initial Coin Offering Address E567
Finally, the ICO sends the attacker 10,000 ICO tokens to back to her address E456
ICO tokens from E456 are sent to, etc, etc. ad infinitum
While complicated to dig through, everything about this series of transactions is in fact traceable. The only major issue is in the bolded part of step three where authorities would need to subpoena the centralized exchange to determine how exactly the Bitcoin was converted to Ethereum.
If the centralized exchange fails, and every last copy of the central database is destroyed, then the chain of custody would be broken, and no one would be able to determine exactly where the Bitcoin went (outside of carefully inferring from the metadata around the time of the theft to piece back together the chain of events)
From a regulatory standpoint, the best an ICO can do is only accept funds from accounts that go through KYC/AML (know-your-customer & anti-money laundering) verification.
Still, once ICO tokens hit the open market, there is no way to prevent anyone from anywhere buying them outside of the afore-mentioned colored token blacklisting techniques.
Worse, the very notion of centralized KYC/AML verification might be fundamentally flawed. Exchanges leak KYC/AML data all of the time which threatens verified users on crypto exchanges to potentially be extorted for their private keys and logins.
A modern twist to solve this issue is the “authorization required” flag. Such a piece of code simply prevents any account address from receiving the tokens unless they have previously gone through an approved KYC process and registered their wallet address with their real (or fake depending on the level of KYC) identity.
Eg. Hacker creates a new Stellar Lumens wallet from scratch with address ABCABC.
This new address has not been flagged as capable of receiving ICO tokens.
When the hacker attempts to buy the ICO tokens and transfer them to his new wallet ABCABC, the transaction will not process.
One step further removed from the permission less origins of blockchain is the “authorization revocable” flag. Such a flag in essence works similarly to a 2 of 3 multi-sig address where the issuer always has the ability to remove the tokens from your account. In effect, this system re-creates the existing banking system..
"Less" Traceable Conversion
If things weren't complicated enough, instead of converting Bitcoin into a traceable token like an Ethereum or Stellar ICO, the attacker could instead have instead converted to any number of privacy focused tokens that intentionally attempt to conceal the sender, receiver, and transaction data.
While not an exhaustive list of privacy technologies, there are three main ways to perform "less" traceable conversions:
Off-chain transactions on second layer solutions like the lightning network
Mixing strategies that attempt to jumble so many transactions together they cannot be practically untied
The use of zero-knowledge proof technology which allows computers to verify the correctness of computations WITHOUT knowing what is inside the computation..
Off-chain privacy is the most obvious privacy solution to grasp, as off chain transactions allow users to send transactions without leaving any proof on a public ledger.
In this case, the attacker opens an off-chain channel by committing the 1 Bitcoin to a lightning address say L123.
Now entirely without reference to the ledger, the attacker can doing anything from buy a different token entirely on the lightning network, to simply pulling down the Bitcoin via a different lightning channel to a new address. (This new addresses will not be linked on the blockchain to the old address) As long as a second layer bank-esque solution allows the collateralized lightning payments to be accepted, illicit funds can potentially be converted to something else like Starbucks points or US dollars.
Mixing strategies are also quite straightforward to explain as they do exactly what the name implies - jumble your tokens together with a bunch of other transactions that make it difficult to know where the tokens went.
There are informal ways of mixing via sending to a centralized exchange that has no formal KYC/AML policy, or more formal methods such as Ring-CT type transactions that automate the process.
Zero-Knowledge proofs which to understand you really should just read this blog post. If there ever was a time to invoke the drive-your-car-without-understanding-the-Carnot-cycle/Use-your-TV-remote-without-understanding-infrared analogy.. this is it.
Without understanding the math behind how they work, the important takeaway of Zero-knowledge proofs is they provide a mathematically secure way to perform private computations.
Of course even in the best math, pesky Metadata can severely limit theoretical privacy. Researchers with enough time and resources were able to significantly reduce privacy just by analyzing who sends transactions to whom in non private transactions on the network.
There will always be an escalating arms race between math that creates new privacy solutions, and math that attempts to reverse engineer privacy back to transparency. When did gold coins get ridges on them? When coin shavers started skimming down the size of gold coins by shaving off minute amounts of the precious metal.
The Ghost of Good Architecture
This arms race leaves us with an interesting paradox to end part I on.
There is a blackhole that exists underneath the distributed ledger world. Namely that there are ways of converting any transparent token into a privacy token which nearly guarantees a clean getaway. (at least for a while until every physical kiosk demands Orwellian transparency)
Yet with known oracles and identities, colored tokens, etc. the same technology can prove exactly what someone did, and when they did it in an immutable record.
Thus the takeaway for part I of the book is: BE REALLY CAREFUL HOW YOU DESIGN DISTRIBUTED LEDGERS
Anything that makes it onto a public ledger is openly available to be analyzed using metadata techniques not just today, but at any time in the future by not just today’s humans, but potentially tomorrow’s AIs.
We have gone on quite the journey so far from cuneiform ledgers, to consensus mechanisms, systems architecture, scaling solutions, and identity to finally culminate with privacy.
In Part II, we will begin combining concepts from Part I to paint a more nuanced picture of how we see the space developing.
We will delve deeply into the “trust continuum”, or how public and private ledgers will interact with each other to preserve the privacy of critical data, while also ensuring a hashed fingerprint of private ledger data exists on a public ledger to prove to the world the data is really immutable.
Maybe there is still a glimmer in you that sees buggy whip centralized databases continuing to run the world indefinitely.
Unfortunately, those HIPAA compliant health records of yours are only one administrative breach away from massive punitive violations. Or maybe you are Chase losing 76 million records to Isreali hackers, or Equifax, or the records of every single classified employee in the United States government.
It's not that distributed ledgers are some panacea, it's that centralized databases are fallible by design. If you can break into something once, and steal (or subtly manipulate) everything from a single point of failure, how can you ever expect to keep data secure, let alone private?
With an image of the above table burned into your brain, a good place to end part I is a final reminder that No Keys = No Crypto: or that losing control of private keys will lose control of whatever information those keys unlock.
While some see this in an anarchist light to enable hiding digital gold under digital mattresses, we see the real ramifications for this technology in corporate and governmental data security.
Remember a centralized database has four things administrators can do:
Read, Write, Modify, and Delete
while distributed ledger databases only contain:
Read & Write
While an amazing breakthrough in itself for humanity, this unfortunately only solves the immutability half of the puzzle. (E.g. a subsequent record can append a guilty verdict to innocent, but the original artifact of the guilty verdict will always remain unless the entire chain is deprecated and replaced with a new one)
What distributed ledgers do not inherently solve, is the ever present issue of access.
Nothing stops an entity from placing a single transaction into a distributed ledger with a single key that can unlock the records of million of users (or unlock billions of dollars worth of Bitcoin in a single transfer).
So what to do?
Do not store a billion dollars worth of Bitcoin using a single private key.
Instead, use multi-signature addresses to require multiple signatures before moving funds.
Moreover, break the balance into many smaller addresses using a securely air gapped machine to generate many addresses that prevent all eggs from being in one basket.
Imagine for a moment the hotel held ransom for digital extortion in our heist example was running on a distributed ledger backbone.
Each hotel guest purchases their stay using cryptocurrency (or tokenized fiat currency).
This unlocks a smart contract that physically unlocks their hotel room door until a timestamp expires.
The hotel maintains 2 of 3 multi-signature access to each room which refreshes after each stay.
Such a system is decentralized and much harder to hack.
Instead of breaking into the cookie jar once (and holding the entire hotel ransom with unfettered access to the key card system) the hackers at best would get to a single room by compromising an individual guest, rather than all guests.
While crypto hotel security may be far into the future, the same concept can be used today to protect our most critical data from massive breaches that hack once and steal all.
From classified government employee data, to consumers shopping at Target, this decentralized data structure is the only long term chance at keeping our important 1s and 0s where they are supposed to be. As we will find out in part II, a decentralized data security framework might also be our best chance at checking the immense power centralized artificial intelligence systems increasingly have over our lives.
So hurry up and let’s build something.
This is a new world.
Data has value.
Data is permanent.
Data is not controlled by corruptible middlemen.