Dutchman Embarking onto a Yacht | Ludolf Backhuysen (1670)
2.5: IOT -> AI -> DLT
Data permeates our discussion of the future so much it has become almost passe. Ask any business leader, technologist, academic, or bureaucrat and they will universally herald the arrival of our “Big Data” future. Surely with enough data we can solve all of our societal ills, or at least make ALOT of money at the expense of unwitting data providers.
Throughout human history “data” (aka information) has been the REAL commodity, not gold, not wheat, not standing armies. What good did having the superior aircraft carriers and pilots do for the Japanese at the Battle of Midway when the American code breakers knew they were coming? Information precedes physical action. Whoever has information asymmetry wields asymmetric power over others.
To be useful in the world, data needs to be generated from an accurate source (IOT) -> processed efficiently (AI) -> then stored securely (DLT). This chapter presents a way to think about all three datacentric mega trends under one roof.
Starting with IOT (Internet of Things) we have a large umbrella of data gathering technologies. The IOT revolution is enabled by increasingly inexpensive internet connected computers and sensors borrowed from the smartphone revolution.
Having vast amounts of IOT generated data means nothing if it cannot be effectively analyzed. We use the term “AI” but in reality Artificial Intelligence and Machine Learning are both variations on the same theme of using statistics to make inferences. AI simply looks for statistical patterns in data by searching fitness landscapes in ways humans would never have the resources or patience to try. This “fitness landscape searching at scale” skill is immensely powerful as it uses iterative computing power to synthesize vast amounts of information until some meaning is found.
This leads to where we should store this vast amount of IOT derived and AI analyzed data.. distributed ledgers! Not only do we need a trusted commons to keep our data out of the hands of centralized monopolies, but we also need the structure DLT inherently provides. The only way hash linked distributed databases can work is to use a shared set of standards. For AI to be truly effective, having a clear digital paper trail with timestamp, sending account/receiving account, and a litany of additional metadata type information in an incorruptible format is crucial.
The IOT Problem + DLT Solution
At the beginning of our data journey from raw information to accurately processed and stored result, many issues can effect the validity and integrity of raw machine generated data. If we look at our current paradigm, IOT data is often generated into unencrypted log files stored locally on the device, then sent to a single centralized server or cloud instance. At minimum, IOT data should be hashed with a compressed hash root stored on a distributed ledger. Without taking this step, the integrity of the data gathered is only as good as the 3rd party trusted to keep a record of what was generated.
A salacious news story broke in late 2017 documenting a high increase in water usage from a smart water sensor between the hours of 1 and 3 in the morning. Such an innocuous data point does not seem like it would make the news, until it was revealed the very same night someone had died. Suspicious water meter data alone did not establish a high enough legal bar to convict for murder, as the integrity of the water meter data was called into question. If life and death hinges on the accuracy of IOT devices, and the integrity of the data after it is generated, why do we continue to rely on outmoded was of handling such crucial information?
If IOT data is hashed and stored on a distributed network, the data trail created from inception until the device stops working can never be tampered with. As long as nodes continue to host the data, such a trail can potentially last hundreds of years bouncing from hard drive to hard drive as a programmatically exact memory of what was.
Just because the data sent to the ledger is immutable, does not make it accurate necessarily. Case in point, anyone is free to put their fitness tracker on their pet instead of wearing it themselves. Amusing yes, but what if an insurance company requires 8,000 steps per day to maintain your health insurance premium?
Encrypted, user-centric IOT data handling is the not only necessary to prevent corporate and governmental excesses from eroding personal liberties, BUT it is also the only way to guarantee the validity of supply chains and a litany of other business functions that rely on honest data generation -> processing -> and storage.
Imagine inexpensive IOT sensors embedded into every net full of Sierra Leonian fish caught. Data like temperature, humidity, g force, and of course geo XY location can be tracked from ocean to supermarket. As each IOT device has a unique serial number, by hash chaining together events, it becomes nearly impossible to fake a supply chain. Even if device serial numbers are spoofed for instance, checking a shared ledger for matching past events can automatically flag and ignore the malicious actor attempting to co-opt the ledger.
The AI Problem + DLT Solution
So now we have terabytes of fish data… great. Not so long ago in the pen and paper days, statistical analysis could only be done on manually generated data points such as counting how many Salmon pass by a viewing window next to a Hydropower plant. Today, data is infinitely more portable, computable, granular, and ultimately useful when expressed as 1s and 0s.
While computers have leveraged statistical techniques for decades to gain deeper insights than ever possible before, recent developments have given computer algorithms increasingly more autonomy to find novel patterns in data. Rather than limit themselves to a human derived search space, Artificially intelligent algorithms can search for novel solutions for beating the best humans at Jeopardy, Chess, and until recently thought impossible.. Go. By searching fitness landscapes with clearly defined boundaries, AI programs have made massive strides towards providing real value to the world.
As amazing as these algorithms are, they cannot generate something from nothing.
In fact, AI relies on massive reams of well defined data to do anything at all. Even modern programs like AlphaZero that can start from zero knowledge and quickly master classic video games or chess boards are not learning from zero data. To become competent at anything, AI needs trial and error to build a sufficiently large “training data set”.
AlphaZero is smart enough to start from scratch and build its own massive dataset by playing itself millions of times, but it could just as easily load a pre-existing dataset of all human games played over the last century. Alas, Chess is not the real world. Immensely helpful for training your brain in higher order thinking, but not at actually navigating the intricacies real world.
To rehash an argument made consistent throughout the book… if the source data used to train closed source corporate AIs is scraped/stolen/borrowed from the open internet, the link between the original data creator and the data processor becomes lost. This current “data-as-commodity” trend means the individual value of each post, tweet, email, or Fitbit sleep schedule is inconsequential. Only in aggregate does data have value when there is a large enough sample size to make the set statistically relevant.
The world does not have to work this way however if we use DLT to create data sovereignty at the individual level. Does this mean each individual needs to manually select settings for every piece of data they generate? Or that querying large datasets becomes so expensive that the real quality of life improvements such as developing self driving cars will suddenly grind to a halt?
What better than a pattern recognizing AI programs to optimize these type of contractual relationships behind the scenes in an automated fashion. Each individual in the future could shop for an automated data broker bot to navigate this tricky landscape to maximize the amount of value they receive for their data.
Even if an athlete is not the top star destined for professional greatness, the value of their biometrics data from skeletal position, to heart rate and sleep patterns can help other athletes become better performers.
How we get to this future and how we accurately price this data is unclear.
What is clear.. data = value. If we continually give away our information for free in exchange for cheap dopamine hits, there will be no middle class left to buy things the owners of the AI systems create. Instead, we can use DLT to re-imagine our economic systems into a two-way street where value is exchanged between the data provider and data aggregator.
At first blush this seems impossible. Why would incumbents with such massive leverage over the less technologically savvy proletariat give up their siren server throne?
The reason is surprisingly straightforward. DLT data is simply higher quality than the unstructured/semi structured datasets of today. If an AI knows not just the sender and receiver of the transaction in question, but also metadata about all previous linked transactions in the chain, the possibilities for finding new statistically significant meaning are immense. Instead of random data fragments existing without context, in the DLT world all data must follow rigid standards or it will not be accepted by the rules governing the shared infrastructure.
In addition to being higher quality, AI has the potential to search a much larger shared DLT database than is possible today searching behind individual walled gardens. Creating a neutral Switzerland repository where data can be selectively shared across individuals, enterprises, and governments is no easy task. Facebook has no incentive to give Google access to their closely guarded user information, or vice versa. In fact, the entire FAANG business model is predicted on maintaining monopoly power over scraped/stolen/borrowed data.
While the ideal end state is to both empower individuals by paying them for their data, while balancing AI access to the most intimate details of their lives, perverse incentive structures will prevent this reality from materializing until:
DLT becomes practically useful at scale using the techniques such as sharding introduced in Part I of the book
Barriers to entry are significantly reduced through good UI design and automation of lower level logic
Incumbent network effects are replaced by better incentive structures that end users gravitate towards over using existing FAANG like products. (The least technical yet most difficult part)
The DLT Problem + DLT Solution
The discussion of the DLT problem pushes us back into the technical challenges posed in Part I. Namely how to effectively scale and query data sets at the scale needed to make important decisions. For Part II of the book, we will assume someone has solved these issues through sharding, compression, independent hashchains, etc.
In this fantasy future world, researchers would be able to unleash an AI on real time population health information. Not a small outdated sample set, but real IOT source data coming directly off of fitness trackers, scales, pharmacy windows, and blood testing machines.
As AI never gets bored endlessly searching possible fitness landscapes, maybe it can find a high predictive correlation between a certain pharmaceutical and positive or negative DNA expression.
With our current database paradigm, such a task is near impossible as researchers cannot effectively query each siloed database. They either do not have access, or cannot search effectively across each data silo as each one is structured in slightly different way. Sure, tools like Splunk exist that drastically increase the speed and efficiency at which big data can but processed and analyzed, but this belies the larger point. As data is ultimately generated by humans, which future do we want to live in?
The one where we give our FAANG overlords complete dominion over our every waking moment.
Or the one with data sovereignty, where we make damn sure our data is at minimum encrypted, then starting from default privacy, we (or our AI powered data brokers) negotiate to ensure we are fairly compensated for our contributions to the means of production.
Taking the high road of course meets the rubber of crony capitalism. When a business model is entirely predicted on monetizing a free resource, how do you expect incumbent institutions to willingly change?
We know the regulatory kudgel is wantonly ineffective, despite the best efforts of laws like GDPR. Go ahead and slap FAANG with a billion dollar fine, profits will only be impacted over the short term. Investors know a one time fee will not impact business as usual.
Our argument is to instead appeal to the natural animal spirits that arise whenever there is a scarce resource, instead of waiting for regulators or big tech to solve all of our problems. Bitcoin has already shown a native digital asset can be valuable. As we move from our primitive “make big boom.. fire pretty” stage where we value digital assets like Bitcoin based on the amount of electricity they waste, we will eventually have efficient digital assets tied directly to underlying value sources such as commodities, equities, bonds, and intellectual property.
Making Data Cool and Profitable
If the cryptosphere has done nothing else, it has made data exciting again. The thought of storing something important on an immutable ledger feels like a semi-religious cause worth pursuing to a small subset of the population. Even if most people don’t care, some people do (maybe even enough to read a whole book about it)
For all of those in the crypto space that do not care, great! Every huckster, scammer, and neophyte leaving keys around brings attention to the space. We are all pawns in a global game theoretic casino.
In the next chapter, we will define what “digital ownership” can mean to regular people. Bitcoin proved a rare digital substance (only 21 million will ever be created) holds value at least enough to create a globally liquid market with an ever growing number of daily active wallet addresses. On the previous internet data was infinitely copyable. On the new internet, data has value because it has scarcity and provenance.