Dutchman Embarking onto a Yacht | Ludolf Backhuysen (1670)

2.5: IOT -> AI -> DLT

Data permeates our discussion of the future so much it has become almost passe. Ask any business leader, technologist, academic, or bureaucrat and they will universally herald the arrival of our “Big Data” future. Surely with enough data we can solve all of our societal ills, or at least make ALOT of money at the expense of unwitting data providers.

Throughout human history “data” (aka information) has been the REAL commodity, not gold, not wheat, not standing armies. What good did having the superior aircraft carriers and pilots do for the Japanese at the battle of Midway when the American code breakers knew they were coming? Information precedes physical action. Whoever has information asymmetry wields asymmetric power over others.

To be useful in the world, data needs to be generated from an accurate source (IOT) -> processed efficiently (AI) -> then stored securely (DLT). This chapter presents a way to think about all three datacentric mega trends under one roof.

  • Starting with IOT (Internet of Things) we have a large umbrella of data gathering technologies. The IOT revolution is enabled by increasingly inexpensive internet connected computers and sensors borrowed from the smartphone revolution.

  • Having vast amounts of IOT generated data means nothing if it cannot be effectively analyzed. We use the term “AI” but in reality Artificial Intelligence and Machine Learning are both variations on the same theme of using statistics to make inferences. AI simply looks for statistical patterns in data by searching fitness landscapes is ways humans would never have the resources or patience to try. This “fitness landscape searching at scale” skill is immensely powerful as it uses iterative computing power to synthesize vast amounts of information until some meaning (aka Correlation, aka Fitness function, aka Goal) is found.

  • This leads to where we should store this vast amount of IOT derived and AI analyzed data.. distributed ledgers! Not only do we need a trusted commons to keep our data out of the hands of centralized monopolies, but we also need the structure DLT inherently provides. The only way hash linked distributed databases can work is to use a shared set of standards. For AI to be truly effective, having a clear digital paper trail with timestamp, sending account/receiving account, etc. information baked into the data itself is crucial.

The IOT Problem + DLT Solution

At the beginning of our data journey from raw information to accurately processed and stored result, many issues can effect the validity and integrity of raw machine generated data. If we look at our current paradigm, IOT data is often generated into unencrypted log files stored locally on the device, then sent to a single centralized SQL server or cloud instance. At minimum, IOT data should be hashed with a compressed hash root store on a distributed ledger. Without taking this step, the integrity of the data gathered is only as good as the 3rd party trusted to keep a record of what was generated.

A salacious news story broke in late 2017 documenting a high increase in water usage from a smart water sensor between the hours of 1 and 3 in the morning. This data point alone was not enough to convict for murder as the integrity of the water meter data was called into question. If life and death hinges on the accuracy of IOT devices, and the integrity of the data after it is generated, why do we continue to rely on outmoded was of handling such crucial information?

If IOT data is hashed and stored on a distributed network, the data trail from inception throughout time can never be tampered with. This however does not stop anyone from putting their Fitbit on their dog to increase their step count. Amusing yes, but what if an insurance company requires 8,000 steps per day to maintain your health insurance premium?

Encrypted, user-centric IOT data is the not only necessary to prevent corporate and governmental excesses from eroding personal liberties, BUT it is also the only way to guarantee the validity of supply chains and a litany of other business functions that rely on honest data generation -> processing -> and storage.

Imagine inexpensive IOT sensors embedded into every net full of Sierra Leonian fish caught. Data like temperature, humidity, g force, and of course geo XY location can be tracked from ocean to supermarket. As each IOT device has a unique serial number, by hash chaining together events, it becomes nearly impossible to fake a supply chain. Even if device serial numbers are spoofed for instance, checking a shared ledger for matching past events can automatically flag and ignore the malicious actor attempting to co-opt the ledger.

The AI Problem + DLT Solution

So now we have terabytes of fish data… great. Not so long ago in the pen and paper days at best a few data points could be accurately captured for basic analysis. Today, data is infinitely more portable, computable, granular, and ultimately useful when expressed as 1s and 0s.

While computers have leveraged statistical techniques for decades to gain deeper insights than ever possible before, recent developments have given computer algorithms increasingly more autonomy to find novel patterns in data. Rather than limit themselves to a human derived search space, Artificially intelligent algorithms can search for novel solutions for beating the best humans at Jeopardy, Chess, and until recently thought impossible.. Go.

As amazing as these algorithms are, they cannot generate something from nothing.

In fact, AI relies on massive reams of well defined data to do anything at all. Even modern programs like AlphaZero that can start from zero knowledge and quickly master classic video games or chess boards are not learning from zero data. To become competent at anything, AI needs trial and error to build a sufficiently large “training data set”.

AlphaZero is smart enough to start from scratch and build its own massive dataset by playing itself millions of times, but it could just as easily load a pre-existing dataset of all human games played over the last century. Alas, Chess is not the real world. Immensely helpful for training your brain in higher order thinking, but not at actually navigating the intricacies real world.

To rehash an argument made consistent throughout the book… If the source data used to train closed source corporate AIs is scraped from the open internet, the link between the original data creator and the data processor becomes lost. The current “data-as-commodity” trend means the individual value of a post, tweet, email, or Fitbit sleep schedule is inconsequential. Only in aggregate does data have value when there is a large enough sample size to make the set statistically relevant.

The world does not have to work this way however if we use DLT to create data sovereignty at the individual level. Does this mean each individual needs to manually select settings of every piece of data generated for what is and isn’t shared with an AI? Does this mean that querying large datasets becomes so expensive that the massive advantages in AI driven improvements over the last decades will suddenly grind to a halt?

Hopefully, not. What better than a pattern recognizing AI programs to optimize these type of contractual relationships behind the scenes in an automated fashion. How we get to this future is unclear. What is clear.. data = value. If we continually give away over information for free in exchange for cheap dopamine hits, there will be no middle class left to buy things the owners of the AI systems create. Instead, we can use DLT to re-imagine our dopamine games into a two-way street where value is exchanged between the data provider and data aggregator.

At first blush this seems impossible. Why would incumbents with such massive leverage over the less technologically savvy proletariat give up their siren server throne?

The reason is surprisingly straightforward. DLT data is simply higher quality than the unstructured/semi structured datasets of today. If an AI knows not just the sender and receiver of the transaction in question, but also metadata about all previous linked transactions in the chain, the possibilities for finding new statistically significant meaning are immense. Instead of random data fragments existing without context, in the DLT world all data must follow rigid standards or it will not be accepted by the rules governing the shared infrastructure.

In addition to being higher quality, AI has the potential to search a much larger shared DLT database than is possible today searching behind individual walled gardens. Creating a neutral Switzerland repository where data can be selectively shared across individuals, enterprises, and governments is no easy task. Facebook has no incentive to give Google access to their closely guarded user information, or vice versa. In fact, the entire FAANG business model is predicted on maintaining monopoly power over stolen data. While the ideal end state is empowering individuals to be paid for data, while giving AI access to a more useful universal dataset, perverse incentive structures will prevent this reality from materializing until DLT becomes practically useful at scale and with minimal UI based barriers to entry.

The DLT Problem + DLT Solution

The discussion of the DLT problem pushes us back into the technical challenges posed in Part I. Namely how to effectively scale and query data sets at the scale needed to make important decisions. For Part II of the book, we will assume someone has solved these issues through sharding, compression, independent hashchains, etc.

In a fantasy world, researchers would be able to unleash an AI on real time population health information. Not a small outdated sample set, but real IOT source data coming directly off of fitness trackers, scales, pharmacy windows, and blood testing machines.

As AI never gets bored endlessly searching the possible fitness landscape, maybe it finds a high predictive correlation between a certain pharmaceutical and positive or negative DNA expression.

With current state database paradigm, such a task is near impossible as researchers cannot query siloed databases they either do not have access to, or cannot search effectively as each silo is structured in slightly different way. Sure, tools like Splunk exist that drastically increase the speed and efficiency at which big data can but processed and analyzed, but this belies the larger point. As data needs to come from somewhere, which future do we want to live in?

  • The one where we give our FAANG gods complete dominion over our every waking moment.

  • Or the one with data sovereignty, where we make damn sure our data is at minimum encrypted, then starting from default privacy, we (or our AI powered data brokers) negotiate to ensure we are fairly compensated for our contributions to the means of production.

Taking the high road of course meets the rubber of crony capitalism. When a business model is entirely predicted on monetizing a free resource, how do you expect incumbent institutions to willingly change?

We know the regulatory kudgel is wantonly ineffective, despite the best efforts of laws like GDPR. Go ahead and slap FAANG with a billion dollar fine, profits will only be impacted over the short term. Investors know a one time fee will not impact business as usual.

Our argument is to instead appeal to the natural animal spirits that arise whenever there is a scarce resource, instead of waiting for regulators or big tech to solve all of our problems. Bitcoin has already shown a native digital asset can be valuable. As we move from our primitive “make big boom.. fire pretty” stage where we value digital assets like Bitcoin based on the amount of electricity then can waste, we will eventually have efficient digital assets tied directly to underlying value sources such as commodities, equities, bonds, and intellectual property.

Making Data Cool and Profitable

If the cryptosphere has done nothing else, it has made data exciting again. The thought of storing something important on an immutable ledger feels like a semi-religious cause worth pursuing to a small subset of the population. Even if must people don’t care, some people do (maybe even enough to read a whole book about it)

For all of those in the crypto space that do not care, great! Every huckster, scammer, neophyte leaving keys around brings attention to the space. We are all pawns in a global game theoretic casino.

In the next chapter we will define what “digital ownership” can mean to regular non-techy people. Bitcoin proved a rare digital substance (only 21 million will ever be created) holds value at least enough to create a globally liquid market with an ever growing number of daily active wallet addresses. On the previous internet data was infinitely copyable. On the new internet, data has value because it has scarcity and provenance.

Too few people recognize that the high technology so celebrated today is essentially a mathematical technology
— Edsger W. Dijkstra (EWD1305)