Data Science & Analytics

Blockchain and Data Storage: A Perfect Match?

Dr Justin Chan

September 4, 2018·7 min read

Blockchain and Data Storage: A Perfect Match?

Data. It’s fast becoming one of the world’s most highly-prized resources. Indeed, its recent proliferation is virtually our raison d'être here at Data Driven Investor. Of course, us DDI folk are not the only ones who believe in the power of data. Even a cursory glance at the world’s top 10 companies by market capitalization shows an intimidating dominance by data-hungry tech firms such as Amazon, Facebook and Microsoft. All of these companies will be unleashing vast quantities of data in the coming years, as will many others. And so, given the onslaught of data that the world now anticipates, the stronger our capabilities to store it must become. This not only applies to Big Data, but also to our own personal data. After all, the more utility that data offers in the future, the more unwanted attention it’s going to receive from hackers. In fact, we’re already seeing this play out, with disastrous breaches of Equifax and Anthem in recent years proving that the storing of valuable, often highly sensitive information on centralized servers can leave it extremely vulnerable to theft, and it can be painfully expensive for the storage provider.

Decentralized Architecture for File Storage can help…

[embedyt] https://www.youtube.com/watch?v=vl3bUzfn2lg[/embedyt] Decentralized file storage is by no means a new concept. If you were a rabid downloader of mp3 music files back in the late 1990s / early-2000s, then it’s likely you’re familiar with the likes of Bittorrent and Limewire. Rather than downloading files from a centralized server, these peer-to-peer (P2P) programs enable a file to first be hosted (or ‘seeded’) by a single computer, before being divided up into smaller chunks and then distributed throughout a network of computers who are also trying to download the file. This ultimately allows each network node to download individual parts of the file from their peers, whilst also uploading other parts of the file to other peers. Decentralization should also boost download speeds, as files are downloaded from multiple nodes instead of a single, centralized server. But some problems have arisen with this model:

Should one computer finish downloading the file before the others in the network, there is little stopping the owner from switching off his machine upon completion, which means that in some instances, others in the network might never receive a complete copy of a file.
There is even less incentive to seed the less popular files, which would make downloading them less reliable, and often slow. And because seeding takes up bandwidth, there is no incentive to continue hosting such files.

So, nodes should be incentivized to continue hosting data in all conditions…

A Blockchain Remedy

Blockchain is now incentivizing people to continue hosting data by offering them a token with monetary value. This model involves ‘farmers’ in a P2P network, entities that offer storage space on the network, and who host data and receive token rewards for doing so. Some projects require farmers to pledge collateral through a smart contract, so they can guarantee sufficient uptime to host data. But ultimately, the chance to receive potentially lucrative returns in the form of crypto prevents farmers from turning their computers off. But as is often the case with blockchain-based applications, scalability is a concern. At this stage, two specific technologies are being used to address this challenge:

Sharding – a technique to logically divide up data within a database. The data is broken up into ‘shards’ that form the original database when pieced back together.
Swarming – a process that involves collectively storing shards in a large group of nodes (a ‘swarm’) within the P2P network. Devices within a swarm can then conveniently retrieve data from the nearest nodes, which ultimately reduces latency, and boosts reliability and scalability.

Similar to the torrenting model of decentralized file storage, files are uploaded, sharded and distributed to nodes throughout the network. This ensures that should one node experience failure, the remaining nodes can still build the file from the shards distributed throughout the network. For added security, moreover, the files are encrypted which prevents nodes from being able to decipher what the file contains. [embedyt] https://www.youtube.com/watch?v=EClPAFPeXIQ[/embedyt] To recall a file, a Distributed Hash Table– a list with keys and values associated with them that contains pointers to where the data exists – is used to locate all the file’s shards. The network can then use the shards to rebuild the file, before the file-owner uses a private key to decrypt the file for use.

3 benefits of using blockchain for data storage

It’s more difficult to hack than a centralized cloud service with a single point of failure, such as Amazon Web Services. The decentralized nature of storage, coupled with such processes as sharding and encryption, means that those hackers who manage to compromise a node will only be able to access a small, encrypted chunk of your data. They’d then have to locate and decrypt all the other shards at the other nodes to be able to make any sense of the data.
Decentralized file storage is cheaper than centralized storage solutions or maintaining your own servers. That’s mainly because it doesn’t have to run enormous server farms, unlike cloud storage firms. Take the storage project Sia as an example (which we will discuss in more depth in our next piece). On average, Sia's decentralized cloud storage reportedly costs 90% less than incumbent cloud storage providers, while storing 1TB of files on Sia costs about $2 per month, compared with $23 on Amazon Web Services’ S3 service.
The incentive system means that if you have extra storage space, you can farm out free space and earn money in return. Clearly, if such models achieve success in terms of consumer demand and usability, the value of the token itself should appreciate considerably. And that means your farming services could end up yielding healthy returns.

So, is data storage and blockchain a perfect match?

Not yet. While blockchain is being touted to solve a whole host of data security and transparency issues related to centralized storage, it’s not quite there in terms of being preferred to existing solutions such as AWS and Dropbox. Here are a few reasons why:

Although scaling techniques are being employed to boost speed and retrievability of files, the decentralized and distributed nature of blockchain-based data storage solutions implies it will be seriously difficult to compete with the likes of AWS. While new blockchain companies can meet consumer-grade demands, they remain unproven when it comes to supporting the big data requirements of businesses.
Will encryption always work? Currently, encryption techniques such as asymmetric encryption work by creating two related keys. While encryption works with a public key that anyone can use, the file can only be decrypted by the owner, who is in possession of the corresponding private key. While this method is secure at the moment, there is no guarantee that powerful technology such as quantum computing won’t manage to overcome this security and decrypt the files in the future.
With Dropbox, as soon as you don’t want your file stored, you can simply delete it. In contrast, a blockchain involves the file reference being stored on the immutable chain, while the file itself is out there on potentially hundreds of network nodes. This makes ensuring a file has been completely deleted considerably more challenging. And it becomes even more pertinent since the recent introduction of new data privacy laws such as GDPR, where the ‘right to be forgotten’ requires that companies erase the personal data of individuals, upon request. Indeed, it remains unclear at this stage, whether data can be stored on the immutable blockchain while remaining in compliance of this obligation.

It goes without saying that, as is the case with many promising blockchain applications, data storage remains in the early stages of development. Scaling issues may well be overcome, while more robust security and encryption techniques are bound to be developed. Indeed, there are now a handful of blockchain projects out there seeking to take data storage into a new era. In our next piece, we will investigate the most promising of these projects.

Dr Justin Chan

Dr Chan founded DataDrivenInvestor.com (DDI) and is the CEO for JCube Capital Partners. Specialized in strategy development, alternative data analytics and behavioral finance, Dr Chan also has extensive experience in investment management and financial services industries. Prior to forming JCube and DDI, Dr Chan served in the capacity of strategy development in multiple hedge funds, fintech companies, and also served as a senior quantitative strategist at GMO. A published author at professional journals in finance, Dr. Chan holds a Ph.D. degree in finance from UCLA.

LinkedIn →

Blockchain and Data Storage: A Perfect Match?

Decentralized Architecture for File Storage can help…

A Blockchain Remedy

3 benefits of using blockchain for data storage

So, is data storage and blockchain a perfect match?

More in Data Science & Analytics

Quality Data, Quality Decisions: Why Web Scraping is Essential for Advanced Analytics

Supply Chain Blind Spots: The Psychology of Hidden Risks

What are we solving with analytics?