Blockchain overview

From Knowledge Kitchen
Jump to navigation Jump to search

Blockchain is a decentralized database technology used by Bitcoin and others. The original Blockchain and Bitcoin were invented together by the elusive Satoshi Nakomoto, but there are now other blockchain variants and other Bitcoin-inspired digital currencies.

The notes here focus on Blockchain as distinct from Bitcoin.


Blockchain is an unusual (by today's standards) database technology that attempts to solve a few specific challenges:

  • how can a database be reliable and available while being completely decentralized, open to everyone and not stored, maintained, and otherwise controlled by one central, dependable authority?
  • if anyone at all can create and maintain records in this database, how can it maintain a semblance of order and consistency in the records it stores?
  • given that the time it takes for any message to propagate across the internet is variable and unpredictable, how can everyone's view of the records stay in sync at all times given that the data is not stored in one central place?
  • in such a database that is controlled by everyone, and the data is distributed all over the Internet, how can it be made tamper-proof, so that one person can't edit another person's records, and people can't secretly change records that store things like financial transactions after they've already occurred?

Decentralized data storage technology

  • Any computer that joins the network of computers operating blockchain software is called a node.
  • Each node in the network contains a complete set of all records that have been published to the network for the entirety of the network's history. This complete set of all records is called a ledger
  • Any new record is first logged by a node, and then passed on to other nodes for them to log in their own copies of the ledger in a peer-to-peer communication style
  • Any new node that joins the network must first get the complete ledger from the other nodes on the network, and validate that those records are 'authentic', before it can start to create new records of its own

Records are validated for authenticity

  • The validity, or authenticity, of any given record depends upon the validity of the record that came immediately before it. Thus, records are 'chained' together.
  • If any given record is found to be invalid at any time (even long after it was originally published), all records that came after that record are also invalid, are removed from all copies of the data, and must be republished anew. This works via the hashing system outlined below.

No centralized data storage 'hub'

  • All nodes have complete set of all records published by all parties for the entire history of the network's existence (the ledger)
  • No one central node has an 'official' list of transactions
  • Any conflicts in ledgers must be resolved by peers, rather than relying on an a central 'authority'.

Inherent distrust of peers is built into the system

  • Every node validates all transactions for itself, not relying on what peers say is a valid transaction
  • The 'proof-of-work' system is designed to make fraud require more work and take more time than is practically feasible
  • Rules for arbitrating conflicts are built into the system and automated according to simple rules of precedence

Records are called blocks

  • In blockchain, records are called blocks
  • Blocks can contain any sort of data
    • In Bitcoin, any single block contains a batch of financial transactions that are published at the same time
    • The promise of blockchain being used for other applications is that blocks can contain whatever any system designer wants to put into them. The applications are limitless, as with any database system.
  • The linked chain of blocks (i.e. the 'blockchain') is essentially a timeline of all blocks that have been published to the network

Any node can create a new block

  • In Bitcoin, new transactions that enter the network are first marked as 'unconfirmed' or 'unordered', meaning they do not belong to a block yet
  • Any node on the network can put a bunch of unconfirmed transactions into a block and publish that block to the other node on the network.
  • But creating a new block requires work...

Each block must contain three things

In order for a node to publish the block, it must contain three things:

  1. the contents of the block
    • In Bitcoins case, this is a batch of the transactions contained in the block
  2. a reference to the most recent block that preceded it
    • no two blocks can reference the same preceding block
    • the very first block, called the genesis block, has no valid reference to a prior block
  3. a solution to the 'block puzzle'...

Once a block is complete, a hash is generated for it

Based on the contents of a complete block, a cryptographic hash for that block is generated by running a well-known hashing algorithm.

  • a hash serves as a unique identifier for each block
  • a hash is the output of an algorithm that takes the contents of the block and produces a unique number based on them.
  • a hash is a uni-directional encryption... it is not possible (or at least unreasonably computation-intensive) to recreate the contents of a block based on its hash
  • if the contents of a block are changed, the hash of those contents would change as well. So it is easy to see whether the contents of a block have been modified by checking whether its hash is correct.
  • thus, blocks are immutable

Block are chronological groups of transactions

  • Each block contains a reference to the hash of its preceding block
  • So they are 'chained' together in chronological order
  • If two blocks are created that both reference the same hash of the previous block, a conflict-resolution algorithm decides which takes the spot... the transactions in the losing block are pushed into the "unconfirmed" holding space until they are placed into a new block

The block puzzle is time-consuming to solve, but easily solvable

In cryptographic terms, the solution to the block puzzle is called a cryptographic nonce.

The 'block puzzle' might be paraphrased such: 'What value, when added to the contents of block' would mean that running a 'cryptographic hash' of the block would produce a number less than x', where x is some arbitrary target value that makes the problem time-consuming to solve.

The time it takes to solve this problem is designed to make it unlikely that any two node will publish a new block at the same time:

  • this number, if added to the contents of a block and run through a 'cryptographic hash' algorithm, must generate a number lower than the arbitrary target value
  • the arbitrary target value is adjusted regularly to make sure the solution to the puzzle is appropriately time-consuming
  • any node publishing a new block must use brute force (loop through all possible values) in order to figure out what this added number is
  • by design, it takes a single regular computer, on average, several years to solve this problem
  • given the large numbers of nodes on the block chain network, this problem is typically solved by some 'lucky' node once every 10 minutes... this is by design. the target number is adjusted every 2 weeks to ensure that it still takes about 10 minutes, on average, for some node on the network to solve it
  • only if this puzzle is solved and the solution included in the block can a node publish that block to the network

Solving the 'block puzzle' is called 'mining

Nodes that solve the block puzzle are rewarded

  • in Bitcoin's implementation of the blockchain, that means the nodes that solve the 'block puzzle' and publish a new block are given a few bitcoins in payment

The block chain system is fault-tolerant

  • Tampering with the contents of any block would make the existing 'block puzzle' solution for that block invalid, since it depends upon the contents of the block. All subsequent blocks which came after that block would also be invalid, since they reference that block. So the 'block puzzles' of the tampered block and all subsequent blocks would have to be re-solved, which would be time-consuming.
  • Ledgers get out of sync regularly, due to the different times it may takes for transaction records to propagate across the network
  • Ledgers may store conflicting records (two blocks claiming the same preceding block) that must be resolved by peers on the network according to clearly defined rules of precedence
    • the transactions in the'losing' block are placed back into the 'unconfirmed'/'unsorted' pool of transactions