Transparent Logs for Skeptical Clients

lachenmayer · on March 1, 2019

The hypercore[1] signed append-only log which is the heart of the Dat protocol[2] is an implementation of very similar ideas. The data structure consists of an append-only log, a Merkle tree (stored in the "inorder storage layout" described in Appendix B), and signatures for each of the root hashes of the Merkle tree. The public key ("read key") of the key pair which is used to sign the tree roots can be used as a unique identifier of the contents of the log, eg. for looking up peers in a DHT which can serve the contents. It features a replication protocol which is similar to the one described in the article (if you squint hard enough). A fantastic in-depth explainer of the data structure and the protocol is available at [3].

The current implementation is in Node.js, but a Rust port is under way. [4]

[1]: https://github.com/mafintosh/hypercore [2]: https://www.datprotocol.com/ [3]: https://datprotocol.github.io/how-dat-works/ [4]: https://github.com/datrs/hypercore

rzzzt · on March 1, 2019

I wish other protocols were described in "How Dat Works"-style. Very nicely done.

k_harper · on March 4, 2019

For fun, I tried to find a simpler formula for seq(0,K) in Appendix A, but I didn't check if it's computationally faster and I doubt it is. The formula I found was: seq(0,K) = 2(K+1) - Hamming(K+1) - ffs(K+1)

Explanation: When the newest hash is at position K (zero-indexed), there are K+1 level 0 hashes being stored.

The stored permanent hashes form a set of perfect binary trees. There is either 0 or 1 of each size of perfect tree. To determine if a tree of a given size exists when there are K+1 hashes, you can check the corresponding binary digit in K+1. For instance, if the eights place in K+1 is set, then there is a perfect tree with eight level 0 hashes.

Each perfect tree contains 2N - 1 hashes, where N is the number of level 0 hashes in it. To calculate the total number of permanent hashes, we can sum the totals for the individual trees. The sum of N for all of the trees is just the total number of level 0 hashes, K+1, so the 2N term becomes 2(K+1). The total number of trees is equal to the number of set bits in the binary representation of K+1, which is the hamming weight[1]. So -1 becomes -Hamming(K+1).

We now know that when the last record is at position K (zero-indexed), there are 2(K+1) - Hamming(K+1) permanent hashes stored.

Unfortunately, the last level 0 hash might not be at the end of the file, so we need to subtract the number of hashes that come after it. The number of hashes after it is the height (L, zero-indexed) of the smallest perfect tree, which is the position of the first set bit (one-indexed)[2] of K+1, less 1: ffs(K+1) - 1. We then subtract 1 to convert the number of records to the zero-indexed position. The -(-1) and -1 cancel out.

[1] https://en.wikipedia.org/wiki/Hamming_weight#Efficient_imple... [2] https://en.wikipedia.org/wiki/Find_first_set#CTZ

yinyinwu · on March 1, 2019

What are the applications for this?

flipp3r · on March 1, 2019

Literally just a usecase for a blockchain, reinvented..

nickpsecurity · on March 1, 2019

It's a hashchain. They predate blockchains. I used them a long time as just signed, text files full of hashes. Trusted Timestamping used them. Blockchains were a highly-wasteful modification of hashchains for a different purpose. Databases, logs, and hashchains are still better for most things. Centralized services with decentralized checking of a signed log is still more efficient since centralized tech is itself more efficient.

geofft · on March 1, 2019

It's a hash tree, which is an important difference from a singly-linked list like a Bitcoin-style blockchain.

darkerside · on March 1, 2019

I was hoping this would be more informative...

https://en.wikipedia.org/wiki/Hash_chain#Hash_chain_vs._bloc...

geofft · on March 1, 2019

A blockchain is a modification of a hash chain to make it usable in cases where different transactions conflict with each other. In the case of TLS certificate transparency, Go packages, etc., a log entry "This object with this hash was published" does not conflict with a log entry "That object with that hash was published." You can append them to the chain in either order and it's fine. For a financial ledger, "This money was moved from A to B" conflicts with "This money was moved from A to C," and so one needs to be applied first so that the other can be rejected. If both transactions are signed it doesn't matter which one (for the correctness of the system, at least), but all users have to come to consensus about which of the two is accepted. Otherwise you have the "double-spend" problem.

Bitcoin's blockchain solves this problem with "proof-of-work," i.e., making it computationally difficult to add another entry to the hash chain. Because it's difficult, it's unlikely that both transactions will be accepted at the same time, which serves as a publicly-verifiable way to arbitrarily pick one. There is a small financial incentive for the network entity that successfully solved the computational problem, hopefully offsetting their expense in "mining" the "block." (As a further means of making sure only one is accepted, even if both are mined in quick succession, it's common practice in Bitcoin to wait for a few more blocks to be mined before B or C accept the transaction, which forces the network to accept one chain or the other.)

Satoshi's innovation was that proof-of-work can be used to allow arbitrary / anonymous entities to participate in this system without risking someone creating several thousand anonymous entities voting in their favor, or creating one view of the hash chain for B and another for C to convince them both to accept the transaction. (Satoshi's innovation was not the append-only cryptographically-verifiable store; that's just a hash chain. It was securing updates to that store without a central trusted coordinator.)

There are other ways to arbitrarily pick one transaction or the other, e.g., using a distributed consensus algorithm and admitting participants based on "proof of stake." Or you can redefine the problem to allow double-spend to a limited extent and make A responsible for paying out both B and C - this is roughly Stellar's approach.

But if you don't have a double-spend problem (or if you have a central trusted coordinator), you just need a boring old hash chain, which can be implemented much more efficiently - and possibly expanded to a hash tree, as described in this article.

geofft · on March 1, 2019

This is different from a blockchain because it's more efficient - the length of a log proof is O(log(N)) instead O(N). A blockchain in the Bitcoin sense does not satisfy properties 1, 2, and 3 in this article.

skybrian · on March 1, 2019

And yet, more efficient and less corrupt, because there is no money and no financial speculation. Much better!

flipp3r · on March 1, 2019

I don't see how a data structure involves money and corruption.

geofft · on March 1, 2019

The term "blockchain" is ill-defined (and often is used to mean just "Merkle tree"), but in the sense of the 2008 Nakamoto paper, a blockchain derives its security from having more computational power participate in the security of the system than the unused and available computational power in the world, and specifically from having financial incentives (mining) to make this be the case. In other words, the data structure is specifically about money and corruption, by design.

andrenth · on March 1, 2019

Why corruption?

skybrian · on March 2, 2019

Money attracts thieves. Especially combined with irreversible transactions and at least partial anonymity.

andrenth · on March 2, 2019

But isn’t this an argument both against (any) currency and privacy?

geofft · on March 3, 2019

Basically yes - if your application needs neither currency nor privacy (e.g., signing Go packages), linking it to a data structure that inherently deals with both would be a mistake. Currency and privacy are both difficult, so avoid bringing in dependencies on them if you don't need them.

(Again, the Bitcoin paper is specific in how the data structure is only secure because the consensus algorithm creates financial incentives that hopefully outweigh the incentive to misbehave. If you drop that from a blockchain, it isn't secure any more.)

skybrian · on March 2, 2019

It's an argument for being cautious about handling money and avoiding it if you can. Of course, Internet commerce deals with it a lot, but sometimes it's an unnecessary dependency and removing it simplifies the problem.

helper · on March 1, 2019

Blockchain, literally just a merkle tree..