The puzzle of the double hash

The puzzle of the double hash

There is a long-standing puzzle within Bitcoin that has not yet been solved correctly: why did I use a double hash?

First of all, the explanation of the double-hash problem as given in the CoreCoin wiki is utterly false.

 

If you double-hash a value, birthday attacks remain. If you have two values that hash to the same result, forming a birthday attack or collision, they will still collide. It is simple to show. It is assumed that we have two values X and Y. For a given hash function, we have a collision such that:

A == Hash(X) == Hash(Y).

Now, given the result that both X and Y hash to the same value, A, we can easily see that no double-hashing or squared-hashing process will help as:

Hash(A) == Hash[Hash(X)] == Hash[Hash(Y)].

In fact, we lose one bit of information for the additional hash operation. We can say that if we iterate a hash n times, it makes it n times as likely that a collision will occur. I am taking some liberty here, and the maths involved in what I’ve explained is not completely accurate, but it is true that for each time we rehash a function using the same hash function, we lose collision security for the function. In fact, if we look at how addresses in Bitcoin are created, we see that the double-hashing function increases the effect even further. In other words, the hash of the hash in the scenario is more likely to lead to a collision than a single hash or even the hash of the same hash function (a double hash).

A == Hash-a(X1) == Hash-a(Y1)
B1 == Hash-a(X2)
B2 == Hash-a(Y2)

If B1 = B2, then we have the senario from A.

But, there are also values that will exist such that:

C == Hash-b(B1) == Hash-b(B2).

As such, the probability of finding a collision in two values, that is to separate public keys, is increased (more than with just a double hash) when we utilise separate hash functions (e.g. SHA256 and RIPEMD160).

So, I have to point out that in this area (as with so many others), the Bitcoin, sorry, CoreCoin wiki is in error. Double-hashing is not used within Bitcoin to increase security. In addressing, it does add one benefit in that if one hash function is found to be vulnerable, then the other hash function will remain secure longer, but as with the scenario above, collisions could be found more easily. So, unfortunately such a seeming benefit is not the reason.

 

Ferguson and Schneier (Practical Cryptography) proposed using a double-hash function as a means to defend against “length-extension” attacks with SHA-256, and they named it SHA-256d. Merkle-Damgård hash functions such as MD5 or SHA-256 are vulnerable to such an attack. If it was an issue, we could have used an HMAC, but the reality is that Bitcoin is an economic system. The security of Bitcoin is always about the cost to attack vs the cost to defend.

Although the computation of a particular SHA-256(k||d) can exist that suffers from a length-extension attack which could be found, it will also easily be detected, monitored, and blocked. Finding script that follows the scenario is not really what I would consider an issue. In fact, it’s really a non-issue, but I won’t go into details here. There are many better ways to stop length-extension attacks in Bitcoin right now. Which is part of the power of script.

So, if it reduces security, why do we do it?

We have a number of patents coming out covering all of it. Unfortunately, all the sham forks of Bitcoin have made it necessary to patent many of the technologies that people did not understand. One use of the double-hash function in Bitcoin is the specialisation of individual verification functions as Bitcoin scales.

If you start to think about it, you will see that where:

A == Hash(X),

now:

B == Hash(A)

allows us to have the value (A) sent to a miner in a way that doesn’t allow them to broadcast a block without individual transactions. In a Merkle tree structure, we could have all of the block sent and solved apart from a single transaction. Doing so would allow us to construct an ASIC mining facility that does not host the data in blocks. The miner is now a distributed function.

As such, the proof-of-work and the verification can be handed off to specialised entities. The ASIC facility can pay the verification facility upon discovery of valid blocks, and yet the system allows them to interact without being able to cheat either party. In fact, they can even have agreements with different miners and pools and allow it to occur without worry: if one party cheats, both lose out.

From the question of where to locate facilities, the matter becomes interesting. China and many places in Central Asia have power, but do not have network access. Being able to distribute such functions is valuable.

With a Merkle tree, we can allow a verification node acting as a distributed system outside of China to send simple but easy to validate hash functions that can be statistically checked by the miner without having to validate all parts of the transaction. In other words, if block propagation is a problem in certain areas, it isn’t a problem for large-scale Bitcoin miners in BSV.

In fact, it is only the tip of the iceberg of what can be done. Later in the year when the patents start being published, you shall get to see some more.

Illicit material

The use of a double-hash function also allows us to create a system that can be distributed and validated while acting within the law. Distribution functions can be sent to specialised systems that act in local jurisdictions. Consequently, it allows us to have immutable data storage that can be filtered with the hash being validated and a subsequent prune of illicit material being allowed in certain jurisdictions. That is, the double hash within Bitcoin creates a method where we can selectively deliver content.

The blockchain is immutable, but with a double hash, the request for a record can be restricted and logged.

So I pose a hypothetical question for you…

What would happen if BTC, ETH, etc all started hosting child porn and other illicit material and only BSV can filter it?