From c209ec1ae6ffc32d83877cab98ad668f9b95de62 Mon Sep 17 00:00:00 2001 From: Alexandre Dulaunoy Date: Sun, 9 Jul 2023 11:16:35 +0200 Subject: [PATCH] chg: [blog] potential improvements --- _posts/2023-07-08-HTTP-Headers-Hashing_HHHash.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/_posts/2023-07-08-HTTP-Headers-Hashing_HHHash.md b/_posts/2023-07-08-HTTP-Headers-Hashing_HHHash.md index 25f66e5..090dd3a 100755 --- a/_posts/2023-07-08-HTTP-Headers-Hashing_HHHash.md +++ b/_posts/2023-07-08-HTTP-Headers-Hashing_HHHash.md @@ -21,6 +21,8 @@ HTTP Headers Hashing (HHHash) is a technique used to create a fingerprint of an To calculate the HHHash, we concatenate the list of headers returned by the HTTP server. This list is ordered according to the sequence in which the headers appear in the server's response. Each header value is separated with `:`. +You may be wondering why the set of headers is not sorted. This decision stems from our observations of a significant number of servers attempting to mimic other servers, often resulting in variations in the order of their returned HTTP headers. + The HHHash value is the SHA256 of the list. ### HHHash format @@ -46,6 +48,10 @@ hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc HHHash is an effective technique; however, its performance is heavily reliant on the characteristics of the HTTP client requests. Therefore, it is important to note that correlations between a set of hashes are typically established when using the same crawler or HTTP client parameters. +## Potential improvements + +As you can observe, HHHash incorporates a version identifier in its value, enabling seamless updates to new hashing functions or concatenation strategies for the headers. One potential approach would be to employ a locality-sensitive hashing algorithm, which calculates distances between sets of headers and facilitates efficient comparisons. + # References -- [Python library - HHHash](https://github.com/adulau/HHHash) +- A first version of a [Python library - HHHash](https://github.com/adulau/HHHash)