Query for Google PageRank
If you have used Google toolbar, you might have used the PageRank tool. It is a nifty tool, which shows PageRank of page or web site you are currently viewing. I was trying to figure out how it works. After googling for it and filtering out the sporadic bits of information, this is what I got.
Suppose you want to find the PageRank of some web site, lets just say Microsoft (www.Microsoft.com). You can either visit the web site and get page rank from the Google Toolbar or use web sites which are designed for this purpose like www.prchecker.info. In either case they generate a query which will look similar to this:
Go ahead click on the above link you’ll get some text in the browser window that appears like this:
Rank_1:1:9
The last number in the above string is the PageRank.
Most parts of the above query remain constant, the values that change are that of ch and q. The the value of q is very easy to generate, but the value of ch will need a bit of work. Let me explain the easy one first. Lets trim the fat and concentrate on the value of q.
info:http%3A%2F%2Fwww.microsoft.com
There are two parts to this value first is the "info:" part which is constant for any query, and the "http%3A%2F%2Fwww.microsoft.com" which is an escaped version of the original query that you requested( i.e., www.microsoft.com). "http://" has to be prefixed to the original query if it was excluded in the original string.
Any string an be easily converted to escaped string in almost all scripting languages. In Perl its uri_escape(), in .Net its Uri.EscapeDataString(), other scripting languages will have similar method to get the escaped string.
These are the steps to generate the value of q:
- Check if http:// is prefixed in the input string, if not then include it (original_uri).
- Get the escaped version of the input string (escaped_uri).
- Value of q = "info:" + escaped_uri.
Now that the easy part is done lets move to the other one, ch. The value of ch in generated by hashing the input uri. So, the input to the hash function will have the following format, "http://….". This is a pretty simple algorithm, I’ll try to make it as generic as possible. I have broken the algorithm into 3 parts, Hash1, Hash2 and MainHash.
Hash1:
Input: Unsigned Integers a, b, c.
Output: Unsigned Integers a, b, c.
Algorithm:
1: a = a - b
2: a = a - c
3: a = a xor (c right_shift 13)
4: b = b - c
5: b = b - a
6: b = b xor (a left_shift 8 )
7: c = c - a
8: c = c - b
9: c = c xor (b right_shift 13)
10: a = a - b
11: a = a - c
12: a = a xor (c right_shift 12)
13: b = b - c
14: b = b - a
15: b = b xor (a left_shift 16)
16: c = c - a
17: c = c - b
18: c = c xor (b right_shift 5)
19: a = a - b
20: a = a - c
21: a = a xor (c right_shift 3)
22: b = b - c
23: b = b - a
24: b = b xor (a left_shift 10)
25: c = c - a
26: c = c - b
27: c = c xor (b right_shift 15)
Output: values of a, b, c
Hash2:
Input: String (character indexing should be allowed)
Output: Unsigned Integer
Algorithm:
1: Unsigned Integer a = 0x9e3779b9
2: Unsigned Integer b = 0x9e3779b9
3: Unsigned Integer c = 0xe6359a60
4: k = 0
5: len = Length of input_string
6: Unsigned Integers va,vb,vc
7:
8: Repeat until len >= 12
9:
10: va = input_string[k + 0]
11: va = va | (input_string[k + 1] left_shift 8 )
12: va = va | (input_string[k + 2] left_shift 16)
13: va = va | (input_string[k + 3] left_shift 24)
14: a = a + va
15:
16: vb = input_string[k + 4]
17: vb = vb | (input_string[k + 5] left_shift 8 )
18: vb = vb | (input_string[k + 6] left_shift 16)
19: vb = vb | (input_string[k + 7] left_shift 24)
20: b = b + vb
21:
22: vc = input_string[k + 8]
23: vc = vc | (input_string[k + 9] left_shift 8 )
24: vc = vc | (input_string[k + 10] left_shift 16)
25: vc = vc | (input_string[k + 11] left_shift 24)
26: c = c + vc
27:
28: a, b, c = Hash1 a, b, c
29: k = k + 12
30: len = len - 12
31:
32: end of loop
33:
34: c = c + Length of input_string
35:
36: if (len > 10) c = c + (input_string[k + 10] left_shift 24)
37: if (len > 9) c = c + (input_string[k + 9] left_shift 16)
38: if (len > 8 ) c = c + (input_string[k + 8] left_shift 8 )
39: if (len > 7) b = b + (input_string[k + 7] left_shift 24)
40: if (len > 6) b = b + (input_string[k + 6] left_shift 16)
41: if (len > 5) b = b + (input_string[k + 5] left_shift 8 )
42: if (len > 4) b = b + (input_string[k + 4])
43: if (len > 3) a = a + (input_string[k + 3] left_shift 24)
44: if (len > 2) a = a + (input_string[k + 2] left_shift 16)
45: if (len > 1) a = a + (input_string[k + 1] left_shift 8 )
46: if (len > 0) a = a + (input_string[k])
47:
48: a, b, c = Hash1 a, b, c
Output: Unsigned Integer c.
MainHash
Input: String (character indexing should be allowed)
Output: Unsigned Integer
Here that algorithm get a little bit complicated. So, I’ll explain it in bit more detail. First two steps are direct, others need explanation, so here they are:
- Unsigned Integer c = Hash2 input_string
- c = ((c mod 0x0d) & 7) | (( c / 7) << 2)
- Generate a mapped string using f(x, c). This is explained below. So, new_string = string generated from table.
- Unsigned Integer c = Hash2 new_string
- Final Hash value = Concatenate "6", string version of c.
To generate a table and calculate f(x, c) for where x ranges from 0 to 19.
f(x, c) = x * 9 + c
In the following table I am using the values generated for http://www.microsoft.com . c = 404068858.
x | f(x, c) | Hex | Little-endian hex format | String |
0 | 404068858 |
0x181599fa |
0xfa, 0x99, 0x15, 0x18 |
ú |
1 | 404068849 |
0x181599f1 |
0xf1, 0x99, 0x15, 0x18 |
ñ |
2 | 404068840 |
0x181599e8 |
0xe8, 0x99, 0x15, 0x18 |
è |
3 | 404068831 |
0x181599df |
0xdf, 0x99, 0x15, 0x18 |
ß |
4 | 404068822 |
0x181599d6 |
0xd6, 0x99, 0x15, 0x18 |
Ö |
5 | 404068813 |
0x181599cd |
0xcd, 0x99, 0x15, 0x18 |
Í |
6 | 404068804 |
0x181599c4 |
0xc4, 0x99, 0x15, 0x18 |
Ä |
7 | 404068795 |
0x181599bb |
0xbb, 0x99, 0x15, 0x18 |
» |
8 | 404068786 |
0x181599b2 |
0xb2, 0x99, 0x15, 0x18 |
² |
9 | 404068777 |
0x181599a9 |
0xa9, 0x99, 0x15, 0x18 |
© |
10 | 404068768 |
0x181599a0 |
0xa0, 0x99, 0x15, 0x18 |
|
11 | 404068759 |
0x18159997 |
0x97, 0x99, 0x15, 0x18 |
|
12 | 404068750 |
0x1815998e |
0x8e, 0x99, 0x15, 0x18 |
|
13 | 404068741 |
0x18159985 |
0x85, 0x99, 0x15, 0x18 |
|
14 | 404068732 |
0x1815997c |
0x7c, 0x99, 0x15, 0x18 |
| |
15 | 404068723 |
0x18159973 |
0x73, 0x99, 0x15, 0x18 |
s |
16 | 404068714 |
0x1815996a |
0x6a, 0x99, 0x15, 0x18 |
j |
17 | 404068705 |
0x18159961 |
0x61, 0x99, 0x15, 0x18 |
a |
18 | 404068696 |
0x18159958 |
0x58, 0x99, 0x15, 0x18 |
X |
19 | 404068687 |
0x1815994f |
0x4f, 0x99, 0x15, 0x18 |
O |
Note: This is a sample table, you need to generate this table for each query.
To generate the table, first step will be to apply the value of "x" in the function f(x, c), convert the result to 32 bit Hex values and reorder the bytes in little-endian format. Next step is to convert the individual Hex bytes to corresponding characters, which will give you a string that will appear similar to that shown in the last column. Finally you have to append the strings together.
After appending all of the strings together the string will appear as shown below.
úñèßÖ….
Let me show what happens to the values in each step of five steps in this algorithm with http://www.microsoft.com as input. Remember this is the main search query not the input to this algorithm. Input to this Hash algorithm will come from another function, I’ll get to that later.
- Unsigned Integer c = Hash2 input_string. (result: c=707120502)
- c = ((c mod 0x0d) & 7) | (( c / 7) << 2). (result: c=404068858)
- String generated by creating the table. (result: new_string=ú™ñ™è™ß™Ö™…. )
- Unsigned Integer c = Hash2 new_string. (result: c=2624488910)
- Final Hash value = Concatenate "6", string version of c. (result: c=62624488910)
That ends the MainHash algorithm. Now to bring it all together in a final step to generate the query string. This is the main part of the entire query building process.
- Check for "http://" in the input query string, if "http://" is missing then prefix it to the query string. Lets call it checked_query_string.
- Initialize base_query_string with constant string "http://toolbarqueries.google.com/search?client=navclient-auto&ie=UTF-8&oe=UTF-8&features=Rank".
- info_query_string = concatenate "info:" and checked_query_string.
- Generate code string. code_string = concatenate "&ch=" and MainHash of info_query_string.
- Info string. info_string = concatenate "&q=info:" and Escaped version of checked_query_string .
- final_query_string = concatenate base_query_string, code_string and info_string.
Here is a sample result of the above algorithm:
The result of this query will be: Rank_1:1:9
The last number "9" is the PageRank.
Wow, you read through the whole thing, hats off to you.