Query for Google PageRank

Karthik Nadig

If you have used Google toolbar, you might have used the PageRank tool. It is a nifty tool, which shows PageRank of page or web site you are currently viewing. I was trying to figure out how it works. After googling for it and filtering out the sporadic bits of information, this is what I got.

Suppose you want to find the PageRank of some web site, lets just say Microsoft (www.Microsoft.com). You can either visit the web site and get page rank from the Google Toolbar or use web sites which are designed for this purpose like www.prchecker.info. In either case they generate a query which will look similar to this:

http://toolbarqueries.google.com/search?client=navclient-auto&ch=62624488910&ie=UTF-8&oe=UTF-8&features=Rank&q=info:http%3A%2F%2Fwww.microsoft.com

Go ahead click on the above link you’ll get some text in the browser window that appears like this:

Rank_1:1:9

The last number in the above string is the PageRank.

Most parts of the above query remain constant, the values that change are that of ch and q. The the value of q is very easy to generate, but the value of ch will need a bit of work. Let me explain the easy one first. Lets trim the fat and concentrate on the value of q.

info:http%3A%2F%2Fwww.microsoft.com

There are two parts to this value first is the "info:" part which is constant for any query, and the "http%3A%2F%2Fwww.microsoft.com" which is an escaped version of the original query that you requested( i.e., www.microsoft.com). "http://" has to be prefixed to the original query if it was excluded in the original string.

Any string an be easily converted to escaped string in almost all scripting languages. In Perl its uri_escape(), in .Net its Uri.EscapeDataString(), other scripting languages will have similar method to get the  escaped string.

These are the steps to generate the value of q:

  1. Check if http:// is prefixed in the input string, if not then include it (original_uri).
  2. Get the escaped version of the input string (escaped_uri).
  3. Value of q = "info:" + escaped_uri.

Now that the easy part is done lets move to the other one, ch. The value of ch in generated by hashing the input uri. So, the input to the hash function will have the following format, "http://….". This is a pretty simple algorithm, I’ll try to make it as generic as possible. I have broken the algorithm into 3 parts, Hash1, Hash2 and MainHash.

Hash1:

Input: Unsigned Integers a, b, c.

Output: Unsigned Integers a, b, c.

Algorithm:

   1:  a = a - b 
   2:  a = a - c
   3:  a = a xor (c right_shift 13)
   4:  b = b - c
   5:  b = b - a
   6:  b = b xor (a left_shift 8 )
   7:  c = c - a
   8:  c = c - b 
   9:  c = c xor (b right_shift 13)
  10:  a = a - b
  11:  a = a - c
  12:  a = a xor (c right_shift  12)
  13:  b = b - c
  14:  b = b - a
  15:  b = b xor (a left_shift  16)
  16:  c = c - a
  17:  c = c - b
  18:  c = c xor (b right_shift 5)
  19:  a = a - b
  20:  a = a - c
  21:  a = a xor (c right_shift 3)
  22:  b = b - c
  23:  b = b - a
  24:  b = b xor (a left_shift  10)
  25:  c = c - a
  26:  c = c - b
  27:  c = c xor (b right_shift 15)

Output: values of a, b, c

 

Hash2:

Input: String (character indexing should be allowed)

Output: Unsigned Integer

Algorithm:

   1:  Unsigned Integer a = 0x9e3779b9
   2:  Unsigned Integer b = 0x9e3779b9
   3:  Unsigned Integer c = 0xe6359a60
   4:  k = 0
   5:  len = Length of input_string
   6:  Unsigned Integers va,vb,vc
   7:   
   8:  Repeat until len >= 12
   9:      
  10:      va = input_string[k + 0]
  11:      va = va | (input_string[k + 1] left_shift 8 )
  12:      va = va | (input_string[k + 2] left_shift 16)
  13:      va = va | (input_string[k + 3] left_shift 24)
  14:      a = a + va
  15:   
  16:      vb = input_string[k + 4]
  17:      vb = vb | (input_string[k + 5] left_shift 8 )
  18:      vb = vb | (input_string[k + 6] left_shift 16)
  19:      vb = vb | (input_string[k + 7] left_shift 24)
  20:      b = b + vb
  21:   
  22:      vc = input_string[k + 8]
  23:      vc = vc | (input_string[k + 9] left_shift 8 )
  24:      vc = vc | (input_string[k + 10] left_shift 16)
  25:      vc = vc | (input_string[k + 11] left_shift 24)
  26:      c = c + vc
  27:   
  28:      a, b, c = Hash1 a, b, c
  29:      k = k + 12
  30:      len = len - 12
  31:   
  32:  end of loop
  33:   
  34:  c = c + Length of input_string
  35:   
  36:  if (len > 10) c = c + (input_string[k + 10] left_shift 24)
  37:  if (len > 9) c = c + (input_string[k + 9] left_shift 16)
  38:  if (len > 8 ) c = c + (input_string[k + 8] left_shift 8 )
  39:  if (len > 7) b = b + (input_string[k + 7] left_shift 24)
  40:  if (len > 6) b = b + (input_string[k + 6] left_shift 16)
  41:  if (len > 5) b = b + (input_string[k + 5] left_shift 8 )
  42:  if (len > 4) b = b + (input_string[k + 4])
  43:  if (len > 3) a = a + (input_string[k + 3] left_shift 24)
  44:  if (len > 2) a = a + (input_string[k + 2] left_shift 16)
  45:  if (len > 1) a = a + (input_string[k + 1] left_shift 8 )
  46:  if (len > 0) a = a + (input_string[k])
  47:   
  48:  a, b, c = Hash1 a, b, c

Output: Unsigned Integer c.

 

MainHash

Input: String (character indexing should be allowed)

Output: Unsigned Integer

Here that algorithm get a little bit complicated. So, I’ll explain it in bit more detail. First two steps are direct, others need explanation, so here they are:

  1. Unsigned Integer c = Hash2 input_string
  2. c = ((c mod 0x0d) & 7) | (( c / 7) << 2)
  3. Generate a mapped string using f(x, c). This is explained below. So, new_string = string generated from table.
  4. Unsigned Integer c = Hash2 new_string
  5. Final Hash value = Concatenate "6", string version of c.

To generate a table and calculate f(x, c) for where x ranges from 0 to 19.

f(x, c) = x * 9 + c

In the following table I am using the values generated for http://www.microsoft.com . c = 404068858.

x f(x, c) Hex Little-endian hex format String
0 404068858

0x181599fa

0xfa, 0x99, 0x15, 0x18

ú™

1 404068849

0x181599f1

0xf1, 0x99, 0x15, 0x18

ñ™

2 404068840

0x181599e8

0xe8, 0x99, 0x15, 0x18

è™

3 404068831

0x181599df

0xdf, 0x99, 0x15, 0x18

ߙ

4 404068822

0x181599d6

0xd6, 0x99, 0x15, 0x18

֙

5 404068813

0x181599cd

0xcd, 0x99, 0x15, 0x18

͙

6 404068804

0x181599c4

0xc4, 0x99, 0x15, 0x18

ę

7 404068795

0x181599bb

0xbb, 0x99, 0x15, 0x18

»™

8 404068786

0x181599b2

0xb2, 0x99, 0x15, 0x18

²™

9 404068777

0x181599a9

0xa9, 0x99, 0x15, 0x18

©™

10 404068768

0x181599a0

0xa0, 0x99, 0x15, 0x18

™

11 404068759

0x18159997

0x97, 0x99, 0x15, 0x18

—™

12 404068750

0x1815998e

0x8e, 0x99, 0x15, 0x18

Ž™

13 404068741

0x18159985

0x85, 0x99, 0x15, 0x18

™

14 404068732

0x1815997c

0x7c, 0x99, 0x15, 0x18

15 404068723

0x18159973

0x73, 0x99, 0x15, 0x18

16 404068714

0x1815996a

0x6a, 0x99, 0x15, 0x18

17 404068705

0x18159961

0x61, 0x99, 0x15, 0x18

18 404068696

0x18159958

0x58, 0x99, 0x15, 0x18

19 404068687

0x1815994f

0x4f, 0x99, 0x15, 0x18

Note: This is a sample table, you need to generate this table for each query.

To generate the table, first step will be to apply the value of "x" in the function f(x, c),  convert the result to 32 bit Hex values and reorder the bytes in little-endian format. Next step is to convert the individual Hex bytes to corresponding characters, which will give you a string that will appear similar to that shown in the last column. Finally you have to append the strings together.

After appending all of the strings together the string will appear as shown below.

ú™ñ™è™ß™Ö™….

Let me show what happens to the values in each step of five steps in this algorithm with http://www.microsoft.com as input. Remember this is the main search query not the input to this algorithm. Input to this Hash algorithm will come from another function, I’ll get to that later.

  1. Unsigned Integer c = Hash2 input_string. (result: c=707120502)
  2. c = ((c mod 0x0d) & 7) | (( c / 7) << 2).  (result: c=404068858)
  3. String generated by creating the table. (result: new_string=ú™ñ™è™ß™Ö™…. )
  4. Unsigned Integer c = Hash2 new_string. (result: c=2624488910)
  5. Final Hash value = Concatenate "6", string version of c. (result: c=62624488910)

That ends the MainHash algorithm. Now to bring it all together in a final step to generate the query string. This is the main part of the entire query building process.

  1. Check for "http://" in the input query string, if "http://" is missing then prefix it to the query string. Lets call it checked_query_string.
  2. Initialize base_query_string with constant string  "http://toolbarqueries.google.com/search?client=navclient-auto&ie=UTF-8&oe=UTF-8&features=Rank".
  3. info_query_string = concatenate "info:" and checked_query_string.
  4. Generate code string. code_string = concatenate "&ch=" and MainHash of info_query_string.
  5. Info string. info_string = concatenate "&q=info:" and Escaped version of checked_query_string .
  6. final_query_string = concatenate base_query_string, code_string and info_string.

Here is a sample result of the above algorithm:

http://toolbarqueries.google.com/search?client=navclient-auto&ie=UTF-8&oe=UTF-8&features=Rank&ch=62624488910&q=info:http%3A%2F%2Fwww.microsoft.com

The result of this query will be: Rank_1:1:9

The last number "9" is the PageRank.

Wow, you read through the whole thing, hats off to you.


One Response to “Query for Google PageRank”

  • Surya.N.L Says:

    While browsing we had come across your resume and found that you had done a project on offline handwritten character recognition using opencv which we have currently doing as our final year project. So could you please provide us with details on how you went about it.It will be very helpful.You can mail me. thank you.

Leave a Reply