Friday, January 09, 2009

PageRank Distribution

Continuation of Last Google Toolbar PageRank Update of 2008

I see very few pages with a PageRank of 1. This doesn't make that much sense, logically. The vast majority of pages should have a page rank of 1. Now it could be those are pages that no one (or at least I) never see. I would think, roughly, the distributions of 211,111 pages would be

no pagerank 100,000
pagerank 1 100,000
2 10,000
3 1,000
4 100
5 10
6 1
7 0
8 0
9 0
10 0

Now this isn't exactly true because there would be natural clumping... So a higher number of high page rank pages would make sense to me. But I think that means if you had say 2 more pagerank 6 pages that you would lose hundreds of pagerank 2, I would think.

Also my choice of 100,000 at no pagerank and pagerank 1 is basically arbitrary. But the whole area of no pagerank I find confusing to reconcile with the logarithmic scale. Some pagerank 0 could be so low a pagerank it is less than .5 and so not displayed as 1. However, some are also penalized to that "rank"... Anyway I just made it up completely. The rest I am just making up to but based on the idea if a logarithmic scale were used a very quick estimate that is at least better than no concept of just saying well 211,111 and 11 possible "ranks" means each rank should have about 19,192. I think the above estimate I made are at least more accurate than the true distribution than a estimate that each pagerank has an equal number, even if my rough guess isn't very accurate.

Well part of the problem in trying to figure this out is that I don't think Google ever confirms it is a logarithmic scale. But I think it is (or something close to it). I'll try to do some research on this - just because I find it interesting.

