Thursday 11 March 2010

How Google has Blinded Itself

Page Rank: A Quick Refresher

Most people with an interest in SEO (Search Engine Optimisation) are aware of Google's Page-Rank Algorithm.

The way page rank works is in essence very simple: each link from a page (page A) passes 'page rank' to the page it links to (Page B) ... the value of 'page rank' passed is the 'page rank' of page A divided by the number of links out from page A. The same then applies for Page B to Page C and so on.

The smarter of you will recognise that this is an iterative calculation (as the network effect means that Page Rank passed down will eventually be passed back in some form). The really smart amongst you may conclude this would be a non-convergent iteration; I believe that Google build in a 'decay factor' to prevent the problems that would ensue as a result. Of course the Page Rank calculation is in practice more complex still -- Google has algorithms and methods for determining the quality of links and pages which significantly refine the calculation.


Rel = "Nofollow": Google's Blindfold

Back in 2005/2007? Google adopted a policy whereby if the link from page A to Page B was given a Rel = "nofollow" attribute then Google would effectively ignore that link for Page Rank purposes.

The logic behind this move appears to have been somewhat confused. The intention was to allow websites to qualify outbound links to help Google (eg. to ignore paid links, advertisments, reciprocal links etc.); the actual outcome was for sites to suddenly start to 'nofollow' as many links as possible to preserve their page rank.

This is potentially catastrophic to Google's Page Rank algorithm: there are very few websites out there now that don't now 'nofollow' their outbound links (eg. Almost every forum, most blogs, Facebook, Twitter, You Tube etc.). If you want to check, go to a page with outbound links, right-click, 'view source', ctrl-f to find the link in the source code (use the anchor text of page URL as your search) and look for rel="nofollow" within the reference tag.

By trying to prevent people artificially boosting Page Rank, Google seem to have cut-off their main sources of 'natural linking' Page Rank. Your site may be being talked about on plenty of forums, people may be recommending you to their Blog readers ... but if the links are 'nofollow' google is blind to them.

So then in 2009 Google (quietly) let it be known that the Page Rank of the outbound page would still be diluted by the 'nofollow' links (so there is no Page Rank benefit to our 'Page A' of 'nofollow'ing outbound links; they still dilute the Page Rank passed by the nomal links) ... but that the receiving Page (our 'page B') would still not get any Page Rank benefit.

I guess the hope from Google is that people will be far more sparing in their use of nofollow tags as a result. Personally I greatly doubt it -- and even if this changes webmaster behaviour going forwards, google is poking its own eyes out if it continues to ignore the inbound Page Rank from nofollow links. Think about it: as it stands Google ignores sites that are being talked about on Forums, mentioned on Twitter, linked to from Facebook .... how can that be a good thing?

So What?

My personal view is that Google will quietly revoke the 'ignore nofollow' policy (if they haven't already) or see the quality of their search results decline (and allow the likes of Bing to make real in-roads).

It will also mean that new websites will find it far harder to build Page Rank (and therefore strong search engine results) than the embedded encumbents (who have links to their sites pre-dating the whole 'nofollow' wave). So not all bad then :0)

If you want to know more about this, you should read what Matt Cutts (the most public 'Google Guy') has to say; this blog entry of his (and the ensuing comments) is fascinating (if you like that sort of thing).

If you've read this and think "that's far too geeky and dull for me to care about" ... I hope that your businesses' and/or your personal success does not depend on your website(s) performing well in Google searches!

Saturday 6 March 2010

Google’s latest data give-away

Not sure how broad the awareness is of the Google / DoubleClick Adplanner tool … but if I was Experian/Hitwise I would be worried that this data is now being made available for free by Google. It is packaged as a tool for advertisers -- but it is also a pretty powerful Competitive Intelligence tool for online retailers.

Remember before Google Analytics came along and service providers and consultants would (try and) charge you significant sums for analysing your web-logs and providing click overlay analyses? In fact it’s not that long ago that I had a conversation with the MD of a large catalogue retailer who was telling me about the amazing analytics service they were paying (through the nose) for; from what I could see they did little more than repackage the Google Analytics data -- so there may be a bit of that still out there.

I wonder if we will soon be looking back on Hitwise (the excellent but expensive online competitive intelligence service, now owned by Experian) as another victim of Google’s policy of providing intelligence for free?

Let's look at what Adplanner does (and does not) provide by using one of my own sites (so I can compare the data given with Hitwise and own Analytics data)

Adplanner Data for Petplanet.co.uk

The data quality certainly appears solid in terms of overall traffic levels and, as far as I can tell, the socio-demographic data looks good.

Using this tool to compare with our online specialist competitors (I'll leave it to you to work out who they are!) I can see (and more importantly now potential advertisers and suppliers can see) that we achieve more than double the reach of any of these guys.

We (uniquely amongst our peers it seems) have elected to share our Google Analytics data (on the basis that we are proud of our numbers and aren’t shy about sharing). I see that as a result Google under-estimates our UK traffic and messes up the visits/user calculation; a bug I'm sure they'll fix and the overall picture is still accurate.

Indeed compare our reach to the pet retail giant that is Pets at Home and (allowing for how much of their traffic is store-finding and searching for the brand itself) I would argue the data supports my argument expounded in earlier Blog (Exploding the Multi-Channel Myth) that Online specialists occupy a competitive space that the off-line brands’ online propositions can't reach. The Pets at Home data also highlights some data glitches in the system – their traffic figues appear overstated (cf Hitwise) and I would very much doubt that 100% of www.petsathome.com traffic is from the UK.

Which brings us to ask the obvious question: how good is the data and is there any systematic bias? Google states

"DoubleClick Ad Planner combines information from a variety of sources, such as aggregated Google search data, opt-in anonymous Google Analytics data, opt-in external consumer panel data, and other third-party market research."

I think in practice what this means is the traffic data is google search based (unless the site in question elects to share analytics data) and the demographics data is triangulated from other sources. I suspect this means that sites who use a lot of email activity to drive traffic and/or have a high proportion of users who have 'favourited' them will be under-represented (as they will have a higher than average proportion of 'direct' traffic).
The keyword and site affinity data is also frankly pretty ropey; it does not triangulate well with Hitwise (or intuition) and is really of little commercial value at the moment. Take the example of our gardening / garden furniture site

Adplanner data for Greenfingers.com

Telling me that 'marks spencer' and 'sainsburys' keywords have a high affinity with my site is not of much use.

You have to wonder how long before Google extends/improves that information though (for a fee?) so that we can all see what keywords are driving traffic to our competitors and which ones they are spending PPC budget on (the main incremental value that Hitwise still offers). You have to wonder how long before Google extends/improves that information as well (for a fee?) so that we can all see what keywords are driving traffic to our competitors and which ones they are spending PPC budget on (the main incremental value that Hitwise still offers).

The 'sites also visited' data is fairly random at the moment too. To avoid getting too carried away with my own sites and hitwise data I tried checking eg. www.wiggle.co.uk versus www.chainreactioncycles.com (who are head-to-head competitors in their space as confirmed by Hitwise and common-sense) but they don't appear in each others' 'sites also visited' ... which is plain wrong.

Hard to be critical when you consider the data is free; it's pretty powerful data and a useful Competitive Intelligence tool. I'd be interested to hear others' perspectives.

Comment welcome