Beantin

James Royal-Lawson

google

The Times and Rupert commit search engine suicide

Forget SEO, News International is covering new ground in SES (Search Engine Suicide). They announced in March that in June 2010 a paywall on all content would be put in place. That happened earlier today.

302 Redirect

For a while earlier today, today’s paywall launch was search engine suicide due to the way that tens of thousands of timesonline articles indexed by search engines were being redirected.

Pretty much every URL for a number of hours today was being redirected to an index-page on a different domain, which then also displayed a paywall notification layer.

Here’s an example:

http redirects times article June 2010

There you can see the “302 Moved Temporarily” redirect. This tells Google and other search engines is to keep checking this URL as at some point it will start to serve content of its own again. Google generally honours this type of redirect and uses the content of the destination page for indexing and ranking purposes.

What we can spot here though is that The Times article we tried to read (“Fears that vuvuzela horns could harm World Cup football fans hearing” from June 9th 2010) is being redirected to a different top level domain http://www.thetimes.co.uk and this is the case for every redirect I’ve checked. Tens of thousands of articles are being redirected to the same URL. This is starting to look to Google very much like the old hijacking trick by spammers.

301 Redirect

But it doesn’t end there. Once we’ve followed the first redirect, we then get hit by a second one. This time a “301 Moved Permanently” redirect status code.

http redirects times article June 2010

This tells us (and the search engines) that http://www.thetimes.co.uk/ has been moved forever and should not be shown in indexes any more and any “value” (Pagerank) the old page has should be transferred to the URL pointed to by the redirect, which is http://www.thetimes.co.uk/tto/news/

Pop goes your index

So what does this mean? Well, Google will receive same page for all articles, plus that content is being served from a different domain to what it started with. The likely hood is that the entire site will be dropped from the index as punishment for spamming tactics. Best case is that the 60,000 plus articles currently indexed by Google will be replaced by http://www.thetimes.co.uk/tto/news/

Update: 20:31

Whilst I’ve been writing this post, the redirect behaviour has changed a number of times. It seems to be stabilising and reflecting what The Times outlined in their announcement. Articles from timesonline.co.uk that are currently indexed by the search engines are still available, for free, and don’t redirect anywhere.

New articles published after the launch of the paywall are being 302 redirected to the timesplus.co,uk start page with a paywall “login” splash.

Screenshot Google timesplus.co.uk June 2010

This means that no new Times article content will be indexed by Google; it will just receive very similar content for every URL. We are going to see News and opinion from The Times, sign up now for an exclusive preview of the new Times website” or something similar a lot for Times pages in search results from now on.

Update: 16 June 2010

If the erection of the paywall wasn’t crazy enough, I’ve also spotted that The Times have stopped updating all of their timesonline.co.uk RSS feeds

Why on earth would you stop feeding article teasers to thousands of loyal readers who have taken the time to subscribe to your news feed? Surely these are exactly the kind of people who might actually pay a few pounds to get through the paywall!

Google search: Delivering what you want

When you enter something into a search engine, you’ve got a question and you want an answer. It’s a simple premise. Google consistently tries to improve that service – to point you in the right direction, or where possible give you the information you are looking for directly in the search results. No extra click required. Task complete!

Football World Cup 2010

Google has, not surprisingly, pushed out a whole load of helpful onebox results (also referred to as integrated results) for the duration of the World Cup. Searching for world cup gives you live scores and upcoming fixtures.

Screenshot Google Onebox World Cup 2010

England group table

Searching for a particular team, such as “England group table”, gives you England’s group table and their upcoming fixtures.

Screenshot Google Onebox World Cup 2010

It’s genius in it’s simplicity. If you search for england group table at the moment, you’re only after one thing. Google knows that, so it shows it. There’s no marketing hullabullo. There’s no attempt to distract you from your task in hand. No hidden agenda. You had a task to solve and Google solved it.

The answer served on a plate

I’ve discovered three World Cup onebox variations so far, giving specific answers directly on the search result page. I’m expecting that Google are keeping their eyes on the search trends and we’re likely to see more variations before the tournament is over. (Top scorers perhaps? Red and yellow cards?)

If only more websites tried as hard as Google to help visitors with the tasks they want to complete. Google’s functional simplicity is second to none.

Google’s broken date recognition

I don’t exactly know when it happened (probably an effect of the “May update” Michael Grey spotted the date problem during April), but Google has clearly got some problems with how they are currently deciding when a page was published.

Trick Google

Simon Sundén pointed out two weeks ago in this article on his Swedish blog that it was easy to trick google into showing any date you wanted in search result pages. Simon suggested that Google was giving extra weight to dates in titles and main headings. But Google’s problems appear to be even more wide-spread.

Google’s algorithm is currently making some really poor guesses as to the published dates of certain articles. Hans Kullin has today spotted that Google is changing correct dates in their search results for old articles from Swedish newspaper Aftonbladet to incorrect dates based on the date they happen to re-index the page.

Aftonbladet example

Let’s take this Aftonbladet article from March 2008 – Bojkotta inte Kina-OS!.

Screenshot of Aftonbladet

You can see from the date in the above picture that Aftonbladet are clearly saying that the article was published on the 20th March 2008.

Screenshot of Google SERP

When we search for that article, Google is telling us that it was published on the 27th of May 2010 (yesterday at the time of writing this).

Screenshot of source code

Why though? Well, the first date that Google reaches when indexing the html of that article is indeed the 27th of May (as you can see in the above image). The date the article was published comes later on further down in the code. In addition, today’s date is repeated a second time in the code towards the bottom of the page.

The most reliable date?

Screenshot of the trigger date in the Aftonbladet menu

Aftonbladet are showing today’s date at the very top of their left hand navigation. (and by the side of their search box in the page-footer) Google’s current broken way of establishing the date that an article was published is seeing this date and deciding that it is the most reliable date on the page.

Exploiting the problem

Hopefully Google will fix this. Given the importance and weight of recently published content, we’re going to see a lot of people exploiting this problem with Google’s date calculation algorithm in order to push their old content back up the search result pages.

11 Articles worth reading… (Spotted: Week 19-20, 2010)

Free SEO Copywriting Report

Related to my SEO Checklist/SEO Guidelines for content writers Brian Clark has covered similar ground and gives some good advice in his PDF. (although ultimately it’s an advert for their automated product – Scribe)

Blog Title Optimization: 6 Simple Steps for SEO Copywriters

More SEO writing tips, this time Dan Zambonini gives some blog post title tips. No reason why his advice should be limited to blogs, although he’s missed the chance to optimise the <title> seperately – giving the chance to hit a bit of a balance between humans (readability) and machines (findability)

Why use a hierarchical, hyphenated URL structure?

Another good, educational article in LBI’s “FAQ” series. They are good to have in stock to share when someone comes with a “why?”. Full marks this time for the use of cheese in the example.

Google Experts Answer your SEO Questions

A gang of 5 Google experts do some straight talking and provide a few to-the-point answers for web managers.

Intranet content manifesto – 2nd draft

An updated Intranet content manifesto. Nice idea – not guidelines or rules, but a manifesto. It’s been increasingly popular to produce such “manifestos” for varies topics. It’s a good way to build up some common ground and a feeling of inclusion

The Generation Gap in Your Office

The Rise of Gen Y (Millenniums) in the Workplace – Your Company’s Communication is About to Change.. An American infographic; but the pattern is the same in the UK/Sweden (and many other countries)

A Case Study on Enterprise Microblogging (PDF)

A write-up of the launch of enterprise “microblogging” (ie Status updates) within a 150 employee company in September 2008 (the study itself covers a period of March 2008-March 2009”.

Safe landing – a review of the direct deposit banking experience

In-depth article about direct banking, usability & eye tracking. James Breeze takes a look at landing pages & form completion.

7 ways to improve your call to action

We’re seeing time and time again in eye tracking studies just how little time people spend on landing pages before making a decision. This Conversion Room blog post from Google gives a whole load of tips and further reading.

What iPads and Tablets Mean for Web Analytics

Death of the dashboard & the age of segmentation? We interact with the Internet differently though mobile devices and tablets than we do through “traditional” computers – This makes understanding visitor behaviour and statistics a whole lot more complicated. Throw in that people “jump” between devices and we’re doomed!

Google Font API & Interview

At I/O one of the things Google launched was Google Font Directory… It’s Basically @font-face using Google’s resources; it’s nothing revolutionary, but it will be useful from a speed viewpoint.

Page load times and big fat Swedish newspapers

Many major newspapers have notoriously very bulky websites. In fact, they are generally some of the most overweight and unhealthy sites on the internet.

Load time matters

Why is this a bad thing? Well, mainly slower loading times. People have very little patience for things to happen online. When combined with a 3G mobile broadband internet connection loading times take a further hit. If testing the patience of your visitors wasn’t enough, Google has even started taking page load times into account in their search results. You could even argue that larger pages have a larger carbon footprint due to the due to increased CPU usage!

The test

Let’s take a closer look at the major Swedish newspapers. Over a period of two weeks I tested Dagens Nyheter, Svenska Dagbladet, Expressen, Aftonbladet, Sydsvenskan and Göteborgs Posten.

Using the Firebug add-on to Firefox I recorded how long it took for the start page of each newspaper to load. Each time I used the same computer, in the same place, connected to the internet via the same 3G mobile internet provider. Before loading the page I emptied the cache of Firefox to ensure that all elements of the page were required to be downloaded.

The results

The websites of all of the Swedish newspapers tested generally weighed more than 2MB and took between 20-35 seconds to load (uncached) over a 3G wireless network. Alexia classes all the newspapers tested as “very slow” and groups them in the slowest 10% of websites on the internet

Graph average page load time swedish newspapers april 2010

The slowest of the websites was Aftonbladet. On some occasions it was very slow (and has the honour of being the only site to ever take more than 40 seconds to fully load) and was also the website that most often caused the fan on my laptop to speed up considerably as it battled to cool the processor down due to the amount of flash video being displayed simultaneously and continuously.

Graph average page sizes swedish newspapers april 2010

An interesting observation was that when reloading pages the majority of the content was, of course, cached (with the exception of Aftonbladet which managed to serve up almost 50% new content) but the load time remained almost the same. This was largely down to the sheer volume of requests made to build up the page. In the case of Aftonbladet, it’s start page is normally comprised of over 300 requests.

During the first week of testing, Sydsvenskan was by far the heaviest of the websites. In the above graphs I have only included Sydsvenskan’s figures from the first week of testing due to the significantly different results during the second week.

New Sydsvenskan

During the second week of testing Sydsvenskan released a new version of their website. Initially I thought this would be a bad thing for my testing, but it quickly became apparent that page size and loading time had been a specific consideration when building their new site. So instead of disrupting my testing, it give me an opportunity to see what difference optimising a size for speed could make.

The results were impressive. Sydsvenskan is now the lightest of the Swedish newspapers by a considerable margin. It weighs in at just 43% of the size of Aftonbladet (the fattest and slowest of those tested) and loads twice as fast.

Graph page load time sydsvenskan april 2010

Above the fold content

Also during the second week I also recorded the time it takes for content above the fold to appear, as in reality we don’t wait until every single part of the page has loaded before we start scanning the page and reading content. During this test, I stopped the timer as soon as the leading story’s headline was visible (even though at times adverts and some other content were already visible). This test showed that the lighter newspapers displayed above the fold content three times as fast as the heavier ones. Dagens Nyheter was an exception here and manage to join the thin boys despite it’s unhealthy BMI.

Graph page load time above fold swedish newspapers april 2010

30 seconds? Goodbye!

Generally people appear to have more patience for newspaper sites than e-commerce sites. If clicking on “confirm purchase” on your site took 30 seconds you’d be losing a lot of sales, but on a newspaper people evidently wait for the content to load (or more likely start reading text content above the fold long before everything else on the page has loaded).

Flash based adverts

A large part of the bloat on Swedish newspapers’ web sites is advertising and in particular flash-based advertising, The worst offenders are “video” adverts that play automatically when the page loads.

Lighter is better

With lighter, faster, more responsive pages, the newspapers would reduce bandwidth costs, increase the number of page views, and ultimately give their readers an overall better experience. But given the seemingly never ending focus newspapers’ place on making advertisers happy rather than their readers I doubt the (global) trend for heavy bloated online news sites is going to end soon.

Perhaps Sydsvenskan can be the catalyst for change? Well, perhaps it can be here in Sweden.

5 of 9
123456789
Reload this page with responsive web design DISABLED