University of Bristol | ILRT | IntDev blog

Jump to content Subscribe via RSS

This is a blog from the Internet Development Team at ILRT, Bristol. We build websites and web applications for a wide variety of customers, many in the UK higher education sector. Continue reading…

A closer look at content migration

Redesigning your existing website is an exciting time. Kieren Pitts takes a look at one aspect of the project that is often given only cursory attention…. content migration.

How hard can it be?

Clients, perhaps not unreasonably, assume that all their existing web content can be moved to a new website with the minimum of time and effort.

Unfortunately, even if content can be moved easily, moving everything without careful consideration can be a false economy. In this article we explore content migration and why it is important to plan the move carefully.

Plan your move

The key to a successful migration is planning. The key steps, in suggested order of completion, are:

  1. Create a list of all your existing content
  2. Determine what content should be moved
  3. Review and update the content you plan to move
  4. Move the content to a development version of your new site
  5. Set up redirects and fix broken links
  6. Launch the new site

We will look at the first five steps in more detail during the rest of this article.

Listing existing content

It may seem an odd question to ask, but “do you know what content actually resides on your website?”

On any site with multiple authors (or websites that have had a succession of authors) it is likely that no one person knows the full extent of the website. Even if you think you do you still need a list of that content so you can plan the migration.

It may be that your existing Content Management System (CMS) will generate a list of all the content for you. If you don’t use a CMS or yours doesn’t produce a list then you can use tools such as Xenu to create a list. We can also do this for you as part of our content services.

Determine what content should be moved

Is it a good idea to move everything from an old site to a new site? Not always.

One simple analogy is that of moving house. It is time consuming and costly to move everything to your new house without having a clear out first.

This applies to websites too. It is likely that some content on your website is out of date or no longer useful.

What content should I remove?

You can determine what content is not useful in several ways (we can also do this for you or otherwise assist in the process as part of our content services):

  1. Site traffic – you can use server log analysis or Google Analytics to work out when a piece of content was last accessed. You might then decide that pages that have not been accessed by an external user (remember to exclude automated bots and your own traffic from your analysis) in, say, the last year should not be moved to the new site.
  2. External links – by using tools such as Google Webmaster Tools you can determine what content is linked to from other websites. If content has not been linked to (and isn’t being accessed) then it may be a candidate for removal.
  3. Task analysis – traffic and external link analysis can identify some content to remove but your existing website may have shortcomings that obscure the value of some content. For example, confusing navigation and a lack of site search might mean users cannot find relevant content even though it is present. This is where user task analysis comes in. Task analysis allows you to identify groups of users and what they want to achieve on your site. You can also find out how users expect to navigate to key information by doing some usability testing.

Always have a safety net

Once you have identified content that you do not plan to move, take a copy of it in case it is requested later.

Review and update content first, move second

Having identified the content to move to your new site, avoid moving all the content to the development version of the new site and then reviewing it.

By moving the content before the review you run the risk of:

  1. Not reviewing the content – If you move your content before you review it you may forget to review it (or run out of time). Also, if regular visitors go to your new site and see the old content in a new design it could create a bad impression.
  2. Missing the opportunity to rethink the content – You should ensure content is presented in the best way possible. For example, a single page listing the contact details for staff might be more useful if implemented as a search to allow users to find someone’s details quickly.

Moving content – can this be automated?

Clients assume there are readily available tools that will migrate content from one website to another. Unfortunately there’s no generic solution and usually time/money are required to develop a bespoke migration tool. However, some websites and content lend themselves more easily to automated migration. We can identify content that could be migrated automatically and, for content that cannot, we offer content editing and manual migration services.

There are also good reasons why automation may not be the best approach:

  1. Quality issues – if your site is old (or implemented poorly) the code used to structure the page content (headings, lists etc.) may not adhere to the formal specifications for HTML (the language used to write web pages) or current best practice. In addition, there are several different versions of HTML and a site with code written to conform to the specification for HTML 3 would not conform to modern coding standards such as XHTML. Moving this content without editing the code might result in display problems.
  2. Technology changes – over time, the specification for computers used by the majority of your users gets better. For example, in 2002 it was likely that most users visiting your site did so using a monitor set at a screen resolution of 800 × 600 pixels. In 2009, the average resolution is 1280 × 1024. So, an image with dimensions of 400 × 300 would have filled 25% (based on total number of pixels) of the screen area in 2002. The same image on your site today will fill less than 10% of the screen area. Moving this file to the new website might be a false economy and time would be better spent replacing the image instead.
  3. Coping with ambiguity – writing a content migration tool is easy when the existing content follows a clear structure. Writing tools to cope with variation in structure or code is difficult. Where a lot of variation exists in the mark-up, the time/money is arguably better spent on a good content editor to review and move the content by hand.
  4. Fashions change – the tone of the old content might not fit the new site, or people’s understanding/expectations have changed since the content was written. For example, does your existing site refer to old web browsers (Netscape or Internet Explorer 6 for example) or use old fashioned phrases such as “electronic mail addresses”? Moving content automatically might mean that these issues are overlooked.

The forgotten steps – redirects and link checking

In an ideal world online content would always be relevant and cool URIs would never change. Unfortunately few websites are built with web address/URI longevity in mind which is a shame and poses problems for users and content maintainers.

Restructuring or removing content causes problems and these are often overlooked when sites are moved.

For example, if you move:
http://www.example.com/about.html
to
http://www.example.com/about/
then you break all links to the original location.

This causes several problems:

  1. Broken links are frustrating for users, reduce trust and make your site look unprofessional.
  2. Google’s ranking algorithms are built to assess the presence, structure and persistence of external links to your content. If you move/remove content and break incoming links then you’ll affect your site’s ranking within search results.

There are simple steps to avoid these problems:

  1. If you’re moving content then put in a 301 redirect on the server so that users (and search engine robots) following the old link will be automatically taken to the new location. Don’t do this with a client side meta refresh.
  2. If you must remove content, and there is no replacement, show users a page explaining why the content was removed – this is much better than letting users receive the default 404 file not found error.
  3. Update all links within your site to reflect the new content structure. Don’t rely on the 301 redirects when it comes to internal links.
  4. Having corrected all the internal links to the old content, link check your site using a tool like Xenu or htcheck. If these tools find broken links or internal redirects (an internal link that results in a server redirect) then fix them.
  5. After “go live” review your logs and use Google Webmaster Tools to spot broken links from other websites. Fix these using server side 301 redirects as this is more effective than asking other webmasters to correct their links.

Conclusions

Clients often see content migration as a simple task that should be automated. However, the more time spent reviewing and updating your content before the move, and always keeping your users in mind, the better the resulting website. We have content editors who can assist with migrating content and developing new, professional and compelling content as part of our content services.

Above all, you want your new site to work for your users. They will always appreciate good quality, concise content that is easy to find. Make sure you factor in some time to focus on the quality of the content as part of any new website project.

Kieren Pitts – Senior Analyst/Programmer

This entry was posted on 11th December 2009 at 8:48 am and is filed under Briefings. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

2 comments

  1. Good article, thanks.

    Could you please give me more details on how to determine when a certain webpage was last accessed in Google Analytics? I can’t determine how to do it and think it would very valuable to know when make a decision on what content stays and what goes.

  2. Hi

    Thanks for the comment. Ideally you’d analyse the server logs as the process is a lot quicker (especially if you have to do many pages).

    If you do have to use Google Analytics then go to your main dashboard page for the site. Select “Content” in the side navigation bar, then select “Site content”. In the main part of the page look for a search box, this is below the main graph and before the list of pages. In the search box type in the URL of the page you are interested in. For example: /about-us/

    The reporting will then be restricted to just that page. You can then extend the time period (using the box at the top of the page) to find out when the page was last accessed.

css.php