We host scores of different websites for our clients, using a raft of different technologies. In fact, often the best solutions to our clients’ needs come from blending different technologies to deliver a single site. For example, a site might contain some simple web pages managed using a Content Management System (CMS), a large bank of questions and answers delivered from a database and some location information presented using Google Maps. However, all this technology comes at a cost – how do we make sure that all parts of your website are always working correctly?
Ensuring your website runs smoothly
One of the simplest ways to monitor your website is to visit it. Since it’s not practical to do this ourselves, we set up a monitoring system (Nagios) that does this for us. We know what we would expect to find on specific parts of your site, so we configure the monitoring system to visit these areas. We make sure that these areas are representative of all the different technologies on your site. The monitoring system visits these every few minutes and compares what it finds with what it expects to find. If there’s a discrepancy, it alerts us so that we can investigate further.
Of course, it’s more helpful to spot potential problems before they arise. To do this, we monitor all sorts of information about the systems that host your website (using tools such as Munin and MRTG), to make sure they can meet the demand placed upon them and to spot anomalies as they occur. So, for example, this graph was generated by one of these tools and shows the number of requests made over the past few weeks to a CMS that we host. Whilst the details aren’t important, you’ll see that there’s a large ’spike’ in the middle of week 50.

Coincidentally, at this time we were trialling Splunk; software which collects and correlates data from all our different systems in one place. This helped us to rapidly track down the problem to a 3rd party, who were inadvertently accessing content from the CMS several times a second and thus slowing down the site. So we contacted the 3rd party who were able to address the glitch in their system. As you can see from the graph, the traffic was back to normal the following day.
Collecting all these data serves other purposes as well. For example, we can analyse the performance of your website and identify potential areas for improvement. Once these improvements are made, we can then confirm that they’ve had the desired effect. In fact, there’s a whole host of techniques we use when trying to squeeze the best possible performance out of your site and we’ll take a glimpse at some of these in a future posting.
This entry was posted on 15th January 2010 at 10:16 am and is filed under Briefings. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a comment