HTML Examples - Site Maintenance


There is more to maintaining a web site than just making regular updates. Generically, these need to be done for all sites. The specifics given below apply only to this site. Basically, with Apache on a unix server, the statistics are found by examining various log files. In each case, I use a cgi shell script.


404 Errors

I consider this to be the most important statistic because it tracks users having problems finding your pages. I have successfully used this data to find various typos and several browser design problems.

On a unix server (an operating system with case-sensitive filenames) the most common typo is having the wrong case. Since I do most of my testing in Windows (a case-preserving operating system which ignores case in filenames), there were a few errors which slipped through. A regular review of the 404 errors easily identifies these.

Even after all the case sensitive problems were fixed, I noticed that there are still quit a few case related 404 errors. Apparently, users are typing in the urls and, since they are not aware that this site is case-sensitive, there are numerous errors.

I also noticed that several urls were failing that had no obvious problems. After a little checking, I discovered that these were browser dependent - i.e. they worked fine with IE 4.72 but failed with Netscape. I went through and fixed these so that they now work with both products.

The following is the code for 404.cgi.

#!/bin/sh # # 404.cgi by Robert Clemenzi 1-13-00 # echo Content-type: text/html echo echo \<html\>\<head\>\<title\>404 Errors\</title\>\</head\> echo \<body\> echo \<h2\>404 Errors related to clemenzi pages \</h2\> \<hr\> echo \<xmp\> /bin/tail -50000 /l/apache/logs/error_log | grep clemenzi | egrep -v "(ico|ICO|\.\.)" echo \</xmp\> echo \</body\>\</html\> Notes on the code


How Users Find Your Site

How do users find your site? Are they using search engines, links from other pages, or bookmarks?

Well, the Apache server collects these statistics and a cgi script runs a query and displays the results.

It turns out that most of my pages are found via search engines. Mostly www.altavista.com.


What pages link to your site

It's neet to discover who is linking to your site. The trick is to search for pages which link to you and exclude your pages (which presumably link to eachother). Unfortunately, the syntax and results depend on which search engine you are using.

Altavista

  +link:cpcug.org/user/clemenzi -url:user/clemenzi 


Page Count - What Pages People Use

Tracking usage per page tells you which pages people find most often, which, perhaps, indicates which pages are most useful. This data can help you determine which pages to spend the most time updating and which are just using space.

In my case, the pages I write indicate information that I am interested in. However, if a page gets more hits than others, then I know that time spent improving that page is well spent

There are many ways to count the number of people using your site

I prefer the last choice because it is transparent and does not require me to modify my site. I mean, having 100 pages and having to support an equal number of cgi files makes no sense at all. Imbedded counters are executed every time your page is acessed. This not only slows down page access, but is browser dependent. The only thing that makes any sense is to let the server collect the statistics.

A few rules, when counting hits

I run a cgi script as a cron job every day around 3:00 AM. In unix, a cron job is a program that you can schedule to run automatically at a specific time. I picked 3:00 AM because I assumed that the server usage would be low at that time.

The following script processes the log file and generates an html file which displays the data in a table.

#!/bin/sh grep 'Last updated' /web/statistics/index.html > stats.htm echo '<h2><center>Current stats for ...</center></h2>' >> stats.htm echo 'Because the log file ...<p>' >> stats.htm echo '<table>' >> stats.htm grep clemenzi/t /web/statistics/index.html | _ egrep -v "(\.cgi|\.\.)" | _ awk '{print "<tr><td align=right>"$4" _ <td width=30><td><a href="$6">" substr($6,16,80)"</a>"}' >> stats.htm echo '</table>' >> stats.htm Notes:
Author: Robert Clemenzi - clemenzi@cpcug.org
URL: http:// cpcug.org / user / clemenzi / technical / HTML_Examples / Maintenance.htm