The Web is not an Attic
The End column, the Australian Industry Standard
Issue 9, 11 September 2000
by Neale Morison
There has been talk of content management on the Internet. The idea appears to be that you buy someone's content management application, for a quarter of a million bucks, spend another couple of hundred on development, and all your content problems vanish like beer from the fridge. But the stark, terrifying facts are these: even if you use content management effectively, you only use it for adding content. If you wanted to use it to retire content you'd have to recruit content managers capable of saying "Delete" not only on their own behalf, but on behalf of their management, their users and the entire population of the Internet. Persons with such fortitude, decisiveness and vision already have jobs where Tommy Lee Jones or Sigourney Weaver plays them in the movie version. The likelihood of finding such a person through any online job network is equal to the inverse of the number of pages on the Internet that are illegible because of the background 2.
The most a content management system will do is archive content, so it's still accessible just in case you need it.
The devastating difference between the attic and the Web is this: Attics don't have search engines.
Two men in dark suits arrive at the door holding up badges you can't read. "One side, sir, we have a warrant to search your attic".
You can't even remember how incriminating that stuff is. Embarrassment on steroids. There are kindergarten reports describing your attitude to toilet training. There are creative pieces you wrote at eight with too many adverbs. There are photographs of you in seventies fashions, worn with no sense of irony. There's the letter you wrote to your boss threatening to report his human rights abuses to the UN, but didn't send. And the letter you wrote to the UN. Consider them sent. Consider it all published.
You fall to your knees pleading, you claw at their trouser cuffs, but they stride past.
Google claims to have indexed 1,060,000,000 web pages. Searches are fast - less than a second. But Google hasn't just indexed those pages. It has cached them. Somewhere in its server farm Google has copies of every page it has ever indexed. It's very convenient, because when someone moves the indexed page, you can still find the content you were looking for in Google's cached version.
I think I'll just delete that page on my site that the legal department has flagged as likely to cost us ten million in damages. Nobody will know it was ever there. Oh oh. The Google spider has been through.
Google isn't the only one. These search engines have been amassing indexes, and saving at least part of the content of pages they have indexed, since the early nineties. If you have ever posted to a newsgroup or email list, if you have ever entered information in a Web guest book, it can be found. Your prospective employer can find it. Your nosy neighbour can find it. Your web-savvy honey-bunny can find it.
Don't try republishing anybody's material on your Web site without obtaining permission. They can find it straight away. Not only that, they can assess the damage you've done by finding out how many other sites are linked to yours on the Google advanced search page.
But it gets worse. Search engines are starting to index the results of other search engine searches. AnswerSleuth caches its search results, so if anybody has ever performed a search that has found your material, and another search engine has been through and spidered those results, then there's a handy cross-reference to your material.
World intelligence agencies: roll up your spy networks, down your surveillance satellites, forget your phone taps and rent out your interrogation rooms.
Internet terrorists: Fun idea for your next project - find some sensitive information and publish it on the Web.
1 - The Web content bomb - spurious speculations - - Grabbit and Poole, section 243.12.3
2 - ibid section 584.2