I mentioned in the last episode that Internet Explorer was the second-worst thing that ever happened to the Internet. Today I’ll talk about the absolute worst. It’s really a long technical rant that doesn’t matter, but it feels good to let it out. What follows is an underinformed ramble about the scourge that did the most harm to the developing computer network that went on to transform our lives — damage that we still live with today. Without this one corrupting influence, we would have had Internet applications that didn’t suck a decade ago, if not longer. In fact, it was because of this electronic plague that Microsoft was able to cause so much harm with Internet Explorer.
The culprit? The ball and chain that modern technology has dragged along despite its obvious flaws? Hypertext Markup Language, or HTML.
First, let’s start with the name. HTML is not a language. Not even close. It is a document format. That its inventors did not recognize the difference tells you that the wrong guys were doing it.
Second, it’s not a very good document format. At its heart, the inventors wanted a format that did three things: connect related documents, embed external resources (like images) and contain standard formatting information that would be interpreted by viewing software consistently. They were not the only ones developing systems like this; Josten’s Learning invented a similar system when they built the first multimedia encyclopedia for Compton’s New Media. Where Berners-Lee and friends had URL’s, Josten’s engineers created BRU’s, but beyond the initials the function was the same.
I don’t want to be too harsh on Berners-Lee, Cailliau, and the others who grew HTML, but I wish they’d been a little more far-sighted. I say ‘grew’ rather than ‘invented’ because it’s clear that they never sat back and asked themselves “What is a tag? What roles do they perform?” Even now, XHTML, the supposedly more rigorous (if still misnamed) descendant of HTML has fundamental inconsistencies.
For a simple example, take the <br /> tag. It exists because in HTML all whitespace (tabs, spaces, and returns) are mushed together and presented on the screen as a single space. Thus
come out the same on the screen. That’s fine if you know what’s going on. But what if you want to put in a line break or a space? Well, for a space you add a special character code and for break you add a tag <br />. Why is one a character and one a tag? Because on the day HTML’s inventors decided they needed line breaks, a tag seemed like a good way to go, even though semantically it had nothing to do with the roles of other tags. It could just as easily been &br; or something like that. That’s how HTML grew up. And thus the World Wide Web was born.
Another fundamental flaw is that the content (what to display) is all mixed up with the presentation (how to display it). What if you want to show the same document in different formats? Nope. While some tags were geared toward identifying the type of content that they enclosed (like the <p> tag), others were direct formatting instructions (like the <i> tag). This inconsistency in the role of tags in a document is a reflection of the organic (and sloppy) way that HTML was grown.
I really can’t blame the inventors of HTML for what came next. Everyone started using it. Everyone. The flaws and inadequacies of the format quickly became apparent. Different document viewers (browsers) rendered things differently. Formatting options were extremely limited. The systems were vulnerable to abuse by unscrupulous people. Right then, there was a chance for people to say, “hold on a second! Let’s take the idea of HTML and apply the lessons we’ve already learned in other branches of computing, and make something that doesn’t suck.”
Rather than scrap HTML, browser makers and others set out to fix it. That was the Big Mistake. After twenty years of tweaking and bickering and incompatible extensions introduced by browser manufacturers and squabbles and lawsuits, HTML has been upgraded from awful to poor. Along the way, companies like Adobe and Macromedia thought to get their technology adopted as a replacement to HTML (the Web in pdf? Interesting…) but those efforts were doomed from the start because they did not provide free, simple tools to create the content.
HTML’s greatest shining virtue (and it’s an awesome one) is that it’s accessible to anyone who can type. Anyone. No special tools required.
So, now we have style sheets to help separate content and presentation, XHTML to fix some of the semantic craziness of HTML, and browsers are finally starting to agree on what all the formatting instructions actually mean. We could have had that fifteen years ago if people had just let go of HTML, but here we are now, with an almost-functional system. There are still plenty of flaws, however. Things that seem so normal now that we don’t even think about how dumb they are.
Take this blog, for instance. It’s a pretty well-built Web application, based on reasonably up-to-date practices. Yet were you to click the comment link at the bottom of this episode, you would go to a new page. On that new page the browser would reload the same header and the same sidebar it just erased. What a waste! Why does it do it? Because that’s how HTML (and HTTP, the underlying part that communicates with servers) works. There have been abortive attempts to fix that over the years, but they have all been flawed. Now, at long last, techniques have been developed to overcome that problem, but they are not quite ready for prime time yet. For one thing, they are very complicated, and for another they rely on browsers working just right. Why was it so hard to implement? Because at its core the Web was not made that way.
Even in the days when almost everyone was on dialup (except the people inventing HTML), no one stopped to say, “hey, let’s make a way to only update the content that changes.” That problem has now been ‘solved’ by adding a new layer of complexity on Web sites. By adding this layer (on top of CSS and so forth), we get sensible Web applications at last, but we take away the one super-cool thing about HTML. It is no longer a simple format that can be harnessed by anyone with a text editor. We have lost the attribute that was the only reason to keep HTML around in the first place.
So now we have a system that is both inaccessibly arcane and flawed. Yay!