Monday, December 11, 2006

XML is not a markup language

XML does not deserver its "ML", or even its "X". But first, the "ML" part.

I am one of the world's leading experts on markup languages. I'll start there. I'm a 20-year veteran of desktop publishing, am personally related to the author of one of the very first markup languages in the world (Scribe), and have actually used SGML, MML, HTML, and most of the other markup languages that came along decades before XML.

So I know what I'm talking about. XML is not a markup language.

A markup language is predicated on the idea that the markup is an exception in a river of text. That is, the markup is a departure from the state that existed at the time the markup was encountered.

One of the first instances of this was the TROFF mechanism in UNIX, used for formatting "man pages". A simple example was that a line that started with .i was italic. So you might format a sentence with an italic word in it like this:

Here is an
.i emphasized phrase
and back to normal text

The same basic approach is used in HTML, except that it's not line-oriented, so you need a "close delimiter" other than carriage return (which is actually a pretty handy closing delimiter, but I digress). So the same thing in HTML is:

Here is an <i>emphasized phrase</i> and back to normal text.

The idea of markup is that you literally mark up a text, "circling" things, if you will, giving instructions to the typesetter (or parser, or other) that this snippet of text is to be treated somehow differently.

Another tenet of a markup language is that only the syntax is specified. The semantics of what the markup means is implicit (HTML) or described earlier (Scribe) or some combination of the two (CSS).

But here's the real kicker: a pure ASCII text file is a valid example of any markup language. That underscores the notion that the markup is a departure from the river of text. So a plain text file is technically a valid HTML file (though they ruined that purity with XHTML and CSS by requiring tags in it, but that's because they too didn't really know what a markup language was).


  1. Too bad trackbacks don't work with blogger.

    I understand the ML part of your post and think that i'll understand the X one too.
    But would you please tell us why it sucks?

  2. Wow, in your profile you've got "Creator of iMovie and iPhoto". I hope you weren't responsible for handling XML in Photocasts, because author of that feature apparently hadn't slightest clue how XML works.

  3. Does this imply that TeX also isn't a markup language? What about the other ML's that you listed in a later post, do they all take plain text as valid input?

    I think your criteria for a markup language is a valid one, but not the most important one.

    Anyway, I agree that XML isn't a markup language, but for completely different reasons. It is a meta language, on top of which you can define other languages, which may or may not be markup languages. XHTML has some elements of a markup language, but the (mis)use of (X)HTML documents as "web applications" with AJAX and stuff clearly means that XHTML needs to be much more than just a markup language.

    It would be quite interesting to see a markup language based on XML that concentrates on the markup and explicitly lacks many features of XHTML, though.

  4. every tool sucks if you do not know how to use it. xml is not a markup language, but a language to define markup languages. you can use it to design good MLs, and you can use it to describe MLs that suck. I agree that the latter are by far more numerous, but that is not so much the fault of xml, but rather of the thousands of lazy developers who spend their time on taunting xml rather than on trying to grasp its principles.