 
- Image via Wikipedia
I am frequently overly enamoured of the idea of where we might get to, forgetting that there are a lot of people still getting used to where we’ve been. I was forcibly reminded of this by Carole Goble on the weekend when I expressed a dislike of the Utopia PDF viewer that enables active figures and semantic markup of the PDFs of scientific papers. “Why can’t we just do this on the web?” I asked, and Carole pointed out the obvious, most people don’t read papers on the web. We know it’s a functionally better and simpler way to do it, but that improvement in functionality and simplicity is not immediately clear to, or in many cases even useable by, someone who is more comfortable with the printed page.
In my defence I never got to make the second part of the argument which is that with the new generation of tablet devices, lead by the iPad, there is a tremendous potential to build active, dynamic and (under the hood hidden from the user) semantically backed representations of papers that are both beautiful and functional. The technical means, and the design basis to suck people into web-based representations of research are falling into place and this is tremendously exciting.
However while the triumph of the iPad in the medium term may seem assured, my record on predicting the impact of technical innovations is not so good given the decision by Google to pull out of futher development of Wave primarily due to lack of uptake. Given that I was amongst the most bullish and positive of Wave advocates and yet I hadn’t managed to get onto the main site for perhaps a month or so, this cannot be terribly surprising but it is disappointing.
The reasons for lack of adoption have been well rehearsed in many places (see the Wikipedia page or Google News for criticisms). The interface was confusing, a lack of clarity as to what Wave is for, and simply the amount of user contribution required to build something useful. Nonetheless Wave remains for me an extremely exciting view of the possibilites. Above all it was the ability for users or communities to build dynamic functionality into documents and to make this part of the fabric of the web that was important to me. Indeed one of the most important criticisms for me was PT Sefton’s complaint that Wave didn’t leverage HTML formatting, that it was in a sense not a proper part of the document web ecosystem.
The key for me about the promise of Wave was its ability to interact with web based functionality, to be dynamic; fundamentally to treat a growing document as data and present that data in new and interesting ways. In the end this was probably just too abstruse a concept to grab hold of a user. While single demonstrations were easy to put together, building graphs, showing chemistry, marking up text, it was the bigger picture that this was generally possible that never made it through.
I think this is part of the bigger problem, similar to that we experience with trying to break people out of the PDF habit that we are conceptually stuck in a world of communicating through static documents. There is an almost obsessive need to control the layout and look of documents. This can become hilarious when you see TeX users complaining about having to use Word and Word users complaining about having to use TeX for fundamentally the same reason, that they feel a loss of control over the layout of their document. Documents that move, resize, or respond really seem to put people off. I notice this myself with badly laid out pages with dynamic sidebars that shift around, inducing a strange form of motion sickness.
There seems to be a higher aesthetic bar that needs to be reached for dynamic content, something that has been rarely achieved on the web until recently and virtually never in the presentation of scientific papers. While I philosophically disagree with Apple’s iron grip over their presentation ecosystem I have to admit that this has made it easier, if not quite yet automatic, to build beautiful, functional, and dynamic interfaces.
The rapid development of tablets that we can expect, as the rough and ready, but more flexible and open platforms do battle with the closed but elegant and safe environment provided by the iPad, offer real possibilities that we can overcome this psychological hurdle. Does this mean that we might finally see the end of the hegemony of the static document, that we can finally consign the PDF to the dustbin of temporary fixes where it belongs? I’m not sure I want to stick my neck out quite so far again, quite so soon and say that this will happen, or offer a timeline. But I hope it does, and I hope it does soon.
Related articles by Zemanta
- Google Bails on Wave (wired.com)
- So Long, Wave, But You’ll Live On In Your Successors (webworkerdaily.com)
- Google Wave is dead! Long live Google Wave! (programmableweb.com)
- Google’s Departing Wave (cyberculturalist.com)
- Apathy kills Google’s new-age Wave (go.theregister.com)


I see a few things holding back web-based alternatives to the paper model, actually. One of the most frustrating to me has to be the complete lack of widespread math support as extensible and as mature as LaTeX. I feel like having LaTeX-grade math support in a web-based platform would do a lot for adoption, as it would broaden the range of fields that could benefit rather drastically.
Sure, there's some LaTeX support mixed here and there in blog engines, Wikis and similar, but from what I've seen, they tend to suffer a lack of extensibility and rely on bitmap images that don't rescale properly with the text, much less fit in typographically. MathML was supposed to deliver us from this, but between half-assed browser support and an overly verbose syntax that is unsuitable for “by-hand” use, it seems to have fizzled by comparison to LaTeX-based solutions.
A distraction from Wave, to be sure, but I think that the document model will persist in the sciences at least as long as LaTeX has a monopoly on the typesetting of mathematics, frustrating implementation of Wave-like solutions.
Indeed, maths is a serious problem on the web, and something that has definitely made people very keen on TeX. Again though my perception is that alternatives have not got traction because they are not seen as being as “pretty” as TeX. Getting TeX/MathML support into Wave was something that @axiomsofchoice and others put quite a bit of effort into and there were some nice examples but they were proof of concept not ready for the mainstream.
My feeling is that what we will see is systems that enable TeX based authoring of maths which then get converted to MathML (which is a semantics and layout tool really and not much good for authoring) and displayed nicely but flexibly. The first bit exists but I'm not sure that the rendering is there yet. Ultimately it will probably be possible to just directly write the maths down and have it recognised and converted to MathML
As someone who has spent a bit of time trying to write LaTeX maths plugins for various webapps, I feel a lot of frustration about this. The tools to do this extremely well have been around for a fairly long time, but most of them are directed towards converting complete LaTeX documents into standalone webpages and it is not easy to adapt them to plug into other systems, especially if you are trying to maintain compatibility with basic webhosting accounts. As a result, almost all LaTeX plugins known to me are hacks and they usually have fairly serious bugs because people only test them on a few simple equations rather than writing out long mathematical expositions.
The few systems that do handle mathematics very well are usually built with the idea of mathematical formulas being first class objects from the outset, rather than trying to build maths support into an existing system built by people who have no idea about the needs of mathematical scientists. One of the best I know of is http://cnx.org/ which will let authors submit LaTeX and convert it into very good HTML+MathML automatically. Basically, I think we should build more systems like this, i.e. wiki and blog engines with ground level maths support, rather than trying to hack existing systems.
On the rendering front, I am hopeful that adoption of MathJax http://www.mathjax.org/ will greatly improve the situation. This is not an issue that we can sit back and wait for the browser developers to deal with, since they have clearly demonstrated that MathML support is a marginal concern even for those browsers that proudly tout their “standards compliant” status. The situation is rather more like that of javascript, where people are resigned to browser inconsistencies, but handle it via libraries like jquery rather than waiting for the browser developers to do something about it.
Cameron, I'm not quite sure I “get” the theme of this post, but I'll confine myself for a minute to Utopia Document. What bothers me here is the very strongly different nature of the tool. Compare this paper, for example: Ruthensteiner, B., & Hess, M. (2008). Embedding 3D models of biological specimens in PDF publications. Microscopy research and technique, 71(11), 778-86. doi: 10.1002/jemt.20618. This doesn't require a different viewer, but let's you view the models in PDF. It seems to work quite well for me.
I'm keen in principle on the idea of scientific articles migrating from PDF as default to HTML as default. In theory, with a little added RDF etc, we could do wonderful things, embedding supplementary data. However, in practice the result is pretty much rubbish even for quite ordinary articles. It seems to be very hard to get figures and tables to look right and be readable in context. And worse, there doesn't seem to be a standardised way of saving a packaged web page so that it can be left for years and then read even when the originating site has gone. Safari's .webarchive format is able to do that, but is not readable by anything else as far as I can see. the standard “save html” format seems to be a .html file and an associated directory of other stuff; hardly handy, even if in practice several browsers seem to be able to read it!
I've just read Brian Kelly's post on formats for repositories, in which he talks about the EPUB format (see http://ukwebfocus.wordpress.com/2010/08/04/epub… for the post). He says:
'EPub is described in Wikipedia as “a free and open e-book standard by the International Digital Publishing Forum (IDPF)“. The article goes on to add that “EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.“
'In terms of the open standards used EPub consists of three specifications:
'Open Publication Structure (OPS) 2.0, contains the formatting of its content.
Open Packaging Format (OPF) 2.0, describes the structure of the .epub file in XML.
OEBPS Container Format (OCF) 1.0, collects all files as a ZIP archive.
'The articles states that “EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB metadata. Finally, the files are bundled in a zip file as a packaging format.“'
It sounds much more “of the web” than PDF, and might be very interesting!
I love jsMath and MathJax, but there's a peculiar problem I've run into with using them. Unless I'm terribly mistaken, both require a few files to be placed on the same domain as a page using the toolkits. This makes it hard to use them with, say, Blogger or WordPress (running on a shared server, anyway). Since blogs form an important channel for discussion, I would have hoped that this problem would be tractable, but so far, I've not run into a good solution. That said, I wholeheartedly agree that such toolkits are a good way forward, as they allow for LaTeX to be used as an intermediate, author-facing language, and yet retain the benefits of standards-compliant web technologies.
It is fairly straightforward to configure a server on which MathJax is hosted so that it can be used from pages hosted in another domain. I understand not everyone would be willing or able to do this on a server, but the installation at mathjax.org has been configured in this way precisely to support people using Blogger. See https://sourceforge.net/projects/mathjax/forums… for details.