Archive for June 2009

Brad Scott

Neat dictionaries

Oxford Concise Dictionary of Music on the iPhone

After working on a number of online dictionary and reference projects over the years, it’s always nice to see some neat innovations. OUP have just made some of their wonderful Oxford Paperback Reference titles available on the iPhone. Maybe it’s time I got that gadget.

I was also excited to read the ReadWriteWeb item about Wordnik[1]. It may not include the breadth and currency of other online dictionaries, but it has creatively pulled together a lovely range of supporting materials in a nice user experience. It shows how effectively you can utilise data that is available via APIs from other sources. Its dictionaries include American Heritage Dictionary, Websters (1913) and a few others, but what makes it exciting is the other items: the examples from texts at Project Gutenberg; thesaurus items; Twitter usages; pictures from Flickr; a graphical view of the occurrence of the word over time; etymology; and pronunciation. Users can add their own notes, as well as pronunciation examples. No doubt more funky features will get added. Do other dictionary publishers need to raise their game?

tennis entry in Wordnik
  1. Lardinois, Frederic. “Enamored With Words? You’ll Love Wordnik.” 9 June 2009. ReadWriteWeb
Brad Scott

Is the semantic web getting easier to do?

Is the momentum building on the whole linked data and semantic web thing?

Finally catching up on some reading, I saw the piece in the Guardian about how Tim Berners-Lee is to help the UK government make its data more easily available online[1]. This can only be a good thing for helping to get the awareness out there, not only of how to do it, but also that it can work. The Linked Data initiative certainly has some useful material on making it happen, and the spring report from PriceWaterhouseCoopers also focuses on the semantic web and how some businesses such as the BBC are now beginning to engage with it.

Last week at the Semantic Technology conference held in San Jose the keynote from Tom Tague of Thomson Reuters’ OpenCalais gave a useful introduction to the trends in this very interesting area.[2] There should be more details about many of the papers and other talks appearing on the conference website soon.

Making a start with the semantic web should be getting easier, as the recent announcement about Google Rich Snippets made clear, though as Richard Padley noted in his blog, Google’s use of RDFa is not completely kosher.[3] In parallel with that development, Common Tag has also opened up an RDFa-based means of getting a decentralised interoperability between tags.[4]

How far have you got with your engagement with the semantic web? I’d be interested to know to what extent publishers are starting to put a toe in the water.

  1. Arthur, Charles. “Web inventor to help Downing Street open up government data.” 10 June 2009.
  2. MacManus, Richard. “The State of the Market in Semantic Technologies.” 16 June 2009. ReadWriteWeb.
  3. Padley, Richard. “What does Google’s RDFa support mean for publishers?” 18 May 2009. The Discovery Blog.
  4. O’Dell, Jolie. “Common Tag Brings Standards to Metadata.” 10 June 2009. ReadWriteWeb
Brad Scott

Misplaced apostrophe

Misplaced apostrophe

In Oxford yesterday and couldn’t quite believe this one outside the George Inn in Botley. I had to go back to have another look and then got snarled at by a cyclist since I was rather distracted by the migrating character. I’ve not seen this particular occurrence before, though maybe someone in the Atrocious Apostrophe’s group on flickr has.

Brad Scott

Lots and lots of data

I’ve been involved with the publication of products containing fairly large amounts of data for well over a decade now, and finding some old articles of mine made me think about what has changed for publishers who handle such content.

Certainly, the volume of data for individual projects has increased, which in turn has meant that publishers have got a bit better at managing and archiving their data assets, though I wish that were more generally true; valuable data can still be stored in the equivalent of a shoe box with inadequate documentation. Suppliers are generally better (and cheaper) too, not least since they now have more familiarity with the important data standards. Even so, data testing and QA can still be problematic, and that is equally true internally within publishers.

Compared with a decade ago, the user requirements and expectations tend to inform data design more, and some publishers certainly have well-thought-out and documented data models that have been constructed with usage in mind. But, the technology platform that delivers the content can sometimes be what shapes the data, rather than the user, and that can lead to some ugly and inflexible choices.

Nevertheless, when faced with a new data creation or migration project, there is still an unavoidably large amount of grind and planning required to get it right. That’s what I found so interesting re-reading these ten-year-old articles. Though the delivery technology has changed, the processes and thinking required isn’t very different, and I could have written similar things about many of the projects I’ve worked on since then.

Cover of Asia Official British Documents package

The articles themselves date back to when I was digital publisher at Routledge in the late 90s. One describes the creation of Asia: Official British Documents (1998)1, which was published with the British National Archives, and comprised 40,000 page images of original archive content plus metadata; and the second focuses on the data of the Calendar of State Papers Colonial series (1999).2

The former was mostly an exercise in tracking bits of paper in a database, but the latter was an SGML implementation, drawing on the models of the Text Encoding Initiative (TEI) and the Model Editions Partnership. In the years since then I’ve been extending the TEI for several other projects, such as  the New Palgrave Dictionary of Economics, and the MLA Handbook, which has meant adding in MathML and the CALS table model. Fundamentally though, the process for planning and creating the data for these products hasn’t changed much at all.

  1. Scott, Brad. “Creating an Image Edition of Historical Material: Asia: Official British Documents, 1945-1965″ 1998.
  2. Scott, Brad. “Retrospective Data Conversion in a Commercial Publishing Environment: The Calendar of State Papers, Colonial” 1999.
Brad Scott

Let the blogging commence

When I was a publisher it felt like I was drowning in information about online publishing, and that was back in the 90s. Since I started my freelance digital publishing activities a few months ago it’s been interesting to see that many publishers still don’t really have enough time for reading about the industry. It can be a surprise to those of us who work on the digital end of things that there are lots of publishers who need some useful pointers and guides through the maze.

So, as I was reading, I started making some notes and passing them on to a few publishing friends. In the back of my mind I knew there was a blog trying to get out, and so here it is. I’ve had a bit of space at last to sit down and get it up and running.

Thanks to everyone who has been feeding my interest and enthusiasm these last few months. Do let me know what you’d like to see here.

Digital publishing consulting

With twenty years' experience in the information industry, and a broad range of activities in the digital/new media sector since 1994, Brambletye Publishing offer invaluable expertise for publishers and other information professionals. Read more