I’ve been involved with the publication of products containing fairly large amounts of data for well over a decade now, and finding some old articles of mine made me think about what has changed for publishers who handle such content.
Certainly, the volume of data for individual projects has increased, which in turn has meant that publishers have got a bit better at managing and archiving their data assets, though I wish that were more generally true; valuable data can still be stored in the equivalent of a shoe box with inadequate documentation. Suppliers are generally better (and cheaper) too, not least since they now have more familiarity with the important data standards. Even so, data testing and QA can still be problematic, and that is equally true internally within publishers.
Compared with a decade ago, the user requirements and expectations tend to inform data design more, and some publishers certainly have well-thought-out and documented data models that have been constructed with usage in mind. But, the technology platform that delivers the content can sometimes be what shapes the data, rather than the user, and that can lead to some ugly and inflexible choices.
Nevertheless, when faced with a new data creation or migration project, there is still an unavoidably large amount of grind and planning required to get it right. That’s what I found so interesting re-reading these ten-year-old articles. Though the delivery technology has changed, the processes and thinking required isn’t very different, and I could have written similar things about many of the projects I’ve worked on since then.
The articles themselves date back to when I was digital publisher at Routledge in the late 90s. One describes the creation of Asia: Official British Documents (1998)1, which was published with the British National Archives, and comprised 40,000 page images of original archive content plus metadata; and the second focuses on the data of the Calendar of State Papers Colonial series (1999).2
The former was mostly an exercise in tracking bits of paper in a database, but the latter was an SGML implementation, drawing on the models of the Text Encoding Initiative (TEI) and the Model Editions Partnership. In the years since then I’ve been extending the TEI for several other projects, such as the New Palgrave Dictionary of Economics, and the MLA Handbook, which has meant adding in MathML and the CALS table model. Fundamentally though, the process for planning and creating the data for these products hasn’t changed much at all.
- Scott, Brad. “Creating an Image Edition of Historical Material: Asia: Official British Documents, 1945-1965″ 1998. http://www.brambletye-publishing.co.uk/consultancy/creating-an-image-edition-of-historical-material/
- Scott, Brad. “Retrospective Data Conversion in a Commercial Publishing Environment: The Calendar of State Papers, Colonial” 1999. http://www.brambletye-publishing.co.uk/consultancy/retrospective-data-conversion-in-a-commercial-publishing-environment/