I have been asked to participate in a panel at the annual CERL conference - and to speak for no more than five minutes or so. Initially, I was just going to wing it, but then, in writing up a couple notes, five minutes worth of text found its way on to the screen. In the spirit of never wasting a grammatical sentence, the text is below. I probably wont follow it at the conference, but it reflects what I wanted to say.
CERL - British Library, 31 October 2012:
We all know just how transformative the digitisation of the inherited print archive has been. Between Google Books, ECCO, EEBO, Project Gutenberg, the Burney Collection, the British Library's 19th century newspapers, the Parliamentary Papers, the Old Bailey Online, and on and on, something new has been created. And it is a testament to twenty years of seriously hard graft. But, it is one of the great ironies of the minute that the most revolutionary technical change in the history of the human ordering of information since writing - the creation of the infinite archive, with all its disruptive possibilities - has resulted in a markedly conservative and indeed reactionary model of human culture.
For both technical and legal reasons, in the rush to the on line, we have given to the oldest of Western canons a new hyper-availability, and a new authority. With the exception of the genealogical sites, which themselves reflect the Western bias of their source materials and audience, the most common sort of historical web resource is dedicated to posting the musings of some elite, dead, white, western male - some scientist, or man of letters; or more unusually, some equally elite, dead white woman of letters. And for legal reasons as much as anything else, it is now much easier to consult the oldest forms of humanities scholarship instead of the more recent and fully engaged varieties. It is easier to access work from the 1890s, imbued with all the contemporary relevance of the long dead, than it is to use that of the 1990s.
Without serious intent and political will - a determination to digitise the more difficult forms of the non-canonical, the non-Western, the non-elite and the quotidian - the materials that capture the lives and thoughts of the least powerful in society - we will have inadvertently turned a major area of scholarship, in to a fossilised irrelevance.
And this is all the more important because just at the same moment that we have allowed our cultural inheritance to be sieved and posted in a narrowly canonical form; the siren voices of the information scientists: the Googlers, coders and Culturomics wranglers, have discovered in that body of digitised material, a new object of study. All digital texts is now data, that date is now available for new forms of analysis, and that data is made up of the stuff we chose to digitise. All of which embeds a subtle biase towards a particular subset of the human experience. Using measures derived from Ngrams, and topic modelling, natural language processing, and TF-IDF similarity measures; scientists are beginning to use this text/data as the basis for a new search for mathematically identifiable patterns. And in the process, the information scientists are beginning to carve out what is being presented as 'natural' patterns of change, that turn the products of human culture into a simple facet of a natural, and scientifically intelligible world. The only problem with this is that the analysts undertaking this work are not overly worried by the nature of the data they are using. For most, the sheer volume of text makes its selective character irrelevant.
But if we are not careful, we will see the creation of a new 'naturalisation' of human thought based on the narrowest sample of the oldest of dead white males. And to this particular audience I just want to suggest that we need to be much more critical about what it is that we digitise; what we allow to represent the cultures libraries and collections stand in for; and that we need to engage more comprehensively and intelligently with the simple fact that we are in the middle of a selective recreation of inherited culture.