Institute of Historical Research
Text Mining the Old Bailey Proceedings
Discussion
Professor Tim Hitchcock
(Hertfordshire)
The Old Bailey Online is probably one of the most successful web-based projects produced in Britain thus far. Based on the proceedings from London’s central criminal court this is a fully searchable edition containing some 197,745 criminal trails detailing the lives of non-elite people. One of the originators of the project, Tim Hitchcock is looking at how to use text mining tools to examine the proceedings and discover new things about them. Text mining is the derivation of meaningful data from a large body of unstructured data, using automated methods to reveal structure and associations. Through text mining Hitchcock is able to compare patterns of persecution over time and further examine changes in court behaviour and procedure.
Did you know, for instance, that the shortest trial on the Old Bailey proceedings is just eight words in length whilst the longest is 320 pages and over 150,000 words? Hitchcock believes that previous attempts to average trial lengths per year to show trends disguises the mix of long and short trials contained within each year and also the fact that the accounts are not entirely complete, that some trials are purposefully reduced in length for very interesting reasons. Through text mining Hitchcock shows that changes in the nature of the jury trial (and which trials would reach a jury) are vital to understanding the trends especially when looking also as the number of non-guilty verses guilty pleas and verdicts. Hitchcock argues that plea-bargaining became increasingly important.
At the heart of Hitchcock’s paper is an argument that data/text mining represents the beginning of a new methodology for historians studying data and that we are very much at the beginning of an exciting process of using digital tools for new historical research. All we have to do is rise to the challenge.
Digital History seminar series