Benson: Copyright Chat is a podcast dedicated to discussing important copyright matters. Host, Sara Benson, the Copyright Librarian from the University of Illinois, converses with experts from across the globe to engage the public with rights issues relevant to their daily lives.
Welcome to Copyright Chat. Today we have Peter Murray-Rust, a researcher from Cambridge University. He’s visiting me live today in my office. Welcome, Peter.
Murray-Rust: Hi there. Thanks very much, Sara.
Benson: Thank you for coming. So you’ve done some really interesting work with open access. You’re kind of, I would designate you an open access champion.
And I think one of your most interesting projects, at least to me, has been your content mining project, and I thought maybe you could talk a little bit about that, what the impetus for it was, and what kinds of projects people can do with it.
Murray-Rust: Right. So more general than open access. I’m an open advocate on many fronts: open source for code, open data for experiments and other types of data that’s collected, open access for access to the literature, and always of reducing the friction from going from one place to another when we’re transmitting knowledge and creating value as people receive knowledge and aggregate it and filter it and so on.
Benson: And you are very passionate about this. I saw you speak at IFLA, and I remember distinctly you saying to folks, you know, who’s a student in the room, you’re the future of open access, and I thought that was really inspirational. So your content mining software, it’s an open source software, is that right?
Murray-Rust: That’s right, yes.
Benson: And what can people use it for, and what what led you to develop it?
Murray-Rust: Ok, so the software is developed for a technical purpose, and it’s developed for a political purpose. So the political purpose is that all published scientific knowledge should be available to everybody on the planet. So I call it liberation software, software whose job is to liberate knowledge, and make it to widely available, and the technical purpose is to be able to read any paper in the literature, and turn it into what we call semantic form. Semantic means that machines can understand it. That means that the words in the paper, they know how to process. If you put in something like Anopheles, it translates it into a tropical mosquito, for example.
Benson: Oh, ok. So what kinds of projects have your software been used for to date, and what do you see the future of your software use being, I guess?
Murray-Rust: Well, our vision is we want to give every reader in the world software which can help them read the scientific literature. There’s about probably five papers published each minute in science, so no way can humans keep up with it without using machines. So the first job of the machines is to find this, the papers that people want to read. So it searches repositories. It usually comes back with far more papers than people want to read. So the next part of that is to filter it so that they find the papers which are most interesting to them for whatever reason, and each reader is different, so that each reader will have a different set of filters that they apply so they find the papers of interest to them. So shall I give an example?
Benson: Sure.
Murray-Rust: Ok. Well, our current example is malarial mosquitoes, right? People are interested in this for many reasons. They want to find out where the mosquitoes are, what they feed on, how they spread disease, how people can control them, how successful it is, what the politics of malaria are, and so forth. So every reader will have a different view on this, and one person would want to look at the mating of mosquitoes, another would want to look at insecticide resistance, another would want to look at the eradication programs and how successful they had been and so forth. So we’re trying to create an environme