Professor Stephen Pulman
In some ways, getting computers to understand language about particular technical domains (like, say, maths or logic) – is less challenging, because those domains do have a logical structure, which you can use to guide the understanding of the text. What is really difficult are the areas where there isn’t that pre-defined logical structure, and where you and I would use our common sense and experience of the world to fill in the gaps – computers are really bad at that. We just don’t know how we do that.
What problems are you working on right now?
One of the problems that’s really interested me from the start and that I’m working on at the moment is the problem of inference and natural language.
I’ll give you an example: if you read somewhere that ‘John’s taller than Bill,’ you can conclude that Bill is shorter than John. But you can’t actually conclude that either of them is tall. If I give you an extra bit of information, if I say, ‘Bill is tall, and John is taller than Bill,’ then I can conclude that John is tall. It follows logically from the other statements. So that inference is an example of a process where you’ve deduced something from the grammatical and semantic structure of language.
There’s a separate question of how I know whether that’s actually true in the world. Well, I have some idea of what the standards of tallness are for people, which can vary: what’s tall for a basketball player is quite different from what’s tall for you and me – and that’s different from what’s tall for a tree or a building. The process of trying to write programmes that will automate those inferences is really interesting to me, and it also has lots of practical applications. If you take Siri, at the moment the way that works is by trying to find some sentences or some text that contain the answer to your question directly – they don’t use much by way of inference. It would be much more useful if they did! If I were to ask: 'Is President Obama tall?', and somewhere or other it could work out that Clinton is tall but Obama is taller than Clinton, then it can tell me that, yes, Obama is tall even by the standards of US presidents - that would be really exciting.
Getting a computer to tell me something I don’t know would be really exciting. And not just to tell me, but to be able to explain it: to teach me.
Computers are really good at some things and really bad at others. What they’re really good at is putting together bits of information from different sources: like numerical information or stuff from databases. What I’d really like to be able to do is to have a machine that could go off and read normal human text, understand it and then be able to explain things about what it had read to me, when questioned. It would have to be able to figure out the basics of what it had read, without me necessarily having understood it. So getting a computer to tell me something I don’t know would be really exciting. And not just to tell me, but to be able to explain it: to teach me.
What needs to happen for that to be a reality?
Lots of things! There’s the basic ability of machines to understand English, which is very far from perfect. We can do it to a certain extent. But to actually have a machine go off and read, say, a biology textbook and then understand the content of it properly – we’re a very long way away from that.
What is really difficult are the areas where there isn’t that pre-defined logical structure, and where you and I would use our common sense and experience of the world.
In some ways, getting computers to understand language about particular technical domains (like, say, maths or logic) - is less challenging, because those domains do have a logical structure, which you can use to guide the understanding of the text. What is really difficult are the areas where there isn’t that pre-defined logical structure, and where you and I would use our common sense and experience of the world to fill in the gaps – computers are really bad at that. We just don’t know how we do that.
Some simple examples would be things like resolving linguistic ambiguities. For example: ‘The porters refused admission to the students, because they advocated violence.’ So the question is, who does ‘they’ refer to there? 'They' would, very likely, be the students. If I change it to ‘The porters refused admission to the students because they feared violence,’ in that instance, ‘they’ is much more likely to be the porters. And we can deduce that from the sentence. But that deduction process is – at the moment – completely mysterious; we just don’t know how to make a computer simulate it. That common sense reasoning that goes into resolving ambiguities and almost every aspect of language processing is the most challenging aspect of language understanding.
What’s at the bottom of humans being able to interpret sentences so much more easily than computers?
I think it’s bigger than logic: when we’re unconsciously making all those decisions, it's because we’ve grown up, we’re embodied, in the world – connected with things and we just learn that stuff as part of what is to be human and a member of a family and be part of a society. You couldn’t write it all down in a computer program, because they’d still be missing something vital about that kind of interaction with real world objects and people and figuring out their intentions and so on. We lay down these enormous funds of memories, and our ability to judge similarities when we’re in a particular circumstance is down to our unconscious recognition that we’ve been in a similar situation before and from there we can deduce what is likely to happen. That kind of human experience is something we don’t know how to codify and give to computers.
Really, in order to get a computer to emulate that kind of understanding, you’d probably have to get it to grow up like a person – to actually be physically embodied and interact with things.
You’re also a co-founder of TheySay; could you tell me more about that?
TheySay came out of work done with a DPhil student called Karo Moilanen, who’s another co-founder. He came to me because he’d been working in a company on ‘sentiment analysis’ (which is the process of discovering positive and negative attitudes towards things in text). Most people working in sentiment analysis treat it as a machine learning problem: you get a corpus of texts, some of which are labelled positive, some which are labelled negative, and then you train the machine to distinguish the two. Karo was interested in some of the fine-grained linguistic properties that those approaches got wrong. So an example would be something like ‘kill’: in isolation that has negative connotations; ‘bacteria’: again negative connotations. ‘Killing bacteria’: that’s positive, ‘failing to kill bacteria’: that’s negative again, ‘never failing to kill bacteria’: positive again.
It wasn’t just that there were more positive words than negative words, it was to do with the grammatical structure and the semantic structure of the sentence.
So that’s interesting because you’ve got ‘never’ (which on its own is probably negative), ‘fail’ (negative), ‘kill’ (negative), ‘bacteria’ (negative). But put together they become positive. We got interested in how that happened. It wasn’t just that there were more positive words than negative words, it was to do with the grammatical structure and the semantic structure of the sentence.
Karo built a system as part of his DPhil to test this on a large scale; he developed a database with about 80,000 items in it, covering all the most frequent grammatical and lexical phenomena in English. Karo built this system and it seemed to work pretty well, and we demonstrated it at a departmental research day to people in industry. People became interested in it, and we started the process of commercialising it. We went through Isis Innovation and got some investment from a company called IP Group (one of the major investors in spin-outs from Oxford). We started this process in 2009, incorporated in 2011 and got the investment in early 2012.
We have two basic products: there's an API which you send text to and it analyses it and you get the result of that analysis back in a report. The second product, ‘MoodRaker’, is a dashboard. MoodRaker allows you to choose topics you’re interested in – for example, ‘Oxford University’ – you decide what text sources you’re interested in (blogs, RSSs, tweets, news sites etc) and the system then starts monitoring mentions of that and categorises and annotates them in various ways: for example, by sentiment.
We also tried to look for other things, like emotion (fear, anger, surprise). It’s particularly interesting to track emotion in political discussions. We also try to look out for sarcasm, we try to detect advertisements (because of course they’re always positive, which is going to skew your results), and we try to do some ‘topic detection’: is this a text about sport or finance or politics? We’re also beginning to look at demographic properties: gender, age and political orientation of the people writing the text.
What type of content are most of your clients interested in analysing with your tools?
It varies. Sometimes it’s social media. A lot of our clients are in the healthcare monitoring area, so in these cases there are various websites (for example the NHS) which people use to record their experience as a patient. But it can also be company reports, press reports.
A lot of the things we’ve done recently for fun are around politics. We were tracking what people were saying about the parties or people in, for example, the Corbyn campaign, the Scottish independence referendum and the general election. It’s very interesting, but we’re never trying to predict the outcome: you can’t assume that the people on social media are representative of the voting public (in fact quite the opposite in the case of the Scottish referendum – the independence campaign absolutely dominated social media, despite losing in the actual vote).
Do you have to analyse British English differently to American English (‘not bad’ might mean really pretty good in the UK but is more likely to be a more indifferent sentiment in the US)?
We do calibrate for whether we’re processing American or British text, but the more difficult thing is to take into account the context in which particular words are being used. So if you’re talking about beer, then ‘cold’ is good; but if you’re talking about ‘coffee’ or ‘food’, it’s not. Then there can be differences depending on who’s producing the text. So if I describe something as ‘sick’, that’s not good. But if my son describes it as ‘sick’ – that’s a good thing.
It’s also very difficult to determine who the target audience is… the intended positivity or negativity can depend on the interests of the readers. If I read about oil prices coming down, I’m probably pleased about that, but if I’m an investor or the government (who want to get taxes from oil) then that could be perceived as negative. We’ve only got the text; we don’t know about the readers’ preferences or prejudices.
What would you ultimately like to consider the final legacy of your research to be?
It’s so nice to see your students all over the place. I’ve got former students in universities here, in London, in the US and in companies. It’s getting to the point where I see students of my students!
In terms of research, I’d hope people continue to use and develop my work, particularly where I’ve been able to combine linguistics with computing or logic, and I hope TheySay continues to be successful.