
The future of voice recognition systems
![]() |
Professor Robert Dale |
Ever been frustrated while talking to a machine over the phone? A human/machine interaction expert explains why this happens, and takes a sneak peek into the future.
Professor Robert Dale, Director of the Centre for Language Technology at Macquarie University, believes most voice recognition systems are actually more effective than people give them credit for.
"One system that got a lot of bad press about five years ago was the Telstra directory assistance system," he recalls. "The application did exactly what it was supposed to do - it was only ever meant to recognise the top 2000 high-frequency names, which account for 10 per cent of the calls they get, and it did that very well. But the whole PR side of it was managed very badly, and so when the other 90 per cent called up and asked for something they were told 'sorry...' and got really frustrated. Some people think that put speech technology back several years in Australia."
Keep it simple
Dale says that while technology has moved on a bit since then and voice recognition systems are now used for a variety of purposes, they still struggle to deal with complicated enquiries.
"These systems work great where there are simple tasks where relatively little can go wrong, but work less well where the task is more complex," he says. "For example, I did some work on an ordering system for a local pizza company last year. The initial intention was to have a voice driven system where you could order absolutely anything on the menu. Some people used it very successfully, but when you look at what's involved in ordering a pizza, with extra toppings and half-and-halfs and all these sorts of things, it's actually incredibly complex."
Taxi!
Dale is a big fan of the taxi booking systems currently used commercially because they avoid the very difficult task of having to recognise every single street address in the state.
"I think the taxi booking thing is great, I use it all the time, it works incredibly reliably and it's really quick," he says. "It just asks me whether I'm leaving from such and such an address - which it knows because it's done a reverse number lookup - and then asks me what suburb I'm going to. It doesn't ask me what my destination address is because that would be too hard to do and because the taxi driver taking the hire asks for the address anyway."
Future improvements
Dale believes that in the next decade or so, voice recognition systems will improve in a number of key areas. Firstly, they will get better at dealing with 'difficult' calls. If they are unsure what the caller said - for whatever reason - they will have a better strategy than just asking the person to repeat themselves over and over again.
Secondly, they will increase personalisation by choosing questions based on your previous interactions. For example, if the last 10 times you booked a taxi you went to the airport, wouldn't 'Do you want to go to the airport?' be a logical and potentially time-saving first question?
But that's just the beginning. Dale says that within a decade you may be able to talk directly to a human-like agent via your camera-enabled phone or on the Internet via your computer, which is able to not only listen to what you are saying, but also to read your lips for greater word recognition and to analyse the speed and emphasis of your speech in order to interpret your mood.
To find out more about postgraduate study within the Centre for Language Technology visit www.clt.mq.edu.au

