Unfortunately it's going to be a while yet before that kind of capability is accurate and useful. Right now the listening and transcribing isn't very accurate, not to mention it'd have to take into account background noise, distance, accent, etc. Then they'd have to use a processor powerful enough to do it all in real-time, because a lag, even a second long, would make it aggravating.
I'd estimate that it'll be another 5-10 years before something like this is feasible.