Language originates as brain signals — mysterious lines of squiggles — that somehow turn into speech. Meet the neuroscientist who is turning those squiggles into conversations, using artificial intelligence (A.I.) to translate brain activity into words and sentences. Dr. Edward Chang of UCSF talks with Dr. Stieg about the painstaking “magic” of decoding that has allowed a paralyzed man to speak after 20 years of aphasia, essentially live streaming signals from his brain and transforming them into language. Plus – Why are A.I. voices always female?
Phil Stieg: Hello, I’d like to welcome Dr. Edward Chang, professor, and chair of neurological surgery at the University of California, San Francisco. Dr. Chang’s team has been in the news lately for their extraordinary breakthrough – enabling Poncho, a paralyzed patient, to communicate through a computer connected to sensors directly embedded in his brain.
We’ll explore with him the decade of work that let up to this achievement, and the larger implications this work has for our understanding of language, computers, and what it means to be human.
Eddie, thank you for being with us today.
Edward Chang: Good morning. Hi, Phil.
Phil Stieg: I understand it you started implanting electrodes to record patient’s brains as a part of treating them for epilepsy.
Edward Chang: That’s right. So I specialize in a kind of neurosurgery to take care of people who have uncontrollable seizures. These are devastating conditions. And in many cases – the vast majority of cases – surgical removal of the part of the brain that is causing the seizures can actually stop the seizures, in many cases, cure it. The trick there is figuring out what part of the brain is causing them. And a lot of patients nowadays, especially have brains that look normal on the MRI but really aren’t. So we actually have to implant electrodes directly onto the brain surface to figure out where the seizures are, but also to figure out the parts that are important for speech and language so we can protect those areas as we plan a surgery. So it’s in this context of actually taking care of patients who have volunteered to help with these scientific studies that we’ve made this progress.
Phil Stieg: In your writing, you used the term neuro prosthesis. When you’re referring to a neuro-prosthesis, what is that? Is that the grid? Is it the computer? Is it the whole thing, or is there something else yeah.
Edward Chang: Well, neuro-prosthesis is a term that has two parts in it. The neural, of course, refers to the nervous system. Prosthesis refers to a device that has some kind of rehabilitative function to replace a loss function. A speech neuro-prosthesis is a device that’s designed to restore communication and speech functions to someone who’s lost that. It basically refers to not just the hardware, meaning the electrode grid and array that’s implanted on the brain surface, but also all of the algorithms that go along with it to translate those signals into something useful. So it really refers to the whole system. Not just the device itself.
Phil Stieg: And the external device that artificial intelligence decoder. The computer, I presume it has two functions, which is, as you said, the magical component of this, as I gathered it’s, the voodoo where it is able to interpret the electrical activity of the brain, decode it and translate that into the motions that you and I make and take for granted. And then apparently what it also synthesizes that information into something that we can read on the screen as word output?
Edward Chang: That’s right. Yeah. So it’s basically translating the commands from the brain to the vocal track, which in someone who is paralyzed, are not connected, let’s say, because of a stroke in the brain stem to the parts that are in the throat and in the vocal tract. So we’re translating those signals into words themselves, and we actually use another form of AI, or natural language processing, which is called a language model. That is a statistical model of language that allows us to look at the context word by word and make sure that the spelling, the word order, that all of it makes sense. Many of you have experienced when you text and your cell phone is correcting, the spelling that is based on what we call language model. And we use that technology as well, because the decoder is not perfect. It is magical, but it is not perfect. And when errors come up, we can use the context of the surrounding words to actually update and correct the words in real time.
Phil Stieg: The thing that I find most astounding about this is the fact that it’s an engineering feat. And I don’t want to turn this into an engineering conversation. But how hard was it to take that electrical information that you were acquiring from those grids and turn it into or decode it into a speech format so you understood electrically what was going on in terms of the movement, of our tongue, our lips, and our oral pharynx?
Edward Chang: Yeah, I love that question, Phil, because in many ways, it does seem like magic. It’s not comprehensible. If you were to look at the signals that were actually recording from the brain, all you see is lines of squiggles. And essentially what the computer is doing is translating that and very, very minute information from that form into words and ultimately sentences. So you’re absolutely right. It’s just an extraordinary transformation of information from one form to the other. Now, that being said, that is also the same magical thing that our nervous system does every day. And it is a miracle and magical, actually, to see that happen for people who are normally speaking as well. So if anything, what we’re doing with the AI is trying to recapitulate a little of what we already do.
Phil Stieg: Again, to enlighten people about how difficult this must have been, you and I speak apparently about 150 words per minute. New Yorkers do 200 words per minute. And Poncho, as I understand it, can do 15 words per minute. And you started with a vocabulary of 50 words.
Edward Chang: That’s right.
Phil Stieg: What was the process? How did you train him? He couldn’t talk. He couldn’t respond. What did you do to get to those 15 words? And how long did that take?
Edward Chang: Yeah. This is remember, just the starting point. It was the first time it was done where we were trying to translate someone’s brain activity into full words. And there’s a lot of room for improvement on technology. But we did start with a dictionary of 50 words that he and our research team worked on for quite some time to define what that vocabulary would look like.
Then what we did was we took those words and we built this language model that I just described by asking thousands of people through Amazon Turk to come up with all the potential sentences that you could make with those 50 words. It turns out there’s thousands. And this is why language is very special. We can flexibly, combine and recombine words and sequences to generate nearly an infinite number of meanings with our vocabulary. With 50 words, you can generate thousands of sentences and thousands of meanings. And once we had that language model, once we had those words, then we had to roll up our sleeves and work, you know, really patiently with Poncho, who’s extraordinarily committed individual to train the AI the algorithm to match what he was intending to say with those words with the actual words themselves. And that’s a process that took over 80 research sessions, dozens of hours just to train the algorithm. It’s a very complex one. And so the magic that we’ve talked about, it’s not for free. It’s something that takes a lot of work to figure out.
Phil Stieg: Well, I think anybody who has worked with a dictation device on their computer, remember when we started a couple of years ago, how many errors there were now it’s gotten quite good. But just training a computer that was set up for taking dictation is difficult. So I can imagine this is ten times more difficult.
Edward Chang: It is probably a lot more than ten times. But nonetheless, I have to say that things are moving fast. Like, we could not have done this even five years ago, to be honest with you, because the algorithms for machine learning are evolving so quickly and they’re so powerful. And these improvements that you’re talking about, in dictation systems, actually are tools that we’re using as well for this, and they did not exist five years ago. We really are undergoing a revolution in how to process speech by machines.
Phil Stieg: Could you briefly describe what a training session entailed?
Edward Chang: Sure. When we got started with this entire thing, it was very simple. He would see a word on the screen, and he had a countdown which queued him when we wanted him to try to say the word. And of course, at best, what he could produce were just maybe guttural sounds or moans, groans again, he had a very severe form of paralysis with no intelligible speech. And so that’s what it was like. He would be looking at a computer screen. He would try to say the word when it was queued on the monitor. And Meanwhile, we’re streaming the data from his brain millisecond by millisecond, a ton of information that’s coming off. And once we have that and he’s repeating words maybe hundreds of times. Then we train the machine learning algorithm to recognize the pattern for a given word. And then we switched to a different mode. Once the machine learning was trained, we could go back to him and do this in real time. So at first he was just collecting data. Then we moved to the phase where we had a trained algorithm, and then he could actually see what was being decoded by the machine in real time.
And they did take a lot of work to figure out how to do this in real time. It’s not a trivial feat because we knew that we could do this given a couple of weeks or months to process the signal, but now we had to figure out how to do this all within milliseconds of actually decoding the words. And so that was really the early process, a little bit boring, to be honest with you, and painstaking, but it did work. It’s a reminder to us of how much we take for granted our ability to express ideas fluently.
Phil Stieg: Yeah. Incredible. Do you anticipate down the road? We’re all familiar with Alexa. Alexa do this, do that. Is that kind of where we’re going with all of this?
Edward Chang: A lot of related technology. A lot of the AI that is there for communication systems that include speech recognition, natural language processing, these are tools that we are using. We are watching closely and actually now participating with companies that are really evolving these technologies to new levels. Because we do think it’s relevant. These are not what they were intended to do to decode brain activity, but they certainly have a lot of power to do so.
Phil Stieg: Does Poncho hear a voice, or is it just the word, the words that are seen on the screen when he’s communicating?
Edward Chang: Right now, we are just working on words on a screen that are decoded from his intentions to speak. We are working right now on speech synthesis. So that goal is, I think, harder for various reasons. But our goal really is to create an interface that he can speak, like the way you and I are doing right now.
Phil Stieg: So let’s talk about that. I can’t imagine what it would be like being aphasic, unable to speak for 20 years. All of a sudden, I get hooked up with these scientific Wizards, and now I’m communicating. What was the emotional and physical response that Poncho had to all this?
Edward Chang: As I alluded to earlier, the progress initially was very incremental and required a lot of training upfront. And as you can imagine, the trials were incorrect, more than correct in the beginning, probably even chance level. But the beauty of it is that the algorithm learns and it improves over time.
I clearly remember the sessions where it seemed like there was what you and I would describe as an epiphany, that it was starting to do things reliably. And the words were coming out, what he intended, and we knew it. Even though he couldn’t say it, we knew that was what he was feeling, because after some of these correct trials, he would giggle and he would shake actually in his chair because he was giggling and so happy to see that happening. The thing is that sometimes it was like actually even disruptive because when he would giggle like that and it actually caused the next word to be decoded wrong. So we actually had to instruct him to control his emotions a little bit and slow down because it was causing interference and artifacts in the signal. But overall, of course, that was a wonderful thing to do because that’s what the whole project was about, was restoring this, and it was very emotional.
Interstitial Theme Music
Narrator: The technology of A. I. – Artificial Intelligence – has made enormous advances in recent years. Nearly every computer or cell phone is installed with some version of an A.I. assistant. One engineer at Google has gone so far as to claim that one of their A.I. programs has achieved sentience and become a conscious entity. While top executives deny this claim, it does beg the question – Why are so many A.I. voices always female?
Siri: I’m Siri, your personal assistant. Is there something I can help with?
Narrator: Whether they are given decidedly feminine names like Siri or Alexa …
Alexa: While there are many virtual assistants out there, I try to be the best…
Narrator: Or an intentionally non-gendered name like Google Assistant…
Google Assistant: I have a wealth of information. How can I help?
Narrator: The preferred voice for these devices is almost always female.
Sfx: mother singing lullaby
Some researchers have suggested that a biological preference for female voices begins when we are fetuses, as these sounds would soothe and calm us in the womb.
Sfx fade out
The late Stanford professor Clifford Nass had a more sociological explanation, based on deep-rooted cultural sexism. He reasoned that “people tend to perceive female voices as helping us solve our problems, while they view male voices as authority figures who tell us the answers to our problems. We want our technology to help us, but we want to be the bosses of it, so we are more likely to opt for a female interface.”
Interstitial Theme Music
And one last theory posits that we avoid giving computers male voices because of our memories of “HAL“
Sfx – Excerpt from 2001:
Open the pod bay doors HAL … I’m sorry Dave, I’m afraid can’t do that… This mission is too important for me to allow you to jeopardize it.
Narrator: The smooth-voiced homicidal computer from “2001 A Space Odyssey” may have put us off the idea of talking to a male computer ever again…
Phil Stieg: Eddie, can you tell us how is the brain similar or different from the computer that you’re teaching in A.I.?
Edward Chang: Oh, wow. That’s I think, very important question that we think about every week because we try to come up with models that can explain brain activity, and a lot of those are what we call computational models. The brain and computer nowadays are both information processors. They both use electrical activity in the brain. It’s in the form of what we call action potentials. These ways that we transmit information and process it and transform it. And computers do the same. I think nowadays a lot of the innovations that we’re seeing A.I,, deep learning, for example, are inspired by the architectures of the brain. And now we are also learning that some of those architectures that have been developed in computer science actually do a really good job at explaining brain activity, which means that we might be at this point where we can appreciate that there’s convergence, actually, between what’s happening in computer science and neuroscience, which is a very exciting, provocative idea. But I think that’s where the similarities almost end. And there’s so much more that let’s say a computer can do that we as humans cannot in terms of the precision, in terms of the scalability.
But there is something about the human brain that is so adaptive, that is so intelligent, that goes way beyond pattern matching that the best computers are doing right now creativity, all of those things can be emulated. Some of it can be emulated, perhaps by a computer, but it’s not the way the human brain works. And I think about that question every week when I go to the operating room. I think there are parts that do language, there are parts that control our hand, our emotions, et cetera. It’s extraordinary to think about the brain as a computer, though. Phil, you and I know that it certainly doesn’t look like one. The fact that what we see and what we feel with our fingers is extraordinary, but it’s not a computer.
Phil Stieg: You and I both have been to too many A.I. meetings or lectures where the individual is telling us that it’s going to take over our life by 2040. You know, you and I are going to be irrelevant. Give us a sense, from your point of view, the pace at which A.I. is going to become an important part of our lives. I mean, it already is to an extent with Alexa and things like that. But the things in terms of taking over what humans do on a routine basis.
Edward Chang: Well, I think that right now we are seeing this explosion in the application of A.I. The fundamental principles of it are relatively old now, even decades. It’s really the availability of data and new computers, basically, that have enabled this rapid explosion. But the fundamental aspects, I think, are still relatively impoverished compared to what our brains really do. So I think that what we’re seeing is an explosion in applications to new problems where you’ve got to transform one data to another form that’s very complex, like, for example, from brainwaves into speech or words. That’s perfect because of the pattern recognition problem that a computer can solve.
The bigger challenge that I see is what some people refer to as “general artificial intelligence”, the thing that emulates what we as humans do, that gives us common sense, that gives us a lot of the intuition that allows us to make inferences about the future and how we make decisions. That I think is farther off. I think that for sure is on the horizon and coming, but in terms of completely eliminating, I think that we need that next big conceptual breakthrough about how to design these machines.
Phil Stieg: We’re a long ways from having our frontal lobes taken over. (laugh) You mentioned it earlier. I really wanted to touch on this. You’re in the thick of it, but I’m curious about what your thoughts are on the ethical components of number one, what we’re doing now and where this might go. What are your thoughts on that?
Edward Chang: Well, I think it’s very important to at least have a dialogue about what are the implications. Right now, we are working on decoding what someone is trying to say from the part of their brain that controls the vocal tract. So that, by definition, is what someone is intentionally and volitionally trying to communicate.
Assuming that you believe that the brain is the source of our thoughts, in theory, it is possible in the future that we will have technology that will allow us to that will tap into eavesdrop, into our inner thoughts and in private thoughts. I think the key questions that are going to be about that have to do with privacy, they have to do with autonomy, meaning to what degree do we as individuals have control over that versus, let’s say, a company that makes a device that can decode that information towards what applications? These are questions that are first and foremost in our minds. And I don’t think that realistically. We’ve gone there yet, but it will happen someday, and I think we need to be prepared for it.
Phil Stieg: In reality, what A.I. in the brain how they’re interacting currently is in speech function, motor function, visual functionality, and things like that. And as you said just a moment ago, we’re a long ways away from a computer understanding our frontal lobes, our judgment, our emotion, and how we integrate things. So probably won’t happen in our lifetime, although it would be exciting if it did.
Edward Chang: Yeah, I agree. I think that there’s some really fundamental things that have to be addressed, and this is not an easy problem, by the way. It’s not like you can just record from one brain spot and understand the way the brain works is you and I know so well. And the only reason why people can do as well as they do after brain tumors and epilepsy surgeries is because there’s this massive redundancy and it’s extremely distributed, this information. No one brain area stores all of that kind of information. And there’s no technology, actually, that can then process that and the resolution you would need to do it. I do think that it is far off, but I do also think that it’s possible someday.
Phil Stieg: Yeah, I do, too. But it’s also not only is there no one brain area that we can map, but also your brain will map slightly differently than mine. So if we’re talking about mass production of these things, in my mind, it just becomes very overwhelming.
Edward Chang: Absolutely. I think that there’s no question that every technology in this space has to be customized for an individual because that’s who we are, and our brains are the biggest expression of that individuality.
Phil Stieg: You wrote that you anticipate being able to apply this to people that were born without the ability to speak, say, in cerebral palsy or something like that. I would imagine that’s going to be more challenging. Do you think that if you were born without the ability to speak, those parts of your brain are underdeveloped? Is that what’s going to make it more challenging, or the fact that they just never knew?
Edward Chang: I think both of those are going to be factors that make this hard to predict. You know, the other side of the coin is there’s a lot of reasons to be very optimistic about it. The young brain has the ability to learn and reorganize in ways that is very hard to do, just like it is. It’s easier to learn a new language, for example, when you’re very young compared to later in life. And so instead of a new language, let’s say, what about learning to use a neuro-prosthetic device for the first time for that first language? We think that it could be very important because if a child does not have the ability to produce language and to express it, there’s other potential consequences, cognitively and beyond that may come along with that. And so allowing children who have never been able to speak to have that expressive form may actually go way beyond language. And this is, again, an open question, but I think it raised a lot of opportunities to see if this is possible, and we won’t know until we actually do it in trial.
Phil Stieg: What was the funniest moment that you had with Poncho? Did he try to crack a joke or was just his giggling something that just made all of you feel elated that you were getting there?
Edward Chang: I think that maybe one of the funniest was seeing some of his creativity come through. One of the things that shocked me was that when we came into want to do one of our research sessions, it wasn’t really about decoding his brain. But when I came in to see him and we were setting things up, he was doing an online class to learn French, and I just couldn’t believe it because when he came to the country, he only spoke in Spanish. He basically had learned English after he was paralyzed. And he never spoke English, actually, but he understands it and can write and can hear it and process the sounds. But the thing that kind of struck me about that when we observed him learning French was, I think it’s just a real testament to him and human will to overcome obstacles. It was funny, but also extraordinary and inspiring, too.
Phil Stieg: Dr. Eddie Chang, thank you so much for being with us today. You’ve made the complexity of speech understandable to us, but you also given us great hope for individuals that have lost the ability to speak or never had it that they might gain it through artificial intelligence and computer decoders. Thanks so much for being with us,
Edward Chang: Phil, thanks for having me. It’s been great. Thank you.