On listening to text-to-speech in 2015
Starting with the Kindle 2, Amazon provided text-to-speech with their EBRs (E-Book Readers).
Text-to-speech is software which reads a book out loud to you.
It’s very different from an audiobook, which has been recorded.
That matters, because creating an audiobook clearly falls under the rights of the rightsholder of the book (initially, the author), while text-to-speech is more like increasing the font size…it’s just a way to access the material, without creating another copy (since TTS is “streaming”, ephemeral).
Ever since the K2, I have listened to TTS typically for hours a week in the car.
It’s my preferred audio in the car…I like it a lot better than talk radio, or music. I’m also not a fan of audiobooks, unless I’ve already read the book. I don’t like the reader (be it the author or an actor) interpreting the characters for me.
TTS has improved a lot since the K2!
I created a thread in the Amazon Kindle forum (six years ago today!) pointing out some of the
That’s what I called the quirky things about the voice, which was then known as “Tom”.
Almost all of those are fixed now.
Ivona, which we have now, has inflection. For example, it uses the appropriate rising inflection to indicate a question.
You can read more about the process of how it’s done in this
It almost always pronounces things correctly, now.
One problem it still has is with homographs (words that are spelled the same but mean different things). For example, I was listening today, and a character left the room with a bow. You know that should rhyme with “now”, but the TTS read it as rhyming like “know”. In other words, it sounded like the person left with a package decoration, rather than inclining at the waist.
I find it also misses on “wind”. A road might be “winding”, not rhyming with “finding”, but sounding like it is blowing a breath.
One more I hear quite a bit: it makes the wrong choice on “wound”. It generally pronounces it like the injury, rather than rhyming it with “found”. So, saying that a scarf was wrapped around someone may make it sound like it took a bite out of them.😉
One other odd one: it pronounces “lower” to rhyme with “flower”, not “grower”. Of course, try to explain to a non-English speaker how we pronounce “flower grower”, and make English sound logical!
However, it’s generally very impressive.
In a book I’m reading now, for example, it correctly pronounced Edinburgh…not ending like Pittsburgh, but ending in two syllables, sort of like a New York borough, but softer.
This book is
which was recommended to me by one of my regular readers and commenters, Lady Galaxy.
It’s part of
and I was ready to start another book, so I’m reading it now.🙂
I am typically reading several books at the same time, which is true in this case, but I also usually have a main one for the commute…and I moved this one up in the list.
One interesting point is that there is a lot of dialect in the book, as Lady Galaxy pointed out to me.
I don’t at all know if it’s accurate, but it’s intended to represent a particular Scotch dialect.
For example, here are a couple of lines:
“It winna dee ye ony good, it disna ring. The salt fae the sea ruins the wiring, fast as I fix it.”
Without that dialect (and it refers to a doorbell), it would read, “It wouldn’t do you any good, it doesn’t ring. The salt from the sea ruins the wiring…”
How did TTS handle it?
About the same way most people would, I’d say. I didn’t have any more trouble understanding TTS speaking it than I would have sight reading it, I believe.
That also impresses me.
it was quite baffled by a person’s name, “Jacoline”. English speaking people would read that as very much like Jacqueline, or Jacklyn…it read it more like it, “Jack OH lyn”, something like that.
Generally, though, I think most people are surprised at how good it is.
Our devices are becoming much more conversational, both in how they speak and how they listen.
I am disappointed, honestly, that the currently available non-Fire EBRs from Amazon don’t have sound at all…which means they don’t do TTS (or music or audiobooks).
I’m guessing it makes them cheaper and more reliable, and perhaps lighter. It’s possible that some people even told Amazon they preferred it, because they found music a distraction…don’t know about that.
I’m listening to TTS on my
which is also why it can use the text-to-speech software it uses.
Eventually, I think we will get a non-backlit EBR with TTS again.
After all, everything may start speaking. It may be like the toaster on Red Dwarf, or the talking bomb in the now obscure John Carpenter movie,
It seems unlikely to me that my toothbrush will talk to me, but my books won’t.😉
What do you think? Do you use TTS? How do you feel about the Voyage, for example, not having it? Does it throw you off when it mispronounces something, or are you able to let it go? Does it affect your understanding? My guess is that I’m unusually well able to cope with the mispronunciations, but I haven’t seen studies. Feel free to tell me and my readers what you think by commenting on this post.
Join thousands of readers and try the free ILMK magazine at Flipboard!
* I am linking to the same thing at the regular Amazon site, and at AmazonSmile. When you shop at AmazonSmile, half a percent of your purchase price on eligible items goes to a non-profit you choose. It will feel just like shopping at Amazon: you’ll be using your same account. The one thing for you that is different is that you pick a non-profit the first time you go (which you can change whenever you want)…and the good feeling you’ll get. Shop ’til you help!
This post by Bufo Calvin originally appeared in the I Love My Kindle blog. To support this or other blogs/organizations, buy Amazon Gift Cards from a link on the site, then use those to buy your items. There will be no cost to you, and a benefit to them.