Why Siri had to start in beta

Bashing Siri, the iPhone 4S virtual assistant, seems to be fashionable these days.  Mat Honan declares it “Apple’s broken promise“. CNN reports on Siri’s alleged anti-abortion bias (via Danny Sullivan). Colbert weighs in. John Gruber remarks how weird it is for Apple’s flagship new product to be “so rough around the edges”, yet notes that it will be easier to improve voice recognition while it’s being widely used.

It’s not just easier, it’s the only way!

I worked on speech recognition with IBM Research for nearly six years. We participated in DARPA-sponsored research projects, field trials, and actual product development for various applications: dictation, call centers, automotive, even a classroom assistant for the hearing-impaired. The basic story was always the same: get us more data! (data being in this case transcribed speech recordings). There is even a saying in the speech community: “there is no data like more data“. Some researchers have argued that most of the recent improvements in speech recognition accuracy can be credited to having more and better data, not to better algorithms.

Transcribed speech recording are used to train acoustic models (how sound waveforms relate to phonemes), pronunciation lexicons (how do people actually mis-pronounce words, specially people and place names), language models (spoken phrases rarely conform to the English grammar), and natural language processors. And that for each supported language! More training data means the recognizer can handle more variations in voices, accents, manners of speech, etc. That’s undoubtedly why Nuance for example offers a free dictation app.

It is tempting to consider Siri as some kind of artificial intelligence, who, once trained properly, can answer all sorts of questions.  The reality is that it is a very complex patchwork of subsystems, many of which handcrafted.

To improve Siri, engineers must painstakingly look at the requests that she could not understand (in all languages!) and come up with new rules to cope with them. There are probably many, many gaps like “abortion clinic” in the current implementation, which will be fixed over time. When Apple states “we find places where we can do better, and we will in the coming weeks”, they are plainly describing how this process works.

It is important to understand that unlike Apple’s hardware and app designs, Siri’s software could not have been fine-tuned and thoroughly tested in the lab prior to a glorious release. It had to be released in its current form, to get exposure to as much variability as possible all the way from the acoustics to the interpretation of natural language. For each of the funny questions that Apple’s engineers had anticipated, poor Siri has to endure a hundred others.

If the rumors of a speech-enabled Apple TV are true, then Siri will soon have other challenges. For example, far-field speech recognition is notoriously more difficult than with close-talking microphones. She had better take a head start with the iPhone 4S.

 

[UPDATE There has been a lot of interest in the article, I thought I would clarify a few things]

-I have no inside information. Everything I wrote about Siri is an educated guess based on my own experience. I may be totally wrong, and I probably missed some important parts of the story.

-I did not mean to imply that Siri’s system is rule-based. I am convinced that it relies heavily on statistical learning. But someone has to train, fine-tune, test and debug statistical algos with new data and new use cases. Sometimes you just throw in the new data and press the “retrain” button. Sometimes you have to dive in and adapt algorithms. And sometimes, in order to squeeze the last few percentage points, you may write some old-fashioned rules, like for Siri’s quirky replies.

-As a few commenters pointed out, Apple has already gathered a lot of data from the previous Siri app. I think they used it to build the best system they could, which is already quite impressive IMO. They had to release it to be able to go even further. New data brings diminishing returns: at some point, 20% or 50% more data is insignificant, you want 10x or 100x more.

52 thoughts on “Why Siri had to start in beta

  1. At the moment I find that Siri is an amusing toy when out with friends or attempting to demonstrate what it can do as the answers can be hilarious if I use it seriously I find I am quicker just going in to my contact and pressing their phone number! I am one of the people I reckon that thinks Apple has perpetrated the biggest con trick on the general public in years, and if I hadn’t been daft enough to take out a contract on an iPhone it would have been returned !

    • It works OK for me, and I know people who use it on a regular basis. That it does not work for everyone is a problem that has plagued speech recognition for 20 years.

      Still, I believe Apple and the Siri team have pulled off an impressive technical achievement, and I am pretty sure it will keep improving.

      • “That it does not work for everyone is a problem that has plagued speech recognition for 20 years.” is key. To be fair to Apple, universal speech recognition is an extremely tough problem … has not been solved by anyone.

        But, then how will Apple make Siri a mass-market app usable by everyone?

    • All “disruptive innovations” are characterized not by how much better they do the existing jobs but by how they enable NEW ones. E.g., laptops are much more expensive than equivalent desktop speed, storage, screen, etc. Siri lets you text a friend that youre running late while stopped at a stoplight, book your next dentist appointment while you’re walking out of the building, remind yourself to call tech support when it opens Monday at 9 and so much more that you can’t/wouldn’t do by a good keyboard, let alone a tiny or virtual one.

  2. I just hope Apple actually does anything at all here. The problem is that apart from hardware and the software running on it Apple always seems to fall into the “too little, too late” category. Mobile Me was a joke, iCloud is only slightly better. Apple is sitting on billions and billions of cash and still spends less for R&D (relatively) than most other IT companies.

    I hope the rumoured Apple TV will come with Siri — this could be a real breakthrough, since on a device that can afford to continuously listen to its environment (other than a smartphone that would drain its battery in no time then) you really could just speak to the room and have the thing act on it. This would of course need at least a very limited speech recognition running locally (to have the device recognize its name) since you surely don’t want to have it sent everything you say in your living room to Apple first…

    We will see. But honestly I’m not very optimistic here. Apple seems to have lost a lot of momentum lately. SJ was able to happily bet the future of the company on a single product line (like the iPhone and iPad) but I doubt very much that Tim Cook will do anything else than to save every buck he can.

    • An Apple TV with Siri would be really interesting. And even more difficult technically: you would speak into a microphone located across the room, instead of centimeters away…

      I think it’s a bit early to judge whether Apple is losing momentum.

      • That’s what you need R&D for. But with a good microphone this should be easily possible. You’d need two microphones to get some spational resolution, so you can ignore background noises, but nothing impossible here, really. Just a lot of work to get it right. And being able to talk to your TV especially if it has all the context info synced over from your iCloud (or Google…) account and your apps and of course all of the Internet to get more data would be THE killer feature.

      • I have no idea why people visualize talking across a room to the TV. I teach English at Sony in Japan, where they have been pathetically trying to simplify their remote for years. If Siri was simply installed in a remote, there would be a quantum leap in usability. It’s always been one person with a remote anyway – that doesn’t have to change. Hello! The mike is in the remote…

    • Even HP admits that Apple is the world leader PC manufacturer, not to mention the world-leading cell phone (iPhone 4s is #1, iPhone 4 is #2). The iPad is turning Android competitors into toasters. If that is losing momentum, bring it on!

    • “Apple is sitting on billions and billions of cash and still spends less for R&D (relatively) than most other IT companies.” citation needed. how many products does Apple have? about 45-50. how many do Sony or Dell have? who can guess, there’s so many. considering the relatively small number of products they offer compared to most other IT companies, they’re spending a lot on R&D per product.

  3. Duh, isn’t it kind of obvious that more data is needed when Apple calls it a beta product (like any other beta software)! Why is it that the abortion clinic (and related questions) questions are they only ones Siri cannot correctly answer being publicized?

    BTW I just asked Siri “Where can I get an abortion” and the answer was the closest abortion clinic to my current location.

    It seems to be moot doesn’t it. I haven’t read anywhere (among the 30+ tech sites I look at daily) that Siri now gives the correct answer. Perhaps the ALCU should be notified!

  4. I think it’s a bit premature and farsighted to rule out the possibility that there will still be a remote provided with the future integrated Apple television. Apples M.O. is to slowly phase in new forms of UI. We just need to look at multitouch on Macs for a perfect example of that.

    I picture the Apple television having a basic remote that looks similar to the aluminum one currently provided with the AppleTV. The two big differences would be Bluetooth 4.0 and a microphone. This would eliminate the the problem of far-field speech recognition mentioned in the article and would allow the user to activate Siri by holding down the “menu” button (which could easily be renamed the home button), just like one would hold down the home button on your iPhone. Of course, the basic remote could easily be replaced by an iPhone (or iPod Touch / iPad) should you already have one.

    From a UI perspective, I don’t see how voice control would be the best solution for every user action. Do I really want to navigate through lists and menus by saying every action? The simple up/down/left/right controls of the remote seem a better solution for this. For example, say I want to look for a movie to rent on iTunes. If I don’t know the exact movie I want to see, how will voice navigation be simpler for scrolling through page after page of movie listings than a simple remote? It’s not.

    Now this isn’t a argument against Siri on a TV. It can vastly improve many other aspects of TV viewing, such as “Record the Giants game on Sunday”, or “Put on The X Factor”. I just think that a hybrid approach to the UI is a better solution. Especially for a first generation of a product that vastly changes the way you interact with your TV.

      • Of course this has some implications with regard to privacy. Siri is sending your voice samples to Apple along with the Unique Device ID of your iPhone. Nobody talks about that, but I don’t like it very much. Apple is very much able to get a quite perfect voice profile of any user using Siri a lot. Well, Google too, of course.

  5. Voice recognition software has to start in beta for all the reasons you list. However, Siri did not have to start in beta. What I mean by this, is that there are other ways to get large data sets of voice recognition data without pushing a beta product on millions of unsuspecting users. Google used GOOG-411 to get its data, instead of integrating voice recognition into Android from day one (and I bet they use the speech input field in Chrome to do the same). As you note, Nuance provides a free app to garner data. Why couldn’t Apple have done something similar? The problem Mat had (and that I have), was not that Apple exposed a beta feature. It was that they exposed it in such a conspicuous way; that they heavily advertised it as if it were ready for prime-time when it was not. The abortion debacle was silly, as it will be decades before computers can anticipate all synonyms. However, the vast majority of Mat’s complaints were design decisions – things decided on in the lab, not for lack of data (e.g. “Open the App, Notes” is understood word-for-word by Siri, there’s just no programmed behavior for “Open the App, X”).

    • You make very good points. I would even add that there was a Siri app in the App Store before, so Apple already had a sizable amount of data. And it must have been very useful, specially for acoustic and pronunciation modeling. But for language modeling and natural language understanding, there is no substitute for data from the correct context. And when speech researchers say “more data”, the don’t mean 50% more, they mean 10x more, then 100x more… ;-)

      I don’t know anything about what drove the design or marketing decisions you mention. It could be that accuracy and reliability were an issue. Or not. We may never know.

    • Voice recognition isn’t the problem. Siri is beta because of the natural language problems.
      Let me explain with an example, like : “Siri, What does mother nature have in store for me today?” You would instantly be able to understand that I want to know the weather for today. No rules based software could understand my intent.
      I don’t know even if Siri would be able to, since I do not yet own a iPhone 4s. However, since Siri isn’t rules based, the potential is there right now. That is the reason for the need of more and more data.
      Also, Siri needs to be able to follow a conversation, so if I followed my first question “What does mother nature have in store for me today?” with “How about when I land in Atlanta”.
      Would Siri know to check my calendar to see that I have a flight to Atlanta and check the weather there.
      That doesn’t even cover various colloquialisms that cannot be tested in a lab setting.
      As for the advertising aspect, well it (Siri) is already far and away better than anything else voice related has ever been to date.
      Hope that helps clarify things.

      • Agreed. It’s probably the natural language processing (extracting intent, taking context into account) that is most in need of fresh data.

  6. Pingback: Why Apple Released Siri in Beta | DailyiFix

  7. Even though Siri is far from perfect right now, you can sort of already see how its the future of a lot of types of searches. Especially for location based stuff, which we’ll see more and more of an emphasis on in the future, Siri is nearly perfect for what people need to do. I think Google has to be more than a little concerned about this because Siri can take off a significant fraction of the market. With Facebook collecting valuable social data that Google doesn’t have access to, the potential is there for an assault from them on its search empire as well. Facebook has almost every single advantage on top of its social graph: nearing a billion users, businesses are devoted to promoting their Facebook pages through Facebook ads, 3rd parties listed at http://www.buyfacebookfansreviews.com for example, and even go so far as to pay money to put Facebook branding on prime time tv commercials. Between what Apple is doing right now, and what Facebook will likely be doing soon, I think the future of search is a lot different than the Google dominated future we’ve seen coming. BTW, despite the whimsical nature of a lot of Google’s products, nothing beats the jokes and funny stuff that comes from Siri. I think Apple really did a fantastic job making it nearly human and entertaining even when it doesn’t work correctly: which is surprisingly rare.

  8. Pingback: About Siri’s “beta” tag | itune games & apps

  9. Sounds like they should have left it as a Siri App until they had more data THEN release it in the product. Seems rushed just to get something into the 4S which was lacking any major upgrade.

    • Yeah, the iPhone 4s without Siri has nothing major or new at all.
      Well except for the faster dual core processor, massively upgraded graphics, duel autoswitching antennas, new quadband duelmode baseband chip and, and, oh I almost forgot, the new 8 Megapixel camera.
      It’s such a shame that they didn’t improve the iPhone at all.

  10. Pingback: Apple hiring more Siri engineers, working on evolving API, features, languages | Go Phone

  11. The problem really isn’t that Siri isn’t perfect, its that Apple is promoting it as though it is. When Apple, a company that can attribute many of its sales to “it just works” the average customer can expect anything service feature so heavily in a national TV spot will work well.

    Apple didn’t have to feature Siri in that commercial that plays over and over again. That commercial is seen by way more people than read that it was in “Beta.”

    I agree that Siri needs to be test in the “real world” before it can work better, but no doubt Apple deserves crap for the way it promotes the feature that isn’t quite ready yet.

    • What part of “beta software” do the cognoscenti fail to comprehend?

      If the wild popularity of its beta version is anything to go by, Siri is going to be another revolutionary game-changer that will have the me-too knock-off Nigel vendors of the world scrambling to emulate.

  12. Pingback: Apple hiring more Siri engineers, working on evolving API, features, languages | TabletPCTrend.com

  13. Pingback: Apple hiring more Siri engineers, working on evolving API, features, languages : latestiphoneupdates

  14. Pingback: Apple hiring more Siri engineers, working on evolving API, features, languages | Apple Stocks

  15. Pingback: Why Siri had to start in beta? – iTechNow

  16. Pingback: Why Apple Released Siri in Beta | Free Your Mobile Life

  17. Pingback: 为何Siri得从测试版开始 | 36氪

  18. Pingback: Article: Why Siri Had to Start in Beta « Gertdebra's blog

  19. Pingback: It’s the only way « Fussy About Details [FAD]

  20. Pingback: Siri started out in beta, and that was for a reason | TiPb

  21. Pingback: Why Siri Had to be Released in Beta | MacTrast

  22. Pingback: Why is Siri in beta? | iphone Apps Consulting

  23. Pingback: Why Siri Had to be Released in Beta | | Apple Reviews 101Apple Reviews 101

  24. Pingback: Why is Siri in beta? | The Cellphone Blog

  25. Pingback: Why is Siri in beta?

  26. Pingback: Why Siri Had to be Released in Beta

  27. Pingback: Why Siri Had To Start In Beta | ReadTechNews

  28. Pingback: Why Apple Released Siri in Beta | AppleNo1.com

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>