How To Have A Personalised Hey Siri Experience with Apple ML

The first touch-free, voice assisted way to interact with Siri was introduced by Apple with the iPhone 6 (iOS 8). Users to adduce Siri without having to press the home button. All it needed to wake Siri up was a voice command, ‘Hey Siri’.

For example, a user could say, “Hey Siri, how is the weather today?” and the phone wakes up upon hearing “Hey Siri” and processes the rest of the utterance as a Siri request. This feature came in ‘handy’ to users in situations where their hands might be otherwise occupied, such as while driving or cooking, as well as in situations when their respective devices are not within arm’s reach.

Imagine a scenario where a user is asking his or her iPhone 6 on the kitchen counter to set a timer while putting a turkey into the oven. It was a great time if you ran an iPhone app development company. Entrepreneurs and budding app development enthusiasts went gaga over developing apps that integrated with Siri to provide users with a breathtakingly fresh experience.

Uplifting Personalization

Apple says that the phrase “Hey Siri” was originally chosen to be as natural as possible. They found out that even before this feature was introduced, users would invoke Siri using the home button and inadvertently prepend their requests with the words, “Hey Siri.” This word articulated brevity and ease of use.

Apple also pointed out that sometimes users might activate Siri unintentionally. Unintended activations occur in three scenarios – 1) when the primary user says a similar phrase, 2) when other users say “Hey Siri,” and 3) when other users say a similar phrase.

In order to reduce such False Accepts (FA), Apple wanted to personalize each device such that it only wakes up when the primary user says “Hey Siri.” And in order to do so, they leveraged techniques from speaker recognition.

Speaker Recognition

The primary motive of speaker recognition is to ascertain the identity of a person using his or her voice. Speech Recognition performed using a phrase known a priori, such as “Hey Siri,” is often referred to as text-dependent SR; otherwise, the problem is known as text-independent SR.

The performance of a speaker recognition system is measured as a combination of an Imposter Accept (IA) rate and a False Reject (FR) rate. For both the key-phrase trigger system and the speaker recognition system, a False Reject (or Miss) happens when the target user says “Hey Siri” and his or her device does not wake up. This sort of error occurs more often in acoustically noisy environments, such as in a moving car or on a bustling sidewalk.

The applied methodology of a speaker recognition system is a two-step process: enrollment and recognition. At the time of the enrollment phase, the user is asked to say a few simple phrases. These phrases are used to create a statistical model for the user’s voice. In the recognition phase, the system then compares an incoming utterance to the user-trained model and the underlying system decides whether to accept that utterance as belonging to the existing model or reject it.

Overview of how the system was developed

On each “Hey Siri”-enabled device, the company stores a user profile that consists of a collection of speaker vectors. This profile contains five vectors, each vector with the sound of ‘Hey Siri’ prompted by the user stored in it, after the explicit enrollment process. During the Model Comparison Stage, as they call, they extract a corresponding speaker vector for every incoming test utterance and compute its cosine score against each of the speaker vectors currently in the profile.

If the average of these scores is greater than a predetermined threshold (λ), then the device wakes up and processes the subsequent command.

Further ahead

Apple says that even though the average speaker recognition performance has improved for them significantly, their experimental evidence suggests that the performance in reverberant and noisy environments still remain more challenging.

At its core, the purpose of the “Hey Siri” feature is to enable users to make Siri requests; however, the company also aims to make use of the Siri request portion of the utterance (e.g., “…, how’s the weather today?”) in the form of text-independent speaker recognition.

Conclusion

Siri has come a long way since its inception. And with Machine learning capabilities, Apple took the game a notch above. The company has skyrocketed its sales with the recent devices and aims to go further ahead with full throttle. Did you want to make an iPhone app? Now would not be a bad time to hire iPhone developer and start working on your dream.