Voice Design 101: How to perfectly create a VUI skill?

Voice skills are becoming more and more popular around the world. The technological advancements, as well as the Coronavirus crisis that forced a multitude of businesses and institutes to go online, made the VUI (Voice User Interface) design a necessity. 

But if you think that designing a user interface for voice is basically taking a textual chatbot and converting it to voice via TTS – whoa, nothing is further from the truth. Working on The Commuter, a Liam Neeson movie, in 2018, gave us the perfect use case to demonstrate the process of creating an engaging VUI skill – which we presented at a workshop with Hillary Black, on October 20, 2020. For the full workshop, you can click the video.

So, first of all, when speaking about voice design – what are the biggest differences between writing a chatbot and writing a voice skill?

Chatbots are much more common and popular nowadays; the easiness of creating a written dialog, considering today’s proliferation of messaging apps, makes this kind of communication the most common. The voice design’s main outlets are the digital assistants like Siri and Alexa, which aren’t very conversational to begin with. Also, on a dialog, creating a human-like interaction with a textual chat companion is easier in text than in voice. That’s because most computer-generated voices don’t sound 100% human.

So, why should you design a voice bot, and how you do it the right way?

The chatbots and conversational AI glossary
How can voice help you learn a music instrument?
A pro tip: “Context is the most important thing for a VUI”
Can voice interface be also a tool for research?


  • Make sure that your brand or company understands the ramifications. You have to give your customers an extra value for chatting with a voice interface, and you also have to make sure that your audience has the ability – language-wise, technology-wise – to connect to your skill. For example, if your country isn’t native-English speaking, you won’t be able to convey your dialogs with them. And if you want to reach only commuters on the rail or bus, you better give up in advance – it’s not necessarily a good idea to have a voice dialog when the communication depends on a quiet surrounding. An audience research in advance is a must and can spare a lot of work and frustration.
  • And, while we’re at it, voice interfaces have an inherent advantage – you can use more than the regular human speech in them. You can have multiple personalities talking with you in the same skill; You can add things like music and sound effects to enrich the conversation and experience. Nailing the voice design is actually an opportunity to create much more engaging and immersive experiences than a regular chatbot.

Every bot needs a plan

  • Remember that your experience – whether it’s one identity or a whole experience – needs a plan, a coherent personality. That’s true for textual chatbots and is at least equally important for voice ones, since lack of cohesion can reduce the reliability of your skill or bot, and your whole voice design can go down the drain.
  • You mustn’t forget an important aspect of this conversation: the fact that there’s another side to it, and you have to keep the user engaged. That means avoiding long monologues, avoiding yes/no questions and “empty” responses as much as you can, and giving a wide range of ways to invoke different prompts from the interface.
  • As in a textual chatbot, you need to write both the happy path(s) and the unhappy paths for your experience. How will the “perfect” conversation unfold? Can the user finish the conversation successfully with different paths? And what to do when the conversation goes awry or that the user decides to troll it?


  • Create a collection of intents. Every intent that is necessary for the conversation flow, or that can possibly, rationally be brought up by the user, should be there. Consider it in your voice design and bring on the appropriate response from the voice interface.
  • Have all of your voice responses set up. It can be made via TTS, or by a voice actor; Just make sure that all of the things that you’ve written are there.
  • Testing, testing, testing. For The Commuter voice skill we released in 2018, to promote Neeson’s movie, we had over 250 internal tests before letting actual users try it.


  • Start with a soft launch: Even after all the internal testing, let people discover it themselves. Also, spread it into another circle of users that will test it to find possible bugs and loopholes. Try it on different users, with different levels of English. Make sure that all your sound effects are working and understood. If some of this stuff is necessary to progress with the skill, make sure it’s audible.
  • Only when you’re done with user testing – roughly 250 sessions – go on air. Publish the fact that the skill is available and ready for everybody to use it.
  • Go by our famous Feedback Loop, the blueprint by Yaki Dunietz to successfully deploy a chatbot – iterate, fix, ad absurdum.