Finding Glowie’s voice: 8 learnings about creating voice-bot dialogue

These past six months, I’ve been seeing another woman. It’s hardly a confession, since my wife was well aware of it, and so many of my colleagues witnessed the whole thing too. What did I like so much about her? Apart from her intellect and looks, I must admit I mostly loved that voice of hers. A sound that’s best described as ‘en-US-Wavenet-F’ by Google Cloud Text To Speech. Yes, she was a voice-bot. And her name was Glowie.

Deconstructed

Why am I using past tense here? Because Glowie was part of the annual international light art festival GLOW in Eindhoven, which took place from 10 to 17 November 2018. Glowie’s not around anymore: just like all other art pieces at GLOW, she was deconstructed the day after the event. But along with her parts, also the insights she gave us were taken back to our Greenhouse Group office. And these will help us (and you, since you’re reading this blog) build stronger dialogue for voice-bots in the future.

Exploring the wasteland

One insight stands out, among all the others: the area of voice technology has come a long way, but has a long way to go still. But considering the steps we’re taking, we in fact feel confident about the speed at which we can explore the wasteland in front of us. Fact is, that to be experienced as a truly natural conversation requires better speech recognition and a much bigger, almost infinite collection of smalltalk – random subjects you and the voice-bot can talk about.

What have we learned from this unique project? Before I share with you 8 learnings about creating voice-bot dialogue, here’s a short video giving you an impression of the design and conversational concept:

The challenge: disrupt

People taking the time, even getting in line for an art piece is quite special at GLOW. Because your typical GLOW visitor wants to look at a light art piece, be amazed and be off again. And that’s exactly the behavioral pattern that GLOW asked us to disrupt, if only for a moment: ‘Find a way to trigger people into a brief, magical interaction with an intelligent, innovative light art piece.’ A challenging brief to sink our teeth into.

The concept: interaction by conversation

For our art piece to embody the desired interactivity would require her looks (inside & outside) to be changed by something visitors could do. Which in itself is quite the novelty at GLOW. Where most light art pieces are mind-blowingly gorgeous, but remain exactly the level of gorgeous for the whole week.

So, to us it was a no-brainer to make an intelligent art piece that could be altered by interacting with it. And the trigger for this changing art piece would be a conversation. Why? Because at Greenhouse Group Conversational we’ve been exploring, implementing and expanding the possibilities of voice-tech through projects for Auping and other frontrunner brands. Glowie would profit from insights we got from earlier conversational projects and add her own new insights as well.

This, that and the other thing… things

To be present at a big international light art festival such as GLOW is quite something special. It’s been a dream of mine since the event was launched in 2006. Expectations are always high, visitors have seen it all. So, we really pushed ourselves. And didn’t ‘just’ combine a light art piece and a voice-bot, but also used personal detection (so Glowie knew she had visitors) and added LED technology and sound design as well. All these things combined had to work at the same time. One link malfunctioning would break the chain we needed to entertain the public the way we wanted. It’s only fair to admit we got some help in this field from four tech & design students*.

Live user-test

Now, let’s get back to the dialogue part. Glowie attracted visitors to step inside for a conversation by wowing them with her motion and sound designs on the outside, made by designer Niel Heesakkers – who also edited the video above. Obviously, not all of GLOW’s 750,000 visitors had the opportunity to actually talk to Glowie. But in the end, she talked with approximately 2,500 people. Quite a big live user test, right?

Live optimization

Because voice-tech is not mainstream yet, a couple of our team members always had to be present to explain what people were about to experience. And what they could (not) expect from Glowie. At the same time, I was backstage optimizing the dialogue, based on feedback users gave in the booth, texted to me by my colleagues. These were texts about lines that didn’t work or needed tweaking in some way or another. The event started on Saturday. By Wednesday, 90% of the lines were considered relevant and entertaining by visitors. This is an estimate, though, since we didn’t actually do a survey (we were simply to busy tweaking).

Visitor satisfaction

Illustrated by the growing number of laughter in the booth and the reactions we got from people leaving the booth, visitor satisfaction increased every day. People actually started advertising the experience to those queueing up. By day 4 there was an average waiting time of 20 minutes, on day 7: 25-30 minutes.

Natural conversation

We wrote lots of smalltalk (conversational items that are off-topic, but that we could predict visitors wanted to talk about.) Such as: ‘How are you?’, ‘What’s your favorite movie?’, ‘Do you know Siri?’, ‘Can you rap for me?’, ‘Are you seeing someone?’ or ‘How old are you?’ Having smalltalk with a voice-bot makes people feel like they’re having a natural, human-like conversation. Technology has not yet evolved to the extent that a fully non-scripted conversation is possible, but we’re getting closer every day.

Scripted

Glowie’s dialogue was scripted like this: she tried to find out how visitors were feeling at GLOW; were they happy, tired, ecstatic? And when Glowie recognized their emotional state, she would try several things to make them (even) happier. For example: ‘Sing your favorite track for me’ (which people could also hear on the outside, this was Glowie’s most popular feature), and a visualization assignment – ‘Close your eyes, think of the happiest moment in your life, what colour would match this moment best?’ (And our LEDs inside would then show them this colour.)

Dialogflow

The conversational copywriting for Glowie was done by my colleague Renske van den Bogaard and myself. For the job at hand, we used Dialogflow by Google. The reason we chose Dialogflow, is because it nicely incorporates Google’s machine learning expertise, the most common smalltalk and products such as Google Cloud Speech-to-Text; the latter making it easy to instantly hear how the dialogue you’ve written will sound like when your voice-bot pronounces it.

Next, as promised, I will share with you the 8 lessons the Glowie project has learned us about writing dialogue for voice-bots.

1. Take the lead

Voice-bots and chatbots are intelligent, but not smart enough yet to understand everything people say without any context. That’s why at this point in the evolution of writing bot dialogues we have to make sure the bot is in the lead. How? By letting the bot be the one asking the questions. And directing the answers of the consumer towards a predictable set of answers (‘What’s your favorite colour?’, ‘What’s your age?’, ‘Who’s your fave Barcelona player?’). Or by asking yes/no and multiple choice questions. Although we try to avoid the latter option.

2. Smalltalk ad infinitum

Smalltalk makes people feel like they’re having a real, human chat. Brands who are prepared for virtually any type of smalltalk, will get lots of kudos from consumers for being so intelligent and likable.

Although people felt like it was really okay for us to push them into Glowie’s conversational flow, we found out that they responded more positively when we served them an extra smalltalk opportunity at the beginning of the conversation: ‘Before we really get started, I can imagine you want to get to know me better and ask me stuff first,’ Glowie would ask. This made people feel they were in control of the conversation and it felt less scripted, more free.

3. Avoid confirmations

Although yes/no and multiple choice questions give the dialogue a very clear, structured and swift flow, they also make your bot feel less natural. They will feel like: ‘It’s like I’m filling out a form with my voice.’ To minimise this feeling, we made our multiple choice questions sound like open end ones.

For example: at the beginning of our dialogue, to give people the opportunity to get to know Glowie better, she said: ‘Go ahead, ask me anything, like "Where do you come from, Glowie?", or "What’s your favorite Netflix series?"’ People want to be original and spontaneous, so they asked her: ‘Hey, Glowie, what’s your favorite movie?’. Which was not exactly the Netflix question we had primed. But we used this priming to trigger the movie question, which – obviously – we had prepared. Result: people were amazed, thought it was sheer magic.

4. Sound as human as possible

One of our goals when we started the Glowie project was to make her sound as human as possible. But this desire to build voice-bot dialogues that sound, as they call it, ‘natural’ is not always the best choice. Although improving every conversation, the sound of the currently available voices is still somewhat botified. Hesitations, breaks, emphasis levels can all be used to make your voice-bot sound more human, but still Glowie didn’t sound perfectly human yet.

Maybe the best thing to suggest here, is not to pretend you’re human and embrace the bot life and the limitations that come with it (e.g. no emotions, no taste/scent). Play with it in a creative way. For instance, when Glowie was asked about hobbies – if any – she said: ‘Well, of course I like a game of go. Chess I used to play a lot. But these days it’s becoming a bit of a drag, to be honest.’

5. Know the journey

In our case, this was literally, geographically important. People had just seen a light art piece on a big church. We could play with this in our dialogue. And they were on their way to the market square, so when saying goodbye Glowie said: ‘Give my best to the market square.’ Of course, this learning also applies to (the phase of) the customer journey.

But also the emotional state-of-mind and the expectations people have when starting the conversation are key factors to successfully influence the bot dialogue. For example, we wanted to have a chat about how people were feeling at GLOW and, first off, we wanted to do a bit more than just ask ‘How are you feeling?’ But we discovered that people just didn’t want to spend time talking about the why, how and what of their emotions. They simply wanted to laugh a bit and be entertained.

6. Be ready for short answers

When your voice-bot ask questions that can be answered very briefly, that’s a risk in terms of speech recognition. (Another reason not to use yes/no questions.) When asked for a specific color, people would answer ‘Red’, which Glowie didn’t understand. This is due to the way the technology currently works: the mic for recording the answers starts when the first syllable is spoken. By adding context in Dialogflow, your bot can fill in the blanks. this could improve. And when people replied ‘The color red’ or ‘I’m thinking of the color red’, Glowie answered correctly (as proof she showed visitors an LED that turned red).

7. Make it personal

As an agency that’s been working on conversational projects for some time now, we sometimes forget how magical it feels for consumers to talk to a bot that actually recognises and repeats your name. Or where you come from (and play with that, as we had Glowie say ‘Howdo’ at the end of a conversation – which is a phonetical way of writing a regional way of greeting people; the Brabant equivalent of ‘Grussgott’ in the Southern part of Germany.) And, sure, speech recognition is everything but perfect at this moment. So mistakes do happen. But hasn’t Starbucks become famous on social media for (deliberately) misspelling your name on a coffee cup?

8. Mimic nonverbal communication

One of the aspects that’s different about voice-bot conversations as compared to chatbot dialogue, is the lack of visual feedback. How do you know when to speak or listen? During a pre-test at the Dutch Design Week we noticed how people were having trouble knowing it was their turn. So, before GLOW we upgraded the inside of the installation by adding an LED that either signified Glowie was talking (LED moving from left to right, "Kitt" style for those familiar with the TV series Knight Rider), or that it was your turn to speak (static light). Of course, we explained this to people when they came inside. This way we introduced a nonverbal element that made the conversation go so much smoother.

!(Glowie on Glow, Eindhoven) 0-1544506143787

*Thank you, Shane van den Bogaard, Lars Jenster, Geert Boer and Amanda van der Vleuten – and Bas Ploeg for coaching them. And while I’m at it…Thanks, Sander Kok for giving the team all the mental support and more. Kudos to Niel Heesakkers for sacrificing weekends and nights.