A lesson we surely learnt is that customers themselves cannot train a chatbot that has been already released and, theoretically, ready to use.
In fact, in order to guarantee the efficiency of a chatbot, it is essential that:
- it is able to identify the topics on which the customers need answers about
- it is able to understand their needs, without being influenced by the interlocutor’s linguistic structures.
Results that can be reached if the chatbot undergoes a learning process. This means that the customer and the developer identify the topics and the related keywords by estimating the interlocutors’ requests and how they will be asked.
But since they are just estimates of a closed group, topics and keywords are generally not exhaustive. The result is that the learning phase actually starts once the chatbot is public. All those topics that were not estimated come to surface. The problem is that those topics are useful for the real user. And just like this, a new training process begins: new keywords and their answers are added while the product is already public.
When the training is so poor, in the first interactions the chatbot offers a low/bad user experience, which can lead to frustration, annoyance and even abandonment for the early adopters. An incompletely trained chatbot can cause a reduction in customer’s retention and that’s why the need to anticipate the learning phase arises.
How can you enter the market with an already-trained chatbot? And how can you do it on time and avoid delay?
The UNGUESS (formerly known as AppQuality)'s project, designed with INNAAS, is meant to support the chatbot development process by making it faster, more effective and complete before its public release. We started from the problem, which is the late learning phase (an already marketed product). The goal was to identify parts of the development processes to revised to improve the chatbot learning phase. It immediately arose that there wasn’t a single specific aspect to work on, but it was necessary to integrate the crowd-testing in different development phases. From the topic identification to the lexical forms, everything needed to be crowd-tested.
Phase 1: Identifying the topics
In the first place, it became clear that the topic identification activity needed to be improved. This activity involves the identification of those topics the chatbot should be able to promptly respond. This phase is usually delegated to customer care analysis and the developer’s personal experience. Together they draft a series of FAQs that reflect the questions they expect to receive. However, especially for a new product, it is difficult for the creator to be able to put himself in the perspective of a real customer approaching the tool for the first time. In fact, the creator already has a deep knowledge of the bot.
The first step was understanding what real customers would ask the chatbot. To do so, it was necessary to get to know the target. That's why the client's marketing team shared an estimate of the demographic distribution, the cultural background and the technological compatibility of its customers.
Then, UNGUESS selected 25 users who matched the client's description from its community of about 200.000 testers worldwide. They were asked to interact with the chatbot that was still in an embryonic stage. Specifically, in this first test, testers were asked to imagine they had seen a new bank advertisement and ask information about its services. The goal was to understand what the user might want to ask to his own bank.
Before the test, the bank and the developer had already tried to foresee the frequent questions. By collecting information from customer care and distribution marketing, they had selected 187 topics. However, the test demonstrated how the chatbot was still limited and underdeveloped. In fact, 132 new FAQ’s were identified, plus 13 already identified processes that still weren't detailed enough. This means crowdtesting activities identified 78% more topics in just 3 days!
Such a significant increase in topics wasn’t expected. The integration of the new questions and answers for each of the 132 topics consisted in a considerable effort on the developer and bank's behalf. If this crowdtesting activity had been run after the chatbot's release, managing the issues would have caused higher costs both in terms of time and resources.
Once the answers to the testers' new questions were added, we moved towards the correct topic and category identification. In fact, knowing that many users ask questions about credit cards isn't enough. For a better experience, it is crucial to correctly identify the purpose of the question and limit misunderstandings and non-answers from the chatbot.
Second phase: improve the lexicon
After the correct identification of the topics and the definition of the questions, the second training phase is focused on the identification of multiple lexical forms and logical construction for each question. In fact, customers could use a colloquial language (“hi, how can I get a credit card?”) or be more concise (“credit card request”) or wordy (“if I open a bank account with you, in how many months can I have a non-rechargeable credit card?”). It is clear that the chatbot needs to be flexible with language, but also efficient in identifying the topic (in this case, the process of a credit card activation).
UNGUESS took the set of pre-defined questions and asked its testers (previously selected to match the bank’s real customers) to rewrite them, each tester with their own style. In this phase it was necessary to reach a high number of alternatives in order to cover the highest number of styles, lexical forms and use of synonym. In this second phase, in less that a week, 50 testers stressed the chatbot reaching more than 8,000 different interactions. Then, 5,000 of those interactions were subsequently validated and integrated in the chatbot learning.
This made it possible to build a sort of database of an average of 25 different logical structures for each single topic, allowing the chatbot to compare the question received with one of those structures. By doing this, the chatbot could correctly identify the request and select the corresponding answer.
It looks clear that the learning phase of a chatbot will never end. It is a continuous balance between satisfying the most likely questions and the necessity to manage the specificities that can emerge. It can look like the chatbot must be limited to be efficient. Actually, it is possible to push those limits and ensure a better ability to respond at the same time. The way is leveraging on thecrowd's support.
By using crowdtesting in the pre-release phase, our client was able to:
- Identify almost all the questions to which the chatbot must be prepared to answer with greater completeness and reasonable precision
- Make the understanding of the question more independent from its structure, its lexical form and even synonyms.
So how do you train a chatbot? And how do you balance the need of complete answers with the necessity of reaching the market as fast as possible?
Three magic words: integration, collaboration and crowd.
UNGUESS' community is characterized by a high flexibility of tests' management. At first glance, the project doesn't look like a typical testing activity, but the two activities share a common goal: the release of a quality product.
By having access to a large number of testers, UNGUESS could select those who resembled the real bank's clients and was also able to start testing in a very short time. We were able to share the results with the developer in just a few days.
Is it possible to do a User Experience Analysis on a Chatbot?
Yes, it is a variation of the classic UX or User Experience testing that is focused on the textual interface instead of the graphic. In fact, in chatbots the experience is mainly given by the content and not by the container. This is obviously a general statement, but there are some chatbots with a graphical customer support interface, that were created to speed up the interaction (such as the one in Milan Airports).
In the case of the banking chatbot that we tested, our approach was able to provide more information, for example related to weather. The results of the test highlighted several design issues on this featured. The goal was to provide visual, synthetic information, also including the temperature and the humidity. The experience of this feature has been criticized as the graphics were too simple and inconsistent with the overall style. For example, the icon didn’t make it clear whether what you saw was the current or the forecast weather. Moreover, even if the humidity indication was useful, many users would have preferred to have information regarding the chance of raining.
What have we learnt?
Crowdtesting can be used as a chatbot training tool, but it needs to be integrated in the development process to reach its maximum efficiency. Also, it is recommended to set a goal for each stage of the development phase and to schedule a deep analysis of the testers' feedback. Testing a chatbot after its development is doable, but not as profitable since a big re-work would be needed.
The integration of the crowd reduces the release times: the development team can focus on the technical issues without wasting time managing the data, which are instead delegated to the crowd. Achieving 8,000 interactions in 3 days is not actually feasible for any company structure that wants to conduct a similar activity internally. Even assuming you try to do so, you should create a support structure to memorize and rationalize the information. Finally, you should implement an appropriate policy to manage the single users. The advantage in turning to UNGUESS lies in the opportunity to outsource a complex time consuming process and an extremely onerous information management.
So, long story short, crowdtesting helps you obtain a clear, organized and structured output that you can effectively include in your company’s development process. Download the white paper here below to learn how you can reduce software testing costs and time using the Crowdtesting method.