The importance of small interactions
User testing is about evaluating new and more complex interactions (e.g. voice user interface, end-to-end multi-touchpoint phygital experiences) and with ever-changing analysis technologies (e.g. eye movement, neuro-ergonomics).
However, this tendency to complexity should not make us forget that even the simplest and most consolidated interactions, like a single touch of a finger on a screen, can, by itself, decree the success or failure of a product (in recent years, indeed, there has been a growing interest in "micro interaction"). Hence the idea of telling you about a study entirely built around a "simple" numeric keypad of a mobile POS (a card reader for smartphones) for credit card payments, NEXI.
100 years of keyboard evolution, not yet finished
NEXI's request to thoroughly evaluate the use of a keypad may be surprising, especially if you consider that we have been living with Western Indo-Arabic numbers (0 1 2 3 4 5 6 7 8 9) for about 800 years and that for over a century we have been using them on a daily basis through numeric keyboards.
But by looking at how these ten buttons have evolved in different contexts of use you can understand that the evolution is not over yet.
The cash registers (and calculators) keyboards, on the one hand, and those of telephones, on the other, are in fact united by the same 3x3 + 0 matrix arrangement (three keys on three rows, plus a fourth row for the "0"), but with the numbers ordered differently: the first from bottom to top, the second from top to bottom. In both cases there is a purely practical reason. Calculators keep the lowest numbers (1 2 3) at the bottom, along with the "0", because they are the ones that are used the most. While in the case of telephones a greater heterogeneity of the numbers to be dialed, in addition to the need to make numbers coexist with letters (e.g. A B C), has led to ordering them from top to bottom to make them consistent with our reading habits (top to bottom, left to right).
Just take any smartphone and open the phone number or calculator pad, and you can see how these age-old conventions still hold firmly.
Talking about electronic POS ("Point Of Sale", terminals for payment by credit or debit card), in the 1980s they resumed the convention of telephones rather than that of cash registers, both for the need to bring together numbers and letters, and not to confuse users by now accustomed to typing codes with the "1 2 3" keyboard present on ATMs since the 1970s (see image).
With the arrival of smartphones and the multiplicity of interfaces that can be represented on their screens, around 2010 mobile POS arrived: card readers that connect to the phone via Bluetooth and are managed through a dedicated App (see image).
Hence the opportunity for a reflection on human performance in the use of a keyboard which, in a very short period, has gone from being physical to becoming digital and has been inserted into a device (the telephone) that has both the "1 2 3" version for dialing telephone numbers and the "7 8 9" version for the calculator.
The objective of the project and the team
The high-level objective received by NEXI was to evaluate, through a usability test with users, the "in use" performance of the POS mobile keyboard with respect to:
- entering amount;
- visibility and use of the CTA "00";
- summing operations comprehensibility;
- visibility of the CTA to cancel;
- haptic feedback perception.
To operationally decline this request, a working group that combined the operational efficiency of UNGUESS (Ex AppQuality) recruitment management with the technical-methodological experience of Ergoproject's UX Researchers was formed.
Ergoproject has been supporting public and private realities for over ten years with research, consultancy and training activities on issues relating to human performance and the quality of interaction with digital products/services in terms of usage error, functional status, accessibility, usability and experience of use.
AppQuality is the leading crowdtesting company in Italy. Thanks to this methodology it is possible to find ux bugs and clutches on any digital product (apps, sites, ecommerce, iot devices, chatbots, etc.) through a crowd of testers ranging from expert bug finders to common users (such as users of Lottomatica slot machines).
The keyboard
Comparing it with other competing applications it is possible to see how NEXI has decided to characterize its UI with a series of distinctive elements that have made it even more important to verify some design choices.
The keys don’t have the rectangular shapes of the "physical" POS, as other apps do, but they are inspired by the circular buttons of the phone keypad with the replacement of the asterisk and the hash with "00" and "+". The “00”, consistently with ordinary POS, is used to speed up the insertion of numbers without decimals after the comma. The “+”, less usual, introduces the possibility of adding different amounts before proceeding with a payment.
The introduction of the "+" led to another characterizing choice: to put the "cancel" at the top right instead of at the bottom, as it usually happens.
All this inserted within a design reduced to the essential, minimal.
Test strategy and use scenarios
To evaluate the use with respect to the design and the proposed CTAs it immediately became clear that the evaluation strategy should focus on two fundamental aspects:
- metrics that allowed to give greater understanding of the quality of interaction with the product (exceeding the "usual" success rate and time of execution of the tasks)
- use scenarios that could give back the plurality of different situations in which the product would have been used in the real world (i.e. I hold the phone with one hand while with the other hand I pass the card reader to the customer, I leave the phone on the counter) .
As for the first aspect, inspired by the 1983 Rasmussen classification (when you can innovate by looking at the past), we have analyzed all the mistakes made by users by dividing them into:
- Miss - System errors (Miss): the system does not recognize the command entered by the user (e.g. The system does not enter the number when tapped).
- Execution errors (Slips): incorrect insertions or deletions caused by misstap (e.g. the user misses the key).
- Memory errors (Lapses): skipping of an operation or execution of another operation (e.g. In a sum the user forgets to insert the "+" between two numbers).
- Understanding errors (Mistake): the user expects the system to work in a different way from how it behaves (e.g. the user looks for the comma, tries to click on an entered number to change it).
As for the second element, to make the simulation of use as "realistic" as possible, we made users confront with 3 different scenarios of phone manipulation:
- Scenario A - Free: users could hold the phone as they preferred (e.g. hold the phone with the left hand and type with the right index finger).
- Scenario B - Dominant hand: users were asked to imagine that they had one hand busy (e.g. with the POS mobile device) and had to carry out the same operations while keeping the phone in their dominant hand (right for right-handed / left for left-handed).
- Scenario C - Table: users were invited to imagine a case in which they could not hold the phone and had to carry out the same operations leaving it on the table (being able to freely use one or two hands).
Scenario A was designed to provide us with information on the comprehensibility of the system, while through scenarios B and C we were able to "force" users into situations considered "typical" during normal M-POS transaction activities, to verify if the performance in the use of the system (e.g. the execution times of the operations) could change in relation to the manipulation of the device: only with one hand or placed on the table.
For each scenario, users had to perform 3 operations:
- Enter a defined amount.
- Sum up a series of amounts.
- Correct an error in the last amount entered.
To avoid distortions (bias) in the results, due to the familiarization with the system as the various sessions progress, the sample was divided into two and a part of the sample carried out the activity in the order A → B → C, while the other in the order A → C → B.
This allowed all users to have a first approach to the system (scenario A) and for us to have "clean" results from learning in the "forced" approach (scenarios B and C)
Involving users: an unusual sampling
Before talking about the results, let's answer a question that we believe is of fundamental importance: where do people who carried out the test come from? The UNGUESS community, which is committed to seeking real users of digital products and services, responded with precision and speed to the request for an unusual sampling.
More than 20 users, most of whom are freelancers with an interest in using M-POS for their future and/or present activities, were quickly involved for the test.
Users were asked to download the application, registering themselves (from the outside) and their screen (from the inside) during the execution of the various operations in the different scenarios. But let's see more in detail the type of user involved.
In this type of tests the correspondence between the real end-user and the user who participates in the activity is not only useful but essential. This is often reflected in the search for users of a certain age range, subscribed to a certain service, residing in a more or less wide geographical area. In this specific project, however, the requirement was the use (current or potential) of the device being tested. On the one hand, the actuals, who are those users who already use the POS (mobile or traditional) in their business activity, on the other, the prospects, those users who have never used the POS, but who could, in the future, need it given the nature of their business. This second branch of users included various professions, such as traders, artisans, merchants, and freelancers.
The selection deliberately excluded anyone who was not a real or a possible end user of the POS, a requirement that made the sampling of the users involved very stringent. Undoubtedly, the possibility to draw from a community as large as the App quality one has also proved crucial in terms of speed in recruitment, which otherwise would have required much longer and delayed the testing phase.
Results
Analysis and evaluation of user behavior
By observing the users we were able to understand the habits of use of the device and the keyboard problems, measuring the performance while carrying out various tasks.
To understand the depth of the analysis, let's briefly review the three main aspects of the study:
- behavior analysis
- performance evaluation
- learning curve
Regarding the behavior analysis, It was possible to evaluate the different manipulation and interaction strategies both with the phone and with the UI, detecting, for example, the two main strategies of use of the device: holding the phone with one hand and typing with the index finger of the other; holding the phone with one hand and typing with your thumb.
This is not surprising if we think about the single-handed use, typical of smartphones, and the typical "calculator" setting with the use of the device held with the other hand or placed on the table.
Through the performance measurement we then assessed both the individual tasks we had asked users to perform as well as their variation in the different usage scenarios. Continuing with the previous example, we found that users make fewer entry and/or deletion errors when using the phone with two hands.
Finally, we tried to cross qualitative and quantitative data to understand what the learning curve of the new interaction paradigms introduced by NEXI could be.
During the first uses of the system, in fact, users encountered some difficulties in understanding related to the absence of a CTA with a comma (automatically inserted by the system) and to the real-time calculation of the sum (they expected to see the entered value appear large and then added to the total, similar to the behavior of a calculator).
However, comparing the error rate on single tasks over time (single task in the first scenario VS the same task in the second scenario VS the same task in the third scenario) we saw how this decreased as the interactions increased linked to a rapid and positive learning process: in the second interaction the difficulties had been reduced, until they disappeared in the third one.
These are obviously just some examples of what emerged which, overall, made it possible to improve the product by refining the position and optimizing the behavior of some commands.
Not only advantages for the quality of the product
Carrying out a test of this type does not only bring the tangible advantage of optimizing the quality of the product. There are also higher level effects that involve the life of the team, the choices to be made, the time to be used and the value of the brand. We can divide these benefits into three categories:
- Focus: to concentrate resources on the salient or critical elements to work on and to keep under observation after the launch
- Objectivity: to make decisions based on data that can be shared with the rest of the BU
- Validation: the strategic choice of arriving on the market with an already validated product
Let’s start from the focus. Thanks to the analysis with the involvement of real users, Nexi's work team was able to concentrate time and effort on salient elements that otherwise could have gone unnoticed. For example, the behavior observation and analysis led to resolving usability clutches, as in the case of using the sum function.
In the design phase hypotheses are formulated based on the knowledge of the end user, which are inevitably vitiated by their preferences, knowledge of the product itself, link with the brand and much more. Having objective data deriving from real users, on the other hand, allows you to make data-driven, defensible decisions that can be told to the rest of the BU.
Finally, at a strategic level, arriving on the market with a non-validated product would have been a huge risk. If the usability had not been previously ascertained, the consequences could have been seen in the number of returns and in the perception of value of the product.