ABTesting | Griffin Beels

A/B & User Testing

Context

In Fall 2018, I took a UI/UX course. In this assignment, we essentially constructed a site using html/css, and then collected user data regarding site interactions -- we then performed user tests on 3 users through UserTesting.com. User testing & A/B testing are very important for determining actual user behavior and figuring out how changes affect users.

ASSIGNMENT LINK

CLick here for a live demo of the site.

Hypotheses & Goals

Our goal is to determine how each site differs in various metrics, with hypotheses for the four metrics below.

Click Rate
- Null: The click through rates for Version A and Version B will be equal.
- Alternative: The click through rate for Version B will be greater than the click through rate for Version A, because Version B utilizes feedback on hover for each vendor section indicating each is separate, and due to “Learn More” as the button tag encouraging the user to be curious.
Time to Click
- Null: The Time to Click for Version A and Version B will be equal.
- Alternative: The Time to Click for Version B will be greater than the Time to Click for Version A, because Version B has more content visible initially and includes a price tag users may compare before clicking.
Dwell Time
- Null: The dwelltime for Version A and Version B will be equal.
- Alternative: The dwelltime for Version A will be greater than the dwelltime for Version B because the user who clicks “Learn More” (Version B) may be attempting to search for more information on various sites than the user who has likely decided to reserve a taxi already (Version A), and therefore will return sooner.
Return Rate
- Null: The return rate for Version A and Version B will be equal.
- Alternative: The return rate for Version B will be greater than the return rate for Version A because users who seek to “Learn More” (Version B) are more likely to seek information from multiple sites, whereas those ready to reserve may not come back to check other options (Version A).

Furthermore, keep in mind that Version A was a template from our class -- I designed Version B to contrast with Version A for testing purposes.

Version A

Version B

Data Analysis and Results

After gathering data from various sources (e.g., posting on Facebook, in-class data from our UI/UX class), the following spreadsheet is composed of all analysis in a compact “Stats / Results” sheet:

User Testing

After analyzing the A/B testing data, we will now use the site UserTesting.com to analyze, through video, how users would interact with a hi-fi mobile app prototype of the Memphis Taxis site. We will use Invision to allow for direct interaction with the prototype, located here. Below are the screens used.

[initial screen]

[city screen]

[clicked screen]

Our hypothesis regarding user performance is that a user tasked with learning more about Memphis Uber through this prototype will be able to do so swiftly, intuitively, and with minimal errors due to the simplicity of the design and its utilization of affordances in menus.

The users are introduced with the following theoretical situation:

Imagine that you are a college student in Memphis, Tennessee looking for the cheapest way to get from your dorm to tonight's party. You need to decide which taxi company you will use.

Then, four sub-tasks were provided to the user to reach the objective:

1.) As you search for the Memphis button, describe your searching

process and the expected outcome of tapping on the Memphis button.

2.) Without leaving this page, in your own words, describe what you think you can do here. Be specific.

3.) Expand the Memphis Uber box and describe your interpretation of the information that will appear.

4.) Learn more about Memphis Uber and describe the process for doing so.

The following is a table quantifying the results of each sub task above for the three users who participating in the user test, using three key metrics surrounding their interactions:

After analyzing each User Testing video, it became clear that the interface designed for the mobile app is particularly simple in design and satisfied the above hypothesis — and based on a 100% completion rate for each task, no major issues arose. However, in sub-task 2, two errors occurred. The first error was a lapse, where the user did not continue to the next task (as he should have in sub-task 1); the second error was a mistake in planning, in that the user thought she would be directed to a separate app when pressing on the drop down menu for a taxi service rather than a box appearing. Finally, while the metrics for the User Testing site provided relatively high Time on Tasks, this time was mostly spent elaborating to the camera; all of the tasks were completed within 5 seconds each otherwise. Overall, two users were satisfied, while the third felt the prototype was “too cheap.”

Each person’s criticism seemed to lie in the “cheapness” of the design. This could be remedied by following color schemes and the Apple design guideline more cleanly and consistently, to appeal to affordances and familiarity. Also, most apps involving financial transactions / eCommerce especially need to be polished such that users are willing to trust the app to handle transactions or point them to safe places for purchases.

Additionally, one user was confused about the drop down carrot; while this relies on affordances already, adding a “more” label to the left of each carrot would better elicit a box appearing rather than loading a different site or app.

Conclusion / Takeaways

A/B Testing for Memphis Taxis was highly disadvantageous compared to UserTesting data — that is, because the Heroku link for the website was sent mostly between students and people who we asked for help, the data was not as organic as the users who talked out loud and recorded their interactions on UserTesting. However, UserTesting is highly qualitative in nature, and is hard to quantify user activity and decisions automatically; in the ideal environment, A/B Testing on the other hand allows us to directly see how users are interacting with certain elements and keep track of interaction time frames.

User Testing offered feedback that A/B Testing simply cannot — that is, what are users actually thinking as they use the prototype, and what have competitors drilled into them as a standard? I have additionally realized that the “carrot” affordance of a dropdown is not as intuitive as I thought. Otherwise, all users were able to complete the tasks with ease, simplicity, and speed which was a success.