Back to blog

Rethinking user testing

beakers

Expanding the horizons and expanding the parameters,
Expanding the rhymes of sucker MC amateurs
– The Beastie Boys, The Sounds of Science, 1989

I’ve always thought it’d be cool to be a scientist–a real scientist, with the lab coat and the beakers and whatnot. You could win friends and influence people (and pwn enemies) anytime, anywhere.

Sometimes I get the impression that I share this secret ambition with web and UI designers at large. After all, making design decisions that are “only” based in a team’s collective experience, thoughtfulness, observation, trial and error, etc. leaves those decisions open to critique. But grounding/couching your work in some sort of rigorous-sounding, quantifiable, testable result: that’s science…you can’t beat that!

Now, don’t get me wrong. Testing your design in appropriate ways can be invaluable (I’ll talk more about this in another post). But I really take issue with the idea that User Testing, per se, leads to great design. I’ve seen just the opposite happen.

I think this is the case because we’re trying to appropriate a tool that loses its power and actually becomes counter-productive when used outside of the context it was designed for. We’re borrowing from the experimental design paradigm in cognitive science, which is a scholarly discipline closely related to the applied field of HCI. But have you ever seen an actual experiment in cog sci? When I performed and ran a few of these back in school, they usually worked something like this:

  • Sit someone down in front of a computer in a small room (maybe there’s a video camera or some kind of monitoring equipment set up)
  • Have them stare at a dot in the middle of the computer monitor and hit A, B, or C as soon as they recognize some sort of visual stimuli presented to the screen
  • Record the timing and number of errors
  • Repeat the experiment with a bunch of different people, changing up one of two “explanatory variables” that you’ve guessed will have an impact on performance.
  • Run statistical analyses on the results and use these to draw conclusions.

Now this kind of experiment is obviously very narrowly focused in its scope, and necessarily so. There are a bunch of reasons why, but the two main ones are these: *reliability and validity*. For an experiment to be reliable means that repeating it over and over again yields the same result. For an experiment to be valid means that it gives cogent answers (even if they’re only partial answers) to the questions you asked in the first place.

Interactive experiences like websites are complex phenomena. They don’t naturally lend themselves to the kind of experimental protocol described above because user performance varies greatly from person to person and from session to session. There are so many potentially confounding variables in play that *reliability suffers*. (The model experiment above tries to eliminate this problem by paring down the user’s task to a few basic actions.) You’re measuring 10th order effects and it becomes nearly impossible to establish causal connections between design characteristics and user performance.

If we do streamline things so that we’re just measuring one or two explanatory variables vs. 50 (say, by temporarily removing elements from the design), the experiment becomes more reliable, but *less valid*. That is to say, the results—while repeatable—can’t really be generalized to answer the type of questions that we want to ask in the first place (questions like ‘is this design easy to understand and use’), because we’re not truly testing the design.

Sometimes usability researchers will employ something called a “talk aloud protocol” to try and tap into the cognitive processes underlying user performance in a given scenario. This involves asking users who are testing a given design to explain what’s going through their heads as they move through some sort of task flow.

Again, I have real problems with this pseudo scientific approach to evidence-based design. For one, the act of talking about what you’re doing changes the nature of that experience. But more importantly, most people can’t accurately report on why they do what they do that’s why there are such a fields of inquiry as cognitive science and psychology in the first place!

I don’t want to be overly cynical here, but do want to caution usability professionals and interaction designers in general: user testing can be helpful, but also misleading. It can also be a powerful political gambit or rhetorical expedient. If you’re going to test something, make sure you’re asking the right kinds of questions. User testing can be used to tune or optimize design, but cannot and should not substitute for creativity or thoughtful trial and error.

David Gillis More posts by David Gillis