Evaluation of three SEQ variants – MeasuringU

- Advertisement -

The Single Ease Question (SEQ®) is a single seven-point item that measures perceived ease of task completion. It is commonly used in usability testing.

- Advertisement -

Since its launch in 2009 [PDF]Some researchers have made changes in its design. Figure 1 shows the version we currently use.

- Advertisement -

Figure 1: Current version of SEQ.

In 2022, we decided to test some other SEQ variants that we have used in the past or seen in the UX literature to help us decide whether we should continue using our current version. The three variations each vary one of the following elements of the scale:

- Advertisement -

The polarity of the scale endpoints The wording of the item stem The presence or absence of numbers on the response options

In this article, we summarize the findings and conclusions of these three experiments.

In all three experiments, we used our MUIQ® platform to conduct asymmetric remote UX studies with a Greco-Latin experimental design to compare standard and alternative versions of the SEQ in terms of attempting easy and difficult tasks. As a reminder, a Greco-Latin square design best combines within-subjects and between-subjects experimental designs. For example, within-subjects analyzes are more sensitive to detect statistically significant effects and allow preference assessments, whereas between-subjects analyzes are resistant to the possibility of asymmetric transfer.

easy and hard work

Easy task: Find a blender on the Amazon website for less than $50. Copy or remember the Blender brand name. Hard Work: Please find out how much an iPhone 12 with 64GB of storage costs monthly with service for one line on the AT&T website. Copy or remember the monthly cost (including all fees).

There were three independent variables in this experimental design: item format (standard or optional), rating context (easy or difficult task), and order of presentation (standard/easy then optional/harder; standard/harder then optional/easy; optional/easy then). standard/hard; alternative/hard standard/easy).

After attempting each task, participants completed either the standard or alternative version of the SEQ according to the experimental design. Finally, participants indicated which SEQ version they preferred.

Experiment 1: Endpoint Polarity

In our first experiment, we wanted to know whether to change the scale polarity from easy → difficult to difficult → easy change score.

We manipulated endpoint polarity using versions that differed whether endpoints were “too hard” on the left and “too easy” on the right (as in the 2006 Tedesco and Tullis study). [PDF]) or the reverse (as in the 2009 Sauro and Dumas study).

Figure 2 shows the two versions used in this experiment. Using the Greco-Latin experimental design described previously, we separated these item formats and the difficulty of the tasks performed by the participants. We used this data to see if there was evidence for

we got

There is no evidence of left-wing bias. When the left side was “too easy”, the means were not statistically different. Overall, there was no significant difference in top-box scores, but there was an interaction between item format and task difficulty such that the difference in top-box scores was statistically significant, but only for the difficult task. 53% of respondents had no preference for one variant, whereas those who had a preference for the standard variant (29%) were significantly stronger than for the alternative (18%).

Figure 2: The two SEQ item formats differ in the polarity of the endpoints.

Experiment 2: Item Stem

Our second experiment used the same experimental design as the first, but the manipulation was only for the wording of the item stem. Specifically, as shown in Figure 3, we compared the original stem (“Overall this task was:”) with our current version (“How easy or difficult was this task to complete?”).

Among the multiple methods of analyzing the data (means and top-box scores), we did not find any statistically significant differences (in fact, the smallest p-value in the analyses was .30). For those who had a preference, it was split evenly between versions, with 55% having no preference.

Figure 3: The two SEQ item formats differ in the wording of their stems.

Experiment 3: Response Option Numbering

Our third experiment used the same experimental design as the other two, but manipulation was only for the presence or absence of numbers as response option labels. As shown in Figure 4, we compared our current numbered version with an otherwise equal numbered version.

In a broad suite of analyzes of means and top-box scores, we found no statistically significant differences, although some top-box differences were large enough to be related. The numbered version appeared to have better discrimination ability between easy and difficult tasks, and among those who had priority, the preference for the numbered version was statistically significant and just over 2:1.

Figure 4: Two SEQ item formats that differ in the presence or absence of numerical labels on response options.

Item Variation Takeaways

These analyzes (and other considerations) support our continued use of the current version (Figure 1).

Endpoint Polarity: Any concerns we may have about differences in endpoint polarity on our historical SEQ instruments were mitigated by the results of the first experiment. We found significant differences in top-box scores for the difficult task but no significant differences in means, which we typically analyze and report. We prefer our current polarity because we know from experience that it is easier to discuss results with stakeholders when a larger number indicates better results, and those who had a preference significantly preferred the current polarity . It’s comforting to know that data of any format is likely to be comparable.

Item stem wording: any concerns we had regarding the potential impact of item stem wording on response selections were addressed by the item stem experiment. There was no evidence that mentioning “easy” earlier in the stem produced different measurements than in the bare-bones stem (“overall, it was:”). This suggests that either wording will produce similar results.

Numerical Labels: Providing numerical labels for response choices is the more commonly used format, has the ability to better discriminate between easy and difficult tasks (when using top-box scores) and versions without numbers. But it was well liked. We recommend keeping the number.

Source link

- Advertisement -

Recent Articles

Related Stories