Evaluation of the Labeled Hedonic Scale under different experimental conditions

https://doi.org/10.1016/j.foodqual.2010.02.001Get rights and content

Abstract

The present study explored the performance, psychophysical properties, and potential limitations of the Labeled Hedonic Scale (LHS) in two different test conditions. The LHS was first compared against the 9-point hedonic scale in a psychophysical laboratory setting by having subjects evaluate two sets of taste stimuli with different size of hedonic ranges. Both scales showed equal discrimination power, but the LHS was more resistant to ceiling effects and its data satisfied the normality assumption for statistical analysis. In the second experiment, subjects with different levels of prior experience with the 9-point scale were recruited to evaluate juice samples and food item names in a large-scale consumer test, after receiving either common written instructions or detailed instructions emphasizing the use of the LHS. Improper use of the LHS (e.g. categorical ratings) was evident for those who received written instructions, especially among subjects who had previous experience with the 9-point scale. The present results support the efficacy of the LHS in various test settings, while highlighting the importance of providing adequate instructions to subjects to fully utilize its advantages.

Introduction

The traditional 9-point hedonic scale (Jones et al., 1955, Peryam and Pilgrim, 1957), which was originally designed to measure consumer acceptability of foods, has been the most common device used in sensory and consumer testing. It has also been widely used in the field of psychophysics, where the objective is often to measure individual or group differences in hedonic perception. The primary reason for its wide acceptance is that, compared to other scaling methods (e.g. magnitude estimation), the 9-point scale’s categorical nature and limited choices make it simple and easy for both consumers and researchers to use.

Because of its simplicity, a series of fundamental limitations associated with the scale have been largely overlooked. First, although its labels are equally spaced, the intervals between the labels are psychologically unequal (Jones and Thurstone, 1955, Moskowitz and Sidel, 1971, Peryam and Pilgrim, 1957). Consequently, the scale produces only ordinal- or, at best, interval-level data, as opposed to ratio-level data (Stevens, 1951). Second, because it is a category scale, the 9-point hedonic scale offers little freedom for consumers to express their preference among products (Marchisano et al., 2003, Villegas-Ruiz et al., 2008). In addition, because there are only nine choices available and also because consumers often avoid using extreme response categories (Hollingworth, 1910, Moskowitz, 1982, O’Mahony, 1982), the scale is highly vulnerable to the ceiling effects. This reduces discriminability of the scale for highly liked or disliked samples (Lim et al., 2009, Schutz and Cardello, 2001, Villanueva and Da Silva, 2009). Lastly, data obtained using the 9-point hedonic scale frequently violate the normality and homoscedacity assumptions for parametric statistics (Gay and Mead, 1992, Giovanni and Pangborn, 1983, Villanueva et al., 2000), particularly the data for extremely liked or disliked samples (Lim et al., 2009, Peryam et al., 1960). In theory, the types of statistical analyses that can be applied to such non-normal data are limited to nonparametric analyses, which are less sensitive compared to parametric analyses.

To overcome the above-mentioned limitations of the 9-point hedonic scale, researchers have made continuous efforts to find a better way to quantify hedonic responses. At first, magnitude estimation (ME) (Stevens, 1956, Stevens and Galanter, 1957), which was originally developed to measure proportional magnitudes of perceived sensations, seemed to be a better solution, as it provides answers to the objections about category scales (e.g. ordinal vs. ratio data, limited choices vs. greater freedom in number usage). Thus, ME was adapted to rate the pleasantness of odors and tastes (Engen and McBurney, 1964, Moskowitz, 1971, Moskowitz et al., 1976) and the hedonic attributes of foods (Giovanni and Pangborn, 1983, McDaniel and Sawyer, 1981, Moskowitz and Sidel, 1971, Shand et al., 1985). These studies showed that ME can be used to quantify the relation between physical concentration and overall acceptability in terms of ratios with equal or slightly better sensitivity to differences among food items compared to the 9-point hedonic scale. Nevertheless, the method fell out of favor and into disuse due to (1) difficulties of the task involved in estimating ratios and translating it to appropriate numbers by untrained subjects (Lawless and Malone, 1986a, Lawless and Malone, 1986b, Moskowitz, 1977), (2) the complexity of analyzing the data associated with normalization and standardization, and (3) lack of qualitative information about acceptance level (Moskowitz & Sidel, 1971).

More recently, different types of scales have been developed that aims to yield data comparable to those obtained with ME while maintaining the chief advantages of the traditional category scale (i.e. ease of use, semantic information about sensation magnitude). “Category-ratio” scales such as the Labeled Magnitude Scale (LMS) (Green et al., 1993, Green et al., 1996) are continuous line scales with verbal descriptors, which spacing and location were empirically determined by measuring their semantic magnitudes via ME. The key elements and benefits of such scales are: (1) because they were derived and validated using ratio scaling, they can be assumed to yield ratio-level data equivalent to ME, which are particularly valuable to illustrate the relation between perceived intensity (or acceptability) and an underlying quantitative dimension of the food (e.g. concentrations of salt); (2) because the positions of their semantic labels have been empirically determined, they provide meaningful semantic information about subjective experience; (3) because they are continuous line scales, study participants can express subtle differences in perceived intensity or preference among stimuli; and (4) because they encompass a broad frame of reference, they enable comparison of individual and group differences (Bartoshuk et al., 2004) within the context of the full range of the given domain.

In recognition of those benefits, Schutz and Cardello (2001) developed an affective version of category-ratio scale for the specific purpose of assessing food affect. Development of the Labeled Affective Magnitude (LAM) scale was patterned after the procedure originally used to create the LMS. Yet, some critical aspects of the psychophysical procedure (e.g. training of subjects, a frame of reference) were somewhat different from those used to develop the LMS, raising questions about the validity of the scale compare to ME or the LMS. Noting these potential limitations, Lim et al. (2009) recently developed a new hedonic category-ratio scale using a method that adhered closely to the method used to develop the LMS. The resulting Labeled Hedonic Scale (LHS) (Fig. 1) was then compared against ME and the 9-point hedonic scale by asking different set of subjects to rate their liking of a variety of food item names. The results suggested that the LHS has some important advantages over both the 9-point hedonic scale and ME. First, the LHS yielded data that were almost identical to those obtained using ME, indicating the validity of placement of the semantic descriptors and the assumption of ratio-level data. Second, the LHS afforded slightly better discrimination among stimuli and much greater resistance to ceiling effects while producing more normally distributed data compared to the 9-point hedonic scale.

While those findings indicate the possibility that the LHS can be an advantageous device for hedonic measurement of taste, flavor, foods and potentially any other consumer products, some concerns about the LHS remain unanswered. The most fundamental question for any category-ratio scale, particularly in the field of sensory evaluation, has been whether a wider frame of reference produces a smaller range of ratings for test samples, thus reducing the discriminating power of the scale (Cardello, Lawless, & Schutz, 2008). In our previous study (Lim et al., 2009), the LHS was shown to be somewhat better at differentiating affective values of 26 food item names than the 9-point hedonic scale. However, it has yet to be determined if the slightly higher discrimination power of the LHS, along with its other quantitative and statistical advantages, will hold for data obtained in common sensory tests where the potential hedonic range of test stimuli is often narrower.

Another concern arises from the potential misuse of the scale by naive subjects or subjects who have extensive experience with other scales such as the 9-point hedonic scale. Since subjects’ full understanding on the nature and mechanics of the scale is of obvious importance to obtaining valid data that truly reflects subjects’ liking/disliking of test stimuli, giving proper instructions prior to testing will be crucial, especially when using a scale that is not familiar to subjects. However, instructions about scale usage are often given a low priority, especially in a large consumer test setting. In a recent study, Cardello et al. (2008) reported that “a large number of panelists (50 of 100 subjects at one study site and 65 of 100 subjects at another site) used the scale (i.e. LAM scale) in a categorical manner, making marks on the tick marks corresponding to the verbal labels”. Without clear instructions, subjects may use the LHS as if it is another category scale with two additional descriptors at each end (i.e. 11-point scale).

The purpose of the present study was therefore to evaluate the performance and psychophysical properties of the LHS in various test settings. The first experiment was designed to compare the LHS against the 9-point hedonic scale in terms of data distributions, sensitivity, and other statistical considerations. Two types of taste stimuli – chemical stimuli covering a wide hedonic range, and juice samples covering a relatively narrow hedonic range – were used to determine whether (1) sensitivity of the scales changed depending on the size of hedonic range, and (2) the scales could be used to assess liking/disliking for both simple and complex flavor system. The second experiment was carried out under a large-scale consumer test setting to evaluate the performance and possible limitations of the LHS, as well as to find a way to avoid potential misuse of the scale. Specifically, (1) the effect of prior experience with the 9-point hedonic scale, and (2) the effectiveness of detailed verbal instructions were assessed in terms of the scale usage by subjects.

Section snippets

Subjects

Forty-five subjects (34 F, 11 M) between 18 and 38 (mean = 23) years old were recruited on the Oregon State University Campus and were paid to participate. The experimental protocol was approved by the Oregon State University Institutional Review Board, and subjects gave written informed consent. All participants were non-smoking, native English speakers who reported that they were free from deficits in taste or smell and were not taking any prescription medications. None of the subjects had prior

Subjects

A total of 157 subjects (102 F, 55 M) between 18 and 50 (mean = 31) years old were recruited on the Oregon State University campus and the community and were paid to participate. The experiment protocol was approved by the Oregon State University Institutional Review Board and subjects gave written informed consent.

Subjects were recruited based on their previous experience with food consumer tests in which the 9-point hedonic scale was used. Approximately half of the subjects (N = 81) were

General discussion

The results of the present study indicate that the LHS is a useful tool for assessing hedonic values for different stimulus systems ranging from chemical stimuli to commercial food products in various test settings. Although the original purpose of the LHS was to measure hedonic magnitudes of sensations in a broad context, which enable meaningful comparisons between individuals and groups, the ability of the scale to be used in diverse test settings is desirable. In the current study, we have

Reference (37)

  • B.G. Green et al.

    Evaluating the ‘Labeled Magnitude Scale’ for measuring sensations of taste and smell

    Chemical Senses

    (1996)
  • B.G. Green et al.

    Derivation and evaluation of a semantic scale of oral sensation magnitude with apparent ratio properties

    Chemical Senses

    (1993)
  • J.L. Greene et al.

    Effective of category and line scales to characterize consumer perception of fruity fermented flavors in peanuts

    Journal of Sensory Studies

    (2006)
  • H.L. Hollingworth

    The central tendency of judgment

    Journal of Philosophy, Psychology, and Scientific Methods

    (1910)
  • L.V. Jones et al.

    Development of a scale for measuring soldiers’ food preferences

    Food Research

    (1955)
  • L.V. Jones et al.

    The psychophysics of semantics: An experimental investigation

    Journal of Applied Psychology

    (1955)
  • H.T. Lawless et al.

    The discriminative efficiency of common scaling methods

    Journal of Sensory Studies

    (1986)
  • H.T. Lawless et al.

    A comparison of rating scales: Sensitivity, replicates and relative measurement

    Journal of Sensory Studies

    (1986)
  • Cited by (37)

    • Measuring sensory perception in relation to consumer behavior

      2022, Rapid Sensory Profiling Techniques: Applications in New Product Development and Consumer Research, Second Edition
    • A combined scoring and ranking approach for determining overall food quality

      2018, International Journal of Approximate Reasoning
    • Joint consensus evaluation of multiple objects on an ordinal scale: An approach driven by monotonicity

      2018, Information Fusion
      Citation Excerpt :

      Letting aside methods based on the median [9], which do not fully exploit all the available information, the mode is often used by researchers to assign to each sample the label expressed by the largest number of experts [10]. In some cases, the labels are identified with numbers (usually assumed to be equidistant) to assign the (rounded) arithmetic mean as the consensus evaluation by the experts [11]. This assumes the existence of a certain notion of distance between labels.

    View all citing articles on Scopus
    View full text