Type of Response Scales
An absolute open-ended quantifier is a type of numerical text input scale, it is used to ask respondents for an open and numerical answer.
A relative open-ended quantifier is a similar type of numerical text input scale, which requires a previous specification of the meaning of a standard value.
A relative metric scale is a kind of scale that also requires the specification of a standard to give relative evaluations.
An absolute metric scale is where respondents should select a point on a continuum.
Dichotomous scales only provides two substantive response options. Typical dichotomous scales are yes–no and true–false scales.
Rating scales provide three or more categorical options.
Closed quantifiers are mainly used for objective variables such as the frequency of activities. Omitting its response alternatives such scales become open-ended quantifiers.
Branching scales are used to simplify the respondents’ task when answering to long bipolar scales.
A relative open-ended quantifier is a similar type of numerical text input scale, which requires a previous specification of the meaning of a standard value.
A relative metric scale is a kind of scale that also requires the specification of a standard to give relative evaluations.
An absolute metric scale is where respondents should select a point on a continuum.
Dichotomous scales only provides two substantive response options. Typical dichotomous scales are yes–no and true–false scales.
Rating scales provide three or more categorical options.
Closed quantifiers are mainly used for objective variables such as the frequency of activities. Omitting its response alternatives such scales become open-ended quantifiers.
Branching scales are used to simplify the respondents’ task when answering to long bipolar scales.
Theoretical arguments
- Metric scales are comparable to categorical scales; the type of scale is not the most important but the conditions related to them (Hjermstad et al. 2011).*
Dichotomous scales are clearer in meaning and require less interpretative efforts which can harm consistency compared to rating scales (Krosnick et al. 2005).* - Relative open-ended scales (or magnitude scaling) are a difficult method to administer which only reveals ratios among stimuli and not absolute judgments(Krosnick and Fabrigar 1997).*
- Respondents are more likely to provide rounded answers in 101 metric scales, as an easy way out (Liu and Conrad 2016).*
- The closed-range quantifier labels provided can influence their results if they do not represent the population distribution (Revilla 2015).*
- Line production (or relative metric) scales are better than relative open-ended quantifiers because rounding is avoided (Saris and Gallhofer 2014).*
- Magnitude estimates (or relative open-ended quantifiers) have problems related to the appropriate standard and recoding into categorical distinctions (Schaeffer and Bradburn 1989).*
Branching has the advantage to provide large number of categories not visually (Schaeffer and Presser 2003).* - Closed-range informs the respondent about the researcher expectations and adds systematic bias in respondent’s reports and related judgements compared to absolute open-ended formats (Schwarz et al. 1985).*
- Better use open quantifiers than closed quantifiers for numerical answers to avoid misleading the respondent (Sudman and Bradburn 1983).*
- Round answers in open-ended quantifiers may be a signal of the unwillingness to come up with a more exact answer and introduce systematic bias, in continuous scales (Tourangeau et al. 2000).*
Empirical evidence on data quality
*DeCastellarnau, A. Qual Quant (2018) 52: 1523. doi: 10.1007/s11135-017-0533-4
- Numerical open-ended scales are as accurate as vague-closed options [Rank-order correlations and regression slopes] (Al Baghal 2014) → NO*
- Rating scales have higher reliabilities than dichotomous but comparable to metric scales [Wiley-Wiley reliability] (Alwin 2007) → YES*
- Metric scales are less reliable than radio button [Score reliability] (Cook et al. 2001) → YES*
- Metric scales suffer more missing data than categorical or open-ended quantifier [Item-nonresponse] (Couper et al. 2006) → YES*
- Metric scales are comparable to 5p scales on item-nonresponse [Item-nonresponse] (Funke and Reips 2012) → NO*
- Absolute open-ended scales are comparable to rating scales on reliability [Cramer’s V reliability](Koskey et al. 2013) → NO*
- Metric scales have lower reliability than rating scales; lower reliabilities when using dichotomous scales; branching provides higher reliabilities than rating scales [Pearson product-moment test-retest correlations] (Krosnick 1991) → YES*
- Branching improves reliability compared to no branching (rating scale) [Item reliability] (Krosnick and Berent 1993) → YES*
- Non-significant differences on item-nonresponse between absolute open ended, rating scale or metric [Item-nonresponse] (Liu and Conrad 2016) → NO*
- Dichotomous is less valid than rating scales [Concurrent validity] (Lundmark et al. 2016) → YES*
- There is no difference on reliability or validity between metric and rating scale [Test retest reliability and Test validity] (McKelvie 1978) →NO*
- Magnitude scaling is less credible in terms of reliability compared to rating scales [Test-retest reliability] (Miethe 1985) → YES*
- 2p scales are less reliable and valid [Test retest reliability, Cronbach alpha and Criterion validity] (Preston and Colman 2000) → YES*
- Open-ended quantifiers and metric scales have significantly higher reliability but lower validity than rating scales [True-score MTMM reliability and validity] (Saris and Gallhofer 2007) → YES*
*DeCastellarnau, A. Qual Quant (2018) 52: 1523. doi: 10.1007/s11135-017-0533-4
References
Al Baghal, T. (2014a). Numeric estimation and response options: an examination of the accuracy of numeric and vague quantifier responses. J. Methods Meas. Soc. Sci. 6, 58–75. DOI: 10.2458/azu_jmmss.v5i2.18476
Alwin, D.F. (2007). Margins of Error: A Study of Reliability in Survey Measurement. Wiley, Hoboken.
Cook, C., Heath, F., Thompson, R.L., Thompson, B. (2001). Score reliability in webor internet-based surveys: unnumbered graphic rating scales versus Likert-type scales. Educ. Psychol. Meas. 61, 697–706. DOI: 10.1177/00131640121971356
Couper, M.P., Tourangeau, R., Conrad, F.G., Singer, E.(2006). Evaluating the effectiveness of visual analog scales: a web experiment. Soc. Sci. Comput. Rev. 24, 227–245. DOI: 10.1177/0894439305281503
Funke, F., Reips, U.-D. (2012). Why semantic differentials in web-based research should be made from visual analogue scales and not from 5-point scales. Field Methods 24, 310–327. DOI: 10.1177/1525822X12444061
Hjermstad, M.J., Fayers, P.M., Haugen, D.F., Caraceni, A., Hanks, G.W., Loge, J.H., Fainsinger, R., Aass, N., Kaasa, S. (2011). Studies comparing numerical rating scales, verbal rating scales, and visual analogue scales for assessment of pain intensity in adults: a systematic literature review. J. Pain Symptom Manag. 41, 1073–1093. DOI: 10.1016/j.jpainsymman.2010.08.016
Koskey, K.L.K., Sondergeld, T.A., Beltyukova, S.A., Fox, C.M. (2013). An experimental study using rasch analysis to compare absolute magnitude estimation and categorical rating scales as applied in survey research. J. Appl. Meas. 14, 1–21.
Krosnick, J.A. (1991). The stability of political preferences: comparisons of symbolic and nonsymbolic attitudes. Am. J. Pol. Sci. 35, 547–576. DOI: 10.2307/2111553
Krosnick, J.A., Berent, M.K. (1993). Comparisons of party identifications and policy preferences: the impact of survey question format. Am. J. Pol. Sci. 37, 941–964. DOI: 10.2307/2111580
Krosnick, J.A., Judd, C.M., Wittenbrink, B. (2005). The measurement of attitudes. In: Albarracin, D., Johnson, B.T., Zanna, M.P. (eds.) The Handbook of Attitudes, pp. 21–78. Lawrence Erlbaum, Mahwah
Krosnick, J.A., Fabrigar, L.R. (1997). Designing rating scales for effective measurement in surveys. In: Lyberg, L.E., Biemer, P.P., Collins, M., De Leeuw, E.D., Dippo, C., Schwarz, N., Trewin, D. (eds.) Survey Measurement and Process Quality, pp. 141–164. Wiley, Hoboken
Liu, M., Conrad, F.G. (2016). An experiment testing six formats of 101-point rating scales. Comput. Hum. Behav. 55, 364–371. DOI: 10.1016/j.chb.2015.09.036
Lundmark, S., Gilljam, M., Dahlberg, S. (2016). Measuring generalized trust. an examination of question wording and the number of scale points. Public Opin. Q. 80, 26–43. DOI: 10.1093/poq/nfv042
McKelvie, S.J. (1978). Graphic rating scales—How many categories? Br. J. Psychol. 69, 185–202. DOI: 10.1111/j.2044-8295.1978.tb01647.x
Miethe, T.D. (1985). The validity and reliability of value measurements. J. Psychol. 119, 441–453. DOI: 10.1080/00223980.1985.10542914
Preston, C.C., Colman, A.M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta. Psychol. (Amst). 104, 1–15. DOI: 10.1016/S0001-6918(99)00050-5
Revilla, M. (2015). Effect of using different labels for the scales in a web survey. Int. J. Mark. Res. 57, 225–238 DOI: 10.2501/IJMR-2014-028
Saris, W.E., Gallhofer, I.N. (2007). Design, Evaluation, and Analysis of Questionnaires for Survey Research. Wiley, Hoboken.
Schaeffer, N.C., Bradburn, N.M. (1989). Respondent behavior in magnitude estimation. J. Am. Stat. Assoc. 84, 402–413. DOI: 10.2307/2289923
Schaeffer, N.C., Presser, S. (2003). The science of asking questions. Annu. Rev. Sociol. 29, 65–88. doi: 10.1146/annurev.soc.29.110702.110112
Schwarz, N., Hippler, H.-J., Deutsch, B., Strack, F. (1985). Response scales: effects of category range on reported behavior and comparative judgments. Public Opin. Q. 49, 388–395. DOI: 10.1086/268936
Sudman, S., Bradburn, N.M. (1983). Asking Questions: A Practical Guide to Questionnaire Design. Jossey Bass, San Francisco.
Tourangeau, R., Rips, L.J., Rasinksi, K.(2000). The Psychology of Survey Response. Cambridge University Press, Cambridge.
Al Baghal, T. (2014a). Numeric estimation and response options: an examination of the accuracy of numeric and vague quantifier responses. J. Methods Meas. Soc. Sci. 6, 58–75. DOI: 10.2458/azu_jmmss.v5i2.18476
Alwin, D.F. (2007). Margins of Error: A Study of Reliability in Survey Measurement. Wiley, Hoboken.
Cook, C., Heath, F., Thompson, R.L., Thompson, B. (2001). Score reliability in webor internet-based surveys: unnumbered graphic rating scales versus Likert-type scales. Educ. Psychol. Meas. 61, 697–706. DOI: 10.1177/00131640121971356
Couper, M.P., Tourangeau, R., Conrad, F.G., Singer, E.(2006). Evaluating the effectiveness of visual analog scales: a web experiment. Soc. Sci. Comput. Rev. 24, 227–245. DOI: 10.1177/0894439305281503
Funke, F., Reips, U.-D. (2012). Why semantic differentials in web-based research should be made from visual analogue scales and not from 5-point scales. Field Methods 24, 310–327. DOI: 10.1177/1525822X12444061
Hjermstad, M.J., Fayers, P.M., Haugen, D.F., Caraceni, A., Hanks, G.W., Loge, J.H., Fainsinger, R., Aass, N., Kaasa, S. (2011). Studies comparing numerical rating scales, verbal rating scales, and visual analogue scales for assessment of pain intensity in adults: a systematic literature review. J. Pain Symptom Manag. 41, 1073–1093. DOI: 10.1016/j.jpainsymman.2010.08.016
Koskey, K.L.K., Sondergeld, T.A., Beltyukova, S.A., Fox, C.M. (2013). An experimental study using rasch analysis to compare absolute magnitude estimation and categorical rating scales as applied in survey research. J. Appl. Meas. 14, 1–21.
Krosnick, J.A. (1991). The stability of political preferences: comparisons of symbolic and nonsymbolic attitudes. Am. J. Pol. Sci. 35, 547–576. DOI: 10.2307/2111553
Krosnick, J.A., Berent, M.K. (1993). Comparisons of party identifications and policy preferences: the impact of survey question format. Am. J. Pol. Sci. 37, 941–964. DOI: 10.2307/2111580
Krosnick, J.A., Judd, C.M., Wittenbrink, B. (2005). The measurement of attitudes. In: Albarracin, D., Johnson, B.T., Zanna, M.P. (eds.) The Handbook of Attitudes, pp. 21–78. Lawrence Erlbaum, Mahwah
Krosnick, J.A., Fabrigar, L.R. (1997). Designing rating scales for effective measurement in surveys. In: Lyberg, L.E., Biemer, P.P., Collins, M., De Leeuw, E.D., Dippo, C., Schwarz, N., Trewin, D. (eds.) Survey Measurement and Process Quality, pp. 141–164. Wiley, Hoboken
Liu, M., Conrad, F.G. (2016). An experiment testing six formats of 101-point rating scales. Comput. Hum. Behav. 55, 364–371. DOI: 10.1016/j.chb.2015.09.036
Lundmark, S., Gilljam, M., Dahlberg, S. (2016). Measuring generalized trust. an examination of question wording and the number of scale points. Public Opin. Q. 80, 26–43. DOI: 10.1093/poq/nfv042
McKelvie, S.J. (1978). Graphic rating scales—How many categories? Br. J. Psychol. 69, 185–202. DOI: 10.1111/j.2044-8295.1978.tb01647.x
Miethe, T.D. (1985). The validity and reliability of value measurements. J. Psychol. 119, 441–453. DOI: 10.1080/00223980.1985.10542914
Preston, C.C., Colman, A.M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta. Psychol. (Amst). 104, 1–15. DOI: 10.1016/S0001-6918(99)00050-5
Revilla, M. (2015). Effect of using different labels for the scales in a web survey. Int. J. Mark. Res. 57, 225–238 DOI: 10.2501/IJMR-2014-028
Saris, W.E., Gallhofer, I.N. (2007). Design, Evaluation, and Analysis of Questionnaires for Survey Research. Wiley, Hoboken.
Schaeffer, N.C., Bradburn, N.M. (1989). Respondent behavior in magnitude estimation. J. Am. Stat. Assoc. 84, 402–413. DOI: 10.2307/2289923
Schaeffer, N.C., Presser, S. (2003). The science of asking questions. Annu. Rev. Sociol. 29, 65–88. doi: 10.1146/annurev.soc.29.110702.110112
Schwarz, N., Hippler, H.-J., Deutsch, B., Strack, F. (1985). Response scales: effects of category range on reported behavior and comparative judgments. Public Opin. Q. 49, 388–395. DOI: 10.1086/268936
Sudman, S., Bradburn, N.M. (1983). Asking Questions: A Practical Guide to Questionnaire Design. Jossey Bass, San Francisco.
Tourangeau, R., Rips, L.J., Rasinksi, K.(2000). The Psychology of Survey Response. Cambridge University Press, Cambridge.