## Response Scales' Length

**The minimum and maximum possible values**are used to evaluate the length of continuous scales.

**The number of categories**is used to evaluate the length of categorical scales.

**Theoretical arguments**

- The optimal number of points in a scale should be taken into consideration in relation to the polarity of the scale (Alwin 2007).*
- There is no single number of response alternatives for a scale which is appropriate under all circumstances (Cox III 1980).*
- Optimal is a complex decision to few categories may compromise the information gathered, too long compromises the clarity of meaning (Krosnick and Fabrigar 1997).*
- Optimal length of continuous scales depends on the size of the device screen (Reips and Funke 2008).*
- More categories compromise discrimination and limit the capacity of respondents to make finer distinctions between the options (Schaeffer and Presser 2003).*

**Empirical evidence on data quality**

- Reliabilities remained constant despite changing the number of categories [Internal consistency reliability] (Aiken 1983) → NO*
- 11p scales is more reliable than 7p [True Score MTMM reliability] (Alwin 1997) → YES*
- The use of 4p scales improves reliability in unipolar scales, while the reliability in bipolar scales is higher for 2, 3 and 5p and lowest for 7p. [Wiley-Wiley reliability] (Alwin 2007) → YES*
- There is no differences between AD with 2 and 5p, IS reliability increases from 3 to 9p, but there is no differences between 7 to 9p [Proportion of variance attributed to true attitudes] (Alwin and Krosnick 1991) → YES*
- The biggest effect on data quality. More categories are better. 3p is worse than 2p [MTMM validity, method effect and residual error] (Andrews 1984) → YES*
- Reliability is independent of the number of scale categories [Test reliability] (Bendig 1954) → NO*
- Reliability and validity are independent of the number of points [Test retest reliability, concurrent validity and predictive validity] (Jacoby and Matell 1971) → NO*
- Reliability increases with the number of points up to 6p [Cronbach alpha] (Komorita and Graham 1965) → YES*
- Validity is higher in 7p and 11p points than 2p [Concurrent validity] (Lundmark et al. 2016) → YES*
- Reliability is independent of the number of points [Internal consistency and Test retest reliability] (Matell and Jacoby 1971) → NO*
- Validity is slightly better on 7p rather than 11p, reliability unaffected scale [Test retest reliability and Test validity] (McKelvie 1978) → NO*
- Reliability is lower for 2, 3, 4p, higher for 7, 8, 9, 10p; it decreases with more than 10p [Test-retest reliability] (Preston and Colman 2000) → YES*
- 11p affects positively the quality of IS scales [True-score MTMM reliability and validity] (Revilla and Ochoa 2015) → YES*
- Quality does not improve with more than 5p for AD scales [True-score MTMM reliability and validity] (Revilla et al. 2014) → YES*
- The number of points has the biggest effect on validity; use at least 5 to 7p, better quality [MTMM construct validity] (Rodgers et al. 1992) → YES*
- Reliability can be improved by using more categories (11p) without decreasing validity; [True-score MTMM reliability and validity] (Saris and Gallhofer 2007) →YES*
- The maximum value of a continuous scale has a significant effect on reliability or validity [True-score MTMM reliability and validity] (Saris and Gallhofer 2007) →YES*
- Highest validity is with 4, 5 or 7p [True-score MTMM validity] (Scherpenzeel and Saris 1997) →YES*
- 5 AD points reduces extreme response style [Extreme Response Style through log odds] (Weijters et al. 2010) → YES*

*

*DeCastellarnau, A. Qual Quant (2018) 52: 1523. doi: 10.1007/s11135-017-0533-4*

**References**

Aiken, L.R. (1983). Number of response categories and statistics on a teacher rating scale.

*Educ. Psychol. Meas. 43,*397–401 . DOI: 10.1177/001316448304300209

Alwin, D.F. (1997). Feeling thermometers versus 7-point scales. Which are better?

*Sociol. Methods Res. 25*, 318–340 . DOI: 10.1177/0049124197025003003

Alwin, D.F. (2007). Margins of Error: A Study of Reliability in Survey Measurement. Wiley, Hoboken.

Alwin, D.F., Krosnick, J.A. (1991). The reliability of survey attitude measurement: the influence of question and respondent attributes.

*Sociol. Methods Res. 20*, 139–181. DOI: 10.1177/0049124191020001005

Andrews, F.M. (1984). Construct validity and error components of survey measures: a structural modelling approach.

*Public Opin. Q. 48*, 409–442 . DOI: 10.1086/268840

Bendig, A.W. (1954). Reliability and the number of rating-scale categories.

*J. Appl. Psychol. 38*, 38–40. DOI: 10.1037/h0055647

Cox III, E.P. (1980). The optimal number of response alternatives for a scale.

*J. Mark. Res. 17*, 407–422. DOI: 10.2307/3150495

Jacoby, J., Matell, M.S. (1971). Three-point Likert scales are good enough.

*J. Mark. Res. 8*, 495–500. DOI: 10.2307/3150242

Komorita, S.S., Graham, W.K. (1965). Number of scale points and the reliability of scales.

*Educ. Psychol. Meas. 25,*987–995. DOI: 10.1177/001316446502500404

Krosnick, J.A., Fabrigar, L.R. (1997). Designing rating scales for effective measurement in surveys. In: Lyberg, L.E., Biemer, P.P., Collins, M., De Leeuw, E.D., Dippo, C., Schwarz, N., Trewin, D. (eds.)

*Survey Measurement and Process Quality*, pp. 141–164. Wiley, Hoboken

Lundmark, S., Gilljam, M., Dahlberg, S. (2016). Measuring generalized trust. an examination of question wording and the number of scale points.

*Public Opin. Q. 80*, 26–43. doi: 10.1093/poq/nfv042

Matell, M.S., Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study I: reliability and validity.

*Educ. Psychol. Meas. 31*, 657–674. DOI: 10.1177/001316447103100307

McKelvie, S.J. (1978). Graphic rating scales—How many categories?

*Br. J. Psychol. 69*, 185–202. DOI: 10.1111/j.2044-8295.1978.tb01647.x

Preston, C.C., Colman, A.M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences.

*Acta. Psychol. (Amst). 104,*1–15. DOI: 10.1016/S0001-6918(99)00050-5

Reips, U.-D., Funke, F. (2008). Interval-level measurement with visual analogue scales in Internet-based research: VAS Generator. Behav. Res. Methods 40, 699–704. DOI: 10.3758/BRM.40.3.699

Revilla, M., Ochoa, C. (2015). Quality of different scales in an online survey in Mexico and Colombia. J. Polit. Lat. Am. 7, 157–177

Revilla, M., Saris, W.E., Krosnick, J.A. (2014). Choosing the number of categories in agree-disagree scales. Sociol. Methods Res. 43, 73–97. DOI: 10.1177/0049124113509605

Rodgers, W.L., Andrews, F.M., Herzog, A.R. (1992). Quality of survey measures: a structural modeling approach. J. Off. Stat. 8, 251–275

Saris, W.E., Gallhofer, I.N. (2007). Design, Evaluation, and Analysis of Questionnaires for Survey Research. Wiley, Hoboken

Scherpenzeel, A.C., Saris, W.E. (1997). The validity and reliability of survey questions: a meta-analysis of MTMM studies. Sociol. Methods Res. 25, 341–383

Schaeffer, N.C., Presser, S. (2003). The science of asking questions. Annu. Rev. Sociol. 29, 65–88. DOI: 10.1146/annurev.soc.29.110702.110112

Weijters, B., Cabooter, E., Schillewaert, N. (2010). The effect of rating scale format on response styles: the number of response categories and response category labels. Int. J. Res. Mark. 27, 236–247. DOI: 10.1016/j.ijresmar.2010.02.004