## Correction of correlation and covariance between simple concepts

Imagine that we would like to know the strength of the relationship between two opinions (the variables of interest), for example, job satisfaction (f_{1}) and life satisfaction represented by the correlation coefficient ρ(f_{1}, f_{2}). This coefficient cannot be obtained directly. One can only estimate the relationship ρ(y_{1}, y_{2})between the observed variables, i.e. the responses to questions with respect to job satisfaction (y_{1}) and life satisfaction (y_{2}). The relationship between f_{1} and y_{1} and between f_{2} and y_{2} will not be perfect because of the measurement errors (e_{1} and e_{2}). The standardized effect of the variables of interest f_{1} on y_{1} is called the quality coefficient (q_{i}). This simple idea is presented in Figure 1. A more elaborate measurement model will be presented later.

The two correlations are only equal if the quality of both measures is perfect (equal to 1), i.e. there are no measurement errors. Unfortunately, this will never occur. What happens if the qualities of the measures are different from 1 is present for a correlation ρ(y_{1}, y_{2}) of .9 in the table below:

In the example, we assume for illustrative purposes, that the correlation is .9 between the two variables of interest, so ρ(f_{1}, f_{2}) =.9. Whenever the quality of the two variables goes down, the observed correlation, ρ(y_{1}, y_{2}), will also go down but much faster. If the quality (q^{2}) of the measures is equal to .5, the average quality in survey research (Alwin 2007), then, the quality coefficients q_{i} are .7 and the expected correlation between the observed variables will be only half of the size of the correlation between the variables of interest. If the quality coefficients go down to .6, then this correlation will be as small as a third of the true value. The correlations between the variables of interest are very much underestimated, if one does not take the measurement quality into account, i.e. does not correct for measurement errors.

But correction for measurement errors is not difficult. A simple procedure is to correct the correlation matrix or covariance matrix for measurement errors. After that, one can analyse the data as if there were no measurement errors. This approach follows directly from equation (1):

ρ(f_{1}, f_{2}) = ρ(y_{1}, y_{2})/ q_{1}q_{2} (2)

Equations 2 suggests that the correlation between the variables of interest is equal to the correlation between the observed variables divided by the product of quality coefficients of the measures used. So correction for measurement error in the observed correlation is very simple if the qualities of the observed variables are known. This result holds for single questions as well as composite scores and was already known in psychology for a long time (Lord and Novick 1968) but this approach is hardly-used.

The problem is, of course, that the quality of the questions cannot be obtained if one has only one measure for each of the variables of interest. The solution suggested in the past was to use minimal two indicators for each of the variables of interest. However, in that case one has to ask twice as many questions as there are latent variables of interest.

A simpler solution is to use the Survey Quality Predictor (SQP) to get a prediction of the quality of the measures and with these estimated one has sufficient information to estimate the correlation between the latent variables of interest using equation (2).

## Correcting for Random and Systematic Error

In the section about the quality of survey questions we have suggested that there are not only random errors but also systematic errors connected with the method used. Therefore, we have specify a bit more complex model introducing also method effect. This model is presented in Figure 2 indicating also the different parameters of the model.

As we have mentioned above, it can be shown that the correlation between the observed variables ρ(y1j,y2j) is equal to the joint effect of the variables we want to measure (f1 and f2) plus the spurious correlation due to the method factor as demonstrates equation (3):

ρ(y_{1j}, y_{2j}) = r_{1j}v_{1j}ρ(f_{1}, f_{2})v_{2j}r_{2j} + r_{1j}m_{1j}m_{2j}r_{2j} (3)

Note that r_{ij} and v_{ij} , which are always smaller than 1, will decrease the correlation (see first term) while the method effects, if they are not zero, can generate an increase in the correlation (see second term). This result suggests that it is possible that an observed correlation is lower than the correlation between the variables of interest due to low reliabilities. However, it is also possible that the correlation between the two observed variables is higher than the correlation between the latent variables of interest because of greater systematic method effects due to the method used.

From (3) follows directly that we can estimate the correlation between the latent variables if we know the quality coefficients mentioned in equation (3). This is shown in equation (4) which is derived from equation (3).

ρ(f_{1}, f_{2}) = (ρ(y_{1j}, y_{2j}) - r_{1j}m_{1j}m_{2j}r_{2j}) / r_{1j}v_{1j}v_{2j}r_{2j} (4)

We see that all coefficients at the right side of the equation can be obtained from research ρ(y_{1j}, y_{2j}) and predicted by SQP.

This result suggests that no complex research is needed with many different measures for each latent variable, SQP can predict the values of the quality coefficients, q,v and m and with this information the observed correlations can be corrected for measurement errors and we get an estimate of the correlation between the latent variables of interest. This procedure can be applied to all correlations in a correlation matrix and after that these correlations matrix corrected for measurement errors can be analysed as if there were no measurement errors to estimate a possible suggested structural equation model.

Note that this formula can be simplified by specifying that q_{ij} = r_{ij} v_{ij} and ɱ_{ij} = r_{ij} m_{ij}

For this model we get

ρ(y_{1j},y_{2j}) = q_{1j} ρ(f_{1},f_{2}) q_{2j} + ɱ_{1j} ɱ_{2j}

and follows that

ρ(f_{1},f_{2})= [ρ(y_{1j},y_{2j}) – ɱ_{1j} ɱ_{2j}]/ q_{1j} q_{2j}

This approach to estimate a causal model after correcting the correlation or covariance matrix for measurement errors has been in detail illustrated in the European Social Survey Edunet:

De Castellarnau A. and W.E.Saris (2016) A simple procedure to correct for measurement error in survey research. Edunet, ESS, chapters 4-7. http://essedunet.nsd.uib.no/cms/topics/measurement/4/