Sanjay G. Reddy*
*Department of Economics, The New School for Social Research, firstname.lastname@example.org
This special symposium titled “Discrepancies” explores a theme of great significance for those who produce and use data, but which is often overlooked. It provides a window on fundamental epistemic, methodological, and substantive questions of great relevance for those wishing to understand and to intervene in the social world. When data from different sources purporting to be about the same or a similar aspect of the world provide very different portraits this necessarily raises basic questions. Do the discrepancies that are observed arise because the objects being studied are different although not at first recognised as such? Do they arise because of distinct methods of inference employed in apprehending the “same” object (as with the blind men and the elephant) or do they arise because of different value schemes that enter implicitly into the description of the object (for instance, in determining how many hours a person must work to count as “employed”)?1
The urgency of the problem is made apparent by considering the requirements of comparison, for example over time or space, as well as of aggregation. Bringing like together with like is necessary for either of these operations. It is not very meaningful to say that Rashid is taller than Raji if the former’s measured height includes the platform he is standing on, and the latter’s does not. It is similarly misleading to report their total (or average) height if they are respectively measured in these different ways. To avoid discrepancies in reported data arising for such inappropriate reasons is a necessary condition for meaningful comparison, for instance over time or space, and aggregation, for instance over territory or populations. Reports on the situation in the country and on how it has changed over time will be undermined severely by a failure to ensure that like is being treated alike. Judgment is necessarily involved in determining what aspects of likeness or unlikeness “count” for the purpose of a specific exercise of description, but this is no embarrassment. It is self-evident, for instance, that the weights of Rashid and Raji are quite irrelevant to determining their relative heights.
Treating like objects alike is merely a necessary condition for adequacy of description. Like can be treated alike in terms of how it is described, without the uniform description being at all correct. It is desirable to describe correctly, both in order to understand the world and to provide a suitable guide to action. To seek this is to demand satisfactoriness of the external correspondence between the objects of our description of the world and the resulting descriptions themselves. While we cannot know definitively whether our descriptions are “correct,” we can at least know that we have made efforts to ensure that they are. Such efforts will, as a necessary aspect of credibility, require treating like objects in a like way, and unlike objects in an unlike way. A correct description would therefore suffice for describing like objects alike, although treating like objects alike is not in itself sufficient for correct description. Finding that seemingly like objects have not been treated alike is certainly a reason for perturbation for the analyst and the practitioner alike, but it is hardly uncommon.
What should one do when faced with discrepant data? Since reality is unknown except through our efforts to know it through data of diverse and possibly discrepant sorts, it is necessary for us to “triangulate” between different sources of data and understandings of reality that we have reason to have, possibly due to diverse and multiple influences upon our perception and judgement concerning that reality that we have experienced over time. We may resolve some discrepancies through a fuller evaluation of the methodological and evaluative choices made by each source, leading us to choose one of the sources or to recommend adjustments to one or both. Such an evaluation can be based on “internal” reasoning alone or on “external” considerations, involving other sources of knowledge about the same reality or our “prior” (in the Bayesian sense) judgments concerning the reality, which will have been shaped by diverse observations and experiences. The finding that there are discrepancies can provide diagnostic focus and motivational impetus to improve methods and enhance knowledge. At a minimum, it is an indication that there is reason to avoid false certainties and to avoid taking descriptions of social and economic reality as settled. We need not take the view that reality itself is “constructed” to recognise that our perceptions of reality are the products of our inferential and evaluative choices, and that this can, especially where those choices are unexamined, create troubles.
In recent years there have been a number of controversies centred on statistical discrepancies in the field of development:
This list can be very easily extended. One may even suggest that the presence of poor measurement (and discrepant measurement between different sources) is the rule. Nevertheless, the use of the lens of discrepancies as a diagnostic tool can be very helpful in drawing attention to possible specific reasons for mismeasurement.
Let us now turn to the contributions to this symposium.
In his paper on estimating the extent and distribution of agricultural land in India, Deepak Kumar shows that different sources give rise to quite widely varying estimates. This is partially attributable to differences in definitions, but it appears also to be due to there being distinct sources of underlying data and different methods of estimation. Kumar uses the data available to him from a comprehensive source (the PARI archive) to demonstrate that the NSS method of sampling of households in order to ask them more detailed questions on their land holdings tends to lead to a lower estimate of total land area (and of the share of land held by richer households) than is in fact the case, and that this accounts for some of the observed discrepancy. The reason is that the chance of richer households (who form a small share of the whole but possess a large share of total land) being picked in any given small sample is low. Although the sample mean will be less than the true mean in a great proportion of the cases, when averaged over a large number of such individually unrepresentative small samples it will still be close to the true mean (because when calculating the mean the disproportionately greater land size in those cases where richer households are in fact picked will compensate for the cases in which richer households are not picked). Could a mechanism of this type be more broadly present as a source of under-estimation of aggregate holdings and of inequalities by survey data of various kinds? Kumar’s paper shows that the analytical lens of “discrepancies” can be used in a fruitful way to identify some likely sources of mismeasurement, i.e., not merely of differences between the reports that different sources present of the reality they purport to measure but of differences between the characterisations presented by each source and the reality they aim to represent.
In their paper on agricultural wage data in India, Kurosaki and Usami show that different reporting centres and districts report wages for work that may differ according to the nature of the work done, seasonality and other factors. Moreover, observations from certain reporting centres and of certain kinds are more likely to be missing. As a result, comparing averages without taking note of these internal differences can be deeply misleading as to both the relative levels of wages in different places and their trend over time. The data from different centres not only can mean very different things but also has very different levels of reliability. The authors attempt to “compare like with like” where it was not previously done by using a panel-data framework incorporating dummy variables for the different factors that may affect reported wages. They conclude that such adjustment is essential to developing a more adequate picture of agricultural wage levels in India.
The paper by Jayan Jose Thomas and M. P. Jayesh shows that estimates of the number and proportion of persons in the labour force engaged in agriculture and in other occupations (such as construction) vary depending on whether the National Sample Survey or the Census of India is used as a source, with the NSS showing a decrease in agricultural labour force overall and the Census showing an increase in marginal agricultural workers in particular, suggesting very different characterisations of whether a “Lewisian” process of absorption of labour from agriculture is or is not occurring.
Finally, in his paper, Morten Jerven brings in a global perspective, and shows that different sources of national income estimates can give very different results. He demonstrates, using data from Sub-Saharan Africa in particular, that considerable obscurity accompanies the origins of much of the data actually reported, but that digging into the methods used to estimate missing values, choose base years, or reconcile sources reveals that these matter a great deal to what is eventually presented and accepted as authoritative by international institutions, both in relation to levels of income and growth rates. In some instances, there seems to be circularity, with institutions reporting each other as the source of data. He also suggests that there may be institutional and political reasons for the observed distortions to run in specific directions. Even historical data is subject to considerable uncertainty as they can periodically be revised in accordance with present-day imperatives, although such revisions may not be announced. A prescription follows of higher levels of transparency in the generation and reporting of data, in order that users can be at least aware of the uncertainties that are present and of their possible sources. The lens of discrepancy provides an invitation to a deeper examination and to the more careful collection, collation, and use of national income statistics in the future.
Taken together, the essays in this symposium underline the need to beware (caveat utilitor) when using data, even of a seemingly elementary descriptive kind. The identification of discrepancies is the sort of discomfiting finding that, nevertheless, opens the way to greater awareness of and justification of choices and implicit values, leading, one can hope, to better understanding of what data does and doesn’t in fact tell us, and, in the longer term, to improvements in methods.
Keywords: Data, statistics, mismeasurement, missing data, statistical inference, political economy of statistics, National Accounts, economic growth, poverty, inequality, land.
1 On this issue, see Putnam (2002).
|Putnam, Hilary (2002), The Collapse of the Fact-Value Dichotomy and Other Essays, Harvard University Press, Cambridge.|