Thinking geographically about segregation: fitting a multilevel index of segregation to census data in R

The measurement of segregation has been debated in the social sciences for well over half a century. Concerns about segregation, and the potential for it to harm society, are prevalent within recent Government reports and proposals, occasionally generating lurid (and often mis-leading) headlines in the media. Understandably, policy makers and other interested parties would like to know how much segregation there is and whether it is increasing. However, there desire can be frustrated. The quest – sometimes taken by social scientists – for one perfect measure that could definitively answer those questions continues to allude researchers and always will. Whilst much academic ink has been spent, for example, discussing the mathematical properties of different indices, there is more to the measurement that maths: different approaches reflect different conceptions of what segregation means as a process or as an outcome. Moreover, as new data, new computational tools and new thinking emerge, it is right to re-evaluate what is being measured, how, and why.
A forthcoming issue of the journal Environment and Planning B: Urban Analytics and City Science to be published later this year (volume 45, issue 6) notes that after decades in which studies of residential segregation have been dominated by the use of descriptive indices – such as those of dissimilarity and isolation – there has been a recent surge of interest in developing those measures to provide greater insights into the observed patterns plus the processes that produce them. Amongst that interest is one in multi-scale and multi-level measures that were showcased at a special session at the recent ESRC Research Methods festival (see https://www.ncrm.ac.uk/RMF2018/programme/session.php?id=E4). These enable the measurement of segregation simultaneously at multiple scales of analysis, aiming to generate insight into the different processes that create patterns of segregation at micro-, meso- and macro-scales.
One such method is the Multilevel Index of Dissimilarity (MLID), which builds on the commonly used Index of Dissimilarity but can capture both the numeric scale of segregation (the amount) and also the spatial scale (the pattern), which the standard index cannot. The MLID aims to be a simple as possible, to be user-friendly and to run in open source software as a package in R, allowing for the importance of reproducibility. Because it is simple it also is fast. Whereas other approaches may take hours or days to run and/or are limited to small data sets and study regions, the MLID operates in the order of minutes on small area census data for the whole of England and Wales. Even if it used as a pre-cursor to more advanced approaches, it offers more ‘interactivity’ with the data at the early stages of analysis, helping to judge whether more complex approaches are warranted. A description of the MLID, its theoretical derivation, its interpretation and its implementation in R are available in the open access article at http://journals.sagepub.com/doi/abs/10.1177/2399808317748328.
To consider its usefulness, consider the following example. Standard indices of segregation treat the four patterns shown in Figure 1 as the same – more accurately, they don’t consider them at all: numerically the amount of segregation is the same in each case. But clearly the patterns and therefore the nature of the segregation is not the same in all four occurrences. The MLID captures the differences – it is a measure of clustering as well as unevenness across the study region.

Figure 1. Standard indices of segregation cannot differentiate between these patterns but the multilevel index of dissimilarity (MLID) can.

Because it is able to differentiate between the numeric and geographic scales of segregation, the MLID can be used to quantify a measure of spatial diffusion, as in the example below. In fact, this is the kind of process that is occurring within the UK where the segregation of ‘minority’ groups is decreasing as they spread out from the areas in which previously they were concentrated into more mixed neighbourhoods.

Figure 2. The MLID can be used to measure a process of spatial diffusion.

To introduce potential users to the MLID, a tutorial is available at https://cran.r-project.org/web/packages/MLID/vignettes/MLID.html which also provides an overview of the MLID package in R. It is a case study using census data to consider patterns of ethnic segregation at a range of scales – for example, the differences between London and the rest of the country but also the internal heterogeneity of London. Further information (the slides from the Research Methods Festival) is available at https://www.dropbox.com/s/eizgtj41rcjg8ny/MLID.pptx?dl=0.
Better measurement, better models and better data are not a complete panacea for understanding how and why segregation is created nor its consequences. The challenge of understanding how patterns relate to processes and to outcomes remains but the hope is that a (relatively) easy to use, multiscale measure of segregation will help to enhance our understanding of segregation as a geographical outcome and as a geographical contributor to geographical processes that are therefore better measured geographically.

Richard Harris is Professor of Quantitative Social Geography at the School of Geographical Sciences at the University of Bristol and co-author (with Ron Johnston) of the forthcoming book, Ethnic Segregation between Schools: is it increasing or decreasing in the England? (Bristol Press, 2019). The research was funded under the ESRC’s Urban Big Data Centre (http://ubdc.ac.uk/), ES/L011921/1.

Continue reading →