The code underlying the Social Metric Commission’s new poverty measure

Emily Harris and Matthew Oakley from the Social Metrics Commission’s Secretariat introduce the release of the code underlying the Commission’s new poverty measure.

Who is the Social Metrics Commission?

The Social Metrics Commission is an independent commission founded in 2016 to develop a new approach to measuring poverty in the UK. Led by the CEO of the Legatum Institute, Baroness Stroud, the Commission’s membership draws on a group of top UK poverty thinkers from different political and professional backgrounds. Currently there is no agreed UK Government measure of poverty and the Commission’s goal is to provide a new consensus around poverty measurement that enables action and informs policy making to improve the lives of people in poverty.

Our landmark report in 2018 proposed a new measure of poverty for the UK and, since then, we have received support from across the political spectrum. The Government has now committed to developing experimental national statistics based on the Commission’s approach.

Aside from analysts in government, we want our measure to be used far and wide, from researchers to charities, media, and policymakers. That is why we have made the code that underpins the creation of the poverty metric freely available to download from our website.

What has the Commission produced?

The Commission’s measure of poverty goes beyond conventional metrics that look only at incomes by also accounting for the positive impact of people’s liquid assets (such as savings, stocks, and shares) on alleviating immediate poverty and the range of inescapable costs that reduce people’s spending power. These inescapable costs include rent or mortgage payments, childcare, and the extra costs of disability.

As well as reporting on the number of families in poverty, our approach also seeks to understand more about the nature of that poverty and families’ experiences. It provides measures of the depth and persistence of poverty, as well as including a range of Lived Experience Indicators that capture issues such as mental and physical health, employment, isolation, and community engagement.

By taking all of these things into account, we believe the new metric reflects more accurately the realities and experiences of those living in poverty than previous measures. If taken up as an agreed measure, it would allow the Government to take meaningful steps to reduce poverty and improve outcomes for those who do experience it and to track the success of these policies.

How can you use the Commission’s work?

The code underlying the Commission’s measure of poverty, along with detailed user guides, can be easily downloaded from our website. It is written in a series of STATA do files and draws on data from the Family Resources Survey (FRS)Households Below Average Income (HBAI), and Understanding Society.

Once the folder for the code is downloaded, users will see that it is arranged in two separate collections of do files according to the data source being used. The first set of code draws on Family Resources Survey (FRS) and related Households Below Average Income (HBAI) data to operationalise the Commission’s core measure of poverty, measures of depth, and selected Lived Experience Indicators. The second set of code draws on Understanding Society data to operationalise the Commission’s measure of persistent poverty and the remaining Lived Experience Indicators.

Within each folder there are a series of individual do files that are coordinated by one master Command File. When you run the Command file it will execute each individual do file, successively building the measure and then producing final results. These results are stored in an Excel spreadsheet that is automatically produced when running the code.

There are also a number of modification options that users can explore in the Command file. You can specify, for example, the name of the data cut you are running so that files are named accordingly, and you can set which years of analysis or which country within the UK the measurement should apply.

Our hope is that analysts and researchers will download our code and use it to replicate our analysis, but will also extend it to further analyse UK poverty based on the Commission’s approach. We look forward to seeing what additional insights others can discover by using our code and building on our analysis.


Matthew Oakley is the Head of Secretariat for the Social Metrics Commission and Director of WPI Economics, an economics and public policy consultancy. He is a respected economist and expert on welfare reform and the future of the welfare state.

Before founding WPI Economics Matthew had been Chief Economist and Head of Financial Services Policy at the consumer champion Which?, Head of Economics and Social Policy at the think tank Policy Exchange and an Economic Advisor at the Treasury. He has an MSc in Economics from University College London, where he specialised in microeconomics, labour markets, public policy and econometrics.

 

Emily Harris is a Senior Analyst for the Social Metrics Commission and is based at the Legatum Institute. She most recently worked in the Social Policy Section at the Commonwealth Secretariat.

Originally from South Africa, Emily was at the University of Cape Town’s Poverty and Inequality Initiative, where she managed a project developing indicators for youth well-being at the small area level. In this post she played a leading role in constructing an index for multidimensional youth poverty and setting up the Youth Explorer data portal. She has also worked as a data analyst on two cash transfer studies in India and consulted as a data manager at a private research company. Emily graduated cum laude with a Masters in Development Studies from the University of KwaZulu-Natal.

Rabia and Carstairs

Rabia Butt, one of our summer Q-Step interns, explains the process she went through to calculate and map the Carstairs deprivation index using UK Census data.

After downloading the data needed for the research to measure deprivation, some issues were recognised. Therefore, we needed to make some changes.

When trying to match variables with those that were used by published research papers, it was quite a challenge. This is because even though the research papers used Carstairs methods where by definition the census variables needed to refer to households. However, the social class had to be changed from head of household in a lower social class to per person because the data from Scotland did not include household references for social class.

To make comparisons across the UK, the per person variable was employed and to replicate our calculations with the research papers as well. There was no other research paper that calculated the whole of the UK. Most research papers used Scotland census only and some were of England and Wales together but not with the other countries in the UK.

For our investigation, the calculations for the Carstairs scores were performed in R, which required the import of a dataset with the relevant variables for the calculation.  The reason for choosing R for the analysis is the large quantity of data we were handling and the numbers of calculation we needed to make. R has the capabilities of being able to produce results for large quantity of data, which makes R irreplaceable for this sort of research.

The variables required for the Carstairs score had different names in the dataset for each census years.  For example, in the overcrowded variable, total households in 2011 was named ‘all household’ whereas, in 2001 it was given a codename ‘cs0520001’. As a result, we decided to change the names so that it was consistence throughout our calculations.

The geographical names of the variables also had different headings and codes. For example, what 2011 called the ‘GEO_CODE’, 2011 called the ‘Zone. Code’. For these reasons, the script was altered to adapt to these changes. Although these are minor adjustments, they were compulsory steps which needed to be made for the R scripts to function without any difficulties.

I have learnt to use R to calculate the proportions, mean, standard deviation and Zscores for my project and perform many other functions as well.

The calculations for the project were completed and the next step was trying to match the Zscores with the published results to ensure that the variables used in calculating scores were correct and to support out research project as well.  We discovered that the variables and geographical areas of the different census that we used were correct but Some research papers’ Zscores were different from ours as they did not include information on how they weighted the population count. After reading many research papers and trying to match their results with ours, we finally found one paper whose scores match our very closely.

The Carstairs Z-scores were spilt into quintiles, ranging from

  • 1 least deprived to
  • 5 most deprived

A quintile is a statistical value of a data set that signifies 20% of a given population, so the first quintile symbolises the lowest fifth of the data and the second quintile represents the second fifth and so on. It should be acknowledged that the quintiles in this project are based on area meaning that 20% of all areas fall into each quintile. We calculated the quintiles in R, and the formula divided the fifth quintile’s sum by the data-set sum. The reason for creating quintiles was for map visualisation, which allowed us to make observation on which areas are most and least deprived across the selected geographical province in the UK. Therefore, quintiles were very convenient for plotting map visualisations as they offer a geographical perspective of the spread of deprivation across the UK.

QGIS is software that allows users to examine and edit spatial information, as well as composing and exporting graphical maps. For our research, this software was used to generate 3D maps of the Carstairs scores for data visualisation.

At first, I practised on QGIS using the census data that interested me, so I chose to use the Pakistan census data of 2017 and 1998 to look at the difference in the gender of Pakistan in 3D maps. From the UK census of 2011 I explored how language proficiency in English could have an influence on general health. The 3D maps with my data were able to show me which places had the highest peaks and where it was the lowest.

For the main project, we created 3D maps of the whole of the UK at ward, Ouput Area and local authority level from 1991 to 2011. 3D maps of Great Britain (i.e. England, Wales and Scotland) were also created at District level.

To produce Carstairs scores from the first census of 1971 to the most recent one of 2011, Northern Ireland could not be included in the calculation, as there is no census data available for Northern Ireland in the years for 1981 and 1971. As a result, Northern Ireland was excluded in this analysis.

For detailed examination of deprivation, 3D maps of the capital cities of the UK and Greater Manchester of output areas were created as well. The boundary data for the maps were downloaded from Casweb and borders from the UK Data Service. The boundary data was simplified on mapshaper and the quintiles were dissolved into 5 layers, so that it was easier for people to understand and produce the 3D maps.

The main findings of this research are portrayed in Tables 1 and 2. Table 1 shows the most deprived areas in the UK, while Table 2 shows the least deprived areas based on the area’s total score. The results are listed for all the years and output level that the data was available for. The least deprived areas, as can be seen in Table 2, are mostly located around London, in the South of England. This finding has been consistent since 1981 till 2011, however, in 1971 Bearsden was the least deprived which is in Scotland.

To understand the overall change in the level of deprivation across Great Britain, maps for 1981 1991, 2001 and 2011 were created using quintiles. The lighter colours represent less deprived areas and vice versa. I

Based on these scores, deprivation in Great Britain has decreased greatly between 1981 and 2011. The largest change has occurred in Scotland, compared with the rest of Great Britain. However, when comparing Scotland with England and Wales, it is still more deprived.

There has been a positive change for England as well, however, it has not been into such extent as for the other countries. The north of England and Cornwall have improved their deprivation scores the most.

Nevertheless, cities in the North, the areas in Birmingham and in London continue to score highly in their deprivation scores.  There is a trend emerging throughout the four censuses of GB, which is that generally the south has always been less deprived when compared to the North.

Great Britain 1981

Great Britain 1991

Great Britain 2001

Great Britain 2011

 

Below are maps of Manchester and Greater Manchester showing deprivation at Output Area level.

Darker colours represent more deprived areas (higher quintile). Most of Greater Manchester, and especially Manchester is greatly deprived with some exceptions. There is a pattern as well, which illustrates that the inner part of all towns in Greater Manchester are deprived and the outskirts are lighter which means less deprived.

Manchester by Output Area 2011

Greater Manchester by OA 2011

 

It needs to be taken into consideration that one of the indicators of deprivation for the Carstairs score is ‘Car ownership’. Considering that owning a car in the city centre might not be convenient, it is to be expected that city centre areas may score higher on this variable resulting in the overall score being higher.

My 3D maps above also correspond with the results of the Index of Multiple Deprivation scores, which have stated that “Manchester is one of the local authority districts which has the largest proportions of highly deprived neighbourhoods in England” (The English Indices of Deprivation 2015, Baljit Gill).

The next stage is using the 3D maps produced to present them in VR. Virtual Reality is defined as “the use of computer technology to create a simulated environment”. Although VR is still in its developing stages, but it is being used across various platforms and presenting data is one of them.

Klara and Carstairs

It has been seven weeks since I started working at the UK Data Service/Jisc.

Although it seems like yesterday since I first entered the office, I have learned so many new things and skills over the past weeks, and I have finally applied what I have been learning at the University for two years.

My time here at Jisc and the UK Data Service has been one of the most valuable experiences in my life and has convinced me that work with data and data analysis is the right path for me once I graduate.

My fellow intern Rabia and I at the start of our internship

For the first few weeks, I was getting familiar with the Carstairs index (more about Carstairs can be read in my previous blog post), and was accessing all the data needed for the calculations. I encountered several problems during this process, which in turn helped me to gain even more experience and new skills.

One of the issues I had was that I needed data across five different Census years and for the whole of UK, which were not all available.

For example, one of the indicators for Carstairs score is ‘low social class of household reference person’. This information was not available for Scotland in the 2001 Census, and so the definition of the indicator had to be changed to ‘low social class of all persons’ in order to proceed with the research. This subsequently resulted in us needing to redownload all the data for the other years, and very “messy” folders with lots and lots of data we did not need anymore.

At that point I realised it was essential for me to organise all my folders and documents, and to name them with sensible names so I was sure I knew what each document contained and where to find it. Even though I have always considered myself as an organised person, working with such large amounts of data showed me I had to be even more organised to be able to progress to the analysis per se.

Another issue I faced was that different research papers used different Census variables to calculate the Carstairs score.

This was caused by the papers focusing either on one country rather than the whole of the UK or a specific Census year. Therefore, they all used the variables that were available for their particular projects. This, however, meant that Rabia (my fellow intern) and I had to adjust the definitions of the variables, so they fitted our own needs. We were lucky enough to find a few researchers who used indicators, which were available for all Census years we were analysing, and for all the countries in the UK, and so we were able to support our decisions on the definition changes.

I managed to solve all those problems and moved onto calculating the scores and analysing the data. This was done in R, which saved me a lot of time and enabled easy replicability. This was extremely useful as I had to recalculate my scores multiple times due to the problems outlined above. I have always been keen on working with R as I find it very intuitive and yet challenging, which I enjoy. I feel I have become proficient in R and I am very confident using it in a work environment.

I have always enjoyed working with data, but regardless, the analysis gets a bit more exciting when you can finally see a story the data tells.

To do this, I uploaded all my results into QGIS, an open source Geographic Information System, which enables creations of maps. All the data I have worked with concerns the geographical areas in the UK, hence mapping deprivation as well as providing tables with specific scores has felt to be the right (and very visually appealing) choice.

I have learned how to use a lot of functions QGIS offers, including creating 3D maps and analysing geospatial data.

Working with the software has woken up a very creative side of me I never knew I even had. I discovered that data analysis can be very artistic and original, especially when it comes to presenting the findings.

Deprivation in the UK from 2011 to 1991, by local authority

One of the things I have found out by just looking at my maps was that deprivation decreased massively between 1991 and 2011 according to the Carstairs index.

The darker the area, the more deprived the local authority is and vice versa.

When looking at the raw numbers, one would have no idea about this unless doing some further analysis, and so plotting numbers into a map is a great way of finding out whether there is something going on and subsequently go on about the specific analysis.

As I saw that the older maps are much darker than the ones from more recent years, I decided to explore whether this trend is significant.

I created confidence intervals and boxplots, presented below, to support my initial hypothesis about deprivation getting lower. The confidence intervals do not overlap and so I could be confident that deprivation decreased significantly between 1991 and 2011 in the UK.

Boxplots and 95% confidence intervals for deprivation levels in the UK from 2011 to 1991

Currently, I am working closely with Matt Ramirez, Futures senior innovation developer here at Jisc, who is turning my 3D maps into a virtual reality environment.

I have been given the opportunity to think about interesting and unique ways of using VR to display my results. I was particularly excited about of my ideas consisting of a lift, which would take the users up to different Census years. They could then get out of the lift and move around the map, which would result in a great interaction between the data and the users, and thus enhanced learning.

Unfortunately, given the short time frame this was not possible to complete, and so we had to simplify the visualisations and not use the lift. Regardless of that, the idea may be implemented in the future as it requires greater amount of time than we have at the moment but would be worth trying out.

Calculating Townsend Scores: Resources to learn R

Sanah Yousaf, one of our interns talks about how she approached the task of learning R.

The current project I am working on as an intern at UK Data Service Census Support is creating Townsend deprivation scores with UK 2011 Census data. To allow UK Data Service Census Support users to produce their own Townsend deprivation scores, I used R to create an R script that produces the Townsend deprivation scores.

Initial experience with R

My first experience with R came about during my degree in a module concerning data analysis. R is a language and environment for statistical computing and graphics. The software is free and allows for data manipulation, calculation and graphical display.

There are many packages available to use in R and I only had experience with using “R Commander” and “Deducer”. Both R Commander and Deducer are data analysis graphical user interfaces. Knowledge of coding in R is not particularly necessary when using R Commander and Deducer as menu options are available for easy navigation to get to what you need, whether that’s obtaining summary statistics of your data or creating graphs.

Perhaps it is fair to say that the R skills set I gained from my module as part of my degree was limited for the task at hand. Having said that, I was both eager and intrigued to learn more about R’s capabilities but I would have to do this quickly to enable me to produce an R script to create Townsend scores.

Useful resources to learn R

After a few Google searches, I found many resources that taught the basics of R. Some were great, others not so much. I particularly found R tutorials on YouTube useful as opposed to some websites that provided code for certain functions in R but lacked explanations. In addition, I often found myself on Stack Overflow which is an online community for developers to learn and share their knowledge of different programmes.

R Tutorials on YouTube

If you are new to R, I would recommend “MarinStatsLectures” channel on YouTube.  The channel has tutorials ranging from how to import data into R that is of different formats to working with data in R. There are over 50 tutorials on the channel that are no longer than 10 minutes in length.  The tutorials provided me with knowledge of different R commands and explained basic R concepts well.

R packages

The R package “Swirl” allows R users to interactively learn through the R console. This was useful because I could learn different R commands whilst practicing within R.

Google search

A simple Google search of “how to… in R?” will usually provide you with the answer you are looking for! You will most probably bump into other R users who have asked the same question on Stack Overflow.

Ask R for help in the R Console

The help() or ?() command typed into the R Console will bring up R Documentation in the help window in R Studio. For example, typing in ?matrix in the R Console should load up the R documentation below.

References

More about R: https://www.r-project.org/about.html

Downloading R: https://cran.r-project.org/bin/windows/base/

Downloading R Studio: https://www.rstudio.com/products/rstudio/download/

MarinStatsLectures Channel on YouTube: https://www.youtube.com/user/marinstatlectures

More about Swirl in R: http://swirlstats.com/

Stack Overflow: https://stackoverflow.com/questions/tagged/r

 

 

 

 

 

 

 

Calculating Townsend scores: Replicating published results

Amy Bonsall, one of our interns talks about how she approached the task of working out how to calculate Townsend scores and then of finding others work to compare against as a way to quality assure the methodology.

As part of the internship project to calculate deprivation scores after finding sources that provide an outline of how to calculate Townsend Deprivation Scores it was important to ensure the methodology would produce scores that matched those already published.

We wanted to calculate scores and compare them to those that had already been calculated by using the same dataset to be sure we were using the same methodology. Whilst I was focused on this, Sanah Yousaf, my partner in this internship, was creating an R script to calculate the scores. Whilst this was being developed I used Excel to calculate the scores. This was not only because we did not yet have an R script but also because I was already comfortable with Excel and it made it easy to visualise the results of each step in the calculation.

Replicating scores proved more difficult than anticipated. Not only were there limited resources of published scores but we also found that many of the people who had already calculated scores had access to unadjusted census data meaning we had different outcomes. The main problem here was, there was no way of knowing if the different data was the only reason for contrasting scores or if it could have also been down to a different formula.

I went through what felt like an endless number of attempts to replicate another’s scores. Each time I would attempt to follow the often-limited detail of the methodology. Each time I failed I’d attempt a slight variation in the calculation to see if this would work with no success. Eventually, I found a source of results calculated for 1991 by Paul Norman. Included with the results was the data used to calculate the scores as well as the Z scores for each of the indicators. The materials provided with these scores were very useful as I could ensure the scores were the same based on the exact same data. It also meant that I could check if the z scores were right before ensuring that the Townsend Deprivation Scores were correct. Success was found with this dataset and meant I could go onto calculating deprivation scores for 2011 knowing that the calculation would be correct.

The next step meant creating scores based on datasets at varied output areas, which was much easier than the previous task. After my partner in the internship, Sanah had created an R script allowing us to calculate the scores, getting results didn’t take long. From here it will be interesting to see any other obstacles that we may come across including mapping the results and comparing them to past censuses. Considering the process so far however, I look forward to confronting them face on.

 

Calculating Townsend scores: An introduction

Amy Bonsall one of our interns talks about what deprivation is and how it could be calculated.

As a student at the University of Manchester studying criminology I was lucky enough to get the opportunity to work on a project with the UK Data Service as an intern calculating Townsend Deprivation Scores for the UK and importantly, learning work environment skills that will be useful once I graduate. My fellow intern (Sanah) and I came with a thirst to learn and an ambition to make the project a success which has made the exciting aspects more rewarding and the obstacles we need to come much more bearable.

Deprivation is a lack of reasonable provisions. This could be in a social way or material. Because there are so many indicators of deprivation and it cannot be measured in one objective way as it is a construct so many deprivation indices have been developed. Each of these indices have their benefits for measuring deprivation as well as areas where they are lacking.

Different methods of calculating have been developed due to a long term need to research deprivation through census data and the ever-changing indications of deprivation. I am currently using the 2011 census to calculate deprivation scores for the UK using the Townsend index. This is just one of many ways deprivation can be calculated however, we have decided this one is appropriate as it measures material deprivation exclusively rather than incorporating social deprivation meaning it can be consistently calculated over time. It is also comparable across the UK.

Before jumping into the data and calculating the deprivation scores it was important to first understand what Townsend’s Index measures and how to measure it. Information on the index was readily available and easy to find giving the initial feeling that the resources required at each stage of the project would be easily found (they weren’t).

Research taught us that Townsend Deprivation scores are calculated based on 4 indicators of deprivation: non-home ownership, non-car ownership, unemployment and overcrowding.

This is calculated by first finding percentage non-car ownership, percentage non- home ownership, percentage unemployment and percentage overcrowding.
The percentages for each area then need to be normalised for the unemployment and overcrowding indicators as these results are very skewed this is done by: ln(percentage value +1).

Z scores are then calculated using the percentage values for each ward under each indicator. For the unemployment and overcrowding variables, the logged versions are used instead.
Z scores= (percentage – mean of all percentages)/ SD of all percentages
Z scores of logged variables= (log percentage – mean of log percentages)/ SD of log percentages

Total of the 4 Z scores= Townsend Deprivation Score

 

Through the sources found it wasn’t perfectly clear how to calculate Z scores from the logged variables. There was no clarification about whether to take the mean and standard deviation of the percentages after they are logged or before. Taking information from different sources gave a good idea of the correct formula, however, the important next step is to test this formula against existing scores to ensure it is correct before continuing the process of this project.

Creating Consistent Deprivation Measures Across the UK

We’ve been lucky enough to have two interns come and work with us over the summer. They have been working on creating a set of Townsend Deprivation scores, using the UK 2011 Census data we have available via InFuse.

The interns came to us through the University of Manchester Q-Step Centre, which coordinates with different types of workplaces to offer 2nd year students the chance to practice the data skills taught through their degree courses at the university.

Sanah Yousaf is studying Law with Criminology.

I am currently a student at the University of Manchester studying Law with Criminology. As part of my degree, I chose a module called Data Analysis for Criminologists which exposed me to the world of data. I enjoyed the course so much that I decided to apply to work as an intern at UK Data Service via the Q-Step internship programme offered at the University of Manchester. As a result, I am now an intern at UK Data Service, specifically in the Census Support team based in Manchester. The project I am working on with my fellow intern (Amy) is calculating Townsend deprivation scores for the UK 2011 Census data.

 

Amy Bonsall is studying Criminology.

As a student at the University of Manchester studying criminology I was lucky enough to get the opportunity to work on a project with the UK Data Service as an intern calculating Townsend Deprivation Scores for the UK and importantly, learning work environment skills that will be useful once I graduate. My fellow intern (Sanah) and I came with a thirst to learn and an ambition to make the project a success which has made the exciting aspects more rewarding and the obstacles we need to come much more bearable.

Amy and Sanah have agreed to write blogs about the project, which we’ll publish over the coming weeks, together with the resources that Amy and Sanah created, to include the raw data and the scores.