Rabia and Carstairs

Rabia Butt, one of our summer Q-Step interns, explains the process she went through to calculate and map the Carstairs deprivation index using UK Census data.

After downloading the data needed for the research to measure deprivation, some issues were recognised. Therefore, we needed to make some changes.

When trying to match variables with those that were used by published research papers, it was quite a challenge. This is because even though the research papers used Carstairs methods where by definition the census variables needed to refer to households. However, the social class had to be changed from head of household in a lower social class to per person because the data from Scotland did not include household references for social class.

To make comparisons across the UK, the per person variable was employed and to replicate our calculations with the research papers as well. There was no other research paper that calculated the whole of the UK. Most research papers used Scotland census only and some were of England and Wales together but not with the other countries in the UK.

For our investigation, the calculations for the Carstairs scores were performed in R, which required the import of a dataset with the relevant variables for the calculation.  The reason for choosing R for the analysis is the large quantity of data we were handling and the numbers of calculation we needed to make. R has the capabilities of being able to produce results for large quantity of data, which makes R irreplaceable for this sort of research.

The variables required for the Carstairs score had different names in the dataset for each census years.  For example, in the overcrowded variable, total households in 2011 was named ‘all household’ whereas, in 2001 it was given a codename ‘cs0520001’. As a result, we decided to change the names so that it was consistence throughout our calculations.

The geographical names of the variables also had different headings and codes. For example, what 2011 called the ‘GEO_CODE’, 2011 called the ‘Zone. Code’. For these reasons, the script was altered to adapt to these changes. Although these are minor adjustments, they were compulsory steps which needed to be made for the R scripts to function without any difficulties.

I have learnt to use R to calculate the proportions, mean, standard deviation and Zscores for my project and perform many other functions as well.

The calculations for the project were completed and the next step was trying to match the Zscores with the published results to ensure that the variables used in calculating scores were correct and to support out research project as well.  We discovered that the variables and geographical areas of the different census that we used were correct but Some research papers’ Zscores were different from ours as they did not include information on how they weighted the population count. After reading many research papers and trying to match their results with ours, we finally found one paper whose scores match our very closely.

The Carstairs Z-scores were spilt into quintiles, ranging from

  • 1 least deprived to
  • 5 most deprived

A quintile is a statistical value of a data set that signifies 20% of a given population, so the first quintile symbolises the lowest fifth of the data and the second quintile represents the second fifth and so on. It should be acknowledged that the quintiles in this project are based on area meaning that 20% of all areas fall into each quintile. We calculated the quintiles in R, and the formula divided the fifth quintile’s sum by the data-set sum. The reason for creating quintiles was for map visualisation, which allowed us to make observation on which areas are most and least deprived across the selected geographical province in the UK. Therefore, quintiles were very convenient for plotting map visualisations as they offer a geographical perspective of the spread of deprivation across the UK.

QGIS is software that allows users to examine and edit spatial information, as well as composing and exporting graphical maps. For our research, this software was used to generate 3D maps of the Carstairs scores for data visualisation.

At first, I practised on QGIS using the census data that interested me, so I chose to use the Pakistan census data of 2017 and 1998 to look at the difference in the gender of Pakistan in 3D maps. From the UK census of 2011 I explored how language proficiency in English could have an influence on general health. The 3D maps with my data were able to show me which places had the highest peaks and where it was the lowest.

For the main project, we created 3D maps of the whole of the UK at ward, Ouput Area and local authority level from 1991 to 2011. 3D maps of Great Britain (i.e. England, Wales and Scotland) were also created at District level.

To produce Carstairs scores from the first census of 1971 to the most recent one of 2011, Northern Ireland could not be included in the calculation, as there is no census data available for Northern Ireland in the years for 1981 and 1971. As a result, Northern Ireland was excluded in this analysis.

For detailed examination of deprivation, 3D maps of the capital cities of the UK and Greater Manchester of output areas were created as well. The boundary data for the maps were downloaded from Casweb and borders from the UK Data Service. The boundary data was simplified on mapshaper and the quintiles were dissolved into 5 layers, so that it was easier for people to understand and produce the 3D maps.

The main findings of this research are portrayed in Tables 1 and 2. Table 1 shows the most deprived areas in the UK, while Table 2 shows the least deprived areas based on the area’s total score. The results are listed for all the years and output level that the data was available for. The least deprived areas, as can be seen in Table 2, are mostly located around London, in the South of England. This finding has been consistent since 1981 till 2011, however, in 1971 Bearsden was the least deprived which is in Scotland.

To understand the overall change in the level of deprivation across Great Britain, maps for 1981 1991, 2001 and 2011 were created using quintiles. The lighter colours represent less deprived areas and vice versa. I

Based on these scores, deprivation in Great Britain has decreased greatly between 1981 and 2011. The largest change has occurred in Scotland, compared with the rest of Great Britain. However, when comparing Scotland with England and Wales, it is still more deprived.

There has been a positive change for England as well, however, it has not been into such extent as for the other countries. The north of England and Cornwall have improved their deprivation scores the most.

Nevertheless, cities in the North, the areas in Birmingham and in London continue to score highly in their deprivation scores.  There is a trend emerging throughout the four censuses of GB, which is that generally the south has always been less deprived when compared to the North.

Great Britain 1981

Great Britain 1991

Great Britain 2001

Great Britain 2011

 

Below are maps of Manchester and Greater Manchester showing deprivation at Output Area level.

Darker colours represent more deprived areas (higher quintile). Most of Greater Manchester, and especially Manchester is greatly deprived with some exceptions. There is a pattern as well, which illustrates that the inner part of all towns in Greater Manchester are deprived and the outskirts are lighter which means less deprived.

Manchester by Output Area 2011

Greater Manchester by OA 2011

 

It needs to be taken into consideration that one of the indicators of deprivation for the Carstairs score is ‘Car ownership’. Considering that owning a car in the city centre might not be convenient, it is to be expected that city centre areas may score higher on this variable resulting in the overall score being higher.

My 3D maps above also correspond with the results of the Index of Multiple Deprivation scores, which have stated that “Manchester is one of the local authority districts which has the largest proportions of highly deprived neighbourhoods in England” (The English Indices of Deprivation 2015, Baljit Gill).

The next stage is using the 3D maps produced to present them in VR. Virtual Reality is defined as “the use of computer technology to create a simulated environment”. Although VR is still in its developing stages, but it is being used across various platforms and presenting data is one of them.

Klara and Carstairs

It has been seven weeks since I started working at the UK Data Service/Jisc.

Although it seems like yesterday since I first entered the office, I have learned so many new things and skills over the past weeks, and I have finally applied what I have been learning at the University for two years.

My time here at Jisc and the UK Data Service has been one of the most valuable experiences in my life and has convinced me that work with data and data analysis is the right path for me once I graduate.

My fellow intern Rabia and I at the start of our internship

For the first few weeks, I was getting familiar with the Carstairs index (more about Carstairs can be read in my previous blog post), and was accessing all the data needed for the calculations. I encountered several problems during this process, which in turn helped me to gain even more experience and new skills.

One of the issues I had was that I needed data across five different Census years and for the whole of UK, which were not all available.

For example, one of the indicators for Carstairs score is ‘low social class of household reference person’. This information was not available for Scotland in the 2001 Census, and so the definition of the indicator had to be changed to ‘low social class of all persons’ in order to proceed with the research. This subsequently resulted in us needing to redownload all the data for the other years, and very “messy” folders with lots and lots of data we did not need anymore.

At that point I realised it was essential for me to organise all my folders and documents, and to name them with sensible names so I was sure I knew what each document contained and where to find it. Even though I have always considered myself as an organised person, working with such large amounts of data showed me I had to be even more organised to be able to progress to the analysis per se.

Another issue I faced was that different research papers used different Census variables to calculate the Carstairs score.

This was caused by the papers focusing either on one country rather than the whole of the UK or a specific Census year. Therefore, they all used the variables that were available for their particular projects. This, however, meant that Rabia (my fellow intern) and I had to adjust the definitions of the variables, so they fitted our own needs. We were lucky enough to find a few researchers who used indicators, which were available for all Census years we were analysing, and for all the countries in the UK, and so we were able to support our decisions on the definition changes.

I managed to solve all those problems and moved onto calculating the scores and analysing the data. This was done in R, which saved me a lot of time and enabled easy replicability. This was extremely useful as I had to recalculate my scores multiple times due to the problems outlined above. I have always been keen on working with R as I find it very intuitive and yet challenging, which I enjoy. I feel I have become proficient in R and I am very confident using it in a work environment.

I have always enjoyed working with data, but regardless, the analysis gets a bit more exciting when you can finally see a story the data tells.

To do this, I uploaded all my results into QGIS, an open source Geographic Information System, which enables creations of maps. All the data I have worked with concerns the geographical areas in the UK, hence mapping deprivation as well as providing tables with specific scores has felt to be the right (and very visually appealing) choice.

I have learned how to use a lot of functions QGIS offers, including creating 3D maps and analysing geospatial data.

Working with the software has woken up a very creative side of me I never knew I even had. I discovered that data analysis can be very artistic and original, especially when it comes to presenting the findings.

Deprivation in the UK from 2011 to 1991, by local authority

One of the things I have found out by just looking at my maps was that deprivation decreased massively between 1991 and 2011 according to the Carstairs index.

The darker the area, the more deprived the local authority is and vice versa.

When looking at the raw numbers, one would have no idea about this unless doing some further analysis, and so plotting numbers into a map is a great way of finding out whether there is something going on and subsequently go on about the specific analysis.

As I saw that the older maps are much darker than the ones from more recent years, I decided to explore whether this trend is significant.

I created confidence intervals and boxplots, presented below, to support my initial hypothesis about deprivation getting lower. The confidence intervals do not overlap and so I could be confident that deprivation decreased significantly between 1991 and 2011 in the UK.

Boxplots and 95% confidence intervals for deprivation levels in the UK from 2011 to 1991

Currently, I am working closely with Matt Ramirez, Futures senior innovation developer here at Jisc, who is turning my 3D maps into a virtual reality environment.

I have been given the opportunity to think about interesting and unique ways of using VR to display my results. I was particularly excited about of my ideas consisting of a lift, which would take the users up to different Census years. They could then get out of the lift and move around the map, which would result in a great interaction between the data and the users, and thus enhanced learning.

Unfortunately, given the short time frame this was not possible to complete, and so we had to simplify the visualisations and not use the lift. Regardless of that, the idea may be implemented in the future as it requires greater amount of time than we have at the moment but would be worth trying out.

Meet our interns: Klara

Rabia Butt and Klara Valentova are our Q-Step interns from the University of Manchester. Q-Step is a £19.5 million programme designed to promote a step-change in quantitative social science training, funded by the Nuffield Foundation and the ESRC. We asked Rabia and Klara to tell us a bit about themselves and their journey to this internship.

Klara

I am one of a small cohort of students taking a degree pathway ‘Sociology and Quantitative Methods’ at the University of Manchester.

I have been very enthusiastic about data analysis since the year 2014 when I completed a study exchange programme in the USA. I attended a local high school in Georgia and took a module called AP Statistics. I really enjoyed it, and decided I would like to study statistics even further. This, together with my interest in sociology and especially social inequalities, have led me to study Sociology and Quantitative Methods at the University of Manchester.

In my second year of University, I chose a module about data modelling. I have learned how to use R and developed critical thinking and problem-solving skills, but wanted to enhance these to a professional level. I have been very interested in working with the Census data, and in learning more about deprivation while improving my data skills. Thus, I am very lucky to have been given the opportunity to work at the UK Data Service on calculating Carstairs Deprivation Scores for the UK.

Carstairs index is a summary measure of relative material deprivation that was developed in the 1980s. It comprises four indicators from the Census, which relate to material deprivation (overcrowding, male unemployment, low social class and lack of car ownership).

Some of these variables, however, are a bit outdated, and so for our project, we have decided to include other indicators, which we propose are more up to date.

For instance, we will include total unemployment (female and male combined) in our calculations as there are much more women in the labour force than there were nearly 40 years ago when Carstairs index was created. Also, we take into consideration that lack of car ownership does not automatically imply deprivation as in urban areas not having a car might just be more convenient while in rural areas it is a necessity rather than an indication of wealth.

Over the last two weeks I have learned a lot about the Census, Carstairs scores and deprivation in the UK. Nonetheless, the greatest lesson I have learned is the importance of long, proper research.

We were researching for about two days, and overall found that all the papers did their analyses in the same way. Satisfied with our findings, we moved onto getting the data we thought we needed for the project and started with the analysis.

New questions then arose and we had to do some more researching. Suddenly, we discovered many new research papers, some of which did their analysis differently than us. We began wondering whether our work so far was correct. I started checking all the data we downloaded, did more and more researching, and realised that with this additional research I could do a lot of things in a more efficient way. I therefore regretted not spending more time on the initial research as in the end it would have saved me a fair amount of time.

On the other hand, practice is the best way of learning, and so I learned a great lesson, which will be very valuable to me in the future.

I am very motivated and excited now to learn other new things during the next 8 weeks. I am particularly thrilled that I will have the opportunity to use 3D printing and VR to visualise the findings of the project. These new technologies have an incredible potential and are beginning to be widely used in many job sectors, and so learning how to use them, and how to use them for presenting statistics is an enticing prospect for me.

Read Klara’s previous blog for the University of Manchester.

Meet our interns: Rabia

Rabia Butt and Klara Valentova are our Q-Step interns from the University of Manchester. Q-Step is a £19.5 million programme designed to promote a step-change in quantitative social science training, funded by the Nuffield Foundation and the ESRC. We asked Rabia and Klara to tell us a bit about themselves and their journey to this internship.

Rabia

As a second-year undergraduate student, who is currently studying Sociology at the University of Manchester, research, data collection and statistics are a large part of my course resulting in a new-found appreciation for quantitative methods.

The survey method in social research module introduced me to social statistics and the basic statistical concepts required for working with numeric survey data. This module enabled me to participate in the amazing Q-Step programme, which provides placement opportunities for students from the University of Manchester.

I stumbled across the UK Data Service, when I was exploring internship options for the summer. Through this programme I was lucky enough to be selected for a summer internship at the UK Data Service.

I was very interested and curious about where the data originates from and how data it is produced for researches, which is why I choose to do my summer internship here. I am delighted to work for the UK Data Service as I will be learning many new skills, allowing me to gain invaluable experience of a working environment, as well as helping me determine what type of job I would like to go into after I graduates.

I and my fellow intern (Klara) are working together on a project that we will be creating ourselves and presenting it at the end of our internship. The project requires us to calculate the deprivation measures of England, Scotland, Wales and Northern Ireland using the data from the Census of 2011, 2001, 1991,1981 and 1971.

The methodology is Carstairs which is an index of deprivation used in spatial epidemiology to identify socio-economic measures. A definition of deprivation is the damaging lack of material benefits considered to be necessities in a society.

The Carstairs index is based on four Census variables:

  • low social class,
  • lack of car ownership,
  • overcrowding
  • male unemployment

The overall index reflects the material deprivation of an area.

We will be using the data to calculate the population-weighted mean percentages and standard deviations (SD) for each component variables. Also, to confirm that all components have an even impact on the final score, each variable will be standardised to have a population-weighted mean of zero and a variance of one. Standardising contains subtracting the population mean from each variable and dividing the result by the SD (z-score method).

The variables Carstairs uses are outdated, since it was originally developed in the 1980s.  After discovering an article that introduced some new variables into their research, for an example they suggested replacing male unemployment in the Carstairs score with overall unemployment. The reason is to consider the participation of female labour force. We decided to this to our research as well, so we will be using old methodology and including overall unemployment and qualification levels as well.

After we have collected all the data we need, the next step will be learning R language, as we will be using this for our analysis.

Finding and downloading data at first, I thought would be quite easy. However, this was not the case because each census was different from each other, especially the previous ones.

For example, the census for 1991,81 and 71 the social class variable was in 10% for all the countries apart from Northern Ireland. This would cause difficulties when comparing the census with different geographical areas. This was just one of the problems we encountered with the census.

I will be using R to calculate the mean, standard deviations and the z-score for each variable of each year and area. This is a task which I am quite excited about because learning a programming language, would appear quite skillful on my CV and I have never done this before.

Read Rabia’s previous blog for the University of Manchester.

Calculating Townsend Scores: Resources to learn R

Sanah Yousaf, one of our interns talks about how she approached the task of learning R.

The current project I am working on as an intern at UK Data Service Census Support is creating Townsend deprivation scores with UK 2011 Census data. To allow UK Data Service Census Support users to produce their own Townsend deprivation scores, I used R to create an R script that produces the Townsend deprivation scores.

Initial experience with R

My first experience with R came about during my degree in a module concerning data analysis. R is a language and environment for statistical computing and graphics. The software is free and allows for data manipulation, calculation and graphical display.

There are many packages available to use in R and I only had experience with using “R Commander” and “Deducer”. Both R Commander and Deducer are data analysis graphical user interfaces. Knowledge of coding in R is not particularly necessary when using R Commander and Deducer as menu options are available for easy navigation to get to what you need, whether that’s obtaining summary statistics of your data or creating graphs.

Perhaps it is fair to say that the R skills set I gained from my module as part of my degree was limited for the task at hand. Having said that, I was both eager and intrigued to learn more about R’s capabilities but I would have to do this quickly to enable me to produce an R script to create Townsend scores.

Useful resources to learn R

After a few Google searches, I found many resources that taught the basics of R. Some were great, others not so much. I particularly found R tutorials on YouTube useful as opposed to some websites that provided code for certain functions in R but lacked explanations. In addition, I often found myself on Stack Overflow which is an online community for developers to learn and share their knowledge of different programmes.

R Tutorials on YouTube

If you are new to R, I would recommend “MarinStatsLectures” channel on YouTube.  The channel has tutorials ranging from how to import data into R that is of different formats to working with data in R. There are over 50 tutorials on the channel that are no longer than 10 minutes in length.  The tutorials provided me with knowledge of different R commands and explained basic R concepts well.

R packages

The R package “Swirl” allows R users to interactively learn through the R console. This was useful because I could learn different R commands whilst practicing within R.

Google search

A simple Google search of “how to… in R?” will usually provide you with the answer you are looking for! You will most probably bump into other R users who have asked the same question on Stack Overflow.

Ask R for help in the R Console

The help() or ?() command typed into the R Console will bring up R Documentation in the help window in R Studio. For example, typing in ?matrix in the R Console should load up the R documentation below.

References

More about R: https://www.r-project.org/about.html

Downloading R: https://cran.r-project.org/bin/windows/base/

Downloading R Studio: https://www.rstudio.com/products/rstudio/download/

MarinStatsLectures Channel on YouTube: https://www.youtube.com/user/marinstatlectures

More about Swirl in R: http://swirlstats.com/

Stack Overflow: https://stackoverflow.com/questions/tagged/r

 

 

 

 

 

 

 

Calculating Townsend scores: Replicating published results

Amy Bonsall, one of our interns talks about how she approached the task of working out how to calculate Townsend scores and then of finding others work to compare against as a way to quality assure the methodology.

As part of the internship project to calculate deprivation scores after finding sources that provide an outline of how to calculate Townsend Deprivation Scores it was important to ensure the methodology would produce scores that matched those already published.

We wanted to calculate scores and compare them to those that had already been calculated by using the same dataset to be sure we were using the same methodology. Whilst I was focused on this, Sanah Yousaf, my partner in this internship, was creating an R script to calculate the scores. Whilst this was being developed I used Excel to calculate the scores. This was not only because we did not yet have an R script but also because I was already comfortable with Excel and it made it easy to visualise the results of each step in the calculation.

Replicating scores proved more difficult than anticipated. Not only were there limited resources of published scores but we also found that many of the people who had already calculated scores had access to unadjusted census data meaning we had different outcomes. The main problem here was, there was no way of knowing if the different data was the only reason for contrasting scores or if it could have also been down to a different formula.

I went through what felt like an endless number of attempts to replicate another’s scores. Each time I would attempt to follow the often-limited detail of the methodology. Each time I failed I’d attempt a slight variation in the calculation to see if this would work with no success. Eventually, I found a source of results calculated for 1991 by Paul Norman. Included with the results was the data used to calculate the scores as well as the Z scores for each of the indicators. The materials provided with these scores were very useful as I could ensure the scores were the same based on the exact same data. It also meant that I could check if the z scores were right before ensuring that the Townsend Deprivation Scores were correct. Success was found with this dataset and meant I could go onto calculating deprivation scores for 2011 knowing that the calculation would be correct.

The next step meant creating scores based on datasets at varied output areas, which was much easier than the previous task. After my partner in the internship, Sanah had created an R script allowing us to calculate the scores, getting results didn’t take long. From here it will be interesting to see any other obstacles that we may come across including mapping the results and comparing them to past censuses. Considering the process so far however, I look forward to confronting them face on.

 

Calculating Townsend scores: An introduction

Amy Bonsall one of our interns talks about what deprivation is and how it could be calculated.

As a student at the University of Manchester studying criminology I was lucky enough to get the opportunity to work on a project with the UK Data Service as an intern calculating Townsend Deprivation Scores for the UK and importantly, learning work environment skills that will be useful once I graduate. My fellow intern (Sanah) and I came with a thirst to learn and an ambition to make the project a success which has made the exciting aspects more rewarding and the obstacles we need to come much more bearable.

Deprivation is a lack of reasonable provisions. This could be in a social way or material. Because there are so many indicators of deprivation and it cannot be measured in one objective way as it is a construct so many deprivation indices have been developed. Each of these indices have their benefits for measuring deprivation as well as areas where they are lacking.

Different methods of calculating have been developed due to a long term need to research deprivation through census data and the ever-changing indications of deprivation. I am currently using the 2011 census to calculate deprivation scores for the UK using the Townsend index. This is just one of many ways deprivation can be calculated however, we have decided this one is appropriate as it measures material deprivation exclusively rather than incorporating social deprivation meaning it can be consistently calculated over time. It is also comparable across the UK.

Before jumping into the data and calculating the deprivation scores it was important to first understand what Townsend’s Index measures and how to measure it. Information on the index was readily available and easy to find giving the initial feeling that the resources required at each stage of the project would be easily found (they weren’t).

Research taught us that Townsend Deprivation scores are calculated based on 4 indicators of deprivation: non-home ownership, non-car ownership, unemployment and overcrowding.

This is calculated by first finding percentage non-car ownership, percentage non- home ownership, percentage unemployment and percentage overcrowding.
The percentages for each area then need to be normalised for the unemployment and overcrowding indicators as these results are very skewed this is done by: ln(percentage value +1).

Z scores are then calculated using the percentage values for each ward under each indicator. For the unemployment and overcrowding variables, the logged versions are used instead.
Z scores= (percentage – mean of all percentages)/ SD of all percentages
Z scores of logged variables= (log percentage – mean of log percentages)/ SD of log percentages

Total of the 4 Z scores= Townsend Deprivation Score

 

Through the sources found it wasn’t perfectly clear how to calculate Z scores from the logged variables. There was no clarification about whether to take the mean and standard deviation of the percentages after they are logged or before. Taking information from different sources gave a good idea of the correct formula, however, the important next step is to test this formula against existing scores to ensure it is correct before continuing the process of this project.

Creating Consistent Deprivation Measures Across the UK

We’ve been lucky enough to have two interns come and work with us over the summer. They have been working on creating a set of Townsend Deprivation scores, using the UK 2011 Census data we have available via InFuse.

The interns came to us through the University of Manchester Q-Step Centre, which coordinates with different types of workplaces to offer 2nd year students the chance to practice the data skills taught through their degree courses at the university.

Sanah Yousaf is studying Law with Criminology.

I am currently a student at the University of Manchester studying Law with Criminology. As part of my degree, I chose a module called Data Analysis for Criminologists which exposed me to the world of data. I enjoyed the course so much that I decided to apply to work as an intern at UK Data Service via the Q-Step internship programme offered at the University of Manchester. As a result, I am now an intern at UK Data Service, specifically in the Census Support team based in Manchester. The project I am working on with my fellow intern (Amy) is calculating Townsend deprivation scores for the UK 2011 Census data.

 

Amy Bonsall is studying Criminology.

As a student at the University of Manchester studying criminology I was lucky enough to get the opportunity to work on a project with the UK Data Service as an intern calculating Townsend Deprivation Scores for the UK and importantly, learning work environment skills that will be useful once I graduate. My fellow intern (Sanah) and I came with a thirst to learn and an ambition to make the project a success which has made the exciting aspects more rewarding and the obstacles we need to come much more bearable.

Amy and Sanah have agreed to write blogs about the project, which we’ll publish over the coming weeks, together with the resources that Amy and Sanah created, to include the raw data and the scores.