Our Q-step interns look back over their time with us

This summer, we were lucky enough to have Manchester students Rabia Butt and Klara Valentova joined us for a Q-step internship. Q-Step was developed as a strategic response to the shortage of quantitatively-skilled social science graduates. Manchester is one of 15 centres taking part in a £19.5 million programme designed to promote a step-change in quantitative social science training.

Here they review their time with us.

The purpose of our research was to explore deprivation in the UK using the Census data and the Carstairs Index of Deprivation.

To comprehend the overall transformation in the level of deprivation within the UK, the Census data from 1971 to the most recent one of 2011 was used in the calculation. In addition, different types of geographical levels, such as local authority, ward, lower super output areas, output area, and district were used for the calculation. This has now been accomplished. throughout this process, we learned many new techniques and skills, which we applied to achieve the final outcomes.

Our research has found that most of the extremely deprived areas are in cities.

To investigate why that might be the case, the individual scores of each of the indicators applied to produce the Carstairs scores were examined as they could explain what variable contributed to the specific level of deprivation the most.

It was found that in cities ‘Non-car ownership’ caused the final deprivation score to be so large. Owning a car in a city, however, can be very impractical, and so this suggests that the indicators used for the calculation of the Carstairs index are outdated and may need to be revised and possibly replaced with more relevant ones.

The most deprived areas overall are located in the City of London and Glasgow City, while the least deprived areas can be found in the suburbs of London, particularly in the southwest of London in towns such as Wokingham or cities such as St Albans.

That being said, our project has also found that some of the least deprived areas are also located in City of London and around Glasgow.

This came as a surprise since these two areas, in particular, seem to have to the highest levels of deprivation. This discovery was possible to made only due to the use of the Carstairs index as it allows for the analysis of smaller geographies such as output areas.

These results demonstrate that the Carstairs index is valuable in noticing small areas with high levels of deprivation be which would not be recognised as deprived when using other deprivation measures, for example the Indices of Multiple Deprivation.

We were also able to compare deprivation across different Census years for the whole of UK, and it was found that deprivation has decreased significantly between 1981 and 2011.

Deprivation in North Ireland, Scotland and Wales decreased the most, while in England this was the case for only certain areas. There was slight improvement for the south of England due to these areas always being less deprived. On the other hand, the north of England used to be greatly deprived especially in comparison to the South. Areas in the North have lowered their deprivation scores greatly, however, they still remain more deprived the South.

Rabia

I have learnt a great deal during the process of our research, starting from not knowing much about Carstairs, R language, 3D mapping on QJIS, virtual reality and many more, to now having completed a research report.

I am most grateful for the experience I have gained during my internship and have not only improved on the skills I had, but also for acquiring new set of skills, which I will definitely be benefiting from in the near future.

One of the key skills which I have developed in is data analysis, especially when regarding quantitative data. This will definitely assist me in the final year of my degree, in the process of my dissertation.

Likewise, this internship has enhanced my knowledge of the role of data analyst, which was one of the reasons to why I wanted to do my internship here. This is because I wanted the experience of working for the UK Data Service to help me in determining the career path I wanted to take after I graduate.

I have enjoyed creating this project with a team member (Klara), as we were able to assist each other throughout the process in achieving the desired outcomes. This whole journey has definably helped me in my development, which has now sadly come to end.

Klara

I started this internship having basic data analysis skills, which I wanted to make use of and enhance them to a profession level. At the same time, I was hoping this experience would help me to decide what I would like to do after graduating, and whether data analysis and statistics would be something I would enjoy even outside of courses at the University. Working at the UK Data Service has fulfilled all the above.

I have learned many new skills and have developed on a professional level while discovering a whole new range of what I can do with data and its results (more about that in my previous blog at https://lab.ukdataservice.ac.uk/2018/08/21/klara-and-carstairs/).

Apart from developing knowledge of using different software such as QGIS or Microsoft Access, the one skill I developed the most was how to be successful when working in a team.

Having to work with another person, taking their idea on board and compromising has been very challenging for me yet incredibly rewarding. It was due to our excellent communication and mutual respect and understanding that we were able to calculate all the scores, do the analysis and finally produce the final report.

I have also discovered that I quite enjoyed guiding and helping Rabia throughout the internship, which helped me to gain very strong interpersonal skills that will be important in both my personal and professional life.

I am now very excited to have completed this project and to have gained so many invaluable skills and experiences. Nonetheless, I am sad to be leaving the UK Data Service as I enjoyed every task I had to complete, and I always felt very excited about working on this research. I am happy, however, that I now know I would like to do this type of job after finishing University because I enjoy it and find it very fulfilling and worthwhile.

Rabia and Carstairs

Rabia Butt, one of our summer Q-Step interns, explains the process she went through to calculate and map the Carstairs deprivation index using UK Census data.

After downloading the data needed for the research to measure deprivation, some issues were recognised. Therefore, we needed to make some changes.

When trying to match variables with those that were used by published research papers, it was quite a challenge. This is because even though the research papers used Carstairs methods where by definition the census variables needed to refer to households. However, the social class had to be changed from head of household in a lower social class to per person because the data from Scotland did not include household references for social class.

To make comparisons across the UK, the per person variable was employed and to replicate our calculations with the research papers as well. There was no other research paper that calculated the whole of the UK. Most research papers used Scotland census only and some were of England and Wales together but not with the other countries in the UK.

For our investigation, the calculations for the Carstairs scores were performed in R, which required the import of a dataset with the relevant variables for the calculation.  The reason for choosing R for the analysis is the large quantity of data we were handling and the numbers of calculation we needed to make. R has the capabilities of being able to produce results for large quantity of data, which makes R irreplaceable for this sort of research.

The variables required for the Carstairs score had different names in the dataset for each census years.  For example, in the overcrowded variable, total households in 2011 was named ‘all household’ whereas, in 2001 it was given a codename ‘cs0520001’. As a result, we decided to change the names so that it was consistence throughout our calculations.

The geographical names of the variables also had different headings and codes. For example, what 2011 called the ‘GEO_CODE’, 2011 called the ‘Zone. Code’. For these reasons, the script was altered to adapt to these changes. Although these are minor adjustments, they were compulsory steps which needed to be made for the R scripts to function without any difficulties.

I have learnt to use R to calculate the proportions, mean, standard deviation and Zscores for my project and perform many other functions as well.

The calculations for the project were completed and the next step was trying to match the Zscores with the published results to ensure that the variables used in calculating scores were correct and to support out research project as well.  We discovered that the variables and geographical areas of the different census that we used were correct but Some research papers’ Zscores were different from ours as they did not include information on how they weighted the population count. After reading many research papers and trying to match their results with ours, we finally found one paper whose scores match our very closely.

The Carstairs Z-scores were spilt into quintiles, ranging from

  • 1 least deprived to
  • 5 most deprived

A quintile is a statistical value of a data set that signifies 20% of a given population, so the first quintile symbolises the lowest fifth of the data and the second quintile represents the second fifth and so on. It should be acknowledged that the quintiles in this project are based on area meaning that 20% of all areas fall into each quintile. We calculated the quintiles in R, and the formula divided the fifth quintile’s sum by the data-set sum. The reason for creating quintiles was for map visualisation, which allowed us to make observation on which areas are most and least deprived across the selected geographical province in the UK. Therefore, quintiles were very convenient for plotting map visualisations as they offer a geographical perspective of the spread of deprivation across the UK.

QGIS is software that allows users to examine and edit spatial information, as well as composing and exporting graphical maps. For our research, this software was used to generate 3D maps of the Carstairs scores for data visualisation.

At first, I practised on QGIS using the census data that interested me, so I chose to use the Pakistan census data of 2017 and 1998 to look at the difference in the gender of Pakistan in 3D maps. From the UK census of 2011 I explored how language proficiency in English could have an influence on general health. The 3D maps with my data were able to show me which places had the highest peaks and where it was the lowest.

For the main project, we created 3D maps of the whole of the UK at ward, Ouput Area and local authority level from 1991 to 2011. 3D maps of Great Britain (i.e. England, Wales and Scotland) were also created at District level.

To produce Carstairs scores from the first census of 1971 to the most recent one of 2011, Northern Ireland could not be included in the calculation, as there is no census data available for Northern Ireland in the years for 1981 and 1971. As a result, Northern Ireland was excluded in this analysis.

For detailed examination of deprivation, 3D maps of the capital cities of the UK and Greater Manchester of output areas were created as well. The boundary data for the maps were downloaded from Casweb and borders from the UK Data Service. The boundary data was simplified on mapshaper and the quintiles were dissolved into 5 layers, so that it was easier for people to understand and produce the 3D maps.

The main findings of this research are portrayed in Tables 1 and 2. Table 1 shows the most deprived areas in the UK, while Table 2 shows the least deprived areas based on the area’s total score. The results are listed for all the years and output level that the data was available for. The least deprived areas, as can be seen in Table 2, are mostly located around London, in the South of England. This finding has been consistent since 1981 till 2011, however, in 1971 Bearsden was the least deprived which is in Scotland.

To understand the overall change in the level of deprivation across Great Britain, maps for 1981 1991, 2001 and 2011 were created using quintiles. The lighter colours represent less deprived areas and vice versa. I

Based on these scores, deprivation in Great Britain has decreased greatly between 1981 and 2011. The largest change has occurred in Scotland, compared with the rest of Great Britain. However, when comparing Scotland with England and Wales, it is still more deprived.

There has been a positive change for England as well, however, it has not been into such extent as for the other countries. The north of England and Cornwall have improved their deprivation scores the most.

Nevertheless, cities in the North, the areas in Birmingham and in London continue to score highly in their deprivation scores.  There is a trend emerging throughout the four censuses of GB, which is that generally the south has always been less deprived when compared to the North.

Great Britain 1981

Great Britain 1991

Great Britain 2001

Great Britain 2011

 

Below are maps of Manchester and Greater Manchester showing deprivation at Output Area level.

Darker colours represent more deprived areas (higher quintile). Most of Greater Manchester, and especially Manchester is greatly deprived with some exceptions. There is a pattern as well, which illustrates that the inner part of all towns in Greater Manchester are deprived and the outskirts are lighter which means less deprived.

Manchester by Output Area 2011

Greater Manchester by OA 2011

 

It needs to be taken into consideration that one of the indicators of deprivation for the Carstairs score is ‘Car ownership’. Considering that owning a car in the city centre might not be convenient, it is to be expected that city centre areas may score higher on this variable resulting in the overall score being higher.

My 3D maps above also correspond with the results of the Index of Multiple Deprivation scores, which have stated that “Manchester is one of the local authority districts which has the largest proportions of highly deprived neighbourhoods in England” (The English Indices of Deprivation 2015, Baljit Gill).

The next stage is using the 3D maps produced to present them in VR. Virtual Reality is defined as “the use of computer technology to create a simulated environment”. Although VR is still in its developing stages, but it is being used across various platforms and presenting data is one of them.

Klara and Carstairs

It has been seven weeks since I started working at the UK Data Service/Jisc.

Although it seems like yesterday since I first entered the office, I have learned so many new things and skills over the past weeks, and I have finally applied what I have been learning at the University for two years.

My time here at Jisc and the UK Data Service has been one of the most valuable experiences in my life and has convinced me that work with data and data analysis is the right path for me once I graduate.

My fellow intern Rabia and I at the start of our internship

For the first few weeks, I was getting familiar with the Carstairs index (more about Carstairs can be read in my previous blog post), and was accessing all the data needed for the calculations. I encountered several problems during this process, which in turn helped me to gain even more experience and new skills.

One of the issues I had was that I needed data across five different Census years and for the whole of UK, which were not all available.

For example, one of the indicators for Carstairs score is ‘low social class of household reference person’. This information was not available for Scotland in the 2001 Census, and so the definition of the indicator had to be changed to ‘low social class of all persons’ in order to proceed with the research. This subsequently resulted in us needing to redownload all the data for the other years, and very “messy” folders with lots and lots of data we did not need anymore.

At that point I realised it was essential for me to organise all my folders and documents, and to name them with sensible names so I was sure I knew what each document contained and where to find it. Even though I have always considered myself as an organised person, working with such large amounts of data showed me I had to be even more organised to be able to progress to the analysis per se.

Another issue I faced was that different research papers used different Census variables to calculate the Carstairs score.

This was caused by the papers focusing either on one country rather than the whole of the UK or a specific Census year. Therefore, they all used the variables that were available for their particular projects. This, however, meant that Rabia (my fellow intern) and I had to adjust the definitions of the variables, so they fitted our own needs. We were lucky enough to find a few researchers who used indicators, which were available for all Census years we were analysing, and for all the countries in the UK, and so we were able to support our decisions on the definition changes.

I managed to solve all those problems and moved onto calculating the scores and analysing the data. This was done in R, which saved me a lot of time and enabled easy replicability. This was extremely useful as I had to recalculate my scores multiple times due to the problems outlined above. I have always been keen on working with R as I find it very intuitive and yet challenging, which I enjoy. I feel I have become proficient in R and I am very confident using it in a work environment.

I have always enjoyed working with data, but regardless, the analysis gets a bit more exciting when you can finally see a story the data tells.

To do this, I uploaded all my results into QGIS, an open source Geographic Information System, which enables creations of maps. All the data I have worked with concerns the geographical areas in the UK, hence mapping deprivation as well as providing tables with specific scores has felt to be the right (and very visually appealing) choice.

I have learned how to use a lot of functions QGIS offers, including creating 3D maps and analysing geospatial data.

Working with the software has woken up a very creative side of me I never knew I even had. I discovered that data analysis can be very artistic and original, especially when it comes to presenting the findings.

Deprivation in the UK from 2011 to 1991, by local authority

One of the things I have found out by just looking at my maps was that deprivation decreased massively between 1991 and 2011 according to the Carstairs index.

The darker the area, the more deprived the local authority is and vice versa.

When looking at the raw numbers, one would have no idea about this unless doing some further analysis, and so plotting numbers into a map is a great way of finding out whether there is something going on and subsequently go on about the specific analysis.

As I saw that the older maps are much darker than the ones from more recent years, I decided to explore whether this trend is significant.

I created confidence intervals and boxplots, presented below, to support my initial hypothesis about deprivation getting lower. The confidence intervals do not overlap and so I could be confident that deprivation decreased significantly between 1991 and 2011 in the UK.

Boxplots and 95% confidence intervals for deprivation levels in the UK from 2011 to 1991

Currently, I am working closely with Matt Ramirez, Futures senior innovation developer here at Jisc, who is turning my 3D maps into a virtual reality environment.

I have been given the opportunity to think about interesting and unique ways of using VR to display my results. I was particularly excited about of my ideas consisting of a lift, which would take the users up to different Census years. They could then get out of the lift and move around the map, which would result in a great interaction between the data and the users, and thus enhanced learning.

Unfortunately, given the short time frame this was not possible to complete, and so we had to simplify the visualisations and not use the lift. Regardless of that, the idea may be implemented in the future as it requires greater amount of time than we have at the moment but would be worth trying out.

Mapping divorce and religion in the Czech Republic

Divorces, religion and education in the regions of the Czech Republic, 2011 (data about divorces from 2018)

Klara Valentova explores mapping data from her home country.

Note: In the following maps, darker colour and higher layer signify higher proportions of whichever variable is being portrayed.

Map 1 presents the divorce rate in different regions of the Czech Republic in 2018.

Divorces are most prevalent in the Central Bohemian Region, which surrounds the capital city Prague. Prague has a lower percentage of divorces, and one could argue that’s because young people move to Prague, where they find a partner, get married and start a family, move outside of Prague to the Central Bohemian Region, where they eventually get divorced. We can also see that there are quite high divorce rates in the North and South East of the Czech Republic.

Map 1: Divorce rates in the regions of the Czech Republic, 2018

 

Map 2 shows the proportion of religious population in regions of the Czech Republic in 2011.

The most religious regions are in Moravia, the East part of the country, while there is a little religious population in the North West. A surprising finding is, that some of these regions with very religious people have quite high rates of divorces as seen in Map 3, while in non-religious regions in the West Bohemia, divorce rates are relatively low. And so Czech Republic does not necessarily follow the believed phenomenon of religious people getting divorced less than non-religious people.

Map 2: Rates of religious population in the regions of the Czech Republic, 2011

 

Map 3 shows the distribution of people with a university degree across the regions in the Czech Republic in 2011.

In general, the North and the division between Bohemia and Moravia have the smallest number of people with degrees, while in the capital city, there is an enormous peak with nearly half of the population having a university degree. The South Moravian Region has the second highest proportion of people with university education, which can be explained by the second biggest city in Czechia, Brno, being situated there. However, there seems to be no correlation between education and religion or divorce rates in the Czechia.

Map 3: The distribution of people with a university degree in the Czech Republic, 2011

 

The data are available at: https://www.czso.cz/csu/czso/home, and the boundary data for Czech regions at: http://www.diva-gis.org/datadown. Both files were then uploaded to QGIS, joined, coloured by the proportions, and subsequently turned into 3D maps with higher areas corresponding to higher proportions to enhance the differences even more.

You can play with the 3D maps by following the links below. Please note that the maps can take some time to load.

Mapping the census – connections between language ability and health

Rabia Butt uses mapping to explore possible connections between health conditions and fluency in English.

From the UK census of 2011, I decided to compare people whose first language isn’t English, but they can speak very well, or they cannot speak at all. I was trying to discover how their proficiency in English would have influence on their general health.

I got my data from the UK Data Service Infuse website and compared the results at England wards level.

My first 3D map showed results of people health who have said that their health is good.

The results were what I had expected them to be: people who can speak English had claimed that their health is good, by a significant amount compared with people who cannot speak English.

Whereas, when I was comparing the result of people who have said their health is not good showed that people who can people speak English claimed that their health is not good more than of people who cannot speak English.

I had expected the results to be other way around, however there may be many other reasons or factors that had an influence on the results. The 3D maps with my data were able to show me which places had the highest peaks and where it was the lowest.

This 3D map is of people in England whose health isn’t good comparing with people who can speak English and cannot. The orange represents people whose health is not good, but they can speak English and the colour green is for people whose health isn’t good and they can’t speak English. The lighter the colour is the less people there whose health isn’t good. The darker the colour the more people there are with health isn’t good.

 

This 3D map is of people with good health and can speak English.

 

This map is of people whose health is good and cannot speak English.

You can play with the 3D map by following the link below. Please note that the map can take some time to load.

Mapping gender in Pakistan

Rabia Butt explores mapping data from her home country.

I created my first 3D maps from using the census data that interested me, so I chose to use the Pakistani census data of 2017 and 1998 to look at the difference in the gender population of Pakistan in 3D maps.

I got my data from the Pakistani census website. I created a map which showed the different gender population in Pakistan in 2017 which had male, female and transgender. The transgender population was extremely low and the there was a difference in the male and female population as well as male was higher than female.

Therefore, I decided to compare the 2017 result the previous census of Pakistan which had a 19 year gap since the latest census. The previous census did not include transgender people and there was still a gap between the male and female population as male population was still higher than female.

This map shows the population of Pakistan from the census of 1998. The blue represents male and pink female.

 

This map shows the population of Pakistan from the census of 2017. The blue represents male and pink female.

 

You can play with the 3D maps by following the links below. Please note that the maps can take some time to load.

Mapping annual net income in the UK

Annual net income in the UK in 2016 for Middle Super Output Areas (MSOA) – Before and after housing costs

Klara Valentova has been exploring mapping of data.

Note: In the following maps, darker colour and higher layer signify higher income for the specific area.

Map 1 shows the annual net income before housing costs in the UK in 2016. The highest income is distributed in the South East, notably around London with peaks in central London such as Westminster or Chelsea. However, income in nearly all areas in Wales is lower than in most areas in England.

Map 1: Annual Net Income Before Housing Costs in the UK for MSOA, 2016

 

Nonetheless, when looking at Map 2, displaying annual net income after housing costs, suddenly the huge differences between the areas have vanished.

The highest incomes are still distributed in the South East, but we can see that in big cities in the North of England, the incomes are almost as high as down south. The peaks in the London area persist but there are more of them now, and they are mostly around London rather than in the city centre as it used to be before accounting for housing costs. This can be explained by the incredibly expensive living costs inside London.

Map 2: Annual Net Income After Housing Costs in the UK for MSOA, 2016

 

The data for both of the maps are available at: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/earningsandworkinghours/datasets/smallareaincomeestimatesformiddlelayersuperoutputareasenglandandwales.

The files were uploaded to QGIS, together with boundary data for MSOA, available from: https://borders.ukdataservice.ac.uk/. These two layers were then joined, and the map coloured by the income level.

The map was subsequently turned into 3D with the height of the areas corresponding to the income level to enhance the differences even more.

You can play with the 3D maps by following the links below. Please note that the maps can take some time to load.

Meet our interns: Klara

Rabia Butt and Klara Valentova are our Q-Step interns from the University of Manchester. Q-Step is a £19.5 million programme designed to promote a step-change in quantitative social science training, funded by the Nuffield Foundation and the ESRC. We asked Rabia and Klara to tell us a bit about themselves and their journey to this internship.

Klara

I am one of a small cohort of students taking a degree pathway ‘Sociology and Quantitative Methods’ at the University of Manchester.

I have been very enthusiastic about data analysis since the year 2014 when I completed a study exchange programme in the USA. I attended a local high school in Georgia and took a module called AP Statistics. I really enjoyed it, and decided I would like to study statistics even further. This, together with my interest in sociology and especially social inequalities, have led me to study Sociology and Quantitative Methods at the University of Manchester.

In my second year of University, I chose a module about data modelling. I have learned how to use R and developed critical thinking and problem-solving skills, but wanted to enhance these to a professional level. I have been very interested in working with the Census data, and in learning more about deprivation while improving my data skills. Thus, I am very lucky to have been given the opportunity to work at the UK Data Service on calculating Carstairs Deprivation Scores for the UK.

Carstairs index is a summary measure of relative material deprivation that was developed in the 1980s. It comprises four indicators from the Census, which relate to material deprivation (overcrowding, male unemployment, low social class and lack of car ownership).

Some of these variables, however, are a bit outdated, and so for our project, we have decided to include other indicators, which we propose are more up to date.

For instance, we will include total unemployment (female and male combined) in our calculations as there are much more women in the labour force than there were nearly 40 years ago when Carstairs index was created. Also, we take into consideration that lack of car ownership does not automatically imply deprivation as in urban areas not having a car might just be more convenient while in rural areas it is a necessity rather than an indication of wealth.

Over the last two weeks I have learned a lot about the Census, Carstairs scores and deprivation in the UK. Nonetheless, the greatest lesson I have learned is the importance of long, proper research.

We were researching for about two days, and overall found that all the papers did their analyses in the same way. Satisfied with our findings, we moved onto getting the data we thought we needed for the project and started with the analysis.

New questions then arose and we had to do some more researching. Suddenly, we discovered many new research papers, some of which did their analysis differently than us. We began wondering whether our work so far was correct. I started checking all the data we downloaded, did more and more researching, and realised that with this additional research I could do a lot of things in a more efficient way. I therefore regretted not spending more time on the initial research as in the end it would have saved me a fair amount of time.

On the other hand, practice is the best way of learning, and so I learned a great lesson, which will be very valuable to me in the future.

I am very motivated and excited now to learn other new things during the next 8 weeks. I am particularly thrilled that I will have the opportunity to use 3D printing and VR to visualise the findings of the project. These new technologies have an incredible potential and are beginning to be widely used in many job sectors, and so learning how to use them, and how to use them for presenting statistics is an enticing prospect for me.

Read Klara’s previous blog for the University of Manchester.

Meet our interns: Rabia

Rabia Butt and Klara Valentova are our Q-Step interns from the University of Manchester. Q-Step is a £19.5 million programme designed to promote a step-change in quantitative social science training, funded by the Nuffield Foundation and the ESRC. We asked Rabia and Klara to tell us a bit about themselves and their journey to this internship.

Rabia

As a second-year undergraduate student, who is currently studying Sociology at the University of Manchester, research, data collection and statistics are a large part of my course resulting in a new-found appreciation for quantitative methods.

The survey method in social research module introduced me to social statistics and the basic statistical concepts required for working with numeric survey data. This module enabled me to participate in the amazing Q-Step programme, which provides placement opportunities for students from the University of Manchester.

I stumbled across the UK Data Service, when I was exploring internship options for the summer. Through this programme I was lucky enough to be selected for a summer internship at the UK Data Service.

I was very interested and curious about where the data originates from and how data it is produced for researches, which is why I choose to do my summer internship here. I am delighted to work for the UK Data Service as I will be learning many new skills, allowing me to gain invaluable experience of a working environment, as well as helping me determine what type of job I would like to go into after I graduates.

I and my fellow intern (Klara) are working together on a project that we will be creating ourselves and presenting it at the end of our internship. The project requires us to calculate the deprivation measures of England, Scotland, Wales and Northern Ireland using the data from the Census of 2011, 2001, 1991,1981 and 1971.

The methodology is Carstairs which is an index of deprivation used in spatial epidemiology to identify socio-economic measures. A definition of deprivation is the damaging lack of material benefits considered to be necessities in a society.

The Carstairs index is based on four Census variables:

  • low social class,
  • lack of car ownership,
  • overcrowding
  • male unemployment

The overall index reflects the material deprivation of an area.

We will be using the data to calculate the population-weighted mean percentages and standard deviations (SD) for each component variables. Also, to confirm that all components have an even impact on the final score, each variable will be standardised to have a population-weighted mean of zero and a variance of one. Standardising contains subtracting the population mean from each variable and dividing the result by the SD (z-score method).

The variables Carstairs uses are outdated, since it was originally developed in the 1980s.  After discovering an article that introduced some new variables into their research, for an example they suggested replacing male unemployment in the Carstairs score with overall unemployment. The reason is to consider the participation of female labour force. We decided to this to our research as well, so we will be using old methodology and including overall unemployment and qualification levels as well.

After we have collected all the data we need, the next step will be learning R language, as we will be using this for our analysis.

Finding and downloading data at first, I thought would be quite easy. However, this was not the case because each census was different from each other, especially the previous ones.

For example, the census for 1991,81 and 71 the social class variable was in 10% for all the countries apart from Northern Ireland. This would cause difficulties when comparing the census with different geographical areas. This was just one of the problems we encountered with the census.

I will be using R to calculate the mean, standard deviations and the z-score for each variable of each year and area. This is a task which I am quite excited about because learning a programming language, would appear quite skillful on my CV and I have never done this before.

Read Rabia’s previous blog for the University of Manchester.

Calculating Townsend Scores: Resources to learn R

Sanah Yousaf, one of our interns talks about how she approached the task of learning R.

The current project I am working on as an intern at UK Data Service Census Support is creating Townsend deprivation scores with UK 2011 Census data. To allow UK Data Service Census Support users to produce their own Townsend deprivation scores, I used R to create an R script that produces the Townsend deprivation scores.

Initial experience with R

My first experience with R came about during my degree in a module concerning data analysis. R is a language and environment for statistical computing and graphics. The software is free and allows for data manipulation, calculation and graphical display.

There are many packages available to use in R and I only had experience with using “R Commander” and “Deducer”. Both R Commander and Deducer are data analysis graphical user interfaces. Knowledge of coding in R is not particularly necessary when using R Commander and Deducer as menu options are available for easy navigation to get to what you need, whether that’s obtaining summary statistics of your data or creating graphs.

Perhaps it is fair to say that the R skills set I gained from my module as part of my degree was limited for the task at hand. Having said that, I was both eager and intrigued to learn more about R’s capabilities but I would have to do this quickly to enable me to produce an R script to create Townsend scores.

Useful resources to learn R

After a few Google searches, I found many resources that taught the basics of R. Some were great, others not so much. I particularly found R tutorials on YouTube useful as opposed to some websites that provided code for certain functions in R but lacked explanations. In addition, I often found myself on Stack Overflow which is an online community for developers to learn and share their knowledge of different programmes.

R Tutorials on YouTube

If you are new to R, I would recommend “MarinStatsLectures” channel on YouTube.  The channel has tutorials ranging from how to import data into R that is of different formats to working with data in R. There are over 50 tutorials on the channel that are no longer than 10 minutes in length.  The tutorials provided me with knowledge of different R commands and explained basic R concepts well.

R packages

The R package “Swirl” allows R users to interactively learn through the R console. This was useful because I could learn different R commands whilst practicing within R.

Google search

A simple Google search of “how to… in R?” will usually provide you with the answer you are looking for! You will most probably bump into other R users who have asked the same question on Stack Overflow.

Ask R for help in the R Console

The help() or ?() command typed into the R Console will bring up R Documentation in the help window in R Studio. For example, typing in ?matrix in the R Console should load up the R documentation below.

References

More about R: https://www.r-project.org/about.html

Downloading R: https://cran.r-project.org/bin/windows/base/

Downloading R Studio: https://www.rstudio.com/products/rstudio/download/

MarinStatsLectures Channel on YouTube: https://www.youtube.com/user/marinstatlectures

More about Swirl in R: http://swirlstats.com/

Stack Overflow: https://stackoverflow.com/questions/tagged/r