Exploring Kepler.gl

Shows the home page of the Kepler.gl site

Kepler.gl is an open source mapping tool that claims to work for large scale datasets.

It has been developed by Uber, where they have developed an in-house solution based on open source components which they use to analyse their data. Luckily for us, they decided to make their solution open source and available to us.

Kepler.gl works within your browser, which is a nice feature as it means you retain control of your data, which could be important if you wanted to map data which could contain sensitive data.

To try the system out I downloaded our 2011 Census Headcounts, in particular the file called UK postcode data and supporting metadata for 2011 frozen postcodes, which is a zip file.

I unzipped this, ready for me to load into Kepler.gl. I chose this dataset as I know it contains latitude and longitude information, as well as population and deprivation data.

Uploading data was pretty straight forward. There’s an option to browse for your data file or drag and drop the file into the browser.

A slightly annoying bit for me was that map opens focused on San Francisco, when I know the data I added was for the UK. But it was easy to refocus the map on the UK using the standard grab-and-pull functionality.

To map the data, I needed to add a layer and choose the type of data.

For this data I knew it was point data. I also entered a name of the layer. I called it 2011 Census Postcodes. It’s possible with Kepler.gl to add more than one layer so giving your new layer an meaning full name is useful.

It next asked for the fields that contain the Lat(latitude) and Lng(longitude).

In our data I discovered that we mislabelled them, so the field names were the opposite of what they should be (I’ll get this corrected).


You’ll notice that there is the option to add a field to represent the Altitude. For this initial visualisation, I left that blank.

This now created a map showing UK postcodes, but (to be honest) it was a bit boring.

Kepler.gl has the option to colour the postcode points based on the value of a field.

In this data, were the UK Townsend Deprivation scores as quintiles calculated at the output area level, so I used this field to colour-code the points. I also sized the points based on the number of people living in that postcode.

The finished map of the UK shows a very mixed view, but if you zoom into a town and city you can then see the differences between postcodes.


For example, here’s a map of Belfast showing differences in deprivation between postcodes. Dark red is less deprived and yellow is most deprived.

Overall I found this web app easy to use, but it may give some issues for people unfamiliar with mapping.

However, as a free tool to map data without sending it back to a server it presents an option to map more personal data without the worry of having this data hosted some where you don’t know.


Rob Dymond-Green is a Senior Technical Co-ordinator for the UK Data Service, working with aggregate census and international data. 

Shaping the 2021 UK Census

The Office for National Statistics (ONS) have been recently doing work on question design and gauging the effect of asking questions on topics such as Gender identity, Sexual orientation and Armed forces community.

These questions are new following feedback from an earlier topic consultation or to address a policy need. For the new armed forces question, there is the Armed Forces Covenant which is a promise from the nation that those who serve or have served in the armed forces, and their families are treated fairly. The ONS have looked at using data from the Ministry of Defence  to identify people who have served in the armed services, but as the Veterans Leavers Database only goes back to 1975, a new question is needed to identify the number of veterans who served before 1975.

National Records of Scotland (NRS) has also been looking at the topics for the 2021 Census and has published a report on their status so far. NRS are running some events to talk about how the data will be disseminated. These events are taking place in February 2018 and early March 2018.

Northern Ireland Statistics and Research Agency (NISRA) has also been looking at the topics for the 2021 census and have produced a report following their consultation activity.

A change for the 2021 census will be the push to get more respondents to fill in their census forms online.

Each of the agencies are also looking at using existing data (administrative) that is collected by government or other national bodies, which could provide accurate data to replace having to ask that question on the form.

It’s hoped that the use of administrative data could deliver a more timely census. There is also the opportunity to increase the frequency of subsequent updates of the data, possibly even annually for some topics.

The agencies are also investigating the use of systems to be able to define your own census datasets, which would be capable of aggregating the data from the unit records (individual records about people) and applying anonymisation to them as the data is generated.

If you are interested in finding out more information on the agencies plans for the 2021 census, please follow these links to their websites for 2021. ONS Census Transformation Programme , NISRA has a 2021 Census page and NRS have setup Scotland’s 2021 Census.

 

Calculating Townsend Scores: Resources to learn R

Sanah Yousaf, one of our interns talks about how she approached the task of learning R.

The current project I am working on as an intern at UK Data Service Census Support is creating Townsend deprivation scores with UK 2011 Census data. To allow UK Data Service Census Support users to produce their own Townsend deprivation scores, I used R to create an R script that produces the Townsend deprivation scores.

Initial experience with R

My first experience with R came about during my degree in a module concerning data analysis. R is a language and environment for statistical computing and graphics. The software is free and allows for data manipulation, calculation and graphical display.

There are many packages available to use in R and I only had experience with using “R Commander” and “Deducer”. Both R Commander and Deducer are data analysis graphical user interfaces. Knowledge of coding in R is not particularly necessary when using R Commander and Deducer as menu options are available for easy navigation to get to what you need, whether that’s obtaining summary statistics of your data or creating graphs.

Perhaps it is fair to say that the R skills set I gained from my module as part of my degree was limited for the task at hand. Having said that, I was both eager and intrigued to learn more about R’s capabilities but I would have to do this quickly to enable me to produce an R script to create Townsend scores.

Useful resources to learn R

After a few Google searches, I found many resources that taught the basics of R. Some were great, others not so much. I particularly found R tutorials on YouTube useful as opposed to some websites that provided code for certain functions in R but lacked explanations. In addition, I often found myself on Stack Overflow which is an online community for developers to learn and share their knowledge of different programmes.

R Tutorials on YouTube

If you are new to R, I would recommend “MarinStatsLectures” channel on YouTube.  The channel has tutorials ranging from how to import data into R that is of different formats to working with data in R. There are over 50 tutorials on the channel that are no longer than 10 minutes in length.  The tutorials provided me with knowledge of different R commands and explained basic R concepts well.

R packages

The R package “Swirl” allows R users to interactively learn through the R console. This was useful because I could learn different R commands whilst practicing within R.

Google search

A simple Google search of “how to… in R?” will usually provide you with the answer you are looking for! You will most probably bump into other R users who have asked the same question on Stack Overflow.

Ask R for help in the R Console

The help() or ?() command typed into the R Console will bring up R Documentation in the help window in R Studio. For example, typing in ?matrix in the R Console should load up the R documentation below.

References

More about R: https://www.r-project.org/about.html

Downloading R: https://cran.r-project.org/bin/windows/base/

Downloading R Studio: https://www.rstudio.com/products/rstudio/download/

MarinStatsLectures Channel on YouTube: https://www.youtube.com/user/marinstatlectures

More about Swirl in R: http://swirlstats.com/

Stack Overflow: https://stackoverflow.com/questions/tagged/r

 

 

 

 

 

 

 

Calculating Townsend scores: Replicating published results

Amy Bonsall, one of our interns talks about how she approached the task of working out how to calculate Townsend scores and then of finding others work to compare against as a way to quality assure the methodology.

As part of the internship project to calculate deprivation scores after finding sources that provide an outline of how to calculate Townsend Deprivation Scores it was important to ensure the methodology would produce scores that matched those already published.

We wanted to calculate scores and compare them to those that had already been calculated by using the same dataset to be sure we were using the same methodology. Whilst I was focused on this, Sanah Yousaf, my partner in this internship, was creating an R script to calculate the scores. Whilst this was being developed I used Excel to calculate the scores. This was not only because we did not yet have an R script but also because I was already comfortable with Excel and it made it easy to visualise the results of each step in the calculation.

Replicating scores proved more difficult than anticipated. Not only were there limited resources of published scores but we also found that many of the people who had already calculated scores had access to unadjusted census data meaning we had different outcomes. The main problem here was, there was no way of knowing if the different data was the only reason for contrasting scores or if it could have also been down to a different formula.

I went through what felt like an endless number of attempts to replicate another’s scores. Each time I would attempt to follow the often-limited detail of the methodology. Each time I failed I’d attempt a slight variation in the calculation to see if this would work with no success. Eventually, I found a source of results calculated for 1991 by Paul Norman. Included with the results was the data used to calculate the scores as well as the Z scores for each of the indicators. The materials provided with these scores were very useful as I could ensure the scores were the same based on the exact same data. It also meant that I could check if the z scores were right before ensuring that the Townsend Deprivation Scores were correct. Success was found with this dataset and meant I could go onto calculating deprivation scores for 2011 knowing that the calculation would be correct.

The next step meant creating scores based on datasets at varied output areas, which was much easier than the previous task. After my partner in the internship, Sanah had created an R script allowing us to calculate the scores, getting results didn’t take long. From here it will be interesting to see any other obstacles that we may come across including mapping the results and comparing them to past censuses. Considering the process so far however, I look forward to confronting them face on.

 

Calculating Townsend scores: An introduction

Amy Bonsall one of our interns talks about what deprivation is and how it could be calculated.

As a student at the University of Manchester studying criminology I was lucky enough to get the opportunity to work on a project with the UK Data Service as an intern calculating Townsend Deprivation Scores for the UK and importantly, learning work environment skills that will be useful once I graduate. My fellow intern (Sanah) and I came with a thirst to learn and an ambition to make the project a success which has made the exciting aspects more rewarding and the obstacles we need to come much more bearable.

Deprivation is a lack of reasonable provisions. This could be in a social way or material. Because there are so many indicators of deprivation and it cannot be measured in one objective way as it is a construct so many deprivation indices have been developed. Each of these indices have their benefits for measuring deprivation as well as areas where they are lacking.

Different methods of calculating have been developed due to a long term need to research deprivation through census data and the ever-changing indications of deprivation. I am currently using the 2011 census to calculate deprivation scores for the UK using the Townsend index. This is just one of many ways deprivation can be calculated however, we have decided this one is appropriate as it measures material deprivation exclusively rather than incorporating social deprivation meaning it can be consistently calculated over time. It is also comparable across the UK.

Before jumping into the data and calculating the deprivation scores it was important to first understand what Townsend’s Index measures and how to measure it. Information on the index was readily available and easy to find giving the initial feeling that the resources required at each stage of the project would be easily found (they weren’t).

Research taught us that Townsend Deprivation scores are calculated based on 4 indicators of deprivation: non-home ownership, non-car ownership, unemployment and overcrowding.

This is calculated by first finding percentage non-car ownership, percentage non- home ownership, percentage unemployment and percentage overcrowding.
The percentages for each area then need to be normalised for the unemployment and overcrowding indicators as these results are very skewed this is done by: ln(percentage value +1).

Z scores are then calculated using the percentage values for each ward under each indicator. For the unemployment and overcrowding variables, the logged versions are used instead.
Z scores= (percentage – mean of all percentages)/ SD of all percentages
Z scores of logged variables= (log percentage – mean of log percentages)/ SD of log percentages

Total of the 4 Z scores= Townsend Deprivation Score

 

Through the sources found it wasn’t perfectly clear how to calculate Z scores from the logged variables. There was no clarification about whether to take the mean and standard deviation of the percentages after they are logged or before. Taking information from different sources gave a good idea of the correct formula, however, the important next step is to test this formula against existing scores to ensure it is correct before continuing the process of this project.

Creating Consistent Deprivation Measures Across the UK

We’ve been lucky enough to have two interns come and work with us over the summer. They have been working on creating a set of Townsend Deprivation scores, using the UK 2011 Census data we have available via InFuse.

The interns came to us through the University of Manchester Q-Step Centre, which coordinates with different types of workplaces to offer 2nd year students the chance to practice the data skills taught through their degree courses at the university.

Sanah Yousaf is studying Law with Criminology.

I am currently a student at the University of Manchester studying Law with Criminology. As part of my degree, I chose a module called Data Analysis for Criminologists which exposed me to the world of data. I enjoyed the course so much that I decided to apply to work as an intern at UK Data Service via the Q-Step internship programme offered at the University of Manchester. As a result, I am now an intern at UK Data Service, specifically in the Census Support team based in Manchester. The project I am working on with my fellow intern (Amy) is calculating Townsend deprivation scores for the UK 2011 Census data.

 

Amy Bonsall is studying Criminology.

As a student at the University of Manchester studying criminology I was lucky enough to get the opportunity to work on a project with the UK Data Service as an intern calculating Townsend Deprivation Scores for the UK and importantly, learning work environment skills that will be useful once I graduate. My fellow intern (Sanah) and I came with a thirst to learn and an ambition to make the project a success which has made the exciting aspects more rewarding and the obstacles we need to come much more bearable.

Amy and Sanah have agreed to write blogs about the project, which we’ll publish over the coming weeks, together with the resources that Amy and Sanah created, to include the raw data and the scores.