Mapping census data with ArcGIS Online

Rachel Oldroyd and Luke Burns work step by step through the process of using ArcGIS Online to map census data.

Geographical Information Systems (GIS) have been marked as one of the most important technological developments of the 21st century, providing powerful analytical tools which inform decision making across a number of disciplines. GIS now forms part of the Secondary Geography Curriculum in England, but it’s often difficult for teachers to delegate time to learn a new piece of software alongside other conflicting priorities.

In this tutorial we use a free, online GIS to map UK 2001 census data. ArcGIS Online, provided by ESRI, is easy to use and does not require downloading or installing, so is well suited for use in the classroom. We provide step by step instructions to map and interpret census data, we also provide debugging tips which cover some of the common problems encountered with ArcGIS Online.

Open ArcGIS Online now using your Web Browser by visiting www.arcgis.com/home/webmap/viewer.html. There is no need to make an account or sign in for this tutorial, however doing so provides access to more advanced functionality.

Click on the ‘Modify map’ link from the top right hand corner of the page to begin.

The Web Map Viewer

  1. On the homepage you will see a new map which is centred on the UK. There is a toolbar along the top of the window with a number of different tools and a side panel on the left hand side.

 

  1. At the top of the side panel, you will see three buttons, ‘About’, ‘Content’ and ‘Legend’ which provide further information about the map and the content. If you click on the legend and content buttons now, you will see that the map is currently empty aside from the base map which is called ‘topographic’.

 

  1. To navigate the map window you can click and drag the map to pan to a different location. You can also zoom in and out, zoom to the default extent (the UK) or zoom to your current location using the buttons on the left hand side of the window.

  1. At the moment we are using a base map called ‘topographic’. You can change the base map by clicking on the ‘Basemap’ button on the left hand side of the top toolbar.

  1. Using the zoom and pan buttons, locate Leeds and zoom in such that it occupies the majority of your screen.

Finding Census Data

We will now add 2001 census data. The UK Data Service website contains a wide range of census data to work with.

  1. In a new tab in your web browser. Visit the census pages of the UK Data Service website at the following link: http://casweb.mimas.ac.uk. This website contains huge amounts of census data and as such you will need to specify which data you would like and at which geography (e.g. which country, county, city etc).

 

  1. From the CasWeb homepage, click the ‘Start CasWeb’ link followed by the first link ‘2001 Aggregate Statistics Datasets (with digital boundary data)’. This data format comes in a geographical format (a shapefile) and is ready to map. [Note: unfortunately we cannot use 2011 data as it does not come in the same geographical format, but later in the exercise we compare the 2001 dataset to the 2011 dataset and discuss the changes]. Notice that you can also download data from as far back as 1971.

  1. We now need to specify where we would like to download our census data for – this could be anywhere, but we will focus on West Yorkshire. Use the on-screen options to locate West Yorkshire by selecting the country (England), then ‘select lower geographies’, then select the region (Yorkshire and The Humber), then select ‘Select Counties’, then select the County (West Yorkshire). The illustrations below step you through this process:
  • In Step 1, we specify the country to select. Then choose ‘select lower geographies’ to select a region within England.

  • In Step 2, we narrow down our search for West Yorkshire by specifying the region in which it belongs.

  • In Step 3, we are able to select the West Yorkshire county having searched through the country and region to find this.  Here we can click the ‘Select output Level’ button as we do not need to continue our search.If we wanted to continue and filter Leeds, Bradford etc. we could do.

 

  1. Before we choose the variables to download we first need to specify a geography. Notice the four options presented to you at the bottom of the screen: District, ST Wards, CAS Wards and OA. The smaller the geography, the more detail you will get – you can think of this as being similar to cutting a cake, you can cut it into big or small pieces. In this example, these pieces range from Districts (5 pieces – one for each of Leeds, Bradford, Wakefield, Calderdale and Kirklees) to OAs (7,131 tiny pieces). Let’s select CAS Wards which breaks West Yorkshire down into a manageable 126 areas – select CAS Wards followed by the ‘Select Data…’

  1. Now it is over to you! Using the table towards the bottom of the page, browse the different datasets that you can download for West Yorkshire.  You can highlight a row and click the ‘Display Table Layout’ button to explore the range of data available within each themed table.  Select two datasets that may interest you.

Example using ‘people with poor health’

The instructions below show how to select the number of people with poor health but you should choose a dataset that interests you.

  1. To select persons with poor health visit the ‘Health and provision of unpaid care’ row (KS008) and click the ‘Display Table Layout’

  1. Browse the options available and select box 6 – people who report their general health as ‘Not Good’.

  1. To add this data to your ‘basket’ to download, click the ‘Add variables to data selection’ button above the table. Notice how this adds to the list of data to be downloaded on the right-hand side of the page.

  1. You can then continue searching for data by using the back button above the table or you can proceed to download your data by selecting the ‘Get Data’ button to the top right of your screen.

 

  1. The final step is to give your data a name and select the file format. As we want to map the data we need to check the little button next to Digital Boundary data. You can then click ‘Execute Query’ which will start saving your data to a specified location.

 

Mapping the Data

We have now sourced and downloaded some census data. You may have downloaded the health data stepped through above or you may have downloaded different data. Hopefully you have at least two datasets to explore here as we look to map this.

 

  1. Now, let’s add the data you have just downloaded from CasWeb to ArcGIS Online. In ArcGIS Online, click on the down-arrow beside the Add button to the top left of the main window and choose Add Layer from File. Browse to find the zipped CasWeb file you downloaded earlier, select this and click Import Layer. This may take a few seconds to display on screen.

 

  1. ArcGIS Online will add a default style but this is not always appropriate. Using the drop-down attribute box to the left of the map widow, select one of the variables you downloaded (these will be the longer numbers, in order of selection on CasWeb). You may need to go back into CasWeb to find out which variables the numbers refer to.

  1. Select one of the variables and notice how the software tries to map this for you. Experiment with the display options to show this in a way that you are happy with (e.g. using the Counts and Amounts (colour) option).  Click Done once complete to save this.

 

  1. Spend some time navigating the map and trying to understand the spatial distribution of your data (in the example provided, people who report ‘not good’ health). You may wish to add area labels to make this easier.  Clicking on the three dots “…” beside your map layer and choose Create Labels. Choose Area Labels as opposed to the code to add more useful labels.

 

  1. By this point you may be happy to stop and re-practice the above and if so that is fine – you have downloaded data, mapped it and looked for spatial patterns. However, a nice addition to this session is to compare the currently mapped data with a second dataset to see if any patterns exist between the two.  As you already have one dataset mapped (in my case, people who report ‘not good’ health), it is time to add a second.  To do this, click on the little three dots beside your map layer again “…” and click Copy.

 

  1. This will create a duplicate map layer. You can now repeat the steps followed previously to display a second dataset using this layer (for me, my second dataset is households without access to a car).  Note that this time it would be wise to choose a different method of visualisation so one dataset can be seen ‘on top’ of the other.  If you used Counts and Amounts (colour) last time, using Counts and Amounts (size) this time will enable you to see both datasets at the same time and hopefully draw some comparisons – see my example overleaf.

 

  1. Once selected, click Done and ensure that the map legend (or key) is showing by clicking on the appropriate icon to the top left of the screen. Doing this will enable you to see what colours/symbols represent high values and those that represent low values. You can then explore and compare the data layers.

Follow up questions

If you are running this tutorial with your students, you may want to ask them to think about the following questions:

Q1.  What patterns do both of your datasets show?  What parts of West Yorkshire show particularly high and low values?  Are there are reasons for this?

Q2.  Do both datasets seem to correlate in any way – for example, do they both have high and low values in the same areas or are these rather different?  Does this pattern match what you might have expected?

Q3.  Can you think of any problems with presenting data in this way?  Are the colours or symbols misleading?

Q4.  Visit the Datashine website to compare your dataset(s) from 2001 to those from 2011.  Are the patterns the same or have things changed?  http://datashine.org.uk/ [Note: You will need to pan the map to find West Yorkshire and then use the menu (top right) to locate and map the data – you may find that your datasets(s) aren’t available to select though as the website only contains a selection!]

Debugging tips

The ArcGIS online software is extremely easy to use, only occasionally should you run into problems. Here are a few common scenarios and how to fix them:

1) The ‘Add’ button isn’t visible on the top toolbar.

Simply click the ‘Modify Map’ button in the top right hand corner and it will appear.

2) The ‘table of contents’ panel has disappeared.

Click the ‘Details’ button on the top left hand size of the page and it will reappear.

5) I can’t find the option for ‘change style’

In the ‘Details’ window on the left hand side of the page (see 2 if you can’t see this), ensure that the middle tab is selected – named ‘Show Contents of Map’- hover over the layer name and you will see the ‘Change style’ icon.

6) I’ve changed the style and now I can’t get rid of the ‘Change Style’ window.

Ensure you have clicked ‘OK’ or ‘Done’ at the bottom of the left hand window. The default ‘Details’ window should then appear.

In the very unlikely event that you run into a problem you can’t fix, close the window down and reopen the map.


rachel oldroyd

Rachel Oldroyd is one of our UK Data Service Data Impact Fellows. Rachel is a quantitative human geographer based at the Consumer Data Research Centre (CDRC) at the University of Leeds, researching how different types of data (including TripAdvisor reviews and social media) are used to detect illness caused by contaminated food or drink.

Luke Burns is a Lecturer in Quantitative Human Geography at the University of Leeds. His work focuses on the advanced application of geographical information systems to socioeconomic problems & the development of geodemographic classification systems and composite indicators.  

Analysing Food Hygiene Rating Scores in R: a guide

Rachel Oldroyd, one of the UK Data Service Data Impact Fellows, takes a step-by-step approach to using R and RStudio to analyse Food Hygiene Rating Scores.

Data download and Preparation

In this tutorial we will look at generating some basic statistics in R using a subset of the Food Hygiene Rating Scores dataset provided by the Food Standards Agency (FSA).

Visit http://ratings.food.gov.uk/open-data/en-GB now and download the data for an area you are interested in. I’ve downloaded City of London Corporation.

R is able to parse XML files but it’s easier to load the file into Excel (or a similar package) and save as a CSV file (visit this page if you’re unsure how to do this: https://support.office.com/en-us/article/import-xml-data-6eca3906-d6c9-4f0d-b911-c736da817fa4).

R and RStudio

R is a statistical programming language and data environment.

Unlike other statistics software packages (such as SPSS and Stata) which have point and click interfaces, R runs from the command line. The main advantage of using the command line is that scripts can be saved and quickly rerun, promoting reproducible outputs. If you’re completely new to R, you may want to follow a basic tutorial beforehand to learn R’s basic syntax.

The most commonly used Graphical User Interface for R is called RStudio (https://www.rstudio.com/products/rstudio/) and I highly recommend you use this as it has nifty functionality such as syntax highlighting and auto completion which helps ease the transition from point and click to command line programming.

Basic Syntax

Once installed, launch RStudio. You should see something similar to this setup with the ‘Console’ on the left-hand side, the ‘Environment window’ on the top right and another window with several tabs (Files, Plots, Packages, Help, Viewer) on the bottom right:

Don’t worry if your screen looks slightly different, you can visit View > Panes from the top menu to change the layout of the windows.

The console area is where code is executed. Outputs and error messages are also printed here but content within this area cannot be saved. As one of the main advantages of using R is its ability to create easily reproducible outputs, let’s create a new script which we can save and rerun later. Hit CTRL+SHIFT+N to create a new script. Save this within your working directory using the save icon.

Loading Data

Let’s get on with loading our data. Type

data = read.csv(file.choose())

into the script file and again hit CTRL + Enter whilst your cursor is on the same line to run the command, you can also highlight a block of code and using CTRL + Enter to run the whole thing.

You should see a file browser window; navigate to the CSV file you saved earlier containing the FHRS data. Note the syntax of this command, it creates a variable called data on the left hand side of the equals sign and assigns it to the file loaded in using the read.csv command. Once loaded, you should see the new variable, data, appear in the environment window on the right hand side. To view the data you can double click on the variable name in the environment window and it will appear as a new tab in the left hand window. Note the variables that this data contains. The object includes useful information such as the business name, rating value, last inspection date and address.

Summary statistics

Let’s do some basic analysis. To remove any records with missing values first run the complete.cases command:

data = data[complete.cases(data),]

here we pass our data variable into complete.cases which removes any incomplete cases and overwrites our original object.

To run some basic statistics we need to convert the RatingValue variable to an integer:

data$RatingValue = strtoi(data$RatingValue,base =0L)

Note how we use the $ to access the variables of our data object.

To see the minimum and maximum rating values of food outlets in London we can use the minimum and maximum functions:

min(data$RatingValue)
max(data$RatingValue)

These commands simply give us the minimum and maximum values without any additional information. To see the full records for these particular establishments we can take a subset of our data to only include those which have been awarded a zero star rating for example:

star0 = data[which(data$RatingValue==0), ]

Creating a graph

Lastly, let’s create a barchart to look at the distribution of star ratings for food outlets in London. We will use the ggplot library, to install and then load this library, call:

install.packages(‘ggplot2’)
library(ggplot2)

To create a simple barchart use the following code:

ggplot(data = data, aes(x = RatingValue)) + geom_bar(stat = "count")

Here you can see we have passed RatingValue as the X axis variable in the ‘aesthetics’ function and passed in ‘count’ as the statistic. The output of which should look something like this:

To add x and y labels and a title to your graph use the labs command at the end of the previous line of code:

ggplot(data = data, aes(x = RatingValue)) + geom_bar(stat = "count") + labs(x = "Rating Value", y = 'Number of Food Outlets', title = 'Food Outlet Rating Values in London')


rachel oldroyd

Rachel Oldroyd is one of our UK Data Service Data Impact Fellows. Rachel is a quantitative human geographer based at the Consumer Data Research Centre (CDRC) at the University of Leeds, researching how different types of data (including TripAdvisor reviews and social media) are used to detect illness caused by contaminated food or drink.

Exploring Kepler.gl

Shows the home page of the Kepler.gl site

Kepler.gl is an open source mapping tool that claims to work for large scale datasets.

It has been developed by Uber, where they have developed an in-house solution based on open source components which they use to analyse their data. Luckily for us, they decided to make their solution open source and available to us.

Kepler.gl works within your browser, which is a nice feature as it means you retain control of your data, which could be important if you wanted to map data which could contain sensitive data.

To try the system out I downloaded our 2011 Census Headcounts, in particular the file called UK postcode data and supporting metadata for 2011 frozen postcodes, which is a zip file.

I unzipped this, ready for me to load into Kepler.gl. I chose this dataset as I know it contains latitude and longitude information, as well as population and deprivation data.

Uploading data was pretty straight forward. There’s an option to browse for your data file or drag and drop the file into the browser.

A slightly annoying bit for me was that map opens focused on San Francisco, when I know the data I added was for the UK. But it was easy to refocus the map on the UK using the standard grab-and-pull functionality.

To map the data, I needed to add a layer and choose the type of data.

For this data I knew it was point data. I also entered a name of the layer. I called it 2011 Census Postcodes. It’s possible with Kepler.gl to add more than one layer so giving your new layer an meaning full name is useful.

It next asked for the fields that contain the Lat(latitude) and Lng(longitude).

In our data I discovered that we mislabelled them, so the field names were the opposite of what they should be (I’ll get this corrected).


You’ll notice that there is the option to add a field to represent the Altitude. For this initial visualisation, I left that blank.

This now created a map showing UK postcodes, but (to be honest) it was a bit boring.

Kepler.gl has the option to colour the postcode points based on the value of a field.

In this data, were the UK Townsend Deprivation scores as quintiles calculated at the output area level, so I used this field to colour-code the points. I also sized the points based on the number of people living in that postcode.

The finished map of the UK shows a very mixed view, but if you zoom into a town and city you can then see the differences between postcodes.


For example, here’s a map of Belfast showing differences in deprivation between postcodes. Dark red is less deprived and yellow is most deprived.

Overall I found this web app easy to use, but it may give some issues for people unfamiliar with mapping.

However, as a free tool to map data without sending it back to a server it presents an option to map more personal data without the worry of having this data hosted some where you don’t know.


Rob Dymond-Green is a Senior Technical Co-ordinator for the UK Data Service, working with aggregate census and international data.