UKDS.Stat and DKAN – Ingesting data

Over the course of a few weeks the team has been creating a new instance of UKDS.Stat, one of our products for viewing international data. Doing so has been a challenge but it has been interesting to compare to the data loading process that happens when ingesting data onto DKAN.

For our DKAN instance we create a dataset either using the onsite creator or the feeds module. For the onsite creator it is just a case of adding the metadata including descriptions, titles, DOIs etc. The data can the be uploaded as a csv file and that is it.

The feeds module just simplifies that process if you are uploading multiple datasets by having the details filled in on a csv file instead and just uploading that first, which then adds it to the instance.

Data loading on UKDS.Stat on the other had has been a much more intensive process. Before even moving any files anywhere the data itself needs to be changed into the correct format. This process involves changing it from its initial .ivt file to csv. Then to tab delimited txt format. After that the tabs need to be changed to pipes (the | character) and special characters such as bullet points removed.

Once all that is done the data needs to be uploaded onto the server into the correct file location. Once that is done we needed to build a bat file to run the relevant program to ingest the data onto the actual site, making sure to set all the parameters exactly correct otherwise it wouldn’t work.

If all that works correctly the dataset will be loaded, however the metadata also needs to be set up in pretty much the same way and if the data has any flags (for example one that indicates the data is unavailable) another set of files needs to be setup to get it to work.

Once all the workflows are in place then data processing for UKDS.Stat gets easier and we are exploring an enhanced focus on automated upload and continuing talking to data users about how they want to access, download and use the data.

If you wish to view our new instance of UKDS.Stat click here. Do note at the time of writing the site is still in a beta stage and is subject to change.

Leave a Reply

Your email address will not be published. Required fields are marked *