streamline the boring stuff: the CSO edition

I’ve recently been working a lot with APIs (otherwise known as Application Programming Interfaces) in my role with GRID Finance as we aim to apply “API-first” principles to our analytics offering. It reminded me of the post I created about Ireland and her Exports because the data that inspired that post came from the Central Statistics Office of Ireland (CSO).

What I didn’t realise at the time was that the CSO has a free to use API so instead of manually downloading a CSV file and then importing it into a tool like Python, developers can connect into the CSO directly and… streamline the boring stuff! This post will show how the API works using Python so you can spend more time working with the data. If you want to get straight to the good stuff, feel free to jump to the end of the page where you’ll find a text file with the code you need to get started.

connect to the CSO with python in 6 easy steps

  1. Install the package json-rpc. This allows you to explore the response you get from the API which is returned as a JSON object. 
  2. Find the dataset you want to work with on the CSO site. In this example we’re going to use the monthly unemployment dataset, specifically the “MUM01 – Seasonally Adjusted Monthly Unemployment” from the following site: https://data.cso.ie
  3. Select the data you want to include in your dataset. We’re going to select the unemployment rate as a percentage, the last 12 months, all age groups and all sexes.
  4. Scroll down the page and expand the API Data Query option, making sure you’ve selected ‘JSON-RPC’ within the drop down – this will show you the commands you need to query the API, including the options you selected in step 3. What we want is the GET URL, and the POST Body.
  5. Create 1 variable for the URL and another for the API request – I’ve named them URL & payload.
  6. Call the API using the requests.post(url, Jason=payload).json() command.

code to access the CSO API for monthly unemployment figures

# monthly unemployment figures - using the API request from the CSO site 

url = "https://ws.cso.ie/public/api.jsonrpc?data=%7B%22jsonrpc%22:%222.0%22,%22method%22:%22PxStat.Data.Cube_API.ReadDataset%22,%22params%22:%7B%22class%22:%22query%22,%22id%22:%5B%22STATISTIC%22,%22TLIST(M1)%22,%22C02076V02508%22,%22C02199V02655%22%5D,%22dimension%22:%7B%22STATISTIC%22:%7B%22category%22:%7B%22index%22:%5B%22MUM01C02%22%5D%7D%7D,%22TLIST(M1)%22:%7B%22category%22:%7B%22index%22:%5B%22202207%22,%22202206%22,%22202205%22,%22202204%22,%22202203%22,%22202202%22,%22202201%22%5D%7D%7D,%22C02076V02508%22:%7B%22category%22:%7B%22index%22:%5B%22316%22%5D%7D%7D,%22C02199V02655%22:%7B%22category%22:%7B%22index%22:%5B%22-%22%5D%7D%7D%7D,%22extension%22:%7B%22pivot%22:null,%22codes%22:false,%22language%22:%7B%22code%22:%22en%22%7D,%22format%22:%7B%22type%22:%22JSON-stat%22,%22version%22:%222.0%22%7D,%22matrix%22:%22MUM01%22%7D,%22version%22:%222.0%22%7D%7D"

payload = {
"jsonrpc": "2.0",
"method": "PxStat.Data.Cube_API.ReadDataset",
"params": {
    "class": "query",
    "id": [
        "STATISTIC",
        "TLIST(M1)",
        "C02076V02508",
        "C02199V02655"
    ],
    "dimension": {
        "STATISTIC": {
            "category": {
                "index": [
                    "MUM01C02"
                ]
            }
        },
        "TLIST(M1)": {
            "category": {
                "index": [
                   "202208",
						"202207",
						"202206",
						"202205",
						"202204",
						"202203",
						"202202",
						"202201",
						"202112",
						"202111",
						"202110",
						"202109",
						"202108"
                ]
            }
        },
        "C02076V02508": {
            "category": {
                "index": [
                    "316"
                ]
            }
        },
        "C02199V02655": {
            "category": {
                "index": [
                    "-"
                ]
            }
        }
    },
    "extension": {
        "pivot": 'null',
        "codes": 'false',
        "language": {
            "code": "en"
        },
        "format": {
            "type": "JSON-stat",
            "version": "2.0"
        },
        "matrix": "MUM01"
    },
    "version": "2.0"
}
}

# calling the API with the requests package, and json to explore the data
response = requests.post(url, json=payload).json()

exploring the output of the JSON object

When we take a look at the response, it can be daunting – it’s not in a nice data frame that we can automatically use for data exploration so a bit of data wrangling is needed. To access the data within the response we can use commands like response[‘result’] – the API response is like a set of lists that are easily accessible. Each API provider is different, so a bit of time up front is needed to understand each one. But it also means that once you figure out how the CSO works, you can use it for any of their published datasets!

What we need from this response is the ‘result’, which we can access using response[‘result’].items() and save this as a pandas data frame. From there we extract the date range and values and then combine the 2 to create our final dataset. I’ve skipped a few steps on how I extracted the data but I suggest you have a play around with the queries by breaking them into their own queries to get a better understanding of how its done.

bonus step

If you know you’re going to query this API on a regular basis and don’t want to have to manually select or deselect months, you can use variables to assign dates that you pass into the API query instead. For example, in the attached python script, I’ve future-proofed the API call by creating variables with future dates and including them into the query.

I hope this was useful. Hit me up if you’ve any questions.

The full script can be found below.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: