Loading and Visualizing LiPD Proxy Data in Python

By Michael Erb (michael.erb@nau.edu), October 28, 2022

Introduction

Proxy records (e.g., ice cores, sediment cores, and speleothems) provide the main source of climate information used in paleoclimate data assimilation. Recent efforts have compiled proxy data into large machine-readible databases.

In this notebook, we'll load and plot proxy data from the Temperature 12k database v. 1.0.2, which can be visualized here: https://lipdverse.org/Temp12k/1_0_2/. The data can be downloaded on the left side of that page.

1. Setting up Google Colab

This notebook was made in Google Colab. The first cell below sets up Google Colab. If you're using this code in a different environment, you can probably skip this first code cell.

2. Importing necessary libraries

Python libraries provide additional functionality. Here, we import some that will be used later.

3. Loading proxy data

Now, let's load the Temp12k database v1.0.2 (https://lipdverse.org/Temp12k/1_0_2/). In python, we can either load lipd files or a pickle file. The two code cells below demonstrate this. You only need to do one or the other, depending on what data format you download. The pickle file is significantly faster to load.

If you download the lipd files, unzip the folder. You can load multiple files in the folder (see below) or specify a single file instead.

The commands below will do several things:

4. Exploring the Proxy Metadata

The proxy database is now stored as a list called "proxy_ts", which is 1276 entries long. Each entry in the list is a dictionary. You can see for yourself:

Python is a 0 indexed language, so to print the first proxy record, we would use the command:

As you can see, there's a lot of data and metadata. Since "proxy_ts[0]" is a dictionary, we need to use the right key to get a specific piece of data or metadata from it. To see all of the keys, you could use the command:

print(proxy_ts[0].keys())

Some of the more important keys are:

Key Explanation
paleoData_values The proxy record data
age The proxy record ages
dataSetName Data set name
paleoData_TSid The "TSid," which is a unique identifier for the proxy record
archiveType Archive type
paleoData_proxyGeneral General proxy type
paleoData_proxy Specific proxy type
paleoData_variableName Variable
geo_meanLat Latitude (-90 to 90)
geo_meanLon Longitude (-180 to 180)
geo_meanElev Elevation (m)
paleoData_interpretation Notes about the interpretation of the proxy record
originalDataUrl The URL of the original proxy record
paleoData_units Units of the data
ageUnits Units of the ages

To demonstrate how keys are used, the code below gets some data and metadata from the proxy record, then makes a simple figure.

To get a better sense of what's in the database, let's make a function to summarize a chosen metadata key across all 1276 records.

Note: Defining functions is useful when you want to run the same code multiple times in different contexts.

Run the code below to display the counts for the archive types across all of the records.

5. Making figures

Okay, let's make some more figures. First, let's make a map of all proxy locations. To do this, let's get the lat and lons of all of our proxy records.

Now, let's create two function:

Both of the functions above use an input list with four values: [lon_min, lon_max, lat_min, lat_min]. Let's try it:

Now, let's use both functions to make a map and list the proxies in a particlar region. In the code below, I've selected a part of southern Asia.

Hopefully this helped you get started using LiPD files in python!