Searching and Downloading GEDI L4A Dataset#

This tutorial will demonstrate how to search and download Global Ecosystem Dynamics Investigation (GEDI) L4A Footprint Level Aboveground Biomass Density (AGBD)[Dubayah et al., 2022] dataset. GEDI L4A dataset is available for the period starting 2019-04-17 and covers latitudes of 52 North to 52 South. GEDI L4A data files are natively in HDF5 format, and each file represents one International Space Station (ISS) orbit.

In this tutorial, we will use earthaccess Python module to search for GEDI L4A files or granules for time and area of interest. The earthaccess module uses NASA’s Earthdata Common Metadata Repository (CMR) Application Programming Interface (API). The CMR catalogs metadata records of NASA Earth Science data and make them available for easy programmatic access. Area of interest can be defined using a bounding box (Option 1) or using polygons (Option 2).

Learning Objectives

  • Learn how to search GEDI L4A data programatically in Python.

  • Learn how to restrict search results by space (study area) and time (project period).

  • Plot data extent/boundaries in a map.

  • Download data files of interest.

Requirements#

Additional prerequisites are provided here.

import datetime as dt 
import pandas as pd
import geopandas as gpd
import earthaccess
from shapely.geometry import MultiPolygon, Polygon, box
from shapely.ops import orient

1. Searching with a bounding box#

The dataset Digital Object Identifier or DOI for the GEDI 4A dataset is needed for searching the files (or granules). For this tutorial, let’s use a bounding box of Brazil, which extends north to south from 5.24448639569 N to -33.7683777809 S latitude and east to west from 34.7299934555 E to 73.9872354804 W longitude. We will search and download all the files for July, 2020.

# Brazil bounding box
bound = (-73.9872354804, -33.7683777809, -34.7299934555, 5.24448639569) 

# time bound
start_date = dt.datetime(2020, 7, 1) # specify your own start date
end_date = dt.datetime(2020, 7, 31)  # specify your end start date

The bounding box and time-bound can be used to search for GEDI L4A files using the earthaccess module. We will use pandas dataframe to store the download URLs of each file and their bounding geometries.

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    bounding_box = bound,
    temporal=(start_date, end_date), # time bound
    doi='10.3334/ORNLDAAC/2056' # GEDI L4A DOI 
)
print(f"Total granules found: {len(granules)}")
Total granules found: 259

Let’s print the details of the first of the files.

granules[0]

As we see above, the granules are hosted in the NASA Earthdata Cloud.

The granules object contains metadata about the granules, including the bounding geometry, publication dates, data providers, etc. Now, let’s convert the above granule metadata from json-formatted to geopandas dataframe. Converting to geopandas dataframe will let us generate plots of the granule geometry.

def convert_umm_geometry(gpoly):
    """converts UMM geometry to multipolygons"""
    multipolygons = []
    for gl in gpoly:
        ltln = gl["Boundary"]["Points"]
        points = [(p["Longitude"], p["Latitude"]) for p in ltln]
        multipolygons.append(Polygon(points))
    return MultiPolygon(multipolygons)

def convert_list_gdf(datag):
    """converts List[] to geopandas dataframe"""
    # create pandas dataframe from json
    df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
    # keep only last string of the column names
    df.columns=df.columns.str.split('.').str[-1]
    # convert polygons to multipolygonal geometry
    df["geometry"] = df["GPolygons"].apply(convert_umm_geometry)
    # return geopandas dataframe
    return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")

# only keep three columns
gdf = convert_list_gdf(granules)[['GranuleUR', 'size', 'geometry']]

Now, we have stored the granule URLs and their bounding geometries into the geopandas dataframe gdf. The first few rows of the table look like the following.

gdf.head()
GranuleUR size geometry
0 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830000... 95.152991 MULTIPOLYGON (((-85.70455 -51.75975, -79.78647...
1 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830132... 273.306011 MULTIPOLYGON (((-103.72545 -51.43057, -97.9257...
2 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830305... 213.742908 MULTIPOLYGON (((-87.70489 -38.51482, -84.17964...
3 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830305... 228.248412 MULTIPOLYGON (((-52.33814 0.03862, -50.18518 3...
4 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830438... 176.070236 MULTIPOLYGON (((-75.94952 0.01201, -73.79122 3...

We can now plot the bounding geometries of the granules (shown with green lines in the figure below). The bounding box (of Brazil) is plotted in red color.

# plotting granule geometry
m = gdf.explore(color='green',  fillcolor='green')

# plotting bounding box of Brazil
b = list(bound)
gdf_bound = gpd.GeoDataFrame(index=[0], crs='epsg:4326', 
                             geometry=[box(b[0], b[1], b[2], b[3])])
gdf_bound.explore(m = m, color='red', fill=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

2. Searching for a polygonal area of interest#

If an area of interest is already defined as a polygon, the polygon file in geojson, shapefile or kml formats can be used to find overlapping GEDI L4A files.

For this tutorial, we will use the boundary of a northern state of Brazil, Amapá, to search for the overlapping GEDI files. The boundary polygon is stored in a geojson file called amapa.json (shown in red polygon in the figure below).

poly = gpd.read_file("polygons/amapa.json").geometry
poly.explore(color='red',  fill=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

In this example, we will use earthaccess python module to search for all the GEDI L4A overlapping the above polygon.

# bounding lon, lat as a list of tuples
poly = poly.apply(orient, args=(1,))
# simplifying the polygon to bypass the coordinates 
# limit of the CMR with a tolerance of .01 degrees
xy = poly.simplify(0.01).get_coordinates()

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    doi="10.3334/ORNLDAAC/2056", # GEDI L4A DOI 
    polygon=list(zip(xy.x, xy.y))
)
print(f"Total granules found: {len(granules)}")
Total granules found: 846

Now, let’s convert the above granule metadata from json-formatted to geopandas dataframe. Converting to geopandas dataframe will let us generate plots of the granule geometry.

# only keep three columns
gdf = convert_list_gdf(granules)[['GranuleUR', 'size', 'geometry']]

We have stored the granule bounding geometries into the geopandas dataframe gdf. The first few rows of the gdf dataframe look like the following.

gdf.head()
GranuleUR size geometry
0 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191120750... 201.574527 MULTIPOLYGON (((-54.57329 0.01230, -52.41297 3...
1 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191121839... 307.588886 MULTIPOLYGON (((-135.22130 51.77855, -129.2297...
2 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191160604... 213.247052 MULTIPOLYGON (((-88.24925 -39.41727, -84.61887...
3 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191160604... 211.508106 MULTIPOLYGON (((-51.48309 -0.00009, -28.56284 ...
4 GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191190507... 178.484945 MULTIPOLYGON (((-54.99693 0.03973, -52.83558 3...

We can now plot the bounding geometries of the granules (shown with green lines in the figure below) using geopandas. The Amapá state is plotted in red color.

# plotting granule geometry
m = gdf.explore(color='green',  fillcolor='green')
poly.explore(m=m, color='red',  fill=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

3. Downloading the files#

We recommend using earthaccess to download GEDI data granules from the NASA Earthdata. You will first need to authenticate your Earthdata Login (EDL) information using the earthaccess python library as follows:

# works if the EDL login already been persisted to a netrc
auth = earthaccess.login(strategy="netrc") 
if not auth.authenticated:
    # ask for EDL credentials and persist them in a .netrc file
    auth = earthaccess.login(strategy="interactive", persist=True)

The following will download the first two files. If you want to download all the granules (846 total), please uncomment the third line below.

downloaded_files = earthaccess.download(granules[:2], local_path="full_orbits")
# download all files 
# downloaded_files = earthaccess.download(granules, local_path="full_orbits")
downloaded_files
['full_orbits/GEDI04_A_2019112075017_O02026_02_T00059_02_002_02_V002.h5',
 'full_orbits/GEDI04_A_2019112075017_O02026_02_T00059_02_002_02_V002.h5.sha256',
 'full_orbits/GEDI04_A_2019112183906_O02033_03_T04182_02_002_02_V002.h5',
 'full_orbits/GEDI04_A_2019112183906_O02033_03_T04182_02_002_02_V002.h5.sha256']

References#

[1]

R.O. Dubayah, J. Armston, J.R. Kellner, L. Duncanson, S.P. Healey, P.L. Patterson, S. Hancock, H. Tang, J. Bruening, M.A. Hofton, J.B. Blair, and S.B. Luthcke. GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1. 2022. doi:10.3334/ORNLDAAC/2056.