Searching and Downloading GEDI L4A Dataset#
This tutorial will demonstrate how to search and download Global Ecosystem Dynamics Investigation (GEDI) L4A Footprint Level Aboveground Biomass Density (AGBD)[Dubayah et al., 2022] dataset. GEDI L4A dataset is available for the period starting 2019-04-17 and covers latitudes of 52 North to 52 South. GEDI L4A data files are natively in HDF5 format, and each file represents one International Space Station (ISS) orbit.
In this tutorial, we will use earthaccess
Python module to search for GEDI L4A files or granules for time and area of interest. The earthaccess
module uses NASA’s Earthdata Common Metadata Repository (CMR) Application Programming Interface (API). The CMR catalogs metadata records of NASA Earth Science data and make them available for easy programmatic access. Area of interest can be defined using a bounding box (Option 1) or using polygons (Option 2).
Learning Objectives
Learn how to search GEDI L4A data programatically in Python.
Learn how to restrict search results by space (study area) and time (project period).
Plot data extent/boundaries in a map.
Download data files of interest.
Requirements#
Additional prerequisites are provided here.
import datetime as dt
import pandas as pd
import geopandas as gpd
import earthaccess
from shapely.geometry import MultiPolygon, Polygon, box
from shapely.ops import orient
1. Searching with a bounding box#
The dataset Digital Object Identifier or DOI for the GEDI 4A dataset is needed for searching the files (or granules). For this tutorial, let’s use a bounding box of Brazil, which extends north to south from 5.24448639569 N to -33.7683777809 S latitude and east to west from 34.7299934555 E to 73.9872354804 W longitude. We will search and download all the files for July, 2020.
# Brazil bounding box
bound = (-73.9872354804, -33.7683777809, -34.7299934555, 5.24448639569)
# time bound
start_date = dt.datetime(2020, 7, 1) # specify your own start date
end_date = dt.datetime(2020, 7, 31) # specify your end start date
The bounding box and time-bound can be used to search for GEDI L4A files using the earthaccess
module. We will use pandas dataframe to store the download URLs of each file and their bounding geometries.
granules = earthaccess.search_data(
count=-1, # needed to retrieve all granules
bounding_box = bound,
temporal=(start_date, end_date), # time bound
doi='10.3334/ORNLDAAC/2056' # GEDI L4A DOI
)
print(f"Total granules found: {len(granules)}")
Total granules found: 259
Let’s print the details of the first of the files.
granules[0]
Data: GEDI04_A_2020183000018_O08784_01_T02857_02_002_02_V002.h5GEDI04_A_2020183000018_O08784_01_T02857_02_002_02_V002.h5.sha256
Size: 95.15 MB
Cloud Hosted: True
As we see above, the granules are hosted in the NASA Earthdata Cloud.
The granules
object contains metadata about the granules, including the bounding geometry, publication dates, data providers, etc. Now, let’s convert the above granule metadata from json-formatted to geopandas
dataframe. Converting to geopandas
dataframe will let us generate plots of the granule geometry.
def convert_umm_geometry(gpoly):
"""converts UMM geometry to multipolygons"""
multipolygons = []
for gl in gpoly:
ltln = gl["Boundary"]["Points"]
points = [(p["Longitude"], p["Latitude"]) for p in ltln]
multipolygons.append(Polygon(points))
return MultiPolygon(multipolygons)
def convert_list_gdf(datag):
"""converts List[] to geopandas dataframe"""
# create pandas dataframe from json
df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
# keep only last string of the column names
df.columns=df.columns.str.split('.').str[-1]
# convert polygons to multipolygonal geometry
df["geometry"] = df["GPolygons"].apply(convert_umm_geometry)
# return geopandas dataframe
return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")
# only keep three columns
gdf = convert_list_gdf(granules)[['GranuleUR', 'size', 'geometry']]
Now, we have stored the granule URLs and their bounding geometries into the geopandas dataframe gdf
. The first few rows of the table look like the following.
gdf.head()
GranuleUR | size | geometry | |
---|---|---|---|
0 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830000... | 95.152991 | MULTIPOLYGON (((-85.70455 -51.75975, -79.78647... |
1 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830132... | 273.306011 | MULTIPOLYGON (((-103.72545 -51.43057, -97.9257... |
2 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830305... | 213.742908 | MULTIPOLYGON (((-87.70489 -38.51482, -84.17964... |
3 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830305... | 228.248412 | MULTIPOLYGON (((-52.33814 0.03862, -50.18518 3... |
4 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201830438... | 176.070236 | MULTIPOLYGON (((-75.94952 0.01201, -73.79122 3... |
We can now plot the bounding geometries of the granules (shown with green lines in the figure below). The bounding box (of Brazil) is plotted in red color.
# plotting granule geometry
m = gdf.explore(color='green', fillcolor='green')
# plotting bounding box of Brazil
b = list(bound)
gdf_bound = gpd.GeoDataFrame(index=[0], crs='epsg:4326',
geometry=[box(b[0], b[1], b[2], b[3])])
gdf_bound.explore(m = m, color='red', fill=False)
2. Searching for a polygonal area of interest#
If an area of interest is already defined as a polygon, the polygon file in geojson
, shapefile
or kml
formats can be used to find overlapping GEDI L4A files.
For this tutorial, we will use the boundary of a northern state of Brazil, Amapá, to search for the overlapping GEDI files. The boundary polygon is stored in a geojson file called amapa.json
(shown in red polygon in the figure below).
poly = gpd.read_file("polygons/amapa.json").geometry
poly.explore(color='red', fill=False)
In this example, we will use earthaccess
python module to search for all the GEDI L4A overlapping the above polygon.
# bounding lon, lat as a list of tuples
poly = poly.apply(orient, args=(1,))
# simplifying the polygon to bypass the coordinates
# limit of the CMR with a tolerance of .01 degrees
xy = poly.simplify(0.01).get_coordinates()
granules = earthaccess.search_data(
count=-1, # needed to retrieve all granules
doi="10.3334/ORNLDAAC/2056", # GEDI L4A DOI
polygon=list(zip(xy.x, xy.y))
)
print(f"Total granules found: {len(granules)}")
Total granules found: 846
Now, let’s convert the above granule metadata from json-formatted to geopandas dataframe. Converting to geopandas dataframe will let us generate plots of the granule geometry.
# only keep three columns
gdf = convert_list_gdf(granules)[['GranuleUR', 'size', 'geometry']]
We have stored the granule bounding geometries into the geopandas dataframe gdf
. The first few rows of the gdf
dataframe look like the following.
gdf.head()
GranuleUR | size | geometry | |
---|---|---|---|
0 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191120750... | 201.574527 | MULTIPOLYGON (((-54.57329 0.01230, -52.41297 3... |
1 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191121839... | 307.588886 | MULTIPOLYGON (((-135.22130 51.77855, -129.2297... |
2 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191160604... | 213.247052 | MULTIPOLYGON (((-88.24925 -39.41727, -84.61887... |
3 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191160604... | 211.508106 | MULTIPOLYGON (((-51.48309 -0.00009, -28.56284 ... |
4 | GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191190507... | 178.484945 | MULTIPOLYGON (((-54.99693 0.03973, -52.83558 3... |
We can now plot the bounding geometries of the granules (shown with green lines in the figure below) using geopandas. The Amapá state is plotted in red color.
# plotting granule geometry
m = gdf.explore(color='green', fillcolor='green')
poly.explore(m=m, color='red', fill=False)
3. Downloading the files#
We recommend using earthaccess
to download GEDI data granules from the NASA Earthdata. You will first need to authenticate your Earthdata Login (EDL) information using the earthaccess
python library as follows:
# works if the EDL login already been persisted to a netrc
auth = earthaccess.login(strategy="netrc")
if not auth.authenticated:
# ask for EDL credentials and persist them in a .netrc file
auth = earthaccess.login(strategy="interactive", persist=True)
The following will download the first two files. If you want to download all the granules (846 total), please uncomment the third line below.
downloaded_files = earthaccess.download(granules[:2], local_path="full_orbits")
# download all files
# downloaded_files = earthaccess.download(granules, local_path="full_orbits")
downloaded_files
['full_orbits/GEDI04_A_2019112075017_O02026_02_T00059_02_002_02_V002.h5',
'full_orbits/GEDI04_A_2019112075017_O02026_02_T00059_02_002_02_V002.h5.sha256',
'full_orbits/GEDI04_A_2019112183906_O02033_03_T04182_02_002_02_V002.h5',
'full_orbits/GEDI04_A_2019112183906_O02033_03_T04182_02_002_02_V002.h5.sha256']
References#
R.O. Dubayah, J. Armston, J.R. Kellner, L. Duncanson, S.P. Healey, P.L. Patterson, S. Hancock, H. Tang, J. Bruening, M.A. Hofton, J.B. Blair, and S.B. Luthcke. GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1. 2022. doi:10.3334/ORNLDAAC/2056.