Accessing GEDI L4A variables using NASA Harmony API#

Overview#

NASA’s Harmony Services allows seamless access and production of analysis-ready Earth observation data across different DAACs, by enabling cloud-based spatial, temporal, and variable subsetting and data conversions. The Global Ecosystem Dynamics Investigation (GEDI) L4A Footprint Level Aboveground Biomass Density (AGBD)[Dubayah et al., 2022] is available from the Harmony Trajectory Subsetter API.

This tutorial demonstrates how to directly access and subset the GEDI L4A variables using Harmony API for an area in NASA’s Delta-X project. The Delta-X project collects field and airborne measurements of ecological and hydrology variables over the two river basins (Atchafalaya and Terrebonne) in the Mississippi River Delta of the United States. The subset of the GEDI L4A dataset for the Delta-X area can enable a comparison of aboveground biomass between GEDI L4A and the field measurements.

Learning Objectives

  • Use NASA’s Harmony Services to retrieve the GEDI L4A dataset. The Harmony API allows access to selected variables for the dataset within the spatial-temporal bounds without having to download the whole data file.

  • Compute summaries of AGBD across various plant functional types (PFTs)

Requirements#

While NASA’s Harmony services are available directly through RESTful API, we will use Harmony-Py Python library for this tutorial. Harmony-Py provides a friendly interface for integrating with NASA’s Harmony Services. In addition to the Harmony-Py python module, additional prerequisites are provided here.

# import python modules
from datetime import datetime
import earthaccess
import h5py
import pandas as pd
import geopandas as gpd
from harmony import BBox, Client, Collection, Request

Harmony Client#

NASA Harmony API requires NASA Earthdata Login (EDL). We recommend authenticating your Earthdata Login (EDL) information using the earthaccess python library as follows:

# works if the EDL login already been persisted to a netrc
try:
    auth = earthaccess.login(strategy="netrc")
except FileNotFoundError:
    # ask for EDL credentials and persist them in a .netrc file
    auth = earthaccess.login(strategy="interactive", persist=True)

Alternatively, you can also login to harmony_client directly by passing EDL authentication as the following in the Jupyter Notebook itself:

harmony_client = Client(auth=("your EDL username", "your EDL password"))

First, we create a Harmony Client object. If you are passing the EDL authentication, please do as shown above with auth parameter.

harmony_client = Client()

Retrieve Concept ID#

Now let’s retrieve the Concept ID of the GEDI L4A dataset. The Concept ID is NASA Earthdata’s unique ID for its dataset.

# GEDI L4A DOI
DOI = '10.3334/ORNLDAAC/2056'

def get_concept_id(doi):
    """get concept id from DOI using earthaccess"""
    col = earthaccess.search_datasets(doi=doi)
    for ds in col:
        # check if Harmony trajectory service exists
        if 'S2836723123-XYZ_PROV' in ds.services().keys():
            return(ds.concept_id())

concept_id = get_concept_id(DOI)
concept_id
'C2237824918-ORNL_CLOUD'

Define Request Parameters#

Let’s create a Harmony Collection object with the concept_id retrieved above. We will also define the GEDI L4A variables of interest and temporal range.

# harmony collection
collection = Collection(id=concept_id)

def create_var_names(gedi_vars):
    """combines variable names with the beam and returns as a list"""
    # gedi beams
    beams = ['BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011',
             'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011']
    # combine variables and beam names
    return [f'/{b}/{v}' for b in beams for v in gedi_vars]

variables = create_var_names(['agbd', 'l4_quality_flag'])
temporal_range = {'start': datetime(2019, 4, 1),
                  'stop': datetime(2023, 3, 31)}

We will use the spatial extent of a Pre-DeltaX Vegetation Structure dataset[Castaneda et al., 2020]. The location and aboveground biomass of the Salix nigra plots collected in the Spring of 2015 are provided as a GeoJSON file at polygons/atchafalaya_salix_spring15.json. Let’s open this file and compute its bound.

salix = gpd.read_file("polygons/atchafalaya_salix_spring15.json")
b = salix.total_bounds
# bounding box for Harmony
bounding_box = BBox(w=b[0], s=b[1], e=b[2], n=b[3])
# map of Salix plots
m = salix.explore("agb", color='red', style_kwds={"radius": 10}, legend=True)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Create and Submit Harmony Request#

Now, we can create a Harmony request with variables, temporal range, and bounding box and submit the request using the Harmony client object. We will use the download_all method, which uses a multithreaded downloader and returns a concurrent future. Futures are asynchronous and let us use the downloaded file as soon as the download is complete while other files are still being downloaded.

request = Request(collection=collection,
                  variables=variables,
                  temporal=temporal_range,
                  spatial=bounding_box,
                  ignore_errors=True)

# submit harmony request, will return job id
subset_job_id = harmony_client.submit(request)

print(f'Processing job: {subset_job_id}')

print('Waiting for the job to finish')
results = harmony_client.result_json(subset_job_id, show_progress=True)

print('Downloading subset files...')
futures = harmony_client.download_all(subset_job_id,
                                      directory='subsets',
                                      overwrite=True)
downloaded = []
for f in futures:
    # all subsetted files have this suffix
    if f.result().endswith('subsetted.h5'):
        downloaded.append(f.result())

print('Done downloading files')
Processing job: 1f4ccffb-bca5-4c33-9c0e-56ffd8587224
Waiting for the job to finish
 [ Processing:  81% ] |#########################################          | [\]
Job is running with errors.
 [ Processing: 100% ] |###################################################| [|]
Downloading subset files...
subsets/90785461_GEDI04_A_2019200010439_O03388_02_T00337_02_002_02_V002_subsetted.h5
subsets/90785460_GEDI04_A_2019148212155_O02594_02_T04606_02_002_02_V002_subsetted.h5
subsets/90785463_GEDI04_A_2020126054750_O07904_02_T00337_02_002_02_V002_subsetted.h5
subsets/90785464_GEDI04_A_2020176175100_O08687_03_T04351_02_002_02_V002_subsetted.h5
subsets/90785462_GEDI04_A_2020102150820_O07538_02_T04606_02_002_02_V002_subsetted.h5
subsets/90785465_GEDI04_A_2020180161620_O08748_03_T00082_02_002_02_V002_subsetted.h5
subsets/90785466_GEDI04_A_2020298094127_O10573_02_T07452_02_002_02_V002_subsetted.h5
subsets/90785467_GEDI04_A_2021081223144_O12891_02_T06029_02_002_02_V002_subsetted.h5
subsets/90785469_GEDI04_A_2021149035919_O13933_03_T10043_02_002_02_V002_subsetted.h5
subsets/90785472_GEDI04_A_2022014085738_O17501_03_T05774_02_002_02_V002_subsetted.h5
subsets/90785468_GEDI04_A_2021085205857_O12952_02_T10298_02_002_02_V002_subsetted.h5
subsets/90785470_GEDI04_A_2021268203812_O15788_02_T06029_02_002_02_V002_subsetted.h5
subsets/90785471_GEDI04_A_2021308125851_O16403_03_T07197_02_002_02_V002_subsetted.h5
subsets/90785477_GEDI04_A_2022266205747_O21417_02_T06029_02_003_01_V002_subsetted.h5
subsets/90785476_GEDI04_A_2022240072928_O21005_02_T10298_02_003_01_V002_subsetted.h5
subsets/90785478_GEDI04_A_2022331191527_O22424_02_T00337_02_003_01_V002_subsetted.h5
Done downloading files

Read Subset files#

All the subsetted files are saved as _subsetted.h5. Let’s read these h5 files into the pandas dataframe.

subset_df = pd.DataFrame()
for subfile in downloaded:
    with h5py.File(subfile, 'r') as hf_in:
        for v in list(hf_in.keys()):
            if v.startswith('BEAM'):
                beam = hf_in[v]
                col_names = []
                col_val = []
                # read all variables
                for key, value in beam.items():
                    # check if the item is a group
                    if isinstance(value, h5py.Group):
                        # looping through subgroups
                        for key2, value2 in value.items():
                            col_names.append(key2)
                            col_val.append(value2[:].tolist())
                    else:
                        col_names.append(key)
                        col_val.append(value[:].tolist())

                # Appending to the subset_df dataframe
                beam_df = pd.DataFrame(map(list, zip(*col_val)), columns=col_names)
                subset_df = pd.concat([subset_df, beam_df])

# print head of dataframe
subset_df.head()
agbd delta_time l4_quality_flag lat_lowestmode lon_lowestmode shot_number
0 11.257590 4.873545e+07 0 29.484904 -91.425244 33880300200240209
1 10.988078 4.873545e+07 0 29.485281 -91.424842 33880300200240210
2 10.737959 4.873545e+07 0 29.485658 -91.424439 33880300200240211
3 11.125208 4.873545e+07 0 29.486035 -91.424037 33880300200240212
0 10.388836 4.873545e+07 0 29.484931 -91.429370 33880500200240091

Quality Filter and Plot#

We can now quality filter the dataset and only retrieve the good quality shots.

gdf = gpd.GeoDataFrame(subset_df,
                       geometry=gpd.points_from_xy(subset_df.lon_lowestmode,
                                                   subset_df.lat_lowestmode), 
                       crs="EPSG:4326")

# creating mask with good quality shots
mask = gdf['l4_quality_flag']==1
# plotting
gdf = gdf[['lat_lowestmode', 'lon_lowestmode', 'agbd', 'geometry']]
m = gdf[mask].explore("agbd", style_kwds={"radius": 5}, legend=True)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Now that we have GEDI data downloaded, can you compare the aboveground estimates of GEDI L4A with that of the Pre-DeltaX vegetation dataset?

References#

[1]

E. Castaneda, A.I. Christensen, M. Simard, A. Bevington, R. Twilley, and A. Mccall. Pre-Delta-X: Vegetation Species, Structure, Aboveground Biomass, MRD, LA, USA, 2015. 2020. doi:10.3334/ORNLDAAC/1805.

[2]

R.O. Dubayah, J. Armston, J.R. Kellner, L. Duncanson, S.P. Healey, P.L. Patterson, S. Hancock, H. Tang, J. Bruening, M.A. Hofton, J.B. Blair, and S.B. Luthcke. GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1. 2022. doi:10.3334/ORNLDAAC/2056.