Exploring GEDI L4A data structure

Exploring GEDI L4A data structure#

This tutorial will explore data structure, variables, and quality flags of the Global Ecosystem Dynamics Investigation (GEDI) L4A Footprint Level Aboveground Biomass Density (AGBD)[Dubayah et al., 2022] dataset. GEDI L4A dataset is currently available for the period starting 2019-04-17 and covers 52 North to 52 South latitudes.

Details about the GEDI L4A dataset are available in the User Guide and the Algorithm Theoretical Basis Document (ATBD)] [Kellner et al., 2022]. Please refer to other tutorials for searching and downloading the GEDI L4A dataset and subsetting GEDI L4A footprints by area of interest.

Learning Objectives

Understand GEDI L4A data structure, variables and quality flags (study area).

Requirements#

Additional prerequisites are provided here.

# import all the modules
from os import path
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import h5py
import earthaccess
from shapely.geometry import Point
from matplotlib_scalebar.scalebar import ScaleBar
from matplotlib.lines import Line2D
from matplotlib import colormaps
from matplotlib.colors import ListedColormap
from scipy import stats
import contextily as ctx
plt.style.use('seaborn-v0_8-whitegrid')

2. Example file#

In this tutorial, we will use one GEDI L4A file, GEDI04_A_2020207182449_O09168_03_T03028_02_002_02_V002.h5. Let’s first search the file using the earthaccess module. Please refer to the previous tutorial for more details on downloading programmatically.

# GEDI L4a file
GEDIL4A = 'GEDI04_A_2020207182449_O09168_03_T03028_02_002_02_V002.h5'

# authenticate using the EDL login
try:
    auth = earthaccess.login(strategy="netrc")
except FileNotFoundError:
    auth = earthaccess.login(strategy="interactive", persist=True)

# search the granule
granules = earthaccess.search_data(
    doi="10.3334/ORNLDAAC/2056", # GEDI L4A DOI
    granule_name = f"*{GEDIL4A}",
)
# print the granule details
granules[0]

Data: GEDI04_A_2020207182449_O09168_03_T03028_02_002_02_V002.h5

Size: 358.96 MB

Cloud Hosted: True

The file is ~359 MB in size. We will go ahead and use earthaccess python module to get the file directly and save it to the full_orbits directory.

# download files to full_orbits directory
downloaded_files = earthaccess.download(granules, local_path="full_orbits")

GEDI footprints data, including the Level 4A, are natively in HDF5 format, and each file represents one of the four International Space Station (ISS). The file naming convention of the GEDI L4A is as GEDI04_A_YYYYDDDHHMMSS_O[orbit_number]_[granule_number]_T[track_number]_[PPDS_type]_ [release_number]_[production_version]_V[version_number].h5:

GEDI04_A: product short name
YYYYDDDHHMMSS: date and time of acquisition in Julian day of year, hours, minutes, and seconds format
[orbit_number]: orbit number
[granule_number]: sub-orbit granule (or file) number
[track_number]: track number
[PPDS_type]: positioning and pointing determination system (PPDS) type (00: predict, 01: rapid, > 02: final)
[release_number]: release number 002, representing the SOC SDS (software) release
[production_version]: granule production version
[version_number]: dataset version

The GEDI L4A file GEDI04_A_2020207182449_O09168_03_T03028_02_002_02_V002.h5 represents orbit number 09168, sub-orbit 03, and track number 03028 for the starting date-time in YYYYDDDHHMMSS format 2020207182449 or 2020-07-25 18:24:49 UTC. The file has the PPDS type of 02, which means it is final quality data. The file has a granule production version of 02 and Version 002 (Release 002).

3. Root Groups#

Let’s open this file using the h5py python library and print the root-level groups of the file.

l4af = path.join('full_orbits', GEDIL4A)
# read the L4A file
hf = h5py.File(l4af, 'r')
# printing root-level groups
list(hf.keys())

['ANCILLARY',
 'BEAM0000',
 'BEAM0001',
 'BEAM0010',
 'BEAM0011',
 'BEAM0101',
 'BEAM0110',
 'BEAM1000',
 'BEAM1011',
 'METADATA']

All science variables are organized by eight beams of the GEDI. Please refer to the GEDI L4A user guide and GEDI L4A data dictionary for details. The root-level group also includes ANCILLARY and METADATA groups.

3. METADATA group#

The METADATA group contains information about the file metadata. All the metadata attributes are provided in the DatasetIdentification group within METADATA. Let’s print its content.

# read the METADATA group
metadata = hf['METADATA/DatasetIdentification']
# store attributes and descriptions in an array
data = []
for attr, desc in metadata.attrs.items():
    data.append([attr, desc])

# display `data` as a pandas table
df = pd.DataFrame(data, columns=['attribute', 'description'])
tbl_n = 1 # table number
caption = f'Table {tbl_n}. Attributes and discription from `METADATA` group'
df.style.set_caption(caption).hide(axis="index")

Table 1. Attributes and discription from `METADATA` group
attribute	description
PGEVersion	002
VersionID	02
abstract	The GEDI L4A standard data product contains predictions of aboveground biomass density within each laser footprint.
characterSet	utf8
creationDate	2022-02-21T10:02:04.611768Z
credit	The software that generates the L4A product was implemented at the Department of Geographical Sciences at the University of Maryland (UMD), in collaboration with the GEDI Science Data Processing System at the NASA Goddard Space Flight Center (GSFC) in Greenbelt, Maryland and the Institute at Brown for Environment and Society at Brown University.
fileName	GEDI04_A_2020207182449_O09168_03_T03028_02_002_02_V002.h5
gedi_l4a_githash	15f188fa6d18cd509186d8d36bea668faa22b1f5
language	eng
originatorOrganizationName	GSFC GEDI-SDPS > GEDI Science Data Processing System and University of Maryland
purpose	The purpose of the L4A dataset is to provide an estimate of aboveground biomass density from each GEDI waveform.
shortName	GEDI_L4A
spatialRepresentationType	along-track
status	onGoing
topicCategory	geoscientificInformation
uuid	97b127da-9915-4e71-8929-fdae799447b0

4. ANCILLARY group#

The ANCILLARY group has three sub-groups - model_data, pft_lut, and region_lut.

# read the ANCILLARY group
ancillary = hf['ANCILLARY']
# print the subgroups
list(ancillary.keys())

['model_data', 'pft_lut', 'region_lut']

The model_data subgroup contains information about parameters and variables from the L4A models used to generate predictions. The details about model_data variables are available in ‘Table 6’ of the user guide. Let’s print the model_data variables.

# read model_data subgroup
model_data = ancillary['model_data']
# print variables, data types, data dimension
model_data.dtype

dtype([('predict_stratum', 'O'), ('model_group', 'u1'), ('model_name', 'O'), ('model_id', 'u1'), ('x_transform', 'O'), ('y_transform', 'O'), ('bias_correction_name', 'O'), ('fit_stratum', 'O'), ('rh_index', 'u1', (8,)), ('predictor_id', 'u1', (8,)), ('predictor_max_value', '<f4', (8,)), ('vcov', '<f8', (5, 5)), ('par', '<f8', (5,)), ('rse', '<f4'), ('dof', '<u4'), ('response_max_value', '<f4'), ('bias_correction_value', '<f4'), ('npar', 'u1')])

Some model_data parameters have more than one dimension (e.g., vcov, which is the variance-covariance matrix of model parameters, has a dimension of 5 x 5). First, let’s read all one-dimensional variables into a pandas dataframe.

# initialize an empty dataframe
model_data_df = pd.DataFrame()
# loop through parameters
for v in model_data.dtype.names:
    # exclude multidimensional variables
    if len(model_data[v].shape) == 1:
        # copy parameters as dataframe column
        model_data_df[v] = model_data[v]
        # converting object datatype to string
        if model_data_df[v].dtype.kind=='O':
            model_data_df[v] = model_data_df[v].str.decode('utf-8')

# print the parameters
tbl_n += 1
caption = f'Table {tbl_n}. Parameters and their values in `model_data` subgroup'
model_data_df.style.set_caption(caption).hide(axis="index")

Table 2. Parameters and their values in `model_data` subgroup
predict_stratum	model_group	model_name	model_id	x_transform	y_transform	bias_correction_name	fit_stratum	rse	dof	response_max_value	bias_correction_value	npar
DBT_Af	4	DBT_Af	3	sqrt	sqrt	Snowdon	DBT_Af	1.950498	487	386.758881	1.092463	3
DBT_Au	4	EBT_Au	1	sqrt	sqrt	Snowdon	EBT_Au	2.923645	210	830.582031	1.017920	3
DBT_Eu	4	DBT_Eu	1	sqrt	sqrt	Snowdon	Eu	2.818270	747	724.057983	0.962632	3
DBT_NAs	4	DBT_coarse	1	sqrt	sqrt	Snowdon	DBT	2.661183	1695	724.057983	1.016632	3
DBT_SA	4	EBT_SA	1	sqrt	sqrt	Snowdon	EBT_SA	3.439619	3438	1578.000000	1.105528	3
DBT_SAs	4	EBT_coarse	1	sqrt	sqrt	Snowdon	EBT	3.920913	4811	1578.000000	1.113366	3
DBT_NAm	4	DBT_NAm	1	sqrt	sqrt	Snowdon	NAm	3.126976	2279	1768.699341	1.052422	3
EBT_Af	4	EBT_Af	1	sqrt	sqrt	Snowdon	EBT	3.920913	4811	1578.000000	1.113366	3
EBT_Au	4	EBT_Au	1	sqrt	sqrt	Snowdon	EBT_Au	2.923645	210	830.582031	1.017920	3
EBT_Eu	4	DBT_Eu	1	sqrt	sqrt	Snowdon	Eu	2.818270	747	724.057983	0.962632	3
EBT_NAs	4	DBT_coarse	1	sqrt	sqrt	Snowdon	DBT	2.661183	1695	724.057983	1.016632	3
EBT_SA	4	EBT_SA	1	sqrt	sqrt	Snowdon	EBT_SA	3.439619	3438	1578.000000	1.105528	3
EBT_SAs	4	EBT_coarse	1	sqrt	sqrt	Snowdon	EBT	3.920913	4811	1578.000000	1.113366	3
EBT_NAm	4	DBT_NAm	1	sqrt	sqrt	Snowdon	NAm	3.126976	2279	1768.699341	1.052422	3
ENT_Af	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
ENT_Au	3	ENT_Au	1	sqrt	sqrt	Snowdon	ENT_Au	3.075415	139	571.716431	0.897592	3
ENT_Eu	4	ENT_Eu	1	sqrt	sqrt	Snowdon	Eu	2.818270	747	724.057983	0.962632	3
ENT_NAs	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
ENT_SA	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
ENT_SAs	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
ENT_NAm	4	ENT_NAm	2	sqrt	sqrt	Snowdon	ENT	3.310075	1983	1768.699341	1.012534	3
DNT_Af	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
DNT_Au	3	ENT_Au	1	sqrt	sqrt	Snowdon	ENT_Au	3.075415	139	571.716431	0.897592	3
DNT_Eu	4	ENT_Eu	1	sqrt	sqrt	Snowdon	Eu	2.818270	747	724.057983	0.962632	3
DNT_NAs	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
DNT_SA	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
DNT_SAs	4	ENT_coarse	4	sqrt	sqrt	Snowdon	ENT	3.292492	1983	1768.699341	1.018368	3
DNT_NAm	4	ENT_NAm	2	sqrt	sqrt	Snowdon	ENT	3.310075	1983	1768.699341	1.012534	3
GSW_Af	4	GSW_coarse	1	sqrt	sqrt	Snowdon	GSW	1.626275	87	192.562592	1.118195	2
GSW_Au	4	GSW_Au	3	sqrt	sqrt	Snowdon	GSW	1.637400	85	192.562592	1.127970	4
GSW_Eu	4	GSW_coarse	1	sqrt	sqrt	Snowdon	GSW	1.626275	87	192.562592	1.118195	2
GSW_NAs	4	GSW_coarse	1	sqrt	sqrt	Snowdon	GSW	1.626275	87	192.562592	1.118195	2
GSW_SA	4	GSW_coarse	1	sqrt	sqrt	Snowdon	GSW	1.626275	87	192.562592	1.118195	2
GSW_SAs	4	GSW_coarse	1	sqrt	sqrt	Snowdon	GSW	1.626275	87	192.562592	1.118195	2
GSW_NAm	4	GSW_coarse	1	sqrt	sqrt	Snowdon	GSW	1.626275	87	192.562592	1.118195	2

The predict_stratum names are divided into two parts - the first part corresponding to PFTs and the second part to world regions.

PFTs (corresponding MCD12Q1 V006 classes are in parenthesis)
DBT: deciduous broadleaf trees (Class 4), DNT: deciduous needleleaf trees (Class 3), EBT: evergreen broadleaf trees (Class 2), ENT: evergreen needleleaf trees (Class 1), GSW: grasses, shrubs and woodlands (Classes 5, 6, 11)

World Regions
Af: Africa, Au: Australia and Oceania, Eu: Europe, NAm : North America, NAs: North Asia, SA: South America, SAs : South Asia

The information on PFTs and World Regions are also provided in pft_lut and region_lut groups within the ANCILLARY group.

# read pft_lut subgroup
pft_lut = ancillary['pft_lut']
headers = pft_lut.dtype.names
# print pft class and names
df = pd.DataFrame(zip(pft_lut[headers[0]], pft_lut[headers[1]]),
                  columns=headers) 
df.pft_name = df.pft_name.apply(lambda x: x.decode('utf-8'))
df.style.hide(axis="index")

pft_class	pft_name
1	ENT
3	DNT
2	EBT
4	DBT
5	GSW
6	GSW
11	GSW

# read region_lut subgroup
region_lut = ancillary['region_lut']
headers = region_lut.dtype.names
# print region class and names
df = pd.DataFrame(zip(region_lut[headers[0]], region_lut[headers[1]]),
                  columns=headers) 
df.region_name = df.region_name.apply(lambda x: x.decode('utf-8'))
df.style.hide(axis="index")

region_class	region_name
1	Eu
2	NAs
5	SAs
3	Au
4	Af
6	SA
7	NAm

In Table 2 above, the model_group column represents model group:

Model Group
1: all predictors considered, 2: no RH metrics below RH50, 3: forced inclusion of RH98, 4: forced inclusion of RH98 and no RH metrics below RH50

x_transform and y_transform are predictor and response transform functions (e.g., sqrt, log). The column bias_correction_name specifies the bias correction methods used - Snowdon (1991) or Baskerville (1972). For details, please refer to Section 4.3 of the ATBD.

To retrieve values from multidimensional variables, we can directly slice using the index from pandas dataframe model_data_df. For example, to print the variance-covariance matrix of the DBT_NAm prediction stratum (deciduous broadleaf trees North America, index=6), we can use:

# index of DBT_NAm predict_stratum, idx = 6
idx = model_data_df[model_data_df['predict_stratum']=='DBT_NAm'].index.item()
# print vcov matrix
model_data['vcov'][idx]

array([[ 4.19623613, -0.30473042, -0.09762442,  0.        ,  0.        ],
       [-0.30473042,  0.05645894, -0.02532849,  0.        ,  0.        ],
       [-0.09762442, -0.02532849,  0.03291459,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

Let’s also print out the variables predictor_id, rh_index, and par for the DBT_NAm prediction stratum.

## get predictor_id, rh_index and par for idx = 6
predictor_id = model_data['predictor_id'][idx]
rh_index = model_data['rh_index'][idx]
par = model_data['par'][idx]

# print
print_s = f"""predictor_id: {predictor_id}
rh_index: {rh_index}
par: {par}"""
print(print_s)

predictor_id: [1 2 0 0 0 0 0 0]
rh_index: [50 98  0  0  0  0  0  0]
par: [-120.77709198    5.50771856    6.80801821    0.            0.        ]

The predictor_id variable gives the identifier of the predictor variable. The same consecutive predictor values indicate that the product of two metrics was used in the linear model. The variable rh_index gives the index of the relative height (RH) metrics used as predictors. For DBT_NAm stratum, two predictors RH50 and RH98 were used. The variable par gives model coefficients - the first element of the list is always the intercept of the linear model.

We can use the values from these three variables (predictor_id, rh_index, and par) to construct the linear AGBD model for each prediction strata. Refer to GEDI_L4A_Common_Queries for more information on how to assign coefficients and predictor variables to the linear models.

# initialize arrays
stratum_arr, modelname_arr, fitstratum_arr, agbd_arr = [], [], [], []
# loop the model_data_df dataframe
for idx, row in model_data_df.iterrows():
    stratum_arr.append(model_data_df['predict_stratum'][idx])
    modelname_arr.append(model_data_df['model_name'][idx])
    fitstratum_arr.append(model_data_df['fit_stratum'][idx])
    i_0 = 0
    predictor_id = model_data['predictor_id'][idx]
    rh_index = model_data['rh_index'][idx]
    par = model_data['par'][idx]
    model_str = 'AGBD = ' + str(par[0]) # intercept
    for i in predictor_id[predictor_id>0]:
        # use product of two RH metrics when consecutive
        # predictor_id have same values
        if i == i_0:
            model_str += ' x RH_' + str(rh_index[i-1])
        # adding slope coefficients
        else:
            model_str += ' + ' + str(par[i]) + ' x RH_' + str(rh_index[i-1])
        i_0 = i
    # agbd model
    agbd_arr.append(model_str)

# unique agbd models
unique_models = list(set(agbd_arr))

# printing agbd models by predict_stratum
data=[]
for model in unique_models:
    s, m, f = [], [], []
    for i, x in enumerate(agbd_arr):
        if x == model:
            s.append(stratum_arr[i])
            m.append(modelname_arr[i])
            f.append(fitstratum_arr[i])
    data.append([", ".join(s), ", ".join(list(set(m))),
                 ", ".join(list(set(f))), model])

headers = ["predict_stratum", "model_name", "fit_stratum", "AGBD model"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. AGBD Linear Models by Prediction Stratum'
df.style.set_caption(caption).hide(axis="index")

Table 3. AGBD Linear Models by Prediction Stratum
predict_stratum	model_name	fit_stratum	AGBD model
DBT_Au, EBT_Au	EBT_Au	EBT_Au	AGBD = -155.41419982910156 + 7.816701889038086 x RH_70 + 7.7095465660095215 x RH_98
DBT_Af	DBT_Af	DBT_Af	AGBD = -118.40806579589844 + 1.956794023513794 x RH_50 + 9.96180534362793 x RH_98
ENT_Au, DNT_Au	ENT_Au	ENT_Au	AGBD = -101.98419952392578 + 6.396724700927734 x RH_40 + 4.258835315704346 x RH_98
DBT_SA, EBT_SA	EBT_SA	EBT_SA	AGBD = -134.77015686035156 + 6.653591632843018 x RH_50 + 6.687118053436279 x RH_98
ENT_Af, ENT_NAs, ENT_SA, ENT_SAs, DNT_Af, DNT_NAs, DNT_SA, DNT_SAs	ENT_coarse	ENT	AGBD = -118.41133880615234 + 7.776707172393799 x RH_60 + 4.377904415130615 x RH_98
GSW_Au	GSW_Au	GSW	AGBD = -151.3834991455078 + 4.490713596343994 x RH_50 + -2.346837282180786 x RH_80 + 12.941423416137695 x RH_98
DBT_SAs, EBT_Af, EBT_SAs	EBT_coarse, EBT_Af	EBT	AGBD = -104.9654541015625 + 6.802174091339111 x RH_50 + 3.9553122520446777 x RH_98
GSW_Af, GSW_Eu, GSW_NAs, GSW_SA, GSW_SAs, GSW_NAm	GSW_coarse	GSW	AGBD = -124.83228302001953 + 12.426309585571289 x RH_98
DBT_Eu, EBT_Eu, ENT_Eu, DNT_Eu	ENT_Eu, DBT_Eu	Eu	AGBD = -96.53070068359375 + 7.175395488739014 x RH_70 + 2.9214682579040527 x RH_98
ENT_NAm, DNT_NAm	ENT_NAm	ENT	AGBD = -114.35456848144531 + 8.400657653808594 x RH_70 + 3.3459579944610596 x RH_98
DBT_NAs, EBT_NAs	DBT_coarse	DBT	AGBD = -110.05912780761719 + 5.133802890777588 x RH_60 + 6.171630382537842 x RH_98
DBT_NAm, EBT_NAm	DBT_NAm	NAm	AGBD = -120.77709197998047 + 5.507718563079834 x RH_50 + 6.808018207550049 x RH_98

As we see in the table above, some predict strata share the same AGBD model. There are a total of 12 AGBD models representing 35 prediction strata.

5. BEAM groups#

GEDI instrument has two full-power lasers, each producing two beams and one coverage laser producing four beams, generating a total of 8 beams. GEDI L4A data are grouped by these beams (group name starting with BEAMXXXX). Let’s print the name and description of BEAM groups. Each of the beams is assigned beam and channel numbers.

data = []
# loop through the root groups
for v in list(hf.keys()):
    if v.startswith('BEAM'):
        beam = hf.get(v)
        b_beam = beam.get('beam')[0]
        channel = beam.get('channel')[0]
        data.append([v, hf[v].attrs['description'], b_beam, channel])

# print as a table
headers = ["beam name", "description", "beam", "channel"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. GEDI Beams'
df.style.set_caption(caption).hide(axis="index")

Table 4. GEDI Beams
beam name	description	beam	channel
BEAM0000	Coverage beam	0	0
BEAM0001	Coverage beam	1	0
BEAM0010	Coverage beam	2	1
BEAM0011	Coverage beam	3	1
BEAM0101	Full power beam	5	2
BEAM0110	Full power beam	6	3
BEAM1000	Full power beam	8	4
BEAM1011	Full power beam	11	5

Let’s look into the data structure of one of the beams, BEAM0110, a full-powered beam. All of the 8 beams have the same set of variables and variable groups.

# get BEAM0110
beam_str = 'BEAM0110'
beam0110 = hf[beam_str]

data = []

# loop over all the variables within BEAM0110 group
for v, var in beam0110.items():
    source = ''
    # if the key is a subgroup assign GROUP tag
    if isinstance(var, h5py.Group):
        data.append([v, 'GROUP', 'GROUP', 'GROUP'])
    # read source, description, units attributes of each variables
    else:
        if 'source' in var.attrs.keys():
            source = var.attrs['source']
        data.append([v, var.attrs['description'], var.attrs['units'], source])

# print all variable name and attributes as a table
headers = ["variable", "description", "units", "source"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. Variables within {beam_str} group'
df.style.set_caption(caption).hide(axis="index")

Table 5. Variables within BEAM0110 group
variable	description	units	source
agbd	Aboveground biomass density (Mg / ha)	Mg / ha
agbd_pi_lower	Lower prediction interval (see alpha attribute for the level)	Mg / ha
agbd_pi_upper	Upper prediction interval (see alpha attribute for the level)	Mg / ha
agbd_prediction	GROUP	GROUP	GROUP
agbd_se	Aboveground biomass density (Mg / ha) prediction standard error	Mg / ha
agbd_t	Model prediction in fit units	-
agbd_t_se	Model prediction standard error in fit units (needed for calculation of custom prediction intervals)	-
algorithm_run_flag	The L4A algorithm is run if this flag is set to 1. This flag selects data which have sufficient waveform fidelity for AGBD estimation.	-
beam	Beam identifier	-	L2A
channel	Channel identifier	-	L2A
degrade_flag	Flag indicating degraded state of pointing and/or positioning information	-	L2A
delta_time	Time delta since Jan 1 00:00 2018.	s	L2A
elev_lowestmode	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
geolocation	GROUP	GROUP	GROUP
l2_quality_flag	Flag identifying the most useful L2 data for biomass predictions	-
l4_quality_flag	Flag simplifying selection of most useful biomass predictions	-
land_cover_data	GROUP	GROUP	GROUP
lat_lowestmode	Latitude of center of lowest mode	degrees	L2A
lon_lowestmode	Longitude of center of lowest mode	degrees	L2A
master_frac	Master time, fractional part. master_int+master_frac is equivalent to /BEAMXXXX/delta_time.	s	L2A
master_int	Master time, integer part. Seconds since master_time_epoch. master_int+master_frac is equivalent to /BEAMXXXX/delta_time.	s	L2A
predict_stratum	Character ID of the prediction stratum name for the 1 km cell	-
predictor_limit_flag	Predictor value is outside the bounds of the training data (0=In bounds; 1=Lower bound; 2=Upper bound)	-
response_limit_flag	Prediction value is outside the bounds of the training data (0=In bounds; 1=Lower bound; 2=Upper bound)	-
selected_algorithm	selected_algorithm	-	L2A
selected_mode	ID of mode selected as lowest non-noise mode	-	L2A
selected_mode_flag	Flag indicating status of selected_mode	-	L2A
sensitivity	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
shot_number	Shot number	-	L2A
solar_elevation	Solar elevation angle	degrees	L2A
surface_flag	Indicates elev_lowestmode is within 300m of Digital Elevation Model (DEM) or Mean Sea Surface (MSS) elevation	-	L2A
xvar	Predictor variables (offset and transformation have been applied)	-

‘L2A’ in the ‘source’ column indicates that the variable originates from the GEDI Level 2A dataset, which provides footprint level elevation and canopy height metrics. Let’s look into each of the variables in more detail. The AGBD variables include agbd, agbd_pi_lower, agbd_pi_upper, agbd_prediction, agbd_se. The variable delta_time provides time for each shot. shot_number is a unique number assigned to each GEDI shot. Quality flags are included in l2_quality_flag and l4_quality_flag, among others. Geolocation of shots is provided in lat_lowestmode, lon_lowestmode, and elev_lowestmode. Let’s use this information to plot the path of each of the eight beams. First, let’s read the latitudes, longitudes, and time variables into a pandas dataframe.

# initialize lists
lat_l = []
lon_l = []
beam_n = []
time_l=[]

# loop over all base groups
for var in list(hf.keys()):
    if var.startswith('BEAM'):
        # reading lat, lon, time
        beam = hf.get(var)
        lat = beam.get('lat_lowestmode')[:] # latitude
        lon = beam.get('lon_lowestmode')[:] # longitude
        time = beam.get('delta_time')[:] # time
        # appending each beam into the array
        lat_l.extend(lat.tolist())
        lon_l.extend(lon.tolist())
        time_l.extend(time.tolist())
        # beam_n as a new column indicating beam number
        n = lat.shape[0] # number of shots in the beam group
        beam_n.extend(np.repeat(str(var), n).tolist())

# read the lists into a dataframe
l4adf = pd.DataFrame(list(zip(beam_n,lat_l,lon_l,time_l)), 
                     columns=["beam", "lat_lowestmode", 
                              "lon_lowestmode", "delta_time"])

# print the first few shots of the BEAM0110
l4adf[l4adf['beam']=='BEAM0110'].head()

	beam	lat_lowestmode	lon_lowestmode	delta_time
835937	BEAM0110	51.793301	15.546454	8.093945e+07
835938	BEAM0110	51.793303	15.547279	8.093945e+07
835939	BEAM0110	51.793304	15.548105	8.093945e+07
835940	BEAM0110	51.793306	15.548934	8.093945e+07
835941	BEAM0110	51.793308	15.549704	8.093945e+07

We can now plot each of the eight beams. The total number of shots (n) in each beam is shown in the parenthesis.

# first converting pandas dataframe into geopandas
l4agdf = gpd.GeoDataFrame(l4adf, crs="EPSG:4326",
                          geometry=gpd.points_from_xy(l4adf.lon_lowestmode,
                                                      l4adf.lat_lowestmode))

# plot each beam as subplot
fig, axs = plt.subplots(nrows=2, ncols=4, sharex=True, sharey=True, figsize=(14,6))
fig_n = 1 # figure number
fig.suptitle(f'Figure {fig_n}. Plots of beams for {GEDIL4A}. Color indicates `delta_time`')
axs = axs.ravel()

# loop over each beam
for i, beam in enumerate(l4adf.beam.unique()):
    subdf = l4agdf[l4agdf['beam'] == beam]
    # color the beam using delta_time
    subdf.to_crs(epsg=3857).plot(ax=axs[i],column='delta_time', 
                                 cmap='viridis', markersize=2)
    axs[i].set_title(f'{beam}, n={str(len(subdf.index))}')
    axs[i].axis('off')
    ctx.add_basemap(axs[i], source=ctx.providers.CartoDB.Positron)

plt.show()

../_images/65b896245fd94c8bd4f5ea9e429ff33080231024aab6037f6d16b4fd2b36c27c.png

As shown in Figure 1 above, there are different numbers of shots in each beam group. The beams are usually always turned on over the land surface. This particular orbit scanned from west to east, as indicated by the delta_time variable (blue being the minimum and yellow the maximum).

The variable delta_time provides time for each shot in seconds since Jan 1 00:00 2018. We can convert the time to actual date and time using datetime module.

# datetime since
dt1 = datetime(2018, 1, 1, 0, 0, 0)
# min, max  and range datetime 
min_dt = dt1 + pd.to_timedelta(l4adf.delta_time.min(), unit='s')
max_dt = dt1 + pd.to_timedelta(l4adf.delta_time.max(), unit='s')
range_dt = max_dt - min_dt
# printing 
print_s = f"""Start time: {min_dt}
End time: {max_dt}
Total period: {range_dt}
Total shots: {len(l4adf.index)}"""
print(print_s)

Start time: 2020-07-25 19:10:52.328367
End time: 2020-07-25 19:34:02.323636
Total period: 0:23:09.995269
Total shots: 1336839

As shown above, the GEDI instrument acquired the first shot of this sub-orbit at 2020-07-25 19:10:52 and the last shot at 2020-07-25 19:34:02. In this 23 minutes period, all eight beams recorded a total of ~1.3 million shots!

To look at the spatial distribution of the GEDI shots, let’s zoom in to an area in the Mount Leuser National Park in Indonesia, where the orbit overpasses, and plot all the beams together.

# Mt Leuser Buffer of 6km
mt_leuser_gdf= gpd.GeoSeries([Point((97.1736, 3.7563))], crs="EPSG:4326")
mt_leuser_3857_gdf = mt_leuser_gdf.to_crs(epsg=3857)
mt_leuser_buffer = mt_leuser_3857_gdf.buffer(6000)

# assigning CRS of base map
l4agdf_3857 = l4agdf.to_crs(epsg=3857)
# clipping the beam_var to the buffer
leuser_gdf = l4agdf_3857[l4agdf_3857['geometry'].within(
    mt_leuser_buffer.geometry[0])]

# plotting the shots
ax = leuser_gdf.plot(alpha=0.8,  column='delta_time',
                     cmap='viridis', figsize=(7, 7))
# plotting buffer circle
mt_leuser_buffer.plot(ax=ax,color='white',
                      edgecolor='red', linewidth=5, alpha=0.4)
# plotting Mt Leuser
mt_leuser_3857_gdf.plot(ax=ax, color='red')

# Annotating Mt. Leuser
ax.annotate('MT LEUSER', xy=(mt_leuser_3857_gdf.geometry.x[0],
                             mt_leuser_3857_gdf.geometry.y[0]),
            xytext=(3, 3), textcoords="offset points", weight='bold')
j = 0
for i, beam in enumerate(l4adf.beam.unique()):
    firstrow = leuser_gdf[leuser_gdf['beam']==beam].iloc[j]
    # Annotating the beam number
    ax.annotate(str(firstrow.beam),
                xy=(firstrow.geometry.x, firstrow.geometry.y),
                arrowprops={"arrowstyle": '->'},
                xytext=(3, 10), textcoords="offset points", weight='bold')
    # display labels apart by 10
    j +=10
# scalebar
ax.add_artist(ScaleBar(1))
fig_n += 1
ax.set_title(f'Figure {fig_n}. {GEDIL4A} beams within a 6 km buffer around \
Mt Leuser, Indonesia')
#adding basemap
ctx.add_basemap(ax, source=ctx.providers.OpenTopoMap)
plt.show()

../_images/a295cf04410cdc0dde290e2478ad6a6f4850abda8b41375a604cdf71ff5786ac.png

GEDI beams are ~600m apart across the track. The distance between the two outer beams is ~4.2 km.

Now, let’s look at variables within one of the beam groups, BEAM0110, in more detail. We will read all the base variables within this beam group into a pandas dataframe.

# arrays to store variable names and values
var_names = []
var_val = []
# read BEAM0110
beam0110 = hf.get('BEAM0110')
# loop through all the items in BEAM0110 and append
# to the arrays var_names, var_val
for key, value in beam0110.items():
    if not isinstance(value, h5py.Group):
         # xvar variables have 2D
        if key.startswith('xvar'):
            for r in range(4):
                var_names.append(key + '_' + str(r+1))
                var_val.append(value[:, r].tolist())
        else:
            var_names.append(key)
            var_val.append(value[:].tolist())

# create a pandas dataframe
beam_df = pd.DataFrame(map(list, zip(*var_val)), columns=var_names)
# set shot_number as dataframe index
beam_df = beam_df.set_index('shot_number')

Now, the beam_df dataframe contains all the variables within the BEAM0110 group. Let’s convert it to a geopandas dataframe to assign spatial context.

# convert to a geopandas dataframe for plotting
beam_gdf = gpd.GeoDataFrame(beam_df,
                            geometry=gpd.points_from_xy(beam_df.lon_lowestmode,
                                                        beam_df.lat_lowestmode))
# assigning CRS of WGS84
beam_gdf.crs = "EPSG:4326"
# turning fill values (-9999) to nan
beam_gdf = beam_gdf.replace(-9999, np.nan)

5.1 Prediction Stratum#

Let’s compute percent shots by the predict_stratum variable for the BEAM0110.

# percent shots in each predict_stratum
predict_stratum=beam_gdf.predict_stratum.str.decode(
    'utf-8').value_counts(normalize=True).mul(100).round(3)
# plotting the values as bar diagram
ax = predict_stratum.plot(kind='bar', figsize=(20, 4))
ax.legend(["percent shots"])
fig_n += 1
ax.set_title(f'Figure {fig_n}. Percent shots by predict_stratum in {beam0110.name} \
of {GEDIL4A}')
# displaying values for each bar
for container in ax.containers:
    ax.bar_label(container)
plt.show()

../_images/bf1ff8747b4c05538f15ccea7e884b34e6b6bc30862b7ba1627e4fe69531aeb1.png

This particular suborbit passes through 10 different prediction strata, as shown in Figure 3, covering several forest regions. About 23% of the total shots have no prediction stratum assigned and are not in one of the 35 GEDI predict strata (e.g., water).

There are three sub-groups within each beam group indicated by ‘GROUP’ in Table 5 above - agbd_prediction, geolocation, and land_cover_data.

6. Land Cover Group#

The group land_cover_data within the beam group contains land cover information for the GEDI shots. Let’s read all the variables within the \BEAM0110\land_cover_data group.

var_names = []
var_val = []
data = []

# loop through all variables within land_cover_data subgroup
for key, value in beam0110['land_cover_data'].items():
    var_names.append(key)
    var_val.append(value[:].tolist())
    data.append([key, value.attrs['description'], value.attrs['units']])

# create a pandas dataframe        
beam_lc_df = pd.DataFrame(map(list, zip(*var_val)), columns=var_names)
beam_lc_df = beam_lc_df.set_index('shot_number')

# join with the previous dataframe conaining root variables
# pandas will use `shot_number` as a common index to join
beam_gdf = beam_gdf.join(beam_lc_df)

# print variable description as table
headers = ["variable", "description", "units"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. Variables within land_cover_data group'
df.style.set_caption(caption).hide(axis="index")

Table 6. Variables within land_cover_data group
variable	description	units
landsat_treecover	Tree cover in the year 2010, defined as canopy closure for all vegetation taller than 5m in height (Hansen et al.). Encoded as a percentage per output grid cell.	percent
landsat_water_persistence	The percent UMD GLAD Landsat observations with classified surface water between 2018 and 2019. Values > 80 usually represent permanent water while values < 10 represent permanent land.	percent
leaf_off_doy	GEDI 1 km EASE 2.0 grid leaf-off start day-of-year derived from the NPP VIIRS Global Land Surface Phenology Product.	days
leaf_off_flag	GEDI 1 km EASE 2.0 grid flag derived from leaf_off_doy, leaf_on_doy and pft_class, indicating if the observation was recorded during leaf-off conditions in deciduous needleleaf or broadleaf forests and woodlands. 1 = leaf-off and 0 = leaf-on.	-
leaf_on_cycle	Flag that indicates the vegetation growing cycle for leaf-on observations. Values are 0 (leaf-off conditions), 1 (cycle 1) or 2 (cycle 2).	-
leaf_on_doy	GEDI 1 km EASE 2.0 grid leaf-on start day-of-year derived from the NPP VIIRS Global Land Surface Phenology Product.	-
pft_class	GEDI 1 km EASE 2.0 grid Plant Functional Type (PFT) derived from the MODIS MCD12Q1v006 Product. Values follow the Land Cover Type 5 Classification scheme.	-
region_class	GEDI 1 km EASE 2.0 grid world continental regions (0: Water, 1: Europe, 2: North Asia, 3: Australasia, 4: Africa, 5: South Asia, 6: South America, 7: North America).	-
shot_number	Shot number	-
urban_focal_window_size	The focal window size used to calculate urban_proportion. Values are 3 (3x3 pixel window size) or 5 (5x5 pixel window size).	pixels
urban_proportion	The percentage proportion of land area within a focal area surrounding each shot that is urban land cover. Urban land cover is derived from the DLR 12 m resolution TanDEM-X Global Urban Footprint Product.	percent

Let’s look at some key variables in more detail.

6.1 Plant Functional Types#

The pft_class variable provides the plant functional types for the shots, which are derived from the MODIS land cover type product (MCD12Q1). It uses LC_Type5 or Annual PFT classification. Please refer to Table 7 of the MCD12Q1 user guide for the PFT type legend. Let’s plot distribution of shots by PFT class for the BEAM0110.

# percentage of shots by pft_class
pft_class=beam_gdf.pft_class.value_counts(normalize=True).mul(100).round(3).sort_index()

# MCD12Q1 PFT types
pft_legend = {'Water Bodies': 0, 'Evergreen Needleleaf Trees': 1,
              'Evergreen Broadleaf Trees': 2, 'Deciduous Needleleaf Trees': 3, 
              'Deciduous Broadleaf Trees': 4, 'Shrub': 5, 'Grass': 6,
              'Cereal Croplands': 7, 'Broadleaf Croplands': 8, 
              'Urban and Built-up Lands': 9, 
              'Permanent Snow and Ice': 10, 'Barren' : 11, 
              'Unclassified': 255}
# get color from tab20
cmap = colormaps.get_cmap('tab20')
patchList = []
tab20 = []
# creating a legend
for key, i in pft_legend.items():
    label = f'{key} [{str(i)}] - {pft_class.get(i, 0)}%'
    data_key = Line2D([0], [0], marker='o', color='w', label=label,
                          markerfacecolor=cmap(i), markersize=8)
    tab20.append(cmap(i))
    patchList.append(data_key)
newcmp = ListedColormap(tab20)

ax=beam_gdf.to_crs(epsg=3857).plot(column='pft_class', alpha=0.8,
                                   categorical=True, cmap=newcmp,
                                   legend_kwds={'facecolor': 'lightgray',
                                                'framealpha': 1},
                                   figsize=(12, 7), markersize=3, legend=True)
fig_n += 1
ax.set_title(f'Figure {fig_n}. Distribution of shots by PFT class in {beam0110.name} \
of {GEDIL4A}')
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
ax.legend(handles=patchList, bbox_to_anchor=(1, 1), loc='upper left')
plt.show()

../_images/4a655d7bbe1f815574ed0859e33d4444340d76d11fb2cf4e78e8e15b1ec433b5.png

As we noted earlier (Figures 3 and 4), water bodies constitute ~23% of the total shots.

6.2 Urban Proportion#

Another variable of interest is urban_proportion, which provides the proportion of urban land cover within the search area defined by urban_focal_window_size variable. The window size is defined as the number of pixels in the 25 m global urban mask developed by the GEDI science team using the TerraSAR-X and TanDEMX global urban footprint data products.

# subsetting the shots with urban cover > 50%
beam_urban_gdf = beam_gdf[beam_gdf['urban_proportion'] > 50]

ax = beam_gdf[beam_gdf['urban_proportion'] <= 50].to_crs(epsg=3857).plot(
    color='grey', alpha=0.01, markersize=2, linewidth=0, figsize=(12, 7))
beam_urban_gdf.to_crs(epsg=3857).plot(ax=ax,column='urban_proportion',
                                      alpha=0.4, cmap='seismic',
                                      markersize=4, legend=True,
                                      legend_kwds={'label': 'Urban Land Cover (%)'})

# figure title
n_urban = len(beam_urban_gdf.index)
percent_urban = (n_urban/len(beam_gdf.index)) * 100
fig_n += 1
ax.set_title(f'''Figure {fig_n}. Shots with proportion of urban land cover \
> 50% in {beam0110.name} of {GEDIL4A},
n = {len(beam_urban_gdf.index)} ({round(percent_urban,3)}%) total shots''')
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
plt.show()

../_images/734644615e382f836f9212730c1e5906b8e263274c6679fe19a2b15d04b519b6.png

As shown in Figure 5, 0.9% of the total GEDI BEAM0110 shots falls over the urban areas (>50% urban cover). The shots with <=50% urban land cover is shown in the grey color.

6.3 Phenology#

GEDI L4A footprints include phenology information derived from VIIRS/S-NPP Land Surface Phenology VNP22Q2 in the variables leaf_on_doy, leaf_off_doy and leaf_on_cycle. These variables together with the PFT are used to indicate whether the GEDI shots were recorded during the leaf-off or leaf-on conditions, provided in the leaf_off_flag. When the shot is collected after the onset of maximum greenness and before the midpoint of the senescence phase, they are marked with a flag of 1 (leaf-off).

Let’s plot the leaf_off_flag on a map.

# plotting leaf-on as grey
ax = beam_gdf[beam_gdf['leaf_off_flag']==0].to_crs(epsg=3857).plot(
    color='grey', alpha=0.01, linewidth=0, markersize=2, figsize=(12, 7))
# plotting leaf-off as red
leaf_on_gdf =beam_gdf[beam_gdf['leaf_off_flag']==1]
leaf_on_gdf.to_crs(epsg=3857).plot(ax=ax, color='red', alpha=0.4, markersize=4)

# getting counts by pft_class
leaf_off=leaf_on_gdf.pft_class.value_counts()
# total shots with leaf-off
n_leaf_off = leaf_off.sum()
# pft classes of leaf-off shots
pft_leaf_off = ",".join([list(pft_legend.keys())[i] for i,v in leaf_off.items()])
# percent leaf off
percent_leaf_off = (n_leaf_off/len(beam_gdf.index)) * 100

# figure title
fig_n += 1
ax.set_title(f'''Figure {fig_n}. Leaf off shots indicated by red color {beam0110.name}\
of {GEDIL4A}, 
n = {n_leaf_off} ({round(percent_leaf_off,3)}%) total shots in {pft_leaf_off} PFT''')
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
plt.show()

../_images/db01c341424dc10271d4d37b46df89f771175a758ec2b2e3db3d7f3102460d56.png

In Figure 6 above, the leaf-off shots are indicated with red. The GEDI orbit was collected on July 25, 2020. There were only 1263 shots with leaf-off conditions, all within the Deciduous Broadleaf Trees PFT.

6.4 Tree Cover#

GEDI uses Landsat-derived tree cover, which is defined as canopy closure for all vegetation taller than 5 m in height (Hansen et al., 2013). The following Figure 7 shows the percent tree cover.

ax = beam_gdf.to_crs(epsg=3857).plot(color='grey', alpha=0.01,
                                     linewidth=0, markersize=2, figsize=(12, 7))
beam_gdf[beam_gdf['landsat_treecover']!=0].to_crs(epsg=3857).plot(
    ax=ax, column='landsat_treecover', cmap='viridis', alpha=0.4,
    markersize=4, legend=True, legend_kwds={'label': 'Tree Cover (%)'})

# figure title
fig_n += 1
ax.set_title(f'''Figure {fig_n}. Tree cover {beam0110.name} of {GEDIL4A}''')
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
plt.show()

../_images/471ff4ba994a2fcd35b7f16dc7c32d1def721d6df334f17d78f7034d3a3015b1.png

7. Geolocation Group#

The geolocation subgroup within the BEAM group contains elevation, latitude, and longitude of the lowest mode for each of the seven algorithm settings groups. Canopy height metrics are computed relative to the elevation of the lowest mode or the last mode of the GEDI waveform. GEDI uses six different algorithm settings (a1 to a6) for webform processing (refer to the GEDI L1/L2 Waveform Processing ATBD, Table 5 for more details). Additional algorithm settings of a10 in GEDI L4A indicate that a5 has been used, but the lowest detected mode is likely noise. When this occurs, a higher mode was used to calculate RH metrics (see GEDI L4A ATBD for details).

The geolocation subgroup also provides a beam sensitivity metric for each of the seven algorithm settings. The sensitivity indicates signal strength of the GEDI shot to penetrate canopies and detect the true ground level. The subgroup also includes stale_return_flag (0=good) to indicate the signal quality.

Let’s read all the variables within the \BEAM0110\geolocation group.

var_names = []
var_val = []
data = []

# loop through all variables within land_cover_data subgroup
for key, value in beam0110['geolocation'].items():
    var_names.append(key)
    var_val.append(value[:].tolist())
    data.append([key, value.attrs['description'], value.attrs['units'],
                 value.attrs['source']])

# create a pandas dataframe        
beam_geo_df = pd.DataFrame(map(list, zip(*var_val)), columns=var_names)
beam_geo_df = beam_geo_df.set_index('shot_number')

# join with the previous dataframe conaining root variables
# pandas will use `shot_number` as a common index to join
beam_gdf = beam_gdf.join(beam_geo_df)

# print variable description as table
headers = ["variable", "description", "units", "source"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. Variables within geolocation group'
df.style.set_caption(caption).hide(axis="index")

Table 7. Variables within geolocation group
variable	description	units	source
elev_lowestmode_a1	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
elev_lowestmode_a10	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
elev_lowestmode_a2	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
elev_lowestmode_a3	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
elev_lowestmode_a4	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
elev_lowestmode_a5	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
elev_lowestmode_a6	Elevation of center of lowest mode relative to reference ellipsoid	m	L2A
lat_lowestmode_a1	Latitude of center of lowest mode	degrees	L2A
lat_lowestmode_a10	Latitude of center of lowest mode	degrees	L2A
lat_lowestmode_a2	Latitude of center of lowest mode	degrees	L2A
lat_lowestmode_a3	Latitude of center of lowest mode	degrees	L2A
lat_lowestmode_a4	Latitude of center of lowest mode	degrees	L2A
lat_lowestmode_a5	Latitude of center of lowest mode	degrees	L2A
lat_lowestmode_a6	Latitude of center of lowest mode	degrees	L2A
lon_lowestmode_a1	Longitude of center of lowest mode	degrees	L2A
lon_lowestmode_a10	Longitude of center of lowest mode	degrees	L2A
lon_lowestmode_a2	Longitude of center of lowest mode	degrees	L2A
lon_lowestmode_a3	Longitude of center of lowest mode	degrees	L2A
lon_lowestmode_a4	Longitude of center of lowest mode	degrees	L2A
lon_lowestmode_a5	Longitude of center of lowest mode	degrees	L2A
lon_lowestmode_a6	Longitude of center of lowest mode	degrees	L2A
sensitivity_a1	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
sensitivity_a10	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
sensitivity_a2	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
sensitivity_a3	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
sensitivity_a4	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
sensitivity_a5	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
sensitivity_a6	Maxmimum canopy cover that can be penetrated considering the SNR of the waveform	-	L2A
shot_number	Shot number	-	L2A
stale_return_flag	Flag from digitizer indicating the real-time pulse detection algorithm did not detect a return signal above its detection threshold within the entire 10 km search window. The pulse location of the previous shot was used to select the telemetered waveform.	-	L2A

8. AGBD variables#

The agbd variable within the BEAM group contains estimated aboveground biomass density. Before looking further into agbd variables, let’s go and read in the variables agbd_prediction subgroup. The agbd_prediction subgroup contains AGBD variables for all 7 algorithm settings groups (see Section 7 above).

var_names = []
var_val = []

# loop through all variables within agbd_prediction subgroup
for key, value in beam0110['agbd_prediction'].items():
    # xvar variables have 2D
    if key.startswith('xvar'):
        for r in range(4):
            var_names.append(key + '_' + str(r+1))
            var_val.append(value[:, r].tolist())
    else:
        var_names.append(key)
        var_val.append(value[:].tolist())

# create a pandas dataframe
beam_agbd_df = pd.DataFrame(map(list, zip(*var_val)), columns=var_names)
beam_agbd_df = beam_agbd_df.set_index('shot_number')

# join with the previous dataframe conaining root variables
# pandas will use `shot_number` as a common index to join
beam_gdf = beam_gdf.join(beam_agbd_df)

The agbd_prediction subgroup contains additional important prediction parameters used. Let’s print the attributes and the values.

data = []
agbd_prediction = beam0110['agbd_prediction']
# loop through attributes
for attr in agbd_prediction.attrs.keys():
    data.append([attr, agbd_prediction.attrs[attr]])
# print attributes and values as table
headers = ["Attributes", "value"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. AGBD Prediction Attributes'
df.style.set_caption(caption).hide(axis="index")

Table 8. AGBD Prediction Attributes
Attributes	value
alpha	0.100000
l2a_alg_count	7.000000
max_nvar	4.000000
pft_grid_version	4.000000
pft_infilled_grid_version	6.000000
phenology_grid_version	5.000000
predictor_offset	100.000000
region_grid_version	5.000000
response_offset	0.000000
urban_grid_version	1.000000
water_grid_version	1.000000

In the table above, predictor_offset and response_offset are the offset appplied to predictors and response variables before fitting the AGBD model. The variable alpha is the alpha value used for calculation of prediction intervals. The AGBD models currently have maximum number of 4 predictors (max_nvar) and 7 algorithm settings groups (l2a_alg_count).

Now, let’s plot the agbd variable for the BEAM0110 along the longitude axis.

# compute bounds of beam_gdf
minx, miny, maxx, maxy = beam_gdf.total_bounds

# x, y labels
ylabel = f"{beam0110['agbd'].attrs['long_name']} ({beam0110['agbd'].attrs['units']})"

xlabel = beam0110['lon_lowestmode'].attrs['long_name']

# subset where agbd is not null (-9999)
agbd_gdf = beam_gdf[~beam_gdf['agbd'].isnull()]

# plot
percent_agbd = (len(agbd_gdf.index)/len(beam_gdf.index))*100
ax = agbd_gdf.plot.scatter(x =['lon_lowestmode'], y=['agbd'],  cmap='viridis',
                           c='agbd', alpha=0.1, ylabel=ylabel, xlabel=xlabel,
                           figsize=(20, 7), grid=True, legend=False)
fig_n += 1
ax.set_title(f'''Figure {fig_n}. {ylabel} for {beam0110.name} of {GEDIL4A},
             n= {len(agbd_gdf.index)} ({round(percent_agbd,2)}% of total)''')
plt.show()

../_images/c0df18d7136421d0a13429619f55c065494170ecdc26d711a518d0bbce5322da.png

As shown in Figure 8 above, ~56% of the shots in BEAM0110 have agbd values. The shots over water surface and areas with no vegetation have no agbd values and are indicated with fill values of -9999.

Let’s also plot the agbd on a map. In the map below, the shots with no agbd values are indicated with grey dots.

ax = beam_gdf[beam_gdf['agbd'].isnull()].to_crs(epsg=3857).plot(
    color='grey', alpha=0.01, linewidth=0, markersize=2, figsize=(12, 7))
beam_gdf.to_crs(epsg=3857).plot(ax=ax, column='agbd', cmap='gist_ncar',
                                alpha=0.4, markersize=4,
                                legend=True, legend_kwds={'label': ylabel})

fig_n += 1
ax.set_title(f'''Figure {fig_n}. {ylabel} for {beam0110.name} of {GEDIL4A},
             n= {len(agbd_gdf.index)} ({round(percent_agbd,2)}% of total)''')
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
plt.show()

../_images/2dedcee922853291e14161f137379b95cf612ec9535b17360d7c54ecdcb4b02c.png

Plotting thousands of shots from the BEAM0110 in one figure may not show the variability in details. Let’s subset the data into a smaller area for a closer look. We will subset an area in the Mount Leuser National Park in Indonesia, where the GEDI orbit overpasses. We will clip around 6 km from Mt. Leuser.

# Mt Leuser Buffer of 5km
mt_leuser_gdf= gpd.GeoSeries([Point((97.1736, 3.7563))])
mt_leuser_gdf.crs = "EPSG:4326"
mt_leuser_3857_gdf = mt_leuser_gdf.to_crs(epsg=3857)
mt_leuser_buffer = mt_leuser_3857_gdf.buffer(6000)

# assigning CRS of base map
beam_gdf_3857 = beam_gdf.to_crs(epsg=3857)
# clipping the beam_var to the buffer
leuser_gdf = beam_gdf_3857[beam_gdf_3857['geometry'].within(
    mt_leuser_buffer.geometry[0])]
# plotting first few lines
leuser_gdf.head()

	agbd	agbd_pi_lower	agbd_pi_upper	agbd_se	agbd_t	agbd_t_se	algorithm_run_flag	beam	channel	degrade_flag	...	xvar_a4_3	xvar_a4_4	xvar_a5_1	xvar_a5_2	xvar_a5_3	xvar_a5_4	xvar_a6_1	xvar_a6_2	xvar_a6_3	xvar_a6_4
shot_number
91680600300633705	NaN	NaN	NaN	NaN	NaN	NaN	0	6	3	30	...	-9999.0	-9999.0	-9999.000000	-9999.000000	-9999.0	-9999.0	-9999.000000	-9999.000000	-9999.0	-9999.0
91680600300633706	NaN	NaN	NaN	NaN	NaN	NaN	0	6	3	30	...	-9999.0	-9999.0	-9999.000000	-9999.000000	-9999.0	-9999.0	-9999.000000	-9999.000000	-9999.0	-9999.0
91680600300633707	55.617771	0.421164	203.532837	17.128305	7.067859	3.922277	1	6	3	30	...	0.0	0.0	10.310674	10.592922	0.0	0.0	10.057336	10.298059	0.0	0.0
91680600300633708	187.650787	47.486774	420.499054	17.121983	12.982439	3.921553	1	6	3	30	...	0.0	0.0	10.689247	11.437220	0.0	0.0	10.441743	11.184812	0.0	0.0
91680600300633709	255.015137	83.940132	518.769043	17.120995	15.134360	3.921440	1	6	3	30	...	0.0	0.0	10.914211	11.594396	0.0	0.0	10.296116	10.914211	0.0	0.0

5 rows × 204 columns

The geopandas dataframe leuser_gdf contains all the variables within BEAM0110 group for an area of 6 km around Mt. Leuser. Let’s plot the shots on a map.

# plotting the shots
ax = leuser_gdf.plot(alpha=0.8,  column='delta_time', cmap='viridis',
                     figsize=(8, 8))
# plotting buffer circle
mt_leuser_buffer.plot(ax=ax,color='white', edgecolor='red',
                      linewidth=5, alpha=0.4)
# plotting Mt Leuser
mt_leuser_3857_gdf.plot(ax=ax, color='red')

# Annotating Mt. Leuser
ax.annotate('MT LEUSER', xy=(mt_leuser_3857_gdf.geometry.x[0],
                             mt_leuser_3857_gdf.geometry.y[0]),
            xytext=(3, 3), textcoords="offset points", weight='bold')
# Annotating the first shot_number
ax.annotate('shot_number: '+ str(leuser_gdf.iloc[0].name),
            xy=(leuser_gdf.iloc[0].geometry.x, leuser_gdf.iloc[0].geometry.y),
            xytext=(3, 3), textcoords="offset points", weight='bold')
# Annotating the last shot_number
ax.annotate('shot_number: '+ str(leuser_gdf.iloc[-1].name),
            xy=(leuser_gdf.iloc[-1].geometry.x, leuser_gdf.iloc[-1].geometry.y),
            xytext=(3, 3), textcoords="offset points", weight='bold')
# scalebar
ax.add_artist(ScaleBar(1))

# number modulus 100000000 gives last eight digits for shot number
# computing difference between the first and the last shot numbers
total_shots = ((abs(leuser_gdf.iloc[-1].name) % 100000000) -
               (abs(leuser_gdf.iloc[0].name) % 100000000))
# figure title
fig_n += 1
ax.set_title(f'''
Figure {fig_n}. GEDI L4A {GEDIL4A} beam {beam0110.name} within a buffer \
of 6 km around Mt Leuser, Indonesia. n = {total_shots+1}
''')

#adding basemap
ctx.add_basemap(ax, source=ctx.providers.OpenTopoMap)
plt.show()

../_images/de5344410f079f99b06d1f226f34ab8ea092da08d52a34300ca815e4d337c2b5.png

The start and end shot numbers are also shown on the map. GEDI shot numbers are 18 characters and in the format OOOOOBBFFFNNNNNNNN, where:

OOOOO: Orbit number
BB: Beam number
RR: Reserved for future use
G: Sub-orbit granule number
NNNNNNNN: Shot index

In Figure 10 above, the start shot has a value of 091680600300633705, which means it is from the orbit number 09168, the beam number 06, the sub-orbit granule number of 3 and the shot index of 00633705 within the orbit. There are a total of 169 shots from beam 06 (or BEAM0110) within 6km of Mt. Leuser.

Now, let’s plot the AGBD variables.

# compute the distance from the first point
leuser_gdf = leuser_gdf.assign(distance=leuser_gdf.geometry.distance(
    leuser_gdf.iloc[0].geometry).values)

## x, y labels
ylabel = f"{beam0110['agbd'].attrs['long_name']} ({beam0110['agbd'].attrs['units']})"
xlabel = 'Distance in m'

# plotting agbd
ax = leuser_gdf.plot.scatter(x ='distance', y='agbd', s=15, ylabel=ylabel,
                             xlabel=xlabel, figsize=(20, 8), grid=True,
                             color='black', label="agbd")
# figure title
fig_n += 1
ax.set_title(f'''
Figure {fig_n}. {ylabel} near Mt. Leuser, Indonesia for {beam0110.name} of {GEDIL4A} \
(n= {str(len(leuser_gdf.index))}, {len(leuser_gdf[leuser_gdf['agbd'].isnull()].index)} with agbd fill values)
''')

# getting alpha attribute from the agbd_prediction group
alpha =  beam0110['agbd_prediction'].attrs['alpha']
# plotting agbd_pi_upper/lower as fill area
ax.fill_between(leuser_gdf['distance'], leuser_gdf['agbd_pi_lower'],
                leuser_gdf['agbd_pi_upper'], alpha=0.1, color='green',
                label=f"agbd prediction interval, alpha={alpha})" )
ax.legend()
plt.show()

../_images/2fe8a76f7397f4af6c4025023d2111ae70c1302af3d8f8190f2b39b0a37d0793.png

The agbd shows the selected aboveground biomass density, shown by black dots in Figure 11 above. The upper and lower prediction intervals are shown with a green-filled area.

GEDI shots are ~60m apart along the track. In the figure above, there are ~16-17 shots (shown by dots) within a distance of 1000 m. Note that a total of 25 shots with agbd fill values (-9999) are not plotted.

The above plot shows the AGBD estimates at alpha=0.1 (90% confidence interval), the default confidence interval used for these variables. To compute the estimates at a different confidence interval (e.g., 95%), we can use agbd_t and agbd_t_se variables, which provides the prediction in transform space. For more details, please refer to GEDI L4A Common Queries.

# 99% confidence interval
alpha_2 = 0.01
# degree of freedom for shots

leuser_gdf['predict_stratum'] = leuser_gdf['predict_stratum'].str.decode('utf-8')
for i, row in leuser_gdf.iterrows():
    # model_data_df dataframe from Section 4 above contains ancillary variables
    midx = model_data_df['predict_stratum']==row['predict_stratum']
    dof = model_data_df[midx].dof.item()
    bias = model_data_df[midx].bias_correction_value.item()
    leuser_gdf.at[i,'dof']=dof
    leuser_gdf.at[i,'bias_correction_value']=bias

tse = leuser_gdf['agbd_t_se'] * stats.t.ppf(1-(alpha_2/2), leuser_gdf['dof'])
leuser_gdf = leuser_gdf.assign(agbd_pi_lower_2 = (leuser_gdf['agbd_t'] - tse)**2 * \
                                    leuser_gdf['bias_correction_value'])
leuser_gdf = leuser_gdf.assign(agbd_pi_upper_2 = (leuser_gdf['agbd_t'] + tse)**2 * \
                                    leuser_gdf['bias_correction_value'])
# plotting agbd
ax = leuser_gdf.plot.scatter(x ='distance', y='agbd', s=15, ylabel=ylabel, xlabel=xlabel,
                        figsize=(20, 8), grid=True, color='black', 
                             label="agbd (aboveground biomass density)")
# figure title
fig_n += 1
ax.set_title(f'''
Figure {fig_n}. {ylabel} prediction interval at two confidence levels (90% and 99%) \
near Mt. Leuser, Indonesia for {beam0110.name} of {GEDIL4A} \
(n= {str(len(leuser_gdf.index))}, {len(leuser_gdf[leuser_gdf['agbd'].isnull()].index)} \
with agbd fill values)
''')

# getting alpha attribute from the agbd_prediction group
alpha =  beam0110['agbd_prediction'].attrs['alpha'] 
# plotting agbd_pi_upper/lower as fill area

ax.fill_between(leuser_gdf['distance'], leuser_gdf['agbd_pi_lower_2'],
                leuser_gdf['agbd_pi_upper_2'], alpha=0.1, color='blue',
                label=f"agbd_pi_lower/agbd_pi_upper (prediction interval, alpha={alpha_2})")
ax.fill_between(leuser_gdf['distance'], leuser_gdf['agbd_pi_lower'],
                leuser_gdf['agbd_pi_upper'], alpha=0.2, color='green',
                label=f"agbd_pi_lower/agbd_pi_upper (prediction interval, alpha={alpha})")
ax.legend()
plt.show()

../_images/1a9af0b0df6e47172cc479eddf3b82b34f2fc7e5eea3e831ea9172ff8b0385c4.png

As noted above in Section 7, the subgroup agbd_prediction provides AGBD estimates for all seven algorithm settings groups (a1-6 and a10). Let’s plot all of them together. For clarity, we will plot selected shots in the following figure.

## x, y labels
ylabel = f"{beam0110['agbd'].attrs['long_name']} ({beam0110['agbd'].attrs['units']})"
xlabel = 'Distance in m'

# plotting agbd
ax = leuser_gdf[-55:-15].plot.scatter(x ='distance', y='agbd', s=100, ylabel=ylabel,
                                      xlabel=xlabel, figsize=(20, 7), grid=True,
                                      marker = "$\u25EF$", alpha=0.5, color='black',
                                      label="agbd")
l2a_alg_count =  beam0110['agbd_prediction'].attrs['l2a_alg_count']

for i in range(l2a_alg_count):
    if i == 6:
        i = 10
    else:
        i += 1
    agbd_aN = 'agbd_a' +  str(i)
    # plotting agbd
    leuser_gdf[-55:-15].plot.line(ax=ax, x ='distance', y=agbd_aN, ylabel=ylabel,
                                  xlabel=xlabel, figsize=(20, 8), grid=True,
                                  color=cmap(i), style='--o', alpha=0.6, label=agbd_aN)
for i, row in leuser_gdf[-55:-15].iterrows():
    ax.annotate(row["selected_algorithm"], (row["distance"],row["agbd"]+15))

# figure title
fig_n += 1
ax.set_title(f'''
Figure {fig_n}. {ylabel} near Mt. Leuser, Indonesia for {beam0110.name} of {GEDIL4A}.
The selected algorithm setting group for each shot is labeled and highlighted with \
a black circle.
''')
ax.legend()
plt.show()

../_images/8ff5f8aa0acb55bbfe40eee6ef812a2075983c7a27ecbcef09f0be5a9100e7c5.png

Figure 13 above shows the variability of the predicted biomass among the different algorithm setting groups.

9. Predictor variables#

The xvar variable provides predictor variables. Let’s print the number of shots per predict_stratum in the leuser_gdf subset.

leuser_gdf.predict_stratum.value_counts()

predict_stratum
EBT_SAs    169
Name: count, dtype: int64

As shown above, all the GEDI shots near Mt. Leuser is classified as EBT_SAs prediction stratum (evergreen broadleaf trees PFT and South Asia region). The AGBD prediction model for EBT_SAs is given in Table 3 above as: AGBD = -104.9654541015625 + 6.802174091339111 x RH_50 + 3.9553122520446777 x RH_98.

The xvar variable has 4 columns, which we read into the dataframe above as xvar_1, xvar_2, xvar_3, and xvar_4. Let’s print the columns.

tbl_n += 1
caption = f'Table {tbl_n}. Predictor variables for the shots near Mt. Leuser, Indonesia \
for {beam0110.name} of {GEDIL4A}'
leuser_gdf[['xvar_1', 'xvar_2', 'xvar_3', 'xvar_4']].head().style.set_caption(caption)

Table 9. Predictor variables for the shots near Mt. Leuser, Indonesia for /BEAM0110 of GEDI04_A_2020207182449_O09168_03_T03028_02_002_02_V002.h5
	xvar_1	xvar_2	xvar_3	xvar_4
shot_number
91680600300633705	nan	nan	nan	nan
91680600300633706	nan	nan	nan	nan
91680600300633707	10.310674	10.592922	0.000000	0.000000
91680600300633708	10.689247	11.437220	0.000000	0.000000
91680600300633709	10.914211	11.594396	0.000000	0.000000

Since the linear model for EBT_SAs prediction stratum has only two predictor variables, RH_50 and RH_98, the columns xvar_3 and xvar_4 in the above table are filled with zero values.

The xvar predictors are transformed and as well as offset applied. Table 3 above shows that square-root (sqrt) transformation is applied to the EBT_SAs prediction stratum both x- and y- variables, and a predictor offset (predictor_offset) of 100 are applied. We will need the offset and transformation to derive the true RH metric, and plot them along with agbd value.

# deriving true RH metric
leuser_gdf['xvar_1_true']=leuser_gdf['xvar_1']**2-100
leuser_gdf['xvar_2_true']=leuser_gdf['xvar_2']**2-100

# plotting xvar_1_true (smaller green dots)
ax = leuser_gdf.plot.line(x ='distance', y='xvar_1_true', figsize=(20, 8), grid=True,
                          color='green', style='o', alpha=0.4, markersize=7,
                          label='Corrected xvar_1 (RH50)')

# adding xvar_2_true as stem plot (larger green dots)
markerline, stemline, baseline = ax.stem(leuser_gdf['distance'], leuser_gdf['xvar_2_true'],
                                         linefmt='-g', markerfmt='C2o', basefmt='C2-',
                                         label='Corrected xvar_2 (RH98)')
plt.setp(stemline, linewidth = 1.0, alpha=0.6)
plt.setp(markerline, markersize = 15, alpha=0.6)
plt.setp(baseline, linewidth = 0)

# adding agbd in the secondary axis
ax2 = ax.twinx()
markerline, stemline, baseline = ax2.stem(leuser_gdf['distance'], leuser_gdf['agbd'],
                                          linefmt='-r', markerfmt='C5o', basefmt='C2-',
                                        label='agbd (mg/ha)')
plt.setp(markerline, markersize = 0)
plt.setp(stemline, linewidth = 3.0, alpha=0.2)
plt.setp(baseline, linewidth = 0)
# figure title
fig_n += 1
ax.set_title(f'''Figure {fig_n}. Corrected 'xvar' and 'agbd' near Mt. Leuser, Indonesia \
for {beam0110.name} of {GEDIL4A}.''')
# ylabels
ax.set_ylabel('RH')
ax2.set_ylabel(ylabel)
# legend
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc=0)
plt.show()

../_images/6ed9dd0ffe85ad0c820668ec956b1a1e8ee546d088ae88ac814f61eda9d71b61.png

10. Flag variables#

Different flag variables are included for each shot. Most of these flags originate from lower-level products such as GEDI L2A, while some of the flags are specific to the GEDI L4A dataset. Let’s print all the flag variables’ names and descriptions.

data = []
# loop over all the variables within BEAM0110 group
for v in beam0110.keys():
    var = beam0110[v]
    source = ''
    # if the key is a subgroup assign GROUP tag
    if isinstance(var, h5py.Group):
        for v2 in var.keys():
            var2 = var[v2]
            if v2.endswith('flag'):
                if 'source' in var2.attrs.keys():
                    source = var2.attrs['source']
                data.append([v2, var2.attrs['description'], source, var.name])
                
    else:
        if v.endswith('flag'):
            if 'source' in var.attrs.keys():
                source = var.attrs['source']
            data.append([v, var.attrs['description'], source, beam0110.name])

# print all variable name and attributes as a table

headers = ["variable", "description", "source", "group"]
df = pd.DataFrame(data, columns=headers)
tbl_n += 1
caption = f'Table {tbl_n}. Flag variables for GEDI L4A {beam_str} group'
df.style.set_caption(caption).hide(axis="index")

Table 10. Flag variables for GEDI L4A BEAM0110 group
variable	description	source	group
algorithm_run_flag	The L4A algorithm is run if this flag is set to 1. This flag selects data which have sufficient waveform fidelity for AGBD estimation.		/BEAM0110
degrade_flag	Flag indicating degraded state of pointing and/or positioning information	L2A	/BEAM0110
stale_return_flag	Flag from digitizer indicating the real-time pulse detection algorithm did not detect a return signal above its detection threshold within the entire 10 km search window. The pulse location of the previous shot was used to select the telemetered waveform.	L2A	/BEAM0110/geolocation
l2_quality_flag	Flag identifying the most useful L2 data for biomass predictions		/BEAM0110
l4_quality_flag	Flag simplifying selection of most useful biomass predictions		/BEAM0110
leaf_off_flag	GEDI 1 km EASE 2.0 grid flag derived from leaf_off_doy, leaf_on_doy and pft_class, indicating if the observation was recorded during leaf-off conditions in deciduous needleleaf or broadleaf forests and woodlands. 1 = leaf-off and 0 = leaf-on.		/BEAM0110/land_cover_data
predictor_limit_flag	Predictor value is outside the bounds of the training data (0=In bounds; 1=Lower bound; 2=Upper bound)		/BEAM0110
response_limit_flag	Prediction value is outside the bounds of the training data (0=In bounds; 1=Lower bound; 2=Upper bound)		/BEAM0110
selected_mode_flag	Flag indicating status of selected_mode	L2A	/BEAM0110
surface_flag	Indicates elev_lowestmode is within 300m of Digital Elevation Model (DEM) or Mean Sea Surface (MSS) elevation	L2A	/BEAM0110

Let’s plot algorithm_run_flag, l2_quality_flag, and l4_quality_flag into a plot.

# plot flags as stacked bar diagram
ax = leuser_gdf.plot(y=['algorithm_run_flag', 'l2_quality_flag','l4_quality_flag'],
                     alpha=0.4, kind='bar', figsize=(22, 8), stacked=True,
                     use_index=True,legend=True)
# plot agbd in the secondary y-axis as line
ax2 = ax.twinx()
ax2.plot(leuser_gdf[['agbd']].values, linestyle='-', alpha=0.6, marker='o',
         linewidth=2.0, color ='black')
# figure title
fig_n += 1
ax.set_title(f'''
Figure {fig_n}. Flag variables and {ylabel} near Mt. Leuser, Indonesia for \
{beam0110.name} of {GEDIL4A}. 
A flag value of 1 in the figure indicates good values. \
(n= {str(len(leuser_gdf.index))}, {len(leuser_gdf[leuser_gdf['agbd'].isnull()].index)} \
with agbd fill values)
''')
# labels
ax.set_ylabel("flags")
ax2.set_ylabel(ylabel)
plt.show()

../_images/f5dbd8813dbbfd45c49d6e35d7ef767ad1d69cb92deee017717ca33de458be62.png

The algorithm_run_flag set is set to 1 when the shot quality is good to run the GEDIL4_A algorithm. When the algorithm is not run for the shot (indicated by algorithm_run_flag=0), all the AGBD variables will be set to the fill values (-9999). The l2_quality_flag identifies shots with good GEDI L2A quality metrics for the prediction of AGBD. The l4_quality_flag identifies shots that are representative of the applied AGBD models. Let’s apply both l2_quality_flag and l4_quality_flag to the dataframe.

qidx = (leuser_gdf['l2_quality_flag'] == 1) & (leuser_gdf['l4_quality_flag'] == 1)
leuser_gdf = leuser_gdf[qidx]

# close the GEDI L4A HDF file
hf.close()

Now, the dataframe leuser_gdf contains all the rows where both l2_quality_flag and l4_quality_flag filters are applied. Refer to the GEDI L4A ATBD and GEDI L4A Common Queries documents for more details about the quality flags.

11. Final Note#

As we saw in this tutorial, in addition to AGBD estimates, a GEDI L4A file has information about predictor variables from the lower level GEDI products and other associated variables derived from multiple sensors such as MODIS, VIIRS, TanDEM-X, and Landsat, among others. GEDI generates ~15-16 orbits each day. GEDI is still collecting data every day and will do so for the next few years, which is an immense amount of data at our hands.

References#

[1]

R.O. Dubayah, J. Armston, J.R. Kellner, L. Duncanson, S.P. Healey, P.L. Patterson, S. Hancock, H. Tang, J. Bruening, M.A. Hofton, J.B. Blair, and S.B. Luthcke. GEDI L4A Footprint Level Aboveground Biomass Density, Version 2.1. 2022. doi:10.3334/ORNLDAAC/2056.

[2]

J.R. Kellner, J. Armston, and L. Duncanson. Algorithm theoretical basis document for GEDI footprint aboveground biomass density. Earth and Space Science, 9:e2022EA002516, 2022. doi:10.1029/2022EA002516.