7. Exploring and Visualizing BioSCape LVIS Data#

7.1. BioSCape Data Skills Workshop: From the Field to the Image#

Bioscape

BioSCape, the Biodiversity Survey of the Cape, is NASA’s first biodiversity-focused airborne and field campaign that was conducted in South Africa in 2023. BioSCape’s primary objective is to study the structure, function, and composition of the region’s ecosystems, and how and why they are changing.

BioSCape’s airborne dataset is unprecedented, with AVIRIS-NG, PRISM, and HyTES imaging spectrometers capturing spectral data across the UV, visible and infrared at high resolution and LVIS acquiring coincident full-waveform lidar. BioSCape’s field dataset is equally impressive, with 18 PI-led projects collecting data ranging from the diversity and phylogeny of plants, kelp and phytoplankton, eDNA, landscape acoustics, plant traits, blue carbon accounting, and more

This workshop will equip participants with the skills to find, subset, and visualize the various BioSCape field and airborne (imaging spectroscopy and full-waveform lidar) data sets. Participants will learn data skills through worked examples in terrestrial and aquatic ecosystems, including: wrangling lidar data, performing band math calculations, calculating spectral diversity metrics, machine learning and image classification, and mapping functional traits using partial least squares regression. The workshop format is a mix of expert talks and interactive coding notebooks and will be run through the BioSCape Cloud computing environment.

Date: October 9 - 11, 2024 Cape Town, South Africa

Host: NASA’s Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), in close collaboration with BioSCape, the South African Environmental Observation Network (SAEON), the University of Wisconsin Madison (Phil Townsend), The Nature Conservancy (Glenn Moncrieff), the University of California Merced (Erin Hestir), the University of Cape Town (Jasper Slingsby), Jet Propulsion Laboratory (Kerry Cawse-Nicholson), and UNESCO.

Instructors:

  • In-person contributors: Anabelle Cardoso, Erin Hestir, Phil Townsend, Henry Frye, Glenn Moncrieff, Jasper Slingsby, Michele Thornton, Rupesh Shrestha

  • Virtual contributors: Kerry Cawse-Nicholson, Nico Stork, Kyle Kovach

Audience: This training is primarily intended for government natural resource management agency representatives and field technicians in South Africa, as well as local academics and students, especially those connected to the BioSCape Team.

7.2. Overview#

This tutorial will demonstrate accessing and visualizing Land, Vegetation, and Ice Sensor (LVIS) data available on the BioSCape SMCE. LVIS is an airborne, wide-swath imaging full-waveform laser altimeter. LVIS collects in a 1064 nm-wavelength (near infrared) range with 3 detectors mounted in an airborne platform, flown typically ~10 km above the ground producing a data swath of 2km wide with 7-10 m footprints. More information about the LVIS instrument is here.

The LVIS instrument has been flown above several regions of the world since 1998. The LVIS flights were collected for the BioSCAPE campaigns from 01 Oct 2023 through 30 Nov 2023.

There are three LVIS BioSCape data products currently available and archived at NSIDC DAAC

Dataset

Dataset DOI

LVIS L1A Geotagged Images V001

10.5067/NE5KKKBAQG44

LVIS Facility L1B Geolocated Return Energy Waveforms V001

10.5067/XQJ8PN8FTIDG

LVIS Facility L2 Geolocated Surface Elevation and Canopy Height Product V001

10.5067/VP7J20HJQISD

For this tutorial, we’ll examine the LVIS L2 Geolocated Surface Elevation and Canopy Height Product V001

LVIS Waveform Figure: Sample LVIS product waveforms illustrating possible distributions of reflected light (Blair et al., 2020)

from glob import glob
import numpy as np
import pandas as pd
import earthaccess
import geopandas as gpd
import requests as re
import s3fs
import h5py
from os import path
from datetime import datetime
from shapely.geometry import Polygon, MultiPolygon
import matplotlib.pyplot as plt
from shapely.geometry import box
from harmony import BBox,Client, Collection, Request, LinkType
# esri background basemap for maps
xyz = "https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}"
attr = "ESRI"

The LVIS data from the BioSCape project are available at the SMCE S3 bucket. Let’s list the LVIS folders at the BioSCape SMCE.

s3 = s3fs.S3FileSystem(anon=False)
lvis = s3.ls('bioscape-data/LVIS')
lvis
['bioscape-data/LVIS/L1B', 'bioscape-data/LVIS/L2']

Both LVIS L1B and L2 products are available at the S3 bucket.

7.3. Structure LVIS L1B Geolocated Waveform#

First, let’s find out how many L1B files are there on the SMCE.

count_l1 = len(s3.ls('bioscape-data/LVIS/L1B'))
print(f"There are {count_l1} LVIS L1 granules")
There are 2328 LVIS L1 granules

Now, we go ahead and open a file to look at its structure.

l1_first = 'bioscape-data/LVIS/L1B/LVISF1B_BioSCape2023_1109_R2404_047611.h5'
with s3.open(l1_first, mode='rb') as fh:
    with h5py.File(fh) as hf:
        for v in list(hf.keys()):
            if v != "ancillary_data":
                print(f"- {v} : {hf[v].attrs['description'][0].decode('utf-8')}")
- AZIMUTH : Azimuth angle of laser beam
- INCIDENTANGLE : Off-nadir incident angle of laser beam
- LAT0 : Latitude of the highest sample of the return waveform (degrees north) 
- LAT1215 : Latitude of the lowest sample of the return waveform (degrees north) 
- LFID : LVIS file identification.  Together with 'LVIS shotnumber' these are a unique identifier for every LVIS laser shot. Format is XXYYYYYZZZ where XX identifies instrument version, YYYYY is the Modified Julian Date of the flight departure day, ZZZ represents file number.
- LON0 : Longitude of the highest sample of the return waveform (degrees east)
- LON1215 : Longitude of the lowest sample of the return waveform (degrees east) 
- RANGE : Distance along laser path from the instrument to the ground
- RXWAVE : Returned waveform (1216 samples long)
- SHOTNUMBER : Laser shot number assigned during collection.  Together with 'LVIS LFID' provides a unique identifier to every LVIS laser shot.
- SIGMEAN : Signal mean noise level, calculated in-flight
- TIME : UTC decimal seconds of the day
- TXWAVE : Transmitted waveform (128 samples long)
- Z0 : Elevation of the highest sample of the waveform with respect to the reference ellipsoid
- Z1215 : Elevation of the lowest sample of the waveform with respect to the reference ellipsoid
with s3.open(l1_first, mode='rb') as fh:
    with h5py.File(fh) as hf:
        l_lat = hf['LAT1215'][:]
        l_lon = hf['LON1215'][:]
        l_range = hf['Z1215'][:] # ground elevation
        shot = hf['SHOTNUMBER'][:] # ground elevation
geo_arr = list(zip(shot, l_range,l_lat,l_lon))
l1b_df = pd.DataFrame(geo_arr, columns=["shot_number", "elevation", "lat", "lon"])
l1b_df
shot_number elevation lat lon
0 82378820 59.477188 -34.699020 19.651972
1 82378821 58.855934 -34.699012 19.651895
2 82378822 59.752430 -34.699084 19.651964
3 82378823 59.403404 -34.699076 19.651888
4 82378824 60.092033 -34.699148 19.651957
... ... ... ... ...
349995 82729294 -8.561060 -34.678439 19.528331
349996 82729295 -9.040883 -34.678432 19.528254
349997 82729296 -8.609493 -34.678375 19.528340
349998 82729297 -8.090749 -34.678367 19.528263
349999 82729298 -8.993920 -34.678310 19.528348

350000 rows × 4 columns

l1b_gdf = gpd.GeoDataFrame(l1b_df, crs="EPSG:4326",
                           geometry=gpd.points_from_xy(l1b_df.lon, 
                                                       l1b_df.lat))
l1b_gdf.sample(frac=0.01).explore("elevation", cmap = "plasma", tiles=xyz, attr=attr)
Make this Notebook Trusted to load map: File -> Trust Notebook
i = np.where(shot==82446258)[0][0]
with s3.open(l1_first, mode='rb') as fh:
    with h5py.File(fh) as hf:
        rxwaveform = hf['RXWAVE'][i, :] # waveform
        elev_1215 = hf['Z1215'][i] # elevation ground
        elev_0 = hf['Z0'][i] # elevation top
c = len(rxwaveform) #
elev = np.linspace(elev_1215, elev_0, num=c)
# plot
plt.rcParams["figure.figsize"] = (3,10)
plt.xlabel("rxwaveform (counts)")
plt.ylabel("elevation (m)")
plt.plot(rxwaveform, elev, linestyle='--', marker='.',)
plt.show()
../../_images/9ab0a3d19405f1ec4a6598db3cc3e815626c39de7415c52634aa04141ae8a511.png

7.4. Structure of LVIS L2 Canopy Height Product#

First, let’s see how many L2 files are there at BioSCape SMCE.

count_l2 = len(s3.ls('bioscape-data/LVIS/L2'))
print(f"There are {count_l2} LVIS L2 granules")
There are 2327 LVIS L2 granules

Now, we’ll go ahead and open a file to look at it’s structure.

l2_first = 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1109_R2404_047611.TXT'
with s3.open(l2_first, 'rb') as f:
    head = [next(f) for _ in range(15)]
    print(head)
[b"# Level2 (v2.0.5) data set collected by NASA's Land, Vegetation and Ice Sensor (LVIS) Facility in October-November 2023 as part of NASA's BioSCape Field Campaign. This file contains elevation,\n", b"# height and surface structure metrics derived from LVIS's geolocated laser return waveform. The corresponding spatially geolocated laser return waveforms are contained in the Level 1B data set.\n", b'# For supporting information please visit https://lvis.gsfc.nasa.gov/Data/Maps/SouthAfrica2023Map.html. LVIS-targeted data collections occurred at flight altitudes of ~7 km above the surface from\n', b'# the NASA G-V, however data collections also occurred at lower altitudes. The "Range" parameter can be used to determine flight altitude. Note: Two alternate ground elevations options are provided\n', b'# in order to provide flexibility for local site conditions; if selected for use, adjust the values of the RH parameters accordingly. Please consult the User Guide for further information. The data\n', b'# have had limited quality assurance checking, although obvious lower quality data have been removed, e.g., clouds and cloud-obscured returns. We highly recommend that end users contact us about\n', b'# any idiosyncrasies or unexpected results.\n', b'#\n', b'# LFID SHOTNUMBER TIME GLON GLAT ZG ZG_ALT1 ZG_ALT2 HLON HLAT ZH TLON TLAT ZT RH10 RH15 RH20 RH25 RH30 RH35 RH40 RH45 RH50 RH55 RH60 RH65 RH70 RH75 RH80 RH85 RH90 RH95 RH96 RH97 RH98 RH99 RH100 AZIMUTH INCIDENTANGLE RANGE COMPLEXITY SENSITIVITY CHANNEL_ZT CHANNEL_ZG CHANNEL_RH\n', b'1960257687  82378820   47611.75433   19.651985  -34.699046    99.55    99.55    99.55   19.651985  -34.699046    99.55   19.651985  -34.699047   100.79   -0.86   -0.67   -0.56   -0.45   -0.37   -0.30   -0.22   -0.15   -0.07   -0.04    0.04    0.07    0.15    0.22    0.30    0.37    0.45    0.60    0.63    0.71    0.75    0.86    1.23  336.95    4.480   6340.45 0.250 0.955  2  2  2\n', b'1960257687  82378821   47611.75458   19.651909  -34.699038    99.57    99.57    99.57   19.651909  -34.699038    99.57   19.651909  -34.699039   100.76   -0.86   -0.67   -0.56   -0.45   -0.37   -0.30   -0.22   -0.19   -0.11   -0.04    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.60    0.63    0.67    0.75    0.86    1.20  336.26    4.511   6339.85 0.241 0.941  2  2  2\n', b'1960257687  82378822   47611.75483   19.651978  -34.699109    99.91    99.91    99.91   19.651978  -34.699109    99.91   19.651978  -34.699110   101.10   -0.93   -0.75   -0.60   -0.49   -0.41   -0.34   -0.26   -0.19   -0.11   -0.07    0.00    0.04    0.11    0.19    0.26    0.34    0.41    0.56    0.60    0.67    0.71    0.82    1.20  336.57    4.423   6339.25 0.291 0.944  2  2  2\n', b'1960257687  82378823   47611.75508   19.651902  -34.699102    99.93    99.93    99.93   19.651902  -34.699102    99.93   19.651902  -34.699103   101.20   -0.90   -0.71   -0.60   -0.49   -0.37   -0.30   -0.22   -0.19   -0.11   -0.04    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.60    0.63    0.71    0.78    0.90    1.27  335.87    4.455   6340.00 0.265 0.953  2  2  2\n', b'1960257687  82378824   47611.75533   19.651970  -34.699173   100.21   100.21   100.21   19.651970  -34.699173   100.21   19.651970  -34.699174   101.41   -0.90   -0.71   -0.56   -0.45   -0.37   -0.30   -0.22   -0.15   -0.11   -0.04    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.56    0.60    0.67    0.75    0.82    1.20  336.16    4.368   6338.05 0.255 0.949  2  2  2\n', b'1960257687  82378825   47611.75558   19.651894  -34.699165   100.20   100.20   100.20   19.651894  -34.699165   100.20   19.651895  -34.699166   101.44   -0.93   -0.75   -0.60   -0.49   -0.41   -0.34   -0.26   -0.19   -0.11   -0.07    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.60    0.64    0.67    0.75    0.86    1.23  335.46    4.400   6338.50 0.327 0.950  2  2  2\n']

LVIS L2A files have a number of lines as header as shown above. Let’s define a python function to read the number of header rows.

def get_line_number(filename):
    """find number of header rows in LVIS L2A"""
    count = 0
    with s3.open(filename, 'rb') as f:
        for line in f:
            if line.startswith(b'#'):
                count = count + 1
                columns = line[1:].split()
            else:
                return count, columns
h_no, col_names = get_line_number(l2_first)
with s3.open(l2_first, 'rb') as f:
    l2a_df = pd.read_csv(f, skiprows=h_no, header=None, engine='python', sep=r'\s+')
    l2a_df.columns =  [x.decode() for x in col_names]
l2a_df
LFID SHOTNUMBER TIME GLON GLAT ZG ZG_ALT1 ZG_ALT2 HLON HLAT ... RH99 RH100 AZIMUTH INCIDENTANGLE RANGE COMPLEXITY SENSITIVITY CHANNEL_ZT CHANNEL_ZG CHANNEL_RH
0 1960257687 82378820 47611.75433 19.651985 -34.699046 99.55 99.55 99.55 19.651985 -34.699046 ... 0.86 1.23 336.95 4.480 6340.45 0.250 0.955 2 2 2
1 1960257687 82378821 47611.75458 19.651909 -34.699038 99.57 99.57 99.57 19.651909 -34.699038 ... 0.86 1.20 336.26 4.511 6339.85 0.241 0.941 2 2 2
2 1960257687 82378822 47611.75483 19.651978 -34.699109 99.91 99.91 99.91 19.651978 -34.699109 ... 0.82 1.20 336.57 4.423 6339.25 0.291 0.944 2 2 2
3 1960257687 82378823 47611.75508 19.651902 -34.699102 99.93 99.93 99.93 19.651902 -34.699102 ... 0.90 1.27 335.87 4.455 6340.00 0.265 0.953 2 2 2
4 1960257687 82378824 47611.75533 19.651970 -34.699173 100.21 100.21 100.21 19.651970 -34.699173 ... 0.82 1.20 336.16 4.368 6338.05 0.255 0.949 2 2 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
349995 1960257689 82729294 47699.37382 19.528347 -34.678446 32.62 32.62 32.62 19.528347 -34.678446 ... 0.75 1.12 298.96 2.287 6392.51 0.076 0.980 1 1 1
349996 1960257689 82729295 47699.37407 19.528270 -34.678439 32.26 32.26 32.26 19.528270 -34.678439 ... 0.74 1.30 298.38 2.346 6392.81 0.076 0.991 1 2 1
349997 1960257689 82729296 47699.37432 19.528356 -34.678382 32.57 32.57 32.57 19.528356 -34.678382 ... 0.71 1.16 300.44 2.312 6392.96 0.174 0.987 1 1 1
349998 1960257689 82729297 47699.37457 19.528279 -34.678375 32.49 32.49 32.49 19.528279 -34.678375 ... 0.71 1.16 299.82 2.370 6392.21 0.076 0.981 1 1 1
349999 1960257689 82729298 47699.37482 19.528364 -34.678318 32.62 32.62 32.62 19.528364 -34.678318 ... 0.72 1.17 301.87 2.339 6392.51 0.116 0.987 1 2 1

350000 rows × 45 columns

Let’s print LVIS L2 variable names.

l2a_df.columns.values
array(['LFID', 'SHOTNUMBER', 'TIME', 'GLON', 'GLAT', 'ZG', 'ZG_ALT1',
       'ZG_ALT2', 'HLON', 'HLAT', 'ZH', 'TLON', 'TLAT', 'ZT', 'RH10',
       'RH15', 'RH20', 'RH25', 'RH30', 'RH35', 'RH40', 'RH45', 'RH50',
       'RH55', 'RH60', 'RH65', 'RH70', 'RH75', 'RH80', 'RH85', 'RH90',
       'RH95', 'RH96', 'RH97', 'RH98', 'RH99', 'RH100', 'AZIMUTH',
       'INCIDENTANGLE', 'RANGE', 'COMPLEXITY', 'SENSITIVITY',
       'CHANNEL_ZT', 'CHANNEL_ZG', 'CHANNEL_RH'], dtype=object)

Key variables:

  • The GLAT and GLON variables provide coordinates of the lowest detected mode within the LVIS waveform.

  • The RHXX variables provide height above the lowest detected mode at which XX percentile of the waveform energy.

  • RANGE provides the distance from the instrument to the ground.

  • SENSITIVITY provides sensitivity metric for the return waveform.

  • ZG,ZG_ALT1, ZG_ALT2: Mean elevation of the lowest detected mode using alternate 1/2 mode detection settings. The alternative ZG to use to adjust the RH parameters for local site conditions.

More information about the variables are provided in the user guide.

Let’s select the particular shot we are interested in.

l2a_shot_df = l2a_df[l2a_df['SHOTNUMBER'] == 82446258]
l2a_shot_df
LFID SHOTNUMBER TIME GLON GLAT ZG ZG_ALT1 ZG_ALT2 HLON HLAT ... RH99 RH100 AZIMUTH INCIDENTANGLE RANGE COMPLEXITY SENSITIVITY CHANNEL_ZT CHANNEL_ZG CHANNEL_RH
67405 1960257687 82446258 47628.61402 19.627168 -34.703296 131.79 131.79 131.79 19.627174 -34.703288 ... 11.41 12.79 209.65 5.433 6319.34 0.312 0.993 1 1 1

1 rows × 45 columns

Now, we can plot the RH metrics with elevation.

elev_zg = l2a_shot_df.ZG.values[0]
elev_zh = l2a_shot_df.ZH.values[0]
rh40 = l2a_shot_df.RH40.values[0]
rh80 = l2a_shot_df.RH80.values[0]
rh100 = l2a_shot_df.RH100.values[0]
rh = l2a_shot_df.filter(like ='RH').drop('CHANNEL_RH', axis=1).values.tolist()[0]

# plotting
plt.rcParams["figure.figsize"] = (5,7)
rh_vals = l2a_shot_df.filter(like = 'RH').columns[:-1].str.strip('RH').astype(int).tolist()
fig, ax1 = plt.subplots()
ax1.plot(rh_vals, elev_zg+rh, alpha=0.3, marker='o', color='black', label='RH metrics' )
ax1.axhline(y=elev_zg+rh40,  color='g', linestyle='dotted', linewidth=2, label='RH40')
ax1.axhline(y=elev_zg+rh80,  color='g', linestyle='-.', linewidth=2, label='RH80')
ax1.axhline(y=elev_zg+rh100,  color='g', linestyle='--', linewidth=3, label='RH100')
ax1.set_xlabel("Percentile of waveform energy (%)")
ax1.set_ylabel("elevation (meters)")
ax1.legend(loc="lower right")
plt.show()
../../_images/d8d04bf66ec7e325f43bb0cf23fa503f7ab327cfa10d862382166e3d2694f6cb.png

7.5. Search LVIS L2A granules over Bioscape Campaign#

We will use earthaccess module to search the LVIS granules using their bounding polygons.

temporal = ("2023-10-01", "2023-11-30") # Bioscape campaign dates
doi = "10.5067/VP7J20HJQISD" # LVIS L2A doi
granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    doi=doi,
    temporal=temporal
)
Granules found: 2328

There are 2328 granules LVIS L2 product found for the time period.

7.5.1. Plot LVIS Bioscape Campaigns#

Lets first define two functions that we can use to plot the search results above over a basemap.

def convert_umm_geometry(gpoly):
    """converts UMM geometry to multipolygons"""
    multipolygons = []
    for gl in gpoly:
        ltln = gl["Boundary"]["Points"]
        points = [(p["Longitude"], p["Latitude"]) for p in ltln]
        multipolygons.append(Polygon(points))
    return MultiPolygon(multipolygons)

def convert_list_gdf(datag):
    """converts List[] to geopandas dataframe"""
    # create pandas dataframe from json
    df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
    # keep only last string of the column names
    df.columns=df.columns.str.split('.').str[-1]
    # convert polygons to multipolygonal geometry
    df["geometry"] = df["GPolygons"].apply(convert_umm_geometry)
    # return geopandas dataframe
    return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")

Now, we will convert the JSON return from the earthaccess granule search into a geopandas dataframe so we could plot over an ESRI basemap.

gdf = convert_list_gdf(granules)
#plot
gdf[['BeginningDateTime','geometry']].explore(tiles=xyz, attr=attr)
Make this Notebook Trusted to load map: File -> Trust Notebook

The above shows the flight lines of all LVIS collection during the BioSCape campaign in 2023.

7.5.2. Burned and Unburned Areas#

The following are the burned and unburned sites at Cape Penninsula.

poly_gdf = gpd.read_file("assets/burned_unburned_sites.json", crs="EPSG:4326")
# bounding box
bounding_box = poly_gdf.total_bounds
# plot
m1 = poly_gdf.explore(categorical=True, column='description', cmap='Paired', 
                      tiles=xyz, attr=attr, marker_kwds={'radius':10})
m1
Make this Notebook Trusted to load map: File -> Trust Notebook

Print the poly_gdf geopandas dataframe.

poly_gdf
Name description geometry
0 Milkwood forest unburned POINT (18.41627 -34.31636)
1 Tall proteoid fynbos on marine sands unburned POINT (18.43615 -34.30710)
2 Low wet restioid fynbos unburned POINT (18.43210 -34.31315)
3 Low wet restioid fynbos burned POINT (18.42820 -34.31069)
4 Dry fynbos on ridge with sparse tall Proteaceae burned POINT (18.43254 -34.31609)
5 Dry fynbos on ridge with sparse tall Proteaceae unburned POINT (18.42938 -34.31610)

The above shows some burned and unburned sites around Cape Point. Milkwood forest has the tallest vegetation (±5m), followed by Tall proteoid (3-4m). The other four sites show pairs of very low vegetation (0.5-1.5m) across a scar from an old fire (March 2016). The wet restioid should be uniformly low, while the Dry fynbos has sparse taller shrubs.

7.5.3. Search LVIS L2A granules over the burned and unburned sites#

Let’s search for LVIS L2A granules that are within the bounds of the above sites.

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    doi=doi, # LVIS L2A doi
    temporal=temporal, # Bioscape campaign dates
    bounding_box = tuple(poly_gdf.total_bounds.tolist())
)
Granules found: 11

Let’s check when these LVIS flights were flown.

gdf = convert_list_gdf(granules)
gdf['BeginningDateTime'] = pd.to_datetime(gdf['BeginningDateTime'])
gdf_daily = gdf.resample(rule='D', on='BeginningDateTime')['concept-id'].nunique()
gdf_daily.index = pd.to_datetime(gdf_daily.index).date
gdf_daily.plot(kind="bar", xlabel="Day of Flight", ylabel="No. of flights")
plt.show()
../../_images/9a303387a8216790bf5eab4e4eaf817a1165c812533aa1e865a208560968fcb0.png

For simplicity, we will limit the granules from the LVIS flights for a single date. Let’s plot these flight lines over a basemap.

lvis_day = '2023-10-29'
gdf['BeginningDateTime'] = gdf['BeginningDateTime'].astype(str)
gdf = gdf[gdf['BeginningDateTime'].str.startswith(lvis_day)]
gdf[['BeginningDateTime','geometry']].explore(tiles=xyz, attr=attr, 
                                              style_kwds={'fillOpacity':0.1})
Make this Notebook Trusted to load map: File -> Trust Notebook

7.5.4. Build the granule names for BioSCape SMCE S3 access#

In the above section, we retrieved the overlapping granules using the earthaccess module, which will show the granule download links of the NASA NSIDC DAAC.

granules[1].data_links()
['https://n5eil01u.ecs.nsidc.org/DP6/ICEBRIDGE/LVISF2.001/2023.10.28/LVISF2_BioSCape2023_1028_R2404_051949.TXT']

In the following codes, we will replace the overlapping granule names with the one from BioSCape S3 bucket.

lvis_f = []
for s in gdf.index:
    lvis_f.append(path.join('s3://bioscape-data/LVIS/L2/', 
                            path.basename(granules[s].data_links()[0])))
lvis_f
['s3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_037983.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_038553.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_039707.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_039795.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_040388.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_041561.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_043943.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_044034.TXT']

7.5.5. Read LVIS L2A Files#

Now, we can read the LVIS L2A files from the S3 bucket into a pandas dataframe.

# create a buffer around fire sites
poly_utm = poly_gdf.to_crs("EPSG:22234") # UTM 34 S
poly_utm.geometry = poly_utm.geometry.buffer(100) # 100m buffer
poly_wgs_gdf = poly_utm.to_crs("EPSG:4326")

# read the LVIS L2A files
lvis_l2a = []
for f in lvis_f:
    h_no, col_names = get_line_number(f)
    temp_df = pd.read_csv(f, skiprows=h_no, header=None, 
                          engine='python', sep=r'\s+')
    temp_df.columns =  [x.decode() for x in col_names]
    temp_gdf = gpd.GeoDataFrame(temp_df, 
                                geometry=gpd.points_from_xy(temp_df.GLON, 
                                                            temp_df.GLAT),
                                crs="EPSG:4326")
    temp_sub = gpd.sjoin(temp_gdf, poly_wgs_gdf, predicate='within')
    if not temp_sub.empty:
        print(f'Subsetting {path.basename(f)}')
        lvis_l2a.append(temp_sub)
lvis_l2a_gdf = pd.concat(lvis_l2a)
Subsetting LVISF2_BioSCape2023_1029_R2404_037983.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_038553.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_039795.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_040388.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_041561.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_043943.TXT

7.5.6. Height of Top Canopy#

Let’s plot RH100 (height of top canopy) values in a map.

cols = ['RH100', 'Name', 'description', 'geometry']
lvis_l2a_gdf[cols].explore(
    "RH100", cmap = "YlGn", alpha=0.5, radius=10, legend=True)
Make this Notebook Trusted to load map: File -> Trust Notebook

As we can see in the map above, the “Tall proteoid fynbos (unburned)” and “Milwood forest (unburned)” sites appear to have taller vegetation.

7.5.7. Relative Height Distribution of Burned vs Unburned sites#

We can generate a plot of RH metrics to check if the vegetation height across the percentile of waveform energy indicates the same.

fig, ax = plt.subplots(figsize=(7, 8))
for i in poly_gdf.index:
    pdesc = poly_gdf.loc[i].description
    pname = poly_gdf.loc[i].Name
    desc_df = lvis_l2a_gdf[lvis_l2a_gdf.description == pdesc]
    plot_df = desc_df[desc_df.Name == pname]
    plot_df = plot_df.filter(like='RH').drop('CHANNEL_RH', axis=1).T
    plot_df.index = plot_df.index.str.strip('RH').astype(int)
    style = 's:'
    if pdesc == "unburned":
        style = 'o-'
    std_df = plot_df.std(axis=1)
    median_df = plot_df.median(axis=1)
    median_df.plot(ax=ax, alpha=0.5, style=style, label=f"{pdesc}: {pname}")
    ax.fill_between(plot_df.index, median_df - std_df, 
                    median_df + std_df, alpha=0.1)

    plt.legend()
ax.set_xlabel("Percentile of waveform energy (%)")
ax.set_ylabel("Height (m)")
plt.show()
../../_images/1edc4ec637de3c06c8ac5df466bb5290c0458cd1d101544bbef7bc97bb809420.png
# exporting 
# lvis_l2a_gdf.geometry.to_file('lvis2.geojson', driver="GeoJSON")
# lvis_l2a_gdf.to_csv('lvis.csv')

7.6. GEDI L2A Canopy Height Metrics#

This section of the tutorial will demonstrate how to directly access and subset the GEDI L2A canopy height metrics using NASA’s Harmony Services and compute a summary of aboveground biomass density for a forest reserve. The Harmony API allows seamless access and production of analysis-ready Earth observation data across different DAACs by enabling cloud-based spatial, temporal, and variable subsetting and data conversions. The GEDI datasets are available from the Harmony API.

We will use NASA’s Harmony Services to retrieve the GEDI L2A dataset and canopy heights (RH100) for the burned and unburned plots . The Harmony API allows access to selected variables for the dataset within the spatial-temporal bounds without having to download the whole data file.

7.6.1. Dataset#

The GEDI Level 2A Geolocated Elevation and Height Metrics product GEDI02_A provides waveform interpretation and extracted products from eachreceived waveform, including ground elevation, canopy top height, and relative height (RH) metrics. GEDI datasets are available for the period starting 2019-04-17 and covers 52 N to 52 S latitudes. GEDI L2A data files are natively in HDF5 format.

7.6.2. Authentication#

NASA Harmony API requires NASA Earthdata Login (EDL). You can use the earthaccess Python library to set up authentication. Alternatively, you can also login to harmony_client directly by passing EDL authentication as the following in the Jupyter Notebook itself:

harmony_client = Client(auth=("your EDL username", "your EDL password"))

7.6.3. Create Harmony Client Object#

First, we create a Harmony Client object. If you are passing the EDL authentication, please do as shown above with the auth parameter.

harmony_client = Client()

7.6.4. Retrieve Concept ID#

Now, let’s retrieve the Concept ID of the GEDI L2A dataset. The Concept ID is NASA Earthdata’s unique ID for its dataset.

def get_concept_id(doi):
    """get concept id from DOI using CMR API"""
    doisearch = f'https://cmr.earthdata.nasa.gov/search/collections.json?doi={doi}' 
    return re.get(doisearch).json()['feed']['entry'][0]['id']

concept_l2a = get_concept_id('10.5067/GEDI/GEDI02_A.002') # GEDI L2A DOI
print(f"{concept_l2a}")
C2142771958-LPCLOUD

7.6.5. Define Request Parameters#

Let’s create a Harmony Collection object with the concept_id retrieved above. We will also define the GEDI L2A RH variables and temporal range.

# harmony collection
collection_l2a = Collection(id=concept_l2a)

def create_var_names(variables):
    # gedi beams
    beams = ['BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011']
    # combine variables and beam names
    return [f'/{b}/{v}' for b in beams for v in variables]

# gedi variables
variables_l2a = create_var_names(['rh', 'quality_flag'])

# time range
temporal_range = {'start': datetime(2019, 4, 17), 
                  'stop': datetime(2023, 3, 31)}

# bounding box for Harmony
b = poly_gdf.total_bounds
bounding_box = BBox(w=b[0], s=b[1], e=b[2], n=b[3])

7.6.6. Create and Submit Harmony Request#

Now, we can create a Harmony request with variables, temporal range, and bounding box and submit the request using the Harmony client object. We will use the download_all method, which uses a multithreaded downloader and returns a concurrent future. Futures are asynchronous and let us use the downloaded file as soon as the download is complete while other files are still being downloaded.

def submit_harmony(collection, variables):
    """submit harmony request"""
    request = Request(collection=collection, 
                  variables=variables, 
                  temporal=temporal_range,
                  spatial=bounding_box,
                  ignore_errors=True)
    # submit harmony request, will return job id
    subset_job_id = harmony_client.submit(request)
    return harmony_client.result_urls(subset_job_id, show_progress=True, 
                                      link_type=LinkType.s3)

results = submit_harmony(collection_l2a, variables_l2a)

A temporary S3 Credentials is needed for read-only, same-region (us-west-2), direct access to S3 objects on the Earthdata cloud. We will use the credentials from the harmony_client.

s3credentials = harmony_client.aws_credentials()

We will pass S3 credentials to S3Fs class S3FileSystem.

fs_s3 = s3fs.S3FileSystem(anon=False, 
                          key=s3credentials['aws_access_key_id'], 
                          secret=s3credentials['aws_secret_access_key'], 
                          token=s3credentials['aws_session_token'])

7.6.7. Read Subset files#

First, we will define a python function to read the GEDI variables, which are organized in a hierarchical way.

def read_gedi_vars(beam):
    """reads through gedi variable hierarchy"""
    col_names = []
    col_val = []
    # read all variables
    for key, value in beam.items():
        # check if the item is a group
        if isinstance(value, h5py.Group):
            # looping through subgroups
            for key2, value2 in value.items():
                col_names.append(key2)
                col_val.append(value2[:].tolist())
        else:
            col_names.append(key)
            col_val.append(value[:].tolist())
    return col_names, col_val

Let’s direct access the subsetted h5 files and retrieve its values into the pandas dataframe.

# define an empty pandas dataframe
subset_df = pd.DataFrame()
# loop through the Harmony results
for s3_url in results:
    print(s3_url)
    with fs_s3.open(s3_url, mode='rb') as fh:
        with h5py.File(fh) as l2a_in:
            for v in list(l2a_in.keys()):
                if v.startswith('BEAM'):
                    c_n, c_v = read_gedi_vars(l2a_in[v])
                    # Appending to the subset_df dataframe
                    subset_df = pd.concat([subset_df, 
                                           pd.DataFrame(map(list, zip(*c_v)), 
                                                        columns=c_n)])
 [ Processing:  45% ] |######################                             | [|]
Job is running with errors.
 [ Processing: 100% ] |###################################################| [|]
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964417/GEDI02_A_2020315071558_O10835_04_T05741_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964414/GEDI02_A_2020183200708_O08797_01_T02411_02_003_01_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964415/GEDI02_A_2020252081142_O09859_04_T11280_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964418/GEDI02_A_2020327023547_O11018_04_T07164_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964419/GEDI02_A_2020342202216_O11262_04_T05741_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964416/GEDI02_A_2020256063818_O09920_04_T08434_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964421/GEDI02_A_2021059114129_O12543_04_T07164_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964423/GEDI02_A_2021132153555_O13677_01_T03681_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964424/GEDI02_A_2021140122952_O13799_01_T06527_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964426/GEDI02_A_2021262033314_O15684_04_T05741_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964427/GEDI02_A_2021337213716_O16858_04_T06858_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964429/GEDI02_A_2021361121724_O17224_04_T08434_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964430/GEDI02_A_2022038193259_O17880_04_T02742_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964431/GEDI02_A_2022245094404_O21084_04_T07164_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964435/GEDI02_A_2023032212120_O23449_04_T00049_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964432/GEDI02_A_2022258132230_O21288_01_T09373_02_003_02_V002_subsetted.h5

Let’s remove the duplicate columns, if any, and print the first two rows of the pandas dataframe.

# remove duplicate columns
subset_df = subset_df.loc[:,~subset_df.columns.duplicated()].copy()
subset_df.head(2)
delta_time lat_lowestmode lon_lowestmode quality_flag rh
0 9.023263e+07 -34.307185 18.427758 1 [-3.9700000286102295, -3.4800000190734863, -3....
1 9.023263e+07 -34.307538 18.428207 1 [-4.039999961853027, -3.549999952316284, -3.22...

7.6.8. RH metrics#

Let’s first split the rh variables into different columns. There are a total of 101 steps of percentiles defined starting from 0 to 100%.

# create rh metrics column
rh = []
# loop through each percentile and create a RH column
for i in range(101):
    y = pd.DataFrame({f'RH{i}':subset_df['rh'].apply(lambda x: x[i])})
    rh.append(y)
rh = pd.concat(rh, axis=1)

# concatenate RH columns to the original dataframe
subset_df = pd.concat([subset_df, rh], axis=1)
# print the first row of dataframe
subset_df.head(2)
delta_time lat_lowestmode lon_lowestmode quality_flag rh RH0 RH1 RH2 RH3 RH4 ... RH91 RH92 RH93 RH94 RH95 RH96 RH97 RH98 RH99 RH100
0 9.023263e+07 -34.307185 18.427758 1 [-3.9700000286102295, -3.4800000190734863, -3.... -3.97 -3.48 -3.14 -2.92 -2.73 ... 1.61 1.68 1.76 1.87 1.98 2.09 2.24 2.43 2.73 3.37
1 9.023263e+07 -34.307538 18.428207 1 [-4.039999961853027, -3.549999952316284, -3.22... -4.04 -3.55 -3.22 -2.95 -2.77 ... 1.53 1.61 1.72 1.79 1.91 2.02 2.17 2.36 2.62 3.22

2 rows × 106 columns

gedi_gdf = gpd.GeoDataFrame(subset_df, 
                                geometry=gpd.points_from_xy(subset_df.lon_lowestmode, 
                                                            subset_df.lat_lowestmode),
                                crs="EPSG:4326")
gedi_sub = gpd.sjoin(gedi_gdf, poly_wgs_gdf, predicate='within')
gedi_sub.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
plotname = gedi_sub.Name.unique()
descrip = gedi_sub.description.unique()

fig, ax = plt.subplots(figsize=(7, 10))

for j, ditem in enumerate(descrip):
    desc_df = gedi_sub[gedi_sub.description == ditem]
    for i, item in enumerate(plotname):
        plot_df = desc_df[desc_df.Name == item]
        plot_df = plot_df.filter(like='RH').T
        plot_df.index = plot_df.index.str.strip('RH').astype(int)
        style = 's:'
        if ditem == "unburned":
            style = 'o-'
        plot_df.median(axis=1).plot(ax=ax, alpha=0.5,
                                    style=style, label=f"{ditem} site: {item}")
        plt.legend()
ax.set_xlabel("Percentile of waveform energy (%)")
ax.set_ylabel("Height (m)")
plt.show()
../../_images/e681a646d1bbb779863381fb9104b308b78d960c20f678d556556a66b2623402.png