Exploring and Visualizing BioSCape LVIS Data

7. Exploring and Visualizing BioSCape LVIS Data#

7.1. BioSCape Data Skills Workshop: From the Field to the Image#

Bioscape

BioSCape, the Biodiversity Survey of the Cape, is NASA’s first biodiversity-focused airborne and field campaign that was conducted in South Africa in 2023. BioSCape’s primary objective is to study the structure, function, and composition of the region’s ecosystems, and how and why they are changing.

BioSCape’s airborne dataset is unprecedented, with AVIRIS-NG, PRISM, and HyTES imaging spectrometers capturing spectral data across the UV, visible and infrared at high resolution and LVIS acquiring coincident full-waveform lidar. BioSCape’s field dataset is equally impressive, with 18 PI-led projects collecting data ranging from the diversity and phylogeny of plants, kelp and phytoplankton, eDNA, landscape acoustics, plant traits, blue carbon accounting, and more

This workshop will equip participants with the skills to find, subset, and visualize the various BioSCape field and airborne (imaging spectroscopy and full-waveform lidar) data sets. Participants will learn data skills through worked examples in terrestrial and aquatic ecosystems, including: wrangling lidar data, performing band math calculations, calculating spectral diversity metrics, machine learning and image classification, and mapping functional traits using partial least squares regression. The workshop format is a mix of expert talks and interactive coding notebooks and will be run through the BioSCape Cloud computing environment.

Date: October 9 - 11, 2024 Cape Town, South Africa

Host: NASA’s Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), in close collaboration with BioSCape, the South African Environmental Observation Network (SAEON), the University of Wisconsin Madison (Phil Townsend), The Nature Conservancy (Glenn Moncrieff), the University of California Merced (Erin Hestir), the University of Cape Town (Jasper Slingsby), Jet Propulsion Laboratory (Kerry Cawse-Nicholson), and UNESCO.

Instructors:

In-person contributors: Anabelle Cardoso, Erin Hestir, Phil Townsend, Henry Frye, Glenn Moncrieff, Jasper Slingsby, Michele Thornton, Rupesh Shrestha
Virtual contributors: Kerry Cawse-Nicholson, Nico Stork, Kyle Kovach

Audience: This training is primarily intended for government natural resource management agency representatives and field technicians in South Africa, as well as local academics and students, especially those connected to the BioSCape Team.

7.2. Overview#

This tutorial will demonstrate accessing and visualizing Land, Vegetation, and Ice Sensor (LVIS) data available on the BioSCape SMCE. LVIS is an airborne, wide-swath imaging full-waveform laser altimeter. LVIS collects in a 1064 nm-wavelength (near infrared) range with 3 detectors mounted in an airborne platform, flown typically ~10 km above the ground producing a data swath of 2km wide with 7-10 m footprints. More information about the LVIS instrument is here.

The LVIS instrument has been flown above several regions of the world since 1998. The LVIS flights were collected for the BioSCAPE campaigns from 01 Oct 2023 through 30 Nov 2023.

There are three LVIS BioSCape data products currently available and archived at NSIDC DAAC

Dataset	Dataset DOI
LVIS L1A Geotagged Images V001	10.5067/NE5KKKBAQG44
LVIS Facility L1B Geolocated Return Energy Waveforms V001	10.5067/XQJ8PN8FTIDG
LVIS Facility L2 Geolocated Surface Elevation and Canopy Height Product V001	10.5067/VP7J20HJQISD

For this tutorial, we’ll examine the LVIS L2 Geolocated Surface Elevation and Canopy Height Product V001

Citation: Blair, J. B. & Hofton, M. (2020). LVIS Facility L2 Geolocated Surface Elevation and Canopy Height Product, Version 1 [Data Set]. Boulder, Colorado USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/VP7J20HJQISD.
User Guide LVIS Facility L2 Geolocated Surface Elevation and Canopy Height Product, Version 1

LVIS Waveform Figure: Sample LVIS product waveforms illustrating possible distributions of reflected light (Blair et al., 2020)

from glob import glob
import numpy as np
import pandas as pd
import earthaccess
import geopandas as gpd
import requests as re
import s3fs
import h5py
from os import path
from datetime import datetime
from shapely.geometry import Polygon, MultiPolygon
import matplotlib.pyplot as plt
from shapely.geometry import box
from harmony import BBox,Client, Collection, Request, LinkType

# esri background basemap for maps
xyz = "https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}"
attr = "ESRI"

The LVIS data from the BioSCape project are available at the SMCE S3 bucket. Let’s list the LVIS folders at the BioSCape SMCE.

s3 = s3fs.S3FileSystem(anon=False)
lvis = s3.ls('bioscape-data/LVIS')
lvis

['bioscape-data/LVIS/L1B', 'bioscape-data/LVIS/L2']

Both LVIS L1B and L2 products are available at the S3 bucket.

7.3. Structure LVIS L1B Geolocated Waveform#

First, let’s find out how many L1B files are there on the SMCE.

count_l1 = len(s3.ls('bioscape-data/LVIS/L1B'))
print(f"There are {count_l1} LVIS L1 granules")

There are 2328 LVIS L1 granules

Now, we go ahead and open a file to look at its structure.

l1_first = 'bioscape-data/LVIS/L1B/LVISF1B_BioSCape2023_1109_R2404_047611.h5'
with s3.open(l1_first, mode='rb') as fh:
    with h5py.File(fh) as hf:
        for v in list(hf.keys()):
            if v != "ancillary_data":
                print(f"- {v} : {hf[v].attrs['description'][0].decode('utf-8')}")

- AZIMUTH : Azimuth angle of laser beam
- INCIDENTANGLE : Off-nadir incident angle of laser beam
- LAT0 : Latitude of the highest sample of the return waveform (degrees north) 
- LAT1215 : Latitude of the lowest sample of the return waveform (degrees north) 
- LFID : LVIS file identification.  Together with 'LVIS shotnumber' these are a unique identifier for every LVIS laser shot. Format is XXYYYYYZZZ where XX identifies instrument version, YYYYY is the Modified Julian Date of the flight departure day, ZZZ represents file number.
- LON0 : Longitude of the highest sample of the return waveform (degrees east)
- LON1215 : Longitude of the lowest sample of the return waveform (degrees east) 
- RANGE : Distance along laser path from the instrument to the ground
- RXWAVE : Returned waveform (1216 samples long)
- SHOTNUMBER : Laser shot number assigned during collection.  Together with 'LVIS LFID' provides a unique identifier to every LVIS laser shot.
- SIGMEAN : Signal mean noise level, calculated in-flight
- TIME : UTC decimal seconds of the day
- TXWAVE : Transmitted waveform (128 samples long)
- Z0 : Elevation of the highest sample of the waveform with respect to the reference ellipsoid
- Z1215 : Elevation of the lowest sample of the waveform with respect to the reference ellipsoid

with s3.open(l1_first, mode='rb') as fh:
    with h5py.File(fh) as hf:
        l_lat = hf['LAT1215'][:]
        l_lon = hf['LON1215'][:]
        l_range = hf['Z1215'][:] # ground elevation
        shot = hf['SHOTNUMBER'][:] # ground elevation
geo_arr = list(zip(shot, l_range,l_lat,l_lon))
l1b_df = pd.DataFrame(geo_arr, columns=["shot_number", "elevation", "lat", "lon"])
l1b_df

	shot_number	elevation	lat	lon
0	82378820	59.477188	-34.699020	19.651972
1	82378821	58.855934	-34.699012	19.651895
2	82378822	59.752430	-34.699084	19.651964
3	82378823	59.403404	-34.699076	19.651888
4	82378824	60.092033	-34.699148	19.651957
...	...	...	...	...
349995	82729294	-8.561060	-34.678439	19.528331
349996	82729295	-9.040883	-34.678432	19.528254
349997	82729296	-8.609493	-34.678375	19.528340
349998	82729297	-8.090749	-34.678367	19.528263
349999	82729298	-8.993920	-34.678310	19.528348

350000 rows × 4 columns

l1b_gdf = gpd.GeoDataFrame(l1b_df, crs="EPSG:4326",
                           geometry=gpd.points_from_xy(l1b_df.lon, 
                                                       l1b_df.lat))
l1b_gdf.sample(frac=0.01).explore("elevation", cmap = "plasma", tiles=xyz, attr=attr)

Make this Notebook Trusted to load map: File -> Trust Notebook

i = np.where(shot==82446258)[0][0]
with s3.open(l1_first, mode='rb') as fh:
    with h5py.File(fh) as hf:
        rxwaveform = hf['RXWAVE'][i, :] # waveform
        elev_1215 = hf['Z1215'][i] # elevation ground
        elev_0 = hf['Z0'][i] # elevation top
c = len(rxwaveform) #
elev = np.linspace(elev_1215, elev_0, num=c)
# plot
plt.rcParams["figure.figsize"] = (3,10)
plt.xlabel("rxwaveform (counts)")
plt.ylabel("elevation (m)")
plt.plot(rxwaveform, elev, linestyle='--', marker='.',)
plt.show()

../../_images/9ab0a3d19405f1ec4a6598db3cc3e815626c39de7415c52634aa04141ae8a511.png

7.4. Structure of LVIS L2 Canopy Height Product#

First, let’s see how many L2 files are there at BioSCape SMCE.

count_l2 = len(s3.ls('bioscape-data/LVIS/L2'))
print(f"There are {count_l2} LVIS L2 granules")

There are 2327 LVIS L2 granules

Now, we’ll go ahead and open a file to look at it’s structure.

l2_first = 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1109_R2404_047611.TXT'
with s3.open(l2_first, 'rb') as f:
    head = [next(f) for _ in range(15)]
    print(head)

[b"# Level2 (v2.0.5) data set collected by NASA's Land, Vegetation and Ice Sensor (LVIS) Facility in October-November 2023 as part of NASA's BioSCape Field Campaign. This file contains elevation,\n", b"# height and surface structure metrics derived from LVIS's geolocated laser return waveform. The corresponding spatially geolocated laser return waveforms are contained in the Level 1B data set.\n", b'# For supporting information please visit https://lvis.gsfc.nasa.gov/Data/Maps/SouthAfrica2023Map.html. LVIS-targeted data collections occurred at flight altitudes of ~7 km above the surface from\n', b'# the NASA G-V, however data collections also occurred at lower altitudes. The "Range" parameter can be used to determine flight altitude. Note: Two alternate ground elevations options are provided\n', b'# in order to provide flexibility for local site conditions; if selected for use, adjust the values of the RH parameters accordingly. Please consult the User Guide for further information. The data\n', b'# have had limited quality assurance checking, although obvious lower quality data have been removed, e.g., clouds and cloud-obscured returns. We highly recommend that end users contact us about\n', b'# any idiosyncrasies or unexpected results.\n', b'#\n', b'# LFID SHOTNUMBER TIME GLON GLAT ZG ZG_ALT1 ZG_ALT2 HLON HLAT ZH TLON TLAT ZT RH10 RH15 RH20 RH25 RH30 RH35 RH40 RH45 RH50 RH55 RH60 RH65 RH70 RH75 RH80 RH85 RH90 RH95 RH96 RH97 RH98 RH99 RH100 AZIMUTH INCIDENTANGLE RANGE COMPLEXITY SENSITIVITY CHANNEL_ZT CHANNEL_ZG CHANNEL_RH\n', b'1960257687  82378820   47611.75433   19.651985  -34.699046    99.55    99.55    99.55   19.651985  -34.699046    99.55   19.651985  -34.699047   100.79   -0.86   -0.67   -0.56   -0.45   -0.37   -0.30   -0.22   -0.15   -0.07   -0.04    0.04    0.07    0.15    0.22    0.30    0.37    0.45    0.60    0.63    0.71    0.75    0.86    1.23  336.95    4.480   6340.45 0.250 0.955  2  2  2\n', b'1960257687  82378821   47611.75458   19.651909  -34.699038    99.57    99.57    99.57   19.651909  -34.699038    99.57   19.651909  -34.699039   100.76   -0.86   -0.67   -0.56   -0.45   -0.37   -0.30   -0.22   -0.19   -0.11   -0.04    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.60    0.63    0.67    0.75    0.86    1.20  336.26    4.511   6339.85 0.241 0.941  2  2  2\n', b'1960257687  82378822   47611.75483   19.651978  -34.699109    99.91    99.91    99.91   19.651978  -34.699109    99.91   19.651978  -34.699110   101.10   -0.93   -0.75   -0.60   -0.49   -0.41   -0.34   -0.26   -0.19   -0.11   -0.07    0.00    0.04    0.11    0.19    0.26    0.34    0.41    0.56    0.60    0.67    0.71    0.82    1.20  336.57    4.423   6339.25 0.291 0.944  2  2  2\n', b'1960257687  82378823   47611.75508   19.651902  -34.699102    99.93    99.93    99.93   19.651902  -34.699102    99.93   19.651902  -34.699103   101.20   -0.90   -0.71   -0.60   -0.49   -0.37   -0.30   -0.22   -0.19   -0.11   -0.04    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.60    0.63    0.71    0.78    0.90    1.27  335.87    4.455   6340.00 0.265 0.953  2  2  2\n', b'1960257687  82378824   47611.75533   19.651970  -34.699173   100.21   100.21   100.21   19.651970  -34.699173   100.21   19.651970  -34.699174   101.41   -0.90   -0.71   -0.56   -0.45   -0.37   -0.30   -0.22   -0.15   -0.11   -0.04    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.56    0.60    0.67    0.75    0.82    1.20  336.16    4.368   6338.05 0.255 0.949  2  2  2\n', b'1960257687  82378825   47611.75558   19.651894  -34.699165   100.20   100.20   100.20   19.651894  -34.699165   100.20   19.651895  -34.699166   101.44   -0.93   -0.75   -0.60   -0.49   -0.41   -0.34   -0.26   -0.19   -0.11   -0.07    0.00    0.07    0.11    0.19    0.26    0.34    0.45    0.60    0.64    0.67    0.75    0.86    1.23  335.46    4.400   6338.50 0.327 0.950  2  2  2\n']

LVIS L2A files have a number of lines as header as shown above. Let’s define a python function to read the number of header rows.

def get_line_number(filename):
    """find number of header rows in LVIS L2A"""
    count = 0
    with s3.open(filename, 'rb') as f:
        for line in f:
            if line.startswith(b'#'):
                count = count + 1
                columns = line[1:].split()
            else:
                return count, columns

h_no, col_names = get_line_number(l2_first)
with s3.open(l2_first, 'rb') as f:
    l2a_df = pd.read_csv(f, skiprows=h_no, header=None, engine='python', sep=r'\s+')
    l2a_df.columns =  [x.decode() for x in col_names]
l2a_df

	LFID	SHOTNUMBER	TIME	GLON	GLAT	ZG	ZG_ALT1	ZG_ALT2	HLON	HLAT	...	RH99	RH100	AZIMUTH	INCIDENTANGLE	RANGE	COMPLEXITY	SENSITIVITY	CHANNEL_ZT	CHANNEL_ZG	CHANNEL_RH
0	1960257687	82378820	47611.75433	19.651985	-34.699046	99.55	99.55	99.55	19.651985	-34.699046	...	0.86	1.23	336.95	4.480	6340.45	0.250	0.955	2	2	2
1	1960257687	82378821	47611.75458	19.651909	-34.699038	99.57	99.57	99.57	19.651909	-34.699038	...	0.86	1.20	336.26	4.511	6339.85	0.241	0.941	2	2	2
2	1960257687	82378822	47611.75483	19.651978	-34.699109	99.91	99.91	99.91	19.651978	-34.699109	...	0.82	1.20	336.57	4.423	6339.25	0.291	0.944	2	2	2
3	1960257687	82378823	47611.75508	19.651902	-34.699102	99.93	99.93	99.93	19.651902	-34.699102	...	0.90	1.27	335.87	4.455	6340.00	0.265	0.953	2	2	2
4	1960257687	82378824	47611.75533	19.651970	-34.699173	100.21	100.21	100.21	19.651970	-34.699173	...	0.82	1.20	336.16	4.368	6338.05	0.255	0.949	2	2	2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
349995	1960257689	82729294	47699.37382	19.528347	-34.678446	32.62	32.62	32.62	19.528347	-34.678446	...	0.75	1.12	298.96	2.287	6392.51	0.076	0.980	1	1	1
349996	1960257689	82729295	47699.37407	19.528270	-34.678439	32.26	32.26	32.26	19.528270	-34.678439	...	0.74	1.30	298.38	2.346	6392.81	0.076	0.991	1	2	1
349997	1960257689	82729296	47699.37432	19.528356	-34.678382	32.57	32.57	32.57	19.528356	-34.678382	...	0.71	1.16	300.44	2.312	6392.96	0.174	0.987	1	1	1
349998	1960257689	82729297	47699.37457	19.528279	-34.678375	32.49	32.49	32.49	19.528279	-34.678375	...	0.71	1.16	299.82	2.370	6392.21	0.076	0.981	1	1	1
349999	1960257689	82729298	47699.37482	19.528364	-34.678318	32.62	32.62	32.62	19.528364	-34.678318	...	0.72	1.17	301.87	2.339	6392.51	0.116	0.987	1	2	1

350000 rows × 45 columns

Let’s print LVIS L2 variable names.

l2a_df.columns.values

array(['LFID', 'SHOTNUMBER', 'TIME', 'GLON', 'GLAT', 'ZG', 'ZG_ALT1',
       'ZG_ALT2', 'HLON', 'HLAT', 'ZH', 'TLON', 'TLAT', 'ZT', 'RH10',
       'RH15', 'RH20', 'RH25', 'RH30', 'RH35', 'RH40', 'RH45', 'RH50',
       'RH55', 'RH60', 'RH65', 'RH70', 'RH75', 'RH80', 'RH85', 'RH90',
       'RH95', 'RH96', 'RH97', 'RH98', 'RH99', 'RH100', 'AZIMUTH',
       'INCIDENTANGLE', 'RANGE', 'COMPLEXITY', 'SENSITIVITY',
       'CHANNEL_ZT', 'CHANNEL_ZG', 'CHANNEL_RH'], dtype=object)

Key variables:

The GLAT and GLON variables provide coordinates of the lowest detected mode within the LVIS waveform.
The RHXX variables provide height above the lowest detected mode at which XX percentile of the waveform energy.
RANGE provides the distance from the instrument to the ground.
SENSITIVITY provides sensitivity metric for the return waveform.
ZG,ZG_ALT1, ZG_ALT2: Mean elevation of the lowest detected mode using alternate 1/2 mode detection settings. The alternative ZG to use to adjust the RH parameters for local site conditions.

More information about the variables are provided in the user guide.

Let’s select the particular shot we are interested in.

l2a_shot_df = l2a_df[l2a_df['SHOTNUMBER'] == 82446258]
l2a_shot_df

	LFID	SHOTNUMBER	TIME	GLON	GLAT	ZG	ZG_ALT1	ZG_ALT2	HLON	HLAT	...	RH99	RH100	AZIMUTH	INCIDENTANGLE	RANGE	COMPLEXITY	SENSITIVITY	CHANNEL_ZT	CHANNEL_ZG	CHANNEL_RH
67405	1960257687	82446258	47628.61402	19.627168	-34.703296	131.79	131.79	131.79	19.627174	-34.703288	...	11.41	12.79	209.65	5.433	6319.34	0.312	0.993	1	1	1

1 rows × 45 columns

Now, we can plot the RH metrics with elevation.

elev_zg = l2a_shot_df.ZG.values[0]
elev_zh = l2a_shot_df.ZH.values[0]
rh40 = l2a_shot_df.RH40.values[0]
rh80 = l2a_shot_df.RH80.values[0]
rh100 = l2a_shot_df.RH100.values[0]
rh = l2a_shot_df.filter(like ='RH').drop('CHANNEL_RH', axis=1).values.tolist()[0]

# plotting
plt.rcParams["figure.figsize"] = (5,7)
rh_vals = l2a_shot_df.filter(like = 'RH').columns[:-1].str.strip('RH').astype(int).tolist()
fig, ax1 = plt.subplots()
ax1.plot(rh_vals, elev_zg+rh, alpha=0.3, marker='o', color='black', label='RH metrics' )
ax1.axhline(y=elev_zg+rh40,  color='g', linestyle='dotted', linewidth=2, label='RH40')
ax1.axhline(y=elev_zg+rh80,  color='g', linestyle='-.', linewidth=2, label='RH80')
ax1.axhline(y=elev_zg+rh100,  color='g', linestyle='--', linewidth=3, label='RH100')
ax1.set_xlabel("Percentile of waveform energy (%)")
ax1.set_ylabel("elevation (meters)")
ax1.legend(loc="lower right")
plt.show()

../../_images/d8d04bf66ec7e325f43bb0cf23fa503f7ab327cfa10d862382166e3d2694f6cb.png

7.5. Search LVIS L2A granules over Bioscape Campaign#

We will use earthaccess module to search the LVIS granules using their bounding polygons.

temporal = ("2023-10-01", "2023-11-30") # Bioscape campaign dates
doi = "10.5067/VP7J20HJQISD" # LVIS L2A doi
granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    doi=doi,
    temporal=temporal
)

Granules found: 2328

There are 2328 granules LVIS L2 product found for the time period.

7.5.1. Plot LVIS Bioscape Campaigns#

Lets first define two functions that we can use to plot the search results above over a basemap.

def convert_umm_geometry(gpoly):
    """converts UMM geometry to multipolygons"""
    multipolygons = []
    for gl in gpoly:
        ltln = gl["Boundary"]["Points"]
        points = [(p["Longitude"], p["Latitude"]) for p in ltln]
        multipolygons.append(Polygon(points))
    return MultiPolygon(multipolygons)

def convert_list_gdf(datag):
    """converts List[] to geopandas dataframe"""
    # create pandas dataframe from json
    df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
    # keep only last string of the column names
    df.columns=df.columns.str.split('.').str[-1]
    # convert polygons to multipolygonal geometry
    df["geometry"] = df["GPolygons"].apply(convert_umm_geometry)
    # return geopandas dataframe
    return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")

Now, we will convert the JSON return from the earthaccess granule search into a geopandas dataframe so we could plot over an ESRI basemap.

gdf = convert_list_gdf(granules)
#plot
gdf[['BeginningDateTime','geometry']].explore(tiles=xyz, attr=attr)

Make this Notebook Trusted to load map: File -> Trust Notebook

The above shows the flight lines of all LVIS collection during the BioSCape campaign in 2023.

7.5.2. Burned and Unburned Areas#

The following are the burned and unburned sites at Cape Penninsula.

poly_gdf = gpd.read_file("assets/burned_unburned_sites.json", crs="EPSG:4326")
# bounding box
bounding_box = poly_gdf.total_bounds
# plot
m1 = poly_gdf.explore(categorical=True, column='description', cmap='Paired', 
                      tiles=xyz, attr=attr, marker_kwds={'radius':10})
m1

Make this Notebook Trusted to load map: File -> Trust Notebook

Print the poly_gdf geopandas dataframe.

poly_gdf

	Name	description	geometry
0	Milkwood forest	unburned	POINT (18.41627 -34.31636)
1	Tall proteoid fynbos on marine sands	unburned	POINT (18.43615 -34.30710)
2	Low wet restioid fynbos	unburned	POINT (18.43210 -34.31315)
3	Low wet restioid fynbos	burned	POINT (18.42820 -34.31069)
4	Dry fynbos on ridge with sparse tall Proteaceae	burned	POINT (18.43254 -34.31609)
5	Dry fynbos on ridge with sparse tall Proteaceae	unburned	POINT (18.42938 -34.31610)

The above shows some burned and unburned sites around Cape Point. Milkwood forest has the tallest vegetation (±5m), followed by Tall proteoid (3-4m). The other four sites show pairs of very low vegetation (0.5-1.5m) across a scar from an old fire (March 2016). The wet restioid should be uniformly low, while the Dry fynbos has sparse taller shrubs.

7.5.3. Search LVIS L2A granules over the burned and unburned sites#

Let’s search for LVIS L2A granules that are within the bounds of the above sites.

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    doi=doi, # LVIS L2A doi
    temporal=temporal, # Bioscape campaign dates
    bounding_box = tuple(poly_gdf.total_bounds.tolist())
)

Granules found: 11

Let’s check when these LVIS flights were flown.

gdf = convert_list_gdf(granules)
gdf['BeginningDateTime'] = pd.to_datetime(gdf['BeginningDateTime'])
gdf_daily = gdf.resample(rule='D', on='BeginningDateTime')['concept-id'].nunique()
gdf_daily.index = pd.to_datetime(gdf_daily.index).date
gdf_daily.plot(kind="bar", xlabel="Day of Flight", ylabel="No. of flights")
plt.show()

../../_images/9a303387a8216790bf5eab4e4eaf817a1165c812533aa1e865a208560968fcb0.png

For simplicity, we will limit the granules from the LVIS flights for a single date. Let’s plot these flight lines over a basemap.

lvis_day = '2023-10-29'
gdf['BeginningDateTime'] = gdf['BeginningDateTime'].astype(str)
gdf = gdf[gdf['BeginningDateTime'].str.startswith(lvis_day)]
gdf[['BeginningDateTime','geometry']].explore(tiles=xyz, attr=attr, 
                                              style_kwds={'fillOpacity':0.1})

Make this Notebook Trusted to load map: File -> Trust Notebook

7.5.4. Build the granule names for BioSCape SMCE S3 access#

In the above section, we retrieved the overlapping granules using the earthaccess module, which will show the granule download links of the NASA NSIDC DAAC.

granules[1].data_links()

['https://n5eil01u.ecs.nsidc.org/DP6/ICEBRIDGE/LVISF2.001/2023.10.28/LVISF2_BioSCape2023_1028_R2404_051949.TXT']

In the following codes, we will replace the overlapping granule names with the one from BioSCape S3 bucket.

lvis_f = []
for s in gdf.index:
    lvis_f.append(path.join('s3://bioscape-data/LVIS/L2/', 
                            path.basename(granules[s].data_links()[0])))
lvis_f

['s3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_037983.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_038553.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_039707.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_039795.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_040388.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_041561.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_043943.TXT',
 's3://bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1029_R2404_044034.TXT']

7.5.5. Read LVIS L2A Files#

Now, we can read the LVIS L2A files from the S3 bucket into a pandas dataframe.

# create a buffer around fire sites
poly_utm = poly_gdf.to_crs("EPSG:22234") # UTM 34 S
poly_utm.geometry = poly_utm.geometry.buffer(100) # 100m buffer
poly_wgs_gdf = poly_utm.to_crs("EPSG:4326")

# read the LVIS L2A files
lvis_l2a = []
for f in lvis_f:
    h_no, col_names = get_line_number(f)
    temp_df = pd.read_csv(f, skiprows=h_no, header=None, 
                          engine='python', sep=r'\s+')
    temp_df.columns =  [x.decode() for x in col_names]
    temp_gdf = gpd.GeoDataFrame(temp_df, 
                                geometry=gpd.points_from_xy(temp_df.GLON, 
                                                            temp_df.GLAT),
                                crs="EPSG:4326")
    temp_sub = gpd.sjoin(temp_gdf, poly_wgs_gdf, predicate='within')
    if not temp_sub.empty:
        print(f'Subsetting {path.basename(f)}')
        lvis_l2a.append(temp_sub)
lvis_l2a_gdf = pd.concat(lvis_l2a)

Subsetting LVISF2_BioSCape2023_1029_R2404_037983.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_038553.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_039795.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_040388.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_041561.TXT
Subsetting LVISF2_BioSCape2023_1029_R2404_043943.TXT

7.5.6. Height of Top Canopy#

Let’s plot RH100 (height of top canopy) values in a map.

cols = ['RH100', 'Name', 'description', 'geometry']
lvis_l2a_gdf[cols].explore(
    "RH100", cmap = "YlGn", alpha=0.5, radius=10, legend=True)

Make this Notebook Trusted to load map: File -> Trust Notebook

As we can see in the map above, the “Tall proteoid fynbos (unburned)” and “Milwood forest (unburned)” sites appear to have taller vegetation.

7.5.7. Relative Height Distribution of Burned vs Unburned sites#

We can generate a plot of RH metrics to check if the vegetation height across the percentile of waveform energy indicates the same.

fig, ax = plt.subplots(figsize=(7, 8))
for i in poly_gdf.index:
    pdesc = poly_gdf.loc[i].description
    pname = poly_gdf.loc[i].Name
    desc_df = lvis_l2a_gdf[lvis_l2a_gdf.description == pdesc]
    plot_df = desc_df[desc_df.Name == pname]
    plot_df = plot_df.filter(like='RH').drop('CHANNEL_RH', axis=1).T
    plot_df.index = plot_df.index.str.strip('RH').astype(int)
    style = 's:'
    if pdesc == "unburned":
        style = 'o-'
    std_df = plot_df.std(axis=1)
    median_df = plot_df.median(axis=1)
    median_df.plot(ax=ax, alpha=0.5, style=style, label=f"{pdesc}: {pname}")
    ax.fill_between(plot_df.index, median_df - std_df, 
                    median_df + std_df, alpha=0.1)

    plt.legend()
ax.set_xlabel("Percentile of waveform energy (%)")
ax.set_ylabel("Height (m)")
plt.show()

../../_images/1edc4ec637de3c06c8ac5df466bb5290c0458cd1d101544bbef7bc97bb809420.png

# exporting 
# lvis_l2a_gdf.geometry.to_file('lvis2.geojson', driver="GeoJSON")
# lvis_l2a_gdf.to_csv('lvis.csv')

7.6. GEDI L2A Canopy Height Metrics#

This section of the tutorial will demonstrate how to directly access and subset the GEDI L2A canopy height metrics using NASA’s Harmony Services and compute a summary of aboveground biomass density for a forest reserve. The Harmony API allows seamless access and production of analysis-ready Earth observation data across different DAACs by enabling cloud-based spatial, temporal, and variable subsetting and data conversions. The GEDI datasets are available from the Harmony API.

We will use NASA’s Harmony Services to retrieve the GEDI L2A dataset and canopy heights (RH100) for the burned and unburned plots . The Harmony API allows access to selected variables for the dataset within the spatial-temporal bounds without having to download the whole data file.

7.6.1. Dataset#

The GEDI Level 2A Geolocated Elevation and Height Metrics product GEDI02_A provides waveform interpretation and extracted products from eachreceived waveform, including ground elevation, canopy top height, and relative height (RH) metrics. GEDI datasets are available for the period starting 2019-04-17 and covers 52 N to 52 S latitudes. GEDI L2A data files are natively in HDF5 format.

7.6.2. Authentication#

NASA Harmony API requires NASA Earthdata Login (EDL). You can use the earthaccess Python library to set up authentication. Alternatively, you can also login to harmony_client directly by passing EDL authentication as the following in the Jupyter Notebook itself:

harmony_client = Client(auth=("your EDL username", "your EDL password"))

7.6.3. Create Harmony Client Object#

First, we create a Harmony Client object. If you are passing the EDL authentication, please do as shown above with the auth parameter.

harmony_client = Client()

7.6.4. Retrieve Concept ID#

Now, let’s retrieve the Concept ID of the GEDI L2A dataset. The Concept ID is NASA Earthdata’s unique ID for its dataset.

def get_concept_id(doi):
    """get concept id from DOI using CMR API"""
    doisearch = f'https://cmr.earthdata.nasa.gov/search/collections.json?doi={doi}' 
    return re.get(doisearch).json()['feed']['entry'][0]['id']

concept_l2a = get_concept_id('10.5067/GEDI/GEDI02_A.002') # GEDI L2A DOI
print(f"{concept_l2a}")

C2142771958-LPCLOUD

7.6.5. Define Request Parameters#

Let’s create a Harmony Collection object with the concept_id retrieved above. We will also define the GEDI L2A RH variables and temporal range.

# harmony collection
collection_l2a = Collection(id=concept_l2a)

def create_var_names(variables):
    # gedi beams
    beams = ['BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011']
    # combine variables and beam names
    return [f'/{b}/{v}' for b in beams for v in variables]

# gedi variables
variables_l2a = create_var_names(['rh', 'quality_flag'])

# time range
temporal_range = {'start': datetime(2019, 4, 17), 
                  'stop': datetime(2023, 3, 31)}

# bounding box for Harmony
b = poly_gdf.total_bounds
bounding_box = BBox(w=b[0], s=b[1], e=b[2], n=b[3])

7.6.6. Create and Submit Harmony Request#

Now, we can create a Harmony request with variables, temporal range, and bounding box and submit the request using the Harmony client object. We will use the download_all method, which uses a multithreaded downloader and returns a concurrent future. Futures are asynchronous and let us use the downloaded file as soon as the download is complete while other files are still being downloaded.

def submit_harmony(collection, variables):
    """submit harmony request"""
    request = Request(collection=collection, 
                  variables=variables, 
                  temporal=temporal_range,
                  spatial=bounding_box,
                  ignore_errors=True)
    # submit harmony request, will return job id
    subset_job_id = harmony_client.submit(request)
    return harmony_client.result_urls(subset_job_id, show_progress=True, 
                                      link_type=LinkType.s3)

results = submit_harmony(collection_l2a, variables_l2a)

A temporary S3 Credentials is needed for read-only, same-region (us-west-2), direct access to S3 objects on the Earthdata cloud. We will use the credentials from the harmony_client.

s3credentials = harmony_client.aws_credentials()

We will pass S3 credentials to S3Fs class S3FileSystem.

fs_s3 = s3fs.S3FileSystem(anon=False, 
                          key=s3credentials['aws_access_key_id'], 
                          secret=s3credentials['aws_secret_access_key'], 
                          token=s3credentials['aws_session_token'])

7.6.7. Read Subset files#

First, we will define a python function to read the GEDI variables, which are organized in a hierarchical way.

def read_gedi_vars(beam):
    """reads through gedi variable hierarchy"""
    col_names = []
    col_val = []
    # read all variables
    for key, value in beam.items():
        # check if the item is a group
        if isinstance(value, h5py.Group):
            # looping through subgroups
            for key2, value2 in value.items():
                col_names.append(key2)
                col_val.append(value2[:].tolist())
        else:
            col_names.append(key)
            col_val.append(value[:].tolist())
    return col_names, col_val

Let’s direct access the subsetted h5 files and retrieve its values into the pandas dataframe.

# define an empty pandas dataframe
subset_df = pd.DataFrame()
# loop through the Harmony results
for s3_url in results:
    print(s3_url)
    with fs_s3.open(s3_url, mode='rb') as fh:
        with h5py.File(fh) as l2a_in:
            for v in list(l2a_in.keys()):
                if v.startswith('BEAM'):
                    c_n, c_v = read_gedi_vars(l2a_in[v])
                    # Appending to the subset_df dataframe
                    subset_df = pd.concat([subset_df, 
                                           pd.DataFrame(map(list, zip(*c_v)), 
                                                        columns=c_n)])

 [ Processing:  45% ] |######################                             | [|]
Job is running with errors.
 [ Processing: 100% ] |###################################################| [|]

s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964417/GEDI02_A_2020315071558_O10835_04_T05741_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964414/GEDI02_A_2020183200708_O08797_01_T02411_02_003_01_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964415/GEDI02_A_2020252081142_O09859_04_T11280_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964418/GEDI02_A_2020327023547_O11018_04_T07164_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964419/GEDI02_A_2020342202216_O11262_04_T05741_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964416/GEDI02_A_2020256063818_O09920_04_T08434_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964421/GEDI02_A_2021059114129_O12543_04_T07164_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964423/GEDI02_A_2021132153555_O13677_01_T03681_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964424/GEDI02_A_2021140122952_O13799_01_T06527_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964426/GEDI02_A_2021262033314_O15684_04_T05741_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964427/GEDI02_A_2021337213716_O16858_04_T06858_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964429/GEDI02_A_2021361121724_O17224_04_T08434_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964430/GEDI02_A_2022038193259_O17880_04_T02742_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964431/GEDI02_A_2022245094404_O21084_04_T07164_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964435/GEDI02_A_2023032212120_O23449_04_T00049_02_003_02_V002_subsetted.h5
s3://harmony-prod-staging/public/3caed903-c2ed-4b20-a64c-67242c2893c1/80964432/GEDI02_A_2022258132230_O21288_01_T09373_02_003_02_V002_subsetted.h5

Let’s remove the duplicate columns, if any, and print the first two rows of the pandas dataframe.

# remove duplicate columns
subset_df = subset_df.loc[:,~subset_df.columns.duplicated()].copy()
subset_df.head(2)

	delta_time	lat_lowestmode	lon_lowestmode	quality_flag	rh
0	9.023263e+07	-34.307185	18.427758	1	[-3.9700000286102295, -3.4800000190734863, -3....
1	9.023263e+07	-34.307538	18.428207	1	[-4.039999961853027, -3.549999952316284, -3.22...

7.6.8. RH metrics#

Let’s first split the rh variables into different columns. There are a total of 101 steps of percentiles defined starting from 0 to 100%.

# create rh metrics column
rh = []
# loop through each percentile and create a RH column
for i in range(101):
    y = pd.DataFrame({f'RH{i}':subset_df['rh'].apply(lambda x: x[i])})
    rh.append(y)
rh = pd.concat(rh, axis=1)

# concatenate RH columns to the original dataframe
subset_df = pd.concat([subset_df, rh], axis=1)
# print the first row of dataframe
subset_df.head(2)

	delta_time	lat_lowestmode	lon_lowestmode	quality_flag	rh	RH0	RH1	RH2	RH3	RH4	...	RH91	RH92	RH93	RH94	RH95	RH96	RH97	RH98	RH99	RH100
0	9.023263e+07	-34.307185	18.427758	1	[-3.9700000286102295, -3.4800000190734863, -3....	-3.97	-3.48	-3.14	-2.92	-2.73	...	1.61	1.68	1.76	1.87	1.98	2.09	2.24	2.43	2.73	3.37
1	9.023263e+07	-34.307538	18.428207	1	[-4.039999961853027, -3.549999952316284, -3.22...	-4.04	-3.55	-3.22	-2.95	-2.77	...	1.53	1.61	1.72	1.79	1.91	2.02	2.17	2.36	2.62	3.22

2 rows × 106 columns

gedi_gdf = gpd.GeoDataFrame(subset_df, 
                                geometry=gpd.points_from_xy(subset_df.lon_lowestmode, 
                                                            subset_df.lat_lowestmode),
                                crs="EPSG:4326")
gedi_sub = gpd.sjoin(gedi_gdf, poly_wgs_gdf, predicate='within')
gedi_sub.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

plotname = gedi_sub.Name.unique()
descrip = gedi_sub.description.unique()

fig, ax = plt.subplots(figsize=(7, 10))

for j, ditem in enumerate(descrip):
    desc_df = gedi_sub[gedi_sub.description == ditem]
    for i, item in enumerate(plotname):
        plot_df = desc_df[desc_df.Name == item]
        plot_df = plot_df.filter(like='RH').T
        plot_df.index = plot_df.index.str.strip('RH').astype(int)
        style = 's:'
        if ditem == "unburned":
            style = 'o-'
        plot_df.median(axis=1).plot(ax=ax, alpha=0.5,
                                    style=style, label=f"{ditem} site: {item}")
        plt.legend()
ax.set_xlabel("Percentile of waveform energy (%)")
ax.set_ylabel("Height (m)")
plt.show()

../../_images/e681a646d1bbb779863381fb9104b308b78d960c20f678d556556a66b2623402.png

Exploring and Visualizing BioSCape LVIS Data

Contents

7. Exploring and Visualizing BioSCape LVIS Data#

7.1. BioSCape Data Skills Workshop: From the Field to the Image#

7.2. Overview#

7.3. Structure LVIS L1B Geolocated Waveform#

7.4. Structure of LVIS L2 Canopy Height Product#

7.5. Search LVIS L2A granules over Bioscape Campaign#

7.5.1. Plot LVIS Bioscape Campaigns#

7.5.2. Burned and Unburned Areas#

7.5.3. Search LVIS L2A granules over the burned and unburned sites#

7.5.4. Build the granule names for BioSCape SMCE S3 access#

7.5.5. Read LVIS L2A Files#

7.5.6. Height of Top Canopy#

7.5.7. Relative Height Distribution of Burned vs Unburned sites#

7.6. GEDI L2A Canopy Height Metrics#

7.6.1. Dataset#

7.6.2. Authentication#

7.6.3. Create Harmony Client Object#

7.6.4. Retrieve Concept ID#

7.6.5. Define Request Parameters#

7.6.6. Create and Submit Harmony Request#

7.6.7. Read Subset files#

7.6.8. RH metrics#