2. BioSCape SMCE Basics#

2.1. BioSCape Data Skills Workshop: From the Field to the Image#

Bioscape

BioSCape, the Biodiversity Survey of the Cape, is NASA’s first biodiversity-focused airborne and field campaign that was conducted in South Africa in 2023. BioSCape’s primary objective is to study the structure, function, and composition of the region’s ecosystems, and how and why they are changing.

BioSCape’s airborne dataset is unprecedented, with AVIRIS-NG, PRISM, and HyTES imaging spectrometers capturing spectral data across the UV, visible and infrared at high resolution and LVIS acquiring coincident full-waveform lidar. BioSCape’s field dataset is equally impressive, with 18 PI-led projects collecting data ranging from the diversity and phylogeny of plants, kelp and phytoplankton, eDNA, landscape acoustics, plant traits, blue carbon accounting, and more

This workshop will equip participants with the skills to find, subset, and visualize the various BioSCape field and airborne (imaging spectroscopy and full-waveform lidar) data sets. Participants will learn data skills through worked examples in terrestrial and aquatic ecosystems, including: wrangling lidar data, performing band math calculations, calculating spectral diversity metrics, machine learning and image classification, and mapping functional traits using partial least squares regression. The workshop format is a mix of expert talks and interactive coding notebooks and will be run through the BioSCape Cloud computing environment.

Date: October 9 - 11, 2024 Cape Town, South Africa

Host: NASA’s Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), in close collaboration with BioSCape, the South African Environmental Observation Network (SAEON), the University of Wisconsin Madison (Phil Townsend), The Nature Conservancy (Glenn Moncrieff), the University of California Merced (Erin Hestir), the University of Cape Town (Jasper Slingsby), Jet Propulsion Laboratory (Kerry Cawse-Nicholson), and UNESCO.

Instructors:

  • In-person contributors: Anabelle Cardoso, Erin Hestir, Phil Townsend, Henry Frye, Glenn Moncrieff, Jasper Slingsby, Michele Thornton, Rupesh Shrestha

  • Virtual contributors: Kerry Cawse-Nicholson, Nico Stork, Kyle Kovach

Audience: This training is primarily intended for government natural resource management agency representatives and field technicians in South Africa, as well as local academics and students, especially those connected to the BioSCape Team.

2.2. Overview#

This tutorial will explore the BioSCape Science Managed Cloud Environment (SMCE) including how to access and explore amazon Simple Storage Service (S3) using Python open source tools.

2.2.1. Load Python Modules#

import s3fs
s3 = s3fs.S3FileSystem(anon=False)

2.3. Explore the BioSCape SMCE S3 data holdings#

Let’s start by exploring the BioSCape Airborne data currently available on the cloud in Amazon Storage. This AWS S3 storage is specific to the SMCE that provides data access and analytics environment to the BioSCape Science Team as well as interested researchers.
We’ll learn how to mount the S3 object storage on our local environment, as well as how to bring other data to the analysis.

  • SMCE = Science Managed Cloud Environment

  • S3 = Amazon Simple Storage Service (S3) is a cloud storage service that allows users to store and retrieve data

  • S3 Bucket = Buckets are the basic containers that hold data. Buckets can be likened to file folders and object storage

  • S3Fs is a Pythonic open source tool that mounts S3 object storage locally. S3Fs provides a filesystem-like interface for accessing objects on S3.

    • The top-level class S3FileSystem holds connection information and allows typical file-system style operations like ls, cp, mv

    • ls is a UNIX command to list computer files and directories

# Use S3Fs to list the BioSCape data on the BioSCape SMCE S3 storaage
files = s3.ls('bioscape-data/')
files
['bioscape-data/AVNG',
 'bioscape-data/BioSCapeVegPolys2023_10_18',
 'bioscape-data/BioSCapeVegPolys2023_10_18.geoparquet',
 'bioscape-data/LVIS',
 'bioscape-data/PRISM',
 'bioscape-data/bioscape_avng.yaml']

2.4. BioSCape Flight Data#

The BioSCape Campaign (Oct - Nov, 2023) flew 4 airborne instruments on 2 aircraft.

  • Gulfstream 3: AVIRIS-Next Generation and PRISM

  • Gulfstream 5: HyTES and LVIS

The NASA Jet Propulsion Laboratory provides the BioSCape Data Portal (https://popo.jpl.nasa.gov/mmgis-aviris/?mission=BIOSCAPE) which shows flight line visualization, quick look images, and BioSCape Team Regions of Interest (ROIs). Download access to preliminary airborne data is available.

2.4.1. Let’s look deeper into each airborne folder on the SMCE#

2.4.1.1. AVIRIS-Next Generation (AVNG)#

  • We’ll spend a little more time looking into the AVIRIS-NG files as these data are a focus of many Notebooks

AVNG_flightlines = s3.ls('bioscape-data/AVNG/')
AVNG_flightlines[:10]
['bioscape-data/AVNG/',
 'bioscape-data/AVNG/ang20231022t092801',
 'bioscape-data/AVNG/ang20231022t094938',
 'bioscape-data/AVNG/ang20231022t101052',
 'bioscape-data/AVNG/ang20231022t103357',
 'bioscape-data/AVNG/ang20231022t105533',
 'bioscape-data/AVNG/ang20231022t111800',
 'bioscape-data/AVNG/ang20231022t113923',
 'bioscape-data/AVNG/ang20231022t120313',
 'bioscape-data/AVNG/ang20231022t122317']
AVNGfl_count = len(AVNG_flightlines)
AVNGfl_count
394
# looking into the ang20231022t092801 folder
AVNG_scenes = s3.ls('bioscape-data/AVNG/ang20231022t092801')
AVNG_scenes
['bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_001',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_002',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_003',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_004',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_005',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_006',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_007',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_008',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_009',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_010',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_011',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_012',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_013',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_014',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_015',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_016',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_017',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_018',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_019',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_020',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_021',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_022',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_023',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_024',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_025',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_026',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_027',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_028',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_029',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_030',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_031',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_032',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_033',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_034',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_035',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_036',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_037',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_038',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_039',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_040',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_041',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_042',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_043',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_044',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_045',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_046',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_047',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_048']
# Explore the AVNG folder that holds an AVNG scene's data
AVNG_scene_data = s3.ls('bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000')
AVNG_scene_data
['bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_RFL_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_RFL_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_RFL_ORT.json',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_RFL_ORT_QL.tif',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_UNC_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_UNC_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_RFL_UNC_COMBINED_ORT.json']

2.4.1.2. File Naming Conventions provide information about each flight line, scene, and data type.#

  • L1B are AVIRIS-NG Surface Radiance

  • L2A are AVIRIS-NG Surface Reflectance

Dataset

Description

*_RFL_ORT_QL.tif

Reflectance Quick Look Image (3 band)

*-RFL_ORT

Reflectance ENVI binary file (425 band)

*_RFL_ORT.hdr

Reflectance ENVI header file (txt file)

ANG_imaging

2.4.1.3. Open a file from an S3 Bucket - S3Fs#

  • Calling open() on a S3FileSystem (typically using a context manager) provides an S3File for read or write access to a particular key.

  • open can be used with other projects that consume the file interface like gzip or pandas.

# Print the first 12 lines of and ENVI header file
hdr_link = 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L2A_OE_main_27577724_RFL_ORT.hdr'
with s3.open(hdr_link, mode='r') as hdr:
    for i in range(12):
        line = next(hdr).strip()
        print(line) 
ENVI
description = {
L2A Analytyical per-pixel surface retrieval}
samples = 719
lines = 615
bands = 425
header offset = 0
file type = ENVI Standard
data type = 4
interleave = bil
byte order = 0
map info = {UTM, 1, 1, 290290.1514036929, 6352647.360537699, 6.3, 6.3, 34, South, WGS-84, units=Meters, rotation=0.0}

2.4.1.4. Portable Remote Imaging Spectrometer (PRISM)#

PRISM_flightlines = s3.ls('bioscape-data/PRISM')
PRISM_flightlines
['bioscape-data/PRISM/L2']

2.4.1.5. There are Level 2 (L2) for PRISM data in the SCME#

  • ENVI file format in binary/header pairs

PRISM_flightlines = s3.ls('bioscape-data/PRISM/L2')
PRISM_flightlines[:10]
['bioscape-data/PRISM/L2/prm20231022t141344_rfl_ort',
 'bioscape-data/PRISM/L2/prm20231022t141344_rfl_ort.hdr',
 'bioscape-data/PRISM/L2/prm20231025t060817_rfl_ort',
 'bioscape-data/PRISM/L2/prm20231025t060817_rfl_ort.hdr',
 'bioscape-data/PRISM/L2/prm20231025t062740_rfl_ort',
 'bioscape-data/PRISM/L2/prm20231025t062740_rfl_ort.hdr',
 'bioscape-data/PRISM/L2/prm20231025t063541_rfl_ort',
 'bioscape-data/PRISM/L2/prm20231025t063541_rfl_ort.hdr',
 'bioscape-data/PRISM/L2/prm20231025t064655_rfl_ort',
 'bioscape-data/PRISM/L2/prm20231025t064655_rfl_ort.hdr']
  • the PRISM are available in ENVI file formats as binary/header pairs

2.4.1.6. Land, Vegetation, and Ice Sensor (LVIS)#

LVIS_flightlines = s3.ls('bioscape-data/LVIS/')
LVIS_flightlines
['bioscape-data/LVIS/L1B', 'bioscape-data/LVIS/L2']

2.4.1.7. Let’s look into the LVIS Level 2 (L2) data#

LVIS_flightlines_L2 = s3.ls('bioscape-data/LVIS/L2')
LVIS_flightlines_L2[:10]
['bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_027373.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_027526.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_027815.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_027902.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_027990.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_028077.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_028551.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_028761.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_028852.TXT',
 'bioscape-data/LVIS/L2/LVISF2_BioSCape2023_1020_R2404_028939.TXT']
  • LVIS L2 data are ASCII Text files

2.4.1.8. Hyperspectral Thermal Emissions Spectrometer (HyTES)#

  • BioSCape HyTES data is currently available from NASA JPL HyTES distribution site