Exploring EMIT data and inspecting spectra for alien species

4. Exploring EMIT data and inspecting spectra for alien species#

4.1. Overview#

In this notebook, we will start analysing satellite imaging spectroscopy data. We will open data from The Earth Surface Mineral Dust Source Investigation (EMIT) stored remotely on AWS and explore it. We will then extract some spectra for invasive plant species and visualize these.

4.2. Learning Objectives#

Understand how to access data direclty from AWS s3 storage
Load, mask, orthorectify, and visualize EMIT data
Extract spectra at locations with and without aline plant and inspect these

4.2.1. Load libraries#

#imports
import xarray as xr
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import pandas as pd
import numpy as np
import math
import geopandas as gpd
import panel as pn
import xvec
#our modules
import emit_tools #from https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py

# Function to adjust gamma across all bands - adjust brightness
def gamma_adjust(rgb_ds, bright=0.2, white_background=False):
    array = rgb_ds.reflectance.data
    gamma = math.log(bright)/math.log(np.nanmean(array)) # Create exponent for gamma scaling - can be adjusted by changing 0.2 
    scaled = np.power(array,gamma).clip(0,1) # Apply scaling and clip to 0-1 range
    if white_background == True:
        scaled = np.nan_to_num(scaled, nan = 1) # Assign NA's to 1 so they appear white in plots
    rgb_ds.reflectance.data = scaled
    return rgb_ds

4.2.2. Authenticate with NASA Earthdata portal#

Earthaccess is a python library to search, download or stream NASA Earth science data. You will also need an account on NASA’s Earthdata data portal https://search.earthdata.nasa.gov/

import earthaccess
auth = earthaccess.login(persist=True)

4.2.3. Accessing data on AWS S3#

The emit files are stored on Amazon S3 object storage. We will access these directly without needing to download them as we can stream s3 files. Also, because we are using xarray with dask and our data is stored in a chunked file format netcdf, we only need to stream the chunks we are currently working on and not the entire file at once.

The best way to access the data is on a virtual machine located in us-west-2. This is where the data is located and this mean that will get the fastest reading and writing of data possible, and all read/writes will be free. If your machine is not in us-west-2 or you are on a local machine, I do not recommend using S3, as it might be slower and there will be costs involved.

4.2.4. Finding data#

earthaccess has really convineint methods for searching and loading nasa data over S3 using earthaccess.search_data(). Here is an example:

results = earthaccess.search_data(
    short_name='EMITL2ARFL',
    point=(19.03711,-33.94747),
    temporal=('2023-01-18','2023-01-24'),
    count=2
)

you can then load the results into an xarray

files = earthaccess.open(results)
ds = xr.open_dataset(files.f)

if you do not want to stream the data from S3 you can download the data directly using:

files = earthaccess.download(results, "./local_folder")

for more info on using earthaccess see the docs

4.2.5. Open the datset#

If you are in us-west-2: Authenticate with s3 and stream data

results = earthaccess.search_data(
    short_name='EMITL2ARFL',
    point=(19.03711,-33.94747),
    temporal=('2023-01-18','2023-01-24'),
    count=1
)
file_handlers = earthaccess.open(results)
file_handlers

Granules found: 2
Opening 1 granules, approx size: 3.5 GB
using endpoint: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials

[<File-like object S3FileSystem, lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230119T114247_2301907_005/EMIT_L2A_RFL_001_20230119T114247_2301907_005.nc>,
 <File-like object S3FileSystem, lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230119T114247_2301907_005/EMIT_L2A_RFLUNCERT_001_20230119T114247_2301907_005.nc>,
 <File-like object S3FileSystem, lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230119T114247_2301907_005/EMIT_L2A_MASK_001_20230119T114247_2301907_005.nc>]

Inspect the raw data

xr.open_dataset(file_handlers[0].f,engine='h5netcdf',chunks='auto')

<xarray.Dataset> Size: 2GB
Dimensions:      (downtrack: 1280, crosstrack: 1242, bands: 285)
Dimensions without coordinates: downtrack, crosstrack, bands
Data variables:
    reflectance  (downtrack, crosstrack, bands) float32 2GB dask.array<chunksize=(538, 522, 119), meta=np.ndarray>
Attributes: (12/38)
    ncei_template_version:             NCEI_NetCDF_Swath_Template_v2.0
    summary:                           The Earth Surface Mineral Dust Source ...
    keywords:                          Imaging Spectroscopy, minerals, EMIT, ...
    Conventions:                       CF-1.63
    sensor:                            EMIT (Earth Surface Mineral Dust Sourc...
    instrument:                        EMIT
    ...                                ...
    southernmost_latitude:             -34.15500797254596
    spatialResolution:                 0.000542232520256367
    spatial_ref:                       GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
    geotransform:                      [ 1.84708746e+01  5.42232520e-04 -0.00...
    day_night_flag:                    Day
    title:                             EMIT L2A Estimated Surface Reflectance...

xarray.Dataset

Dimensions:
- downtrack: 1280
- crosstrack: 1242
- bands: 285
Coordinates: (0)

Data variables: (1)

reflectance

(downtrack, crosstrack, bands)

float32

dask.array<chunksize=(538, 522, 119), meta=np.ndarray>

long_name :: Surface Reflectance
units :: unitless

	Array	Chunk
Bytes	1.69 GiB	127.49 MiB
Shape	(1280, 1242, 285)	(538, 522, 119)
Dask graph	27 chunks in 2 graph layers
Data type	float32 numpy.ndarray

Indexes: (0)
Attributes: (38)
ncei_template_version :
NCEI_NetCDF_Swath_Template_v2.0
summary :
The Earth Surface Mineral Dust Source Investigation (EMIT) is an Earth Ventures-Instrument (EVI-4) Mission that maps the surface mineralogy of arid dust source regions via imaging spectroscopy in the visible and short-wave infrared (VSWIR). Installed on the International Space Station (ISS), the EMIT instrument is a Dyson imaging spectrometer that uses contiguous spectroscopic measurements from 410 to 2450 nm to resolve absoprtion features of iron oxides, clays, sulfates, carbonates, and other dust-forming minerals. During its one-year mission, EMIT will observe the sunlit Earth's dust source regions that occur within +/-52° latitude and produce maps of the source regions that can be used to improve forecasts of the role of mineral dust in the radiative forcing (warming or cooling) of the atmosphere.\n\nThis file contains L2A estimated surface reflectances and geolocation data. Reflectance estimates are created using an Optimal Estimation technique - see ATBD for details. Reflectance values are reported as fractions (relative to 1). Geolocation data (latitude, longitude, height) and a lookup table to project the data are also included.
keywords :
Imaging Spectroscopy, minerals, EMIT, dust, radiative forcing
Conventions :
CF-1.63
sensor :
EMIT (Earth Surface Mineral Dust Source Investigation)
instrument :
EMIT
platform :
ISS
institution :
NASA Jet Propulsion Laboratory/California Institute of Technology
license :
https://science.nasa.gov/earth-science/earth-science-data/data-information-policy/
naming_authority :
LPDAAC
date_created :
2023-02-28T02:12:24Z
keywords_vocabulary :
NASA Global Change Master Directory (GCMD) Science Keywords
stdname_vocabulary :
NetCDF Climate and Forecast (CF) Metadata Convention
creator_name :
Jet Propulsion Laboratory/California Institute of Technology
creator_url :
https://earth.jpl.nasa.gov/emit/
project :
Earth Surface Mineral Dust Source Investigation
project_url :
https://emit.jpl.nasa.gov/
publisher_name :
NASA LPDAAC
publisher_url :
https://lpdaac.usgs.gov
publisher_email :
lpdaac@usgs.gov
identifier_product_doi_authority :
https://doi.org
flight_line :
emit20230119t114247_o01907_s001
time_coverage_start :
2023-01-19T11:42:47+0000
time_coverage_end :
2023-01-19T11:42:59+0000
software_build_version :
010608
software_delivery_version :
010610
product_version :
V001
history :
PGE Run Command: {python /beegfs/store/emit/ops/repos/emit-sds-l2a/spectrum_quality.py /tmp/emit/ops/emit20230119t114247_emit.L2AReflectance_20230120t160028/output/emit20230119t114247_rfl /tmp/emit/ops/emit20230119t114247_emit.L2AReflectance_20230120t160028/output/emit20230119t114247_rfl_quality.txt}, PGE Input Files: {radiance_file=/beegfs/store/emit/ops/data/acquisitions/20230119/emit20230119t114247/l1b/emit20230119t114247_o01907_s001_l1b_rdn_b0106_v01.img, pixel_locations_file=/beegfs/store/emit/ops/data/acquisitions/20230119/emit20230119t114247/l1b/emit20230119t114247_o01907_s001_l1b_loc_b0106_v01.img, observation_parameters_file=/beegfs/store/emit/ops/data/acquisitions/20230119/emit20230119t114247/l1b/emit20230119t114247_o01907_s001_l1b_obs_b0106_v01.img, surface_model_config=/beegfs/store/emit/ops/repos/emit-sds-l2a/surface/surface_20221020.json}
crosstrack_orientation :
as seen on ground
easternmost_longitude :
19.742952092533837
northernmost_latitude :
-33.1242239515386
westernmost_longitude :
18.4708746000124
southernmost_latitude :
-34.15500797254596
spatialResolution :
0.000542232520256367
spatial_ref :
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
geotransform :
[ 1.84708746e+01 5.42232520e-04 -0.00000000e+00 -3.31242240e+01 -0.00000000e+00 -5.42232520e-04]
day_night_flag :
Day
title :
EMIT L2A Estimated Surface Reflectance 60 m V001

If you are not in us-west-2: Download locally

#get the files we want
results = earthaccess.search_data(
    short_name='EMITL2ARFL',
    point=(19.03711,-33.94747),
    temporal=('2023-01-18','2023-01-24'),
    count=1
)
#download them
files = earthaccess.download(results, "data/downloads")

Granules found: 2
 Getting 1 granules, approx download size: 3.5 GB
Accessing cloud dataset using dataset endpoint credentials: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials
Downloaded: data/downloads/EMIT_L2A_RFL_001_20230119T114247_2301907_005.nc
Downloaded: data/downloads/EMIT_L2A_RFLUNCERT_001_20230119T114247_2301907_005.nc
Downloaded: data/downloads/EMIT_L2A_MASK_001_20230119T114247_2301907_005.nc

# Open dataset with xarray
fp = 'data/downloads/EMIT_L2A_RFL_001_20230119T114247_2301907_005.nc'
fp_mask = 'data/downloads/EMIT_L2A_MASK_001_20230119T114247_2301907_005.nc'

ds = xr.open_dataset(fp,engine='h5netcdf',chunks='auto')
ds

<xarray.Dataset> Size: 2GB
Dimensions:      (downtrack: 1280, crosstrack: 1242, bands: 285)
Dimensions without coordinates: downtrack, crosstrack, bands
Data variables:
    reflectance  (downtrack, crosstrack, bands) float32 2GB dask.array<chunksize=(538, 522, 119), meta=np.ndarray>
Attributes: (12/38)
    ncei_template_version:             NCEI_NetCDF_Swath_Template_v2.0
    summary:                           The Earth Surface Mineral Dust Source ...
    keywords:                          Imaging Spectroscopy, minerals, EMIT, ...
    Conventions:                       CF-1.63
    sensor:                            EMIT (Earth Surface Mineral Dust Sourc...
    instrument:                        EMIT
    ...                                ...
    southernmost_latitude:             -34.15500797254596
    spatialResolution:                 0.000542232520256367
    spatial_ref:                       GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
    geotransform:                      [ 1.84708746e+01  5.42232520e-04 -0.00...
    day_night_flag:                    Day
    title:                             EMIT L2A Estimated Surface Reflectance...

xarray.Dataset

Dimensions:
- downtrack: 1280
- crosstrack: 1242
- bands: 285
Coordinates: (0)

Data variables: (1)

reflectance

(downtrack, crosstrack, bands)

float32

dask.array<chunksize=(538, 522, 119), meta=np.ndarray>

long_name :: Surface Reflectance
units :: unitless

	Array	Chunk
Bytes	1.69 GiB	127.49 MiB
Shape	(1280, 1242, 285)	(538, 522, 119)
Dask graph	27 chunks in 2 graph layers
Data type	float32 numpy.ndarray

Indexes: (0)
Attributes: (38)
ncei_template_version :
NCEI_NetCDF_Swath_Template_v2.0
summary :
The Earth Surface Mineral Dust Source Investigation (EMIT) is an Earth Ventures-Instrument (EVI-4) Mission that maps the surface mineralogy of arid dust source regions via imaging spectroscopy in the visible and short-wave infrared (VSWIR). Installed on the International Space Station (ISS), the EMIT instrument is a Dyson imaging spectrometer that uses contiguous spectroscopic measurements from 410 to 2450 nm to resolve absoprtion features of iron oxides, clays, sulfates, carbonates, and other dust-forming minerals. During its one-year mission, EMIT will observe the sunlit Earth's dust source regions that occur within +/-52° latitude and produce maps of the source regions that can be used to improve forecasts of the role of mineral dust in the radiative forcing (warming or cooling) of the atmosphere.\n\nThis file contains L2A estimated surface reflectances and geolocation data. Reflectance estimates are created using an Optimal Estimation technique - see ATBD for details. Reflectance values are reported as fractions (relative to 1). Geolocation data (latitude, longitude, height) and a lookup table to project the data are also included.
keywords :
Imaging Spectroscopy, minerals, EMIT, dust, radiative forcing
Conventions :
CF-1.63
sensor :
EMIT (Earth Surface Mineral Dust Source Investigation)
instrument :
EMIT
platform :
ISS
institution :
NASA Jet Propulsion Laboratory/California Institute of Technology
license :
https://science.nasa.gov/earth-science/earth-science-data/data-information-policy/
naming_authority :
LPDAAC
date_created :
2023-02-28T02:12:24Z
keywords_vocabulary :
NASA Global Change Master Directory (GCMD) Science Keywords
stdname_vocabulary :
NetCDF Climate and Forecast (CF) Metadata Convention
creator_name :
Jet Propulsion Laboratory/California Institute of Technology
creator_url :
https://earth.jpl.nasa.gov/emit/
project :
Earth Surface Mineral Dust Source Investigation
project_url :
https://emit.jpl.nasa.gov/
publisher_name :
NASA LPDAAC
publisher_url :
https://lpdaac.usgs.gov
publisher_email :
lpdaac@usgs.gov
identifier_product_doi_authority :
https://doi.org
flight_line :
emit20230119t114247_o01907_s001
time_coverage_start :
2023-01-19T11:42:47+0000
time_coverage_end :
2023-01-19T11:42:59+0000
software_build_version :
010608
software_delivery_version :
010610
product_version :
V001
history :
PGE Run Command: {python /beegfs/store/emit/ops/repos/emit-sds-l2a/spectrum_quality.py /tmp/emit/ops/emit20230119t114247_emit.L2AReflectance_20230120t160028/output/emit20230119t114247_rfl /tmp/emit/ops/emit20230119t114247_emit.L2AReflectance_20230120t160028/output/emit20230119t114247_rfl_quality.txt}, PGE Input Files: {radiance_file=/beegfs/store/emit/ops/data/acquisitions/20230119/emit20230119t114247/l1b/emit20230119t114247_o01907_s001_l1b_rdn_b0106_v01.img, pixel_locations_file=/beegfs/store/emit/ops/data/acquisitions/20230119/emit20230119t114247/l1b/emit20230119t114247_o01907_s001_l1b_loc_b0106_v01.img, observation_parameters_file=/beegfs/store/emit/ops/data/acquisitions/20230119/emit20230119t114247/l1b/emit20230119t114247_o01907_s001_l1b_obs_b0106_v01.img, surface_model_config=/beegfs/store/emit/ops/repos/emit-sds-l2a/surface/surface_20221020.json}
crosstrack_orientation :
as seen on ground
easternmost_longitude :
19.742952092533837
northernmost_latitude :
-33.1242239515386
westernmost_longitude :
18.4708746000124
southernmost_latitude :
-34.15500797254596
spatialResolution :
0.000542232520256367
spatial_ref :
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
geotransform :
[ 1.84708746e+01 5.42232520e-04 -0.00000000e+00 -3.31242240e+01 -0.00000000e+00 -5.42232520e-04]
day_night_flag :
Day
title :
EMIT L2A Estimated Surface Reflectance 60 m V001

4.2.6. Load the mask#

The EMIT data is composed of 3 files, the actual data, a mask that tells us about various quality issues with the data, and the reflectance uncertainties (we will ignore the uncertainty for now). We can selecting which quality filter (mask) to use. Here we used the aggregate filter - the sum of all filters to be very strict about what data we include. The quality_mask function provided by the EMIT team reads and processes this file for us.

flags=[7]
mask = emit_tools.quality_mask(file_handlers[2].f,flags)

Flags used: ['Aggregate Flag']

4.2.8. Quick plot#

#select rgb
rgb = ds.sel(wavelengths=[650, 560, 470], method='nearest')
#make colors nicer
rgb = gamma_adjust(rgb)

plot natural colors

rgb.hvplot.rgb(x='longitude', y='latitude', bands='wavelengths', frame_width=400,rasterize=True, data_aspect=1)

plot a single wavelength

ds.sel(wavelengths=850, method='nearest').hvplot(cmap='Magma', geo=True, 
                                                 tiles='ESRI', rasterize=True, alpha=0.8, 
                                                 frame_height=600,clim =(0,0.7)
                                                ).opts(title=f"Reflectance at 850nm (Orthorectified)")

We can make nice interactive plots in python. Here we can use a slider to select the wavelength we want to view

def plot_reflectance(wavelength):
    return ds.sel(wavelengths=wavelength, method='nearest').hvplot(
        cmap='Magma', geo=True, tiles='ESRI', rasterize=True, alpha=0.8,
        frame_height=600, clim=(0, 0.7)
    ).opts(title=f"Reflectance at {wavelength}nm (Orthorectified)")

wavelength_slider = pn.widgets.IntSlider(name='Wavelength (nm)', start=int(min(ds.wavelengths.values)), end=int(max(ds.wavelengths.values)), step=5, value=850)

interactive_plot = pn.bind(plot_reflectance, wavelength=wavelength_slider)

pn.Column(wavelength_slider, interactive_plot)

4.2.9. Extract points#

We will load a small dataset of points for Fynbos and Pine covered sites across the EMIT image we are working with.

#load points
gdf = gpd.read_file("/shared/users/gmoncrieff/emit_unmix.gpkg")
#explore the data
gdf[['Category','geometry']].explore('Category')

Make this Notebook Trusted to load map: File -> Trust Notebook

The xvec package will extract data at a point or area within an xarray and return the data as an xarray

extracted = ds.xvec.extract_points(gdf['geometry'], x_coords="longitude", y_coords="latitude",index=True)
extracted

<xarray.Dataset> Size: 113kB
Dimensions:           (geometry: 111, wavelengths: 244)
Coordinates:
  * wavelengths       (wavelengths) float32 976B 381.0 388.4 ... 2.493e+03
    fwhm              (wavelengths) float32 976B 8.415 8.415 ... 8.807 8.809
    good_wavelengths  (wavelengths) float32 976B 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0
    elev              (geometry) float32 444B dask.array<chunksize=(111,), meta=np.ndarray>
    spatial_ref       int64 8B 0
  * geometry          (geometry) object 888B POINT (19.2194 -33.9549) ... POI...
    index             (geometry) int64 888B 0 1 2 3 4 5 ... 106 107 108 109 110
Data variables:
    reflectance       (geometry, wavelengths) float32 108kB dask.array<chunksize=(111, 10), meta=np.ndarray>
Indexes:
    geometry  GeometryIndex (crs=EPSG:4326)
Attributes: (12/40)
    ncei_template_version:             NCEI_NetCDF_Swath_Template_v2.0
    summary:                           The Earth Surface Mineral Dust Source ...
    keywords:                          Imaging Spectroscopy, minerals, EMIT, ...
    Conventions:                       CF-1.63
    sensor:                            EMIT (Earth Surface Mineral Dust Sourc...
    instrument:                        EMIT
    ...                                ...
    spatial_ref:                       GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
    geotransform:                      [ 1.84708746e+01  5.42232520e-04 -0.00...
    day_night_flag:                    Day
    title:                             EMIT L2A Estimated Surface Reflectance...
    granule_id:                        EMIT_L2A_RFL_001_20230119T114247_23019...
    Orthorectified:                    True

Lets turn our xarray into a dataframe, and join it with the original dataset

extracted = (extracted
             .to_dataframe() #turn into a dataframe
             .reset_index() #convert the row names to a column
             .merge( #merge with the original dataset
                 gdf.reset_index(),on='index')
            )
#select the columns we want
extracted = extracted[['wavelengths','reflectance','geometry_x','Category','index']]
extracted

	wavelengths	reflectance	geometry_x	Category	index
0	381.005585	0.017709	POINT (19.2194 -33.9549)	PINE	0
1	388.409210	0.018134	POINT (19.2194 -33.9549)	PINE	0
2	395.815826	0.018555	POINT (19.2194 -33.9549)	PINE	0
3	403.225403	0.018993	POINT (19.2194 -33.9549)	PINE	0
4	410.638000	0.019613	POINT (19.2194 -33.9549)	PINE	0
...	...	...	...	...	...
27079	2463.381592	0.044974	POINT (19.38343349154113 -33.445019637266505)	FYNBOS	110
27080	2470.767822	0.045954	POINT (19.38343349154113 -33.445019637266505)	FYNBOS	110
27081	2478.153076	0.044455	POINT (19.38343349154113 -33.445019637266505)	FYNBOS	110
27082	2485.538574	0.042368	POINT (19.38343349154113 -33.445019637266505)	FYNBOS	110
27083	2492.923828	0.041585	POINT (19.38343349154113 -33.445019637266505)	FYNBOS	110

27084 rows × 5 columns

Finally lets select a class and plot the spectra for all points of that class

extracted_plot = extracted.query('Category == "FYNBOS"')
extracted_plot.hvplot.line(y='reflectance',x='wavelengths',by='index',
                                    color='green',ylim=(0,0.5),alpha=0.5,legend=False)

4.2.10. credits#

This lesson has borrowed from:

the EMIT-Data-Resources repository by LPDAAC/ the EMIT team