Access CMS LiDAR data#

Learning Objectives

  • Learn how to search files within a dataset programatically in Python

  • Learn how to restrict search results by space (study area) and time (project period).

  • Download data files of interest.

  • Work with the data files.

In this tutorial, we will use earthaccess Python module to search for CMS datasets.

import earthaccess
import numpy as np
import pandas as pd
import geopandas as gpd
import contextily as cx
import laspy
from shapely.geometry import box
from shapely.ops import orient

LiDAR Surveys over Selected Forest Research Sites, Brazilian Amazon, 2008-2018#

This dataset [dos-Santos et al., 2019] provides the complete catalog of point cloud data collected during LiDAR surveys over selected forest research sites across the Amazon rainforest in Brazil between 2008 and 2018 for the Sustainable Landscapes Brazil Project. Flight lines were selected to overfly key field research sites in the Brazilian states of Acre, Amazonas, Bahia, Goias, Mato Grosso, Para, Rondonia, Santa Catarina, and Sao Paulo. The point clouds have been georeferenced, noise-filtered, and corrected for misalignment of overlapping flight lines. They are provided in 1 km x 1 km tiles. There are a total of 3,152 tiles in compressed LAS (*.laz) file format.

[1]

M.N. dos-Santos, M.M. Keller, and D.C. Morton. LiDAR Surveys over Selected Forest Research Sites, Brazilian Amazon, 2008-2018. 2019. doi:10.3334/ORNLDAAC/1644.

We will use earthaccess module to search and download lidar data.

Authentication#

We recommend authenticating your Earthdata Login (EDL) information using the earthaccess python library as follows:

# works if the EDL login already been persisted to a netrc
auth = earthaccess.login(strategy="netrc") 
if not auth.authenticated:
    # ask for EDL credentials and persist them in a .netrc file
    auth = earthaccess.login(strategy="interactive", persist=True)

Search data granules#

We will use earthaccess’s search_data API to find the granules within the data collection.

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    granule_name = '*.laz', # only find laz files
    doi="10.3334/ORNLDAAC/1644" # data collection doi
)
print(f"Total granules found: {len(granules)}")
Total granules found: 3152

Let’s print the details of the first of the files.

granules[0]

Data: TAP_A04_2008_laz_0.laz

Size: 90.2 MB

Cloud Hosted: True

As we see above, the file is hosted in the NASA Earthdata Cloud.

The granules object contains metadata about the granules, including the bounding boxes, publication dates, data providers, etc. Now, let’s convert the above granule metadata from json-formatted to geopandas dataframe. Converting to geopandas dataframe will let us easily generate various plots about the granules.

def convert_umm_bbox(b):
    """converts UMM bounding rectangles"""
    try:
        return box(b[0]['WestBoundingCoordinate'], 
              b[0]['SouthBoundingCoordinate'],
              b[0]['EastBoundingCoordinate'], 
              b[0]['NorthBoundingCoordinate'])
    except TypeError as e:
        return None

def convert_list_gdf(datag):
    """converts List[] to geopandas dataframe"""
    # create pandas dataframe from json
    df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
    # keep only last string of the column names
    df.columns=df.columns.str.split('.').str[-1]
    df['geometry'] = df['BoundingRectangles'].apply(convert_umm_bbox)
    return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")
    
gdf = convert_list_gdf(granules)

Plot granules bounding boxes#

Let’s now plot the bounding boxes for each granules. Here we only limit three columns for display purpose: GranuleUR, size, geometry.

gdf[['GranuleUR', 'size', 'geometry']].explore(color='red',  fillcolor='red')
Make this Notebook Trusted to load map: File -> Trust Notebook

Region of Interest (ROI)#

We will use a boundary of the Reserva Florestal Adolpho Ducke as the region of interest. Reserva Adolpho Ducke is a 10,000-hectare (25,000-acre) protected area of the Amazon rainforest on the outskirts of the city of Manaus, Brazil. It is a part of long term ecological research network and is one of the most intensively studied rainforest in the world.

Let’s plot the study area over a base map.

poly = gpd.read_file("assets/reserva_ducke.json")
poly.explore(color='red',  fill=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

Search granules over ROI#

We will use earthaccess python module to search for the granules overlapping the above ROI.

# bounding lon, lat as a list of tuples
poly.geometry = poly.geometry.apply(orient, args=(1,))
# simplifying the polygon to bypass the coordinates 
# limit of the CMR with a tolerance of .005 degrees
xy = poly.geometry.simplify(0.005).get_coordinates()

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    granule_name = '*.laz', # only find laz files
    doi="10.3334/ORNLDAAC/1644", # dataset doi
    polygon=list(zip(xy.x, xy.y))
)
print(f"Total granules found: {len(granules)}")
Total granules found: 34
gdf = convert_list_gdf(granules)
gdf[['GranuleUR', 'size', 'geometry']].explore(color='red',  fillcolor='red')
Make this Notebook Trusted to load map: File -> Trust Notebook

Download Datafiles#

downloaded_files = earthaccess.download(granules, local_path="downloads")
downloaded_files
['downloads/DUC_A01_2008_laz_13.laz',
 'downloads/DUC_A01_2008_laz_1.laz',
 'downloads/DUC_A01_2008_laz_6.laz',
 'downloads/DUC_A01_2008_laz_12.laz',
 'downloads/DUC_A01_2008_laz_8.laz',
 'downloads/DUC_A01_2008_laz_5.laz',
 'downloads/DUC_A01_2008_laz_2.laz',
 'downloads/DUC_A01_2008_laz_9.laz',
 'downloads/DUC_A01_2008_laz_7.laz',
 'downloads/DUC_A01_2008_laz_0.laz',
 'downloads/DUC_A01_2008_laz_4.laz',
 'downloads/DUC_A01_2008_laz_10.laz',
 'downloads/DUC_A01_2008_laz_11.laz',
 'downloads/DUC_A01_2008_laz_3.laz',
 'downloads/DUC_A01_2017_LAS_16.laz',
 'downloads/DUC_A01_2017_LAS_11.laz',
 'downloads/DUC_A01_2017_LAS_2.laz',
 'downloads/DUC_A01_2017_LAS_14.laz',
 'downloads/DUC_A01_2017_LAS_5.laz',
 'downloads/DUC_A01_2017_LAS_4.laz',
 'downloads/DUC_A01_2017_LAS_1.laz',
 'downloads/DUC_A01_2017_LAS_3.laz',
 'downloads/DUC_A01_2017_LAS_6.laz',
 'downloads/DUC_A01_2017_LAS_18.laz',
 'downloads/DUC_A01_2017_LAS_19.laz',
 'downloads/DUC_A01_2017_LAS_10.laz',
 'downloads/DUC_A01_2017_LAS_0.laz',
 'downloads/DUC_A01_2017_LAS_7.laz',
 'downloads/DUC_A01_2017_LAS_17.laz',
 'downloads/DUC_A01_2017_LAS_9.laz',
 'downloads/DUC_A01_2017_LAS_15.laz',
 'downloads/DUC_A01_2017_LAS_13.laz',
 'downloads/DUC_A01_2017_LAS_8.laz',
 'downloads/DUC_A01_2017_LAS_12.laz']
point_cloud = laspy.read(downloaded_files[0])
print(point_cloud)
<LasData(1.0, point fmt: <PointFormat(1, 0 bytes of extra dims)>, 71736108 points, 0 vlrs)>
point_format = point_cloud.point_format
list(point_format.dimension_names)
['X',
 'Y',
 'Z',
 'intensity',
 'return_number',
 'number_of_returns',
 'scan_direction_flag',
 'edge_of_flight_line',
 'classification',
 'synthetic',
 'key_point',
 'withheld',
 'scan_angle_rank',
 'user_data',
 'point_source_id',
 'gps_time']