Access CMS LiDAR data#
Learning Objectives
Learn how to search files within a dataset programatically in Python
Learn how to restrict search results by space (study area) and time (project period).
Download data files of interest.
Work with the data files.
In this tutorial, we will use earthaccess
Python module to search for CMS datasets.
import earthaccess
import numpy as np
import pandas as pd
import geopandas as gpd
import contextily as cx
import laspy
from shapely.geometry import box
from shapely.ops import orient
LiDAR Surveys over Selected Forest Research Sites, Brazilian Amazon, 2008-2018#
This dataset [dos-Santos et al., 2019] provides the complete catalog of point cloud data collected during LiDAR surveys over selected forest research sites across the Amazon rainforest in Brazil between 2008 and 2018 for the Sustainable Landscapes Brazil Project. Flight lines were selected to overfly key field research sites in the Brazilian states of Acre, Amazonas, Bahia, Goias, Mato Grosso, Para, Rondonia, Santa Catarina, and Sao Paulo. The point clouds have been georeferenced, noise-filtered, and corrected for misalignment of overlapping flight lines. They are provided in 1 km x 1 km tiles. There are a total of 3,152 tiles in compressed LAS (*.laz) file format.
M.N. dos-Santos, M.M. Keller, and D.C. Morton. LiDAR Surveys over Selected Forest Research Sites, Brazilian Amazon, 2008-2018. 2019. doi:10.3334/ORNLDAAC/1644.
We will use earthaccess
module to search and download lidar data.
Authentication#
We recommend authenticating your Earthdata Login (EDL) information using the earthaccess python library as follows:
# works if the EDL login already been persisted to a netrc
auth = earthaccess.login(strategy="netrc")
if not auth.authenticated:
# ask for EDL credentials and persist them in a .netrc file
auth = earthaccess.login(strategy="interactive", persist=True)
Search data granules#
We will use earthaccess
’s search_data API to find the granules within the data collection.
granules = earthaccess.search_data(
count=-1, # needed to retrieve all granules
granule_name = '*.laz', # only find laz files
doi="10.3334/ORNLDAAC/1644" # data collection doi
)
print(f"Total granules found: {len(granules)}")
Total granules found: 3152
Let’s print the details of the first of the files.
granules[0]
As we see above, the file is hosted in the NASA Earthdata Cloud.
The granules
object contains metadata about the granules, including the bounding boxes, publication dates, data providers, etc. Now, let’s convert the above granule metadata from json-formatted to geopandas
dataframe. Converting to geopandas
dataframe will let us easily generate various plots about the granules.
def convert_umm_bbox(b):
"""converts UMM bounding rectangles"""
try:
return box(b[0]['WestBoundingCoordinate'],
b[0]['SouthBoundingCoordinate'],
b[0]['EastBoundingCoordinate'],
b[0]['NorthBoundingCoordinate'])
except TypeError as e:
return None
def convert_list_gdf(datag):
"""converts List[] to geopandas dataframe"""
# create pandas dataframe from json
df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
# keep only last string of the column names
df.columns=df.columns.str.split('.').str[-1]
df['geometry'] = df['BoundingRectangles'].apply(convert_umm_bbox)
return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")
gdf = convert_list_gdf(granules)
Plot granules bounding boxes#
Let’s now plot the bounding boxes for each granules. Here we only limit three columns for display purpose: GranuleUR
, size
, geometry
.
gdf[['GranuleUR', 'size', 'geometry']].explore(color='red', fillcolor='red')
Region of Interest (ROI)#
We will use a boundary of the Reserva Florestal Adolpho Ducke as the region of interest. Reserva Adolpho Ducke is a 10,000-hectare (25,000-acre) protected area of the Amazon rainforest on the outskirts of the city of Manaus, Brazil. It is a part of long term ecological research network and is one of the most intensively studied rainforest in the world.
Let’s plot the study area over a base map.
poly = gpd.read_file("assets/reserva_ducke.json")
poly.explore(color='red', fill=False)
Search granules over ROI#
We will use earthaccess
python module to search for the granules overlapping the above ROI.
# bounding lon, lat as a list of tuples
poly.geometry = poly.geometry.apply(orient, args=(1,))
# simplifying the polygon to bypass the coordinates
# limit of the CMR with a tolerance of .005 degrees
xy = poly.geometry.simplify(0.005).get_coordinates()
granules = earthaccess.search_data(
count=-1, # needed to retrieve all granules
granule_name = '*.laz', # only find laz files
doi="10.3334/ORNLDAAC/1644", # dataset doi
polygon=list(zip(xy.x, xy.y))
)
print(f"Total granules found: {len(granules)}")
Total granules found: 34
gdf = convert_list_gdf(granules)
gdf[['GranuleUR', 'size', 'geometry']].explore(color='red', fillcolor='red')
Download Datafiles#
downloaded_files = earthaccess.download(granules, local_path="downloads")
downloaded_files
['downloads/DUC_A01_2008_laz_13.laz',
'downloads/DUC_A01_2008_laz_1.laz',
'downloads/DUC_A01_2008_laz_6.laz',
'downloads/DUC_A01_2008_laz_12.laz',
'downloads/DUC_A01_2008_laz_8.laz',
'downloads/DUC_A01_2008_laz_5.laz',
'downloads/DUC_A01_2008_laz_2.laz',
'downloads/DUC_A01_2008_laz_9.laz',
'downloads/DUC_A01_2008_laz_7.laz',
'downloads/DUC_A01_2008_laz_0.laz',
'downloads/DUC_A01_2008_laz_4.laz',
'downloads/DUC_A01_2008_laz_10.laz',
'downloads/DUC_A01_2008_laz_11.laz',
'downloads/DUC_A01_2008_laz_3.laz',
'downloads/DUC_A01_2017_LAS_16.laz',
'downloads/DUC_A01_2017_LAS_11.laz',
'downloads/DUC_A01_2017_LAS_2.laz',
'downloads/DUC_A01_2017_LAS_14.laz',
'downloads/DUC_A01_2017_LAS_5.laz',
'downloads/DUC_A01_2017_LAS_4.laz',
'downloads/DUC_A01_2017_LAS_1.laz',
'downloads/DUC_A01_2017_LAS_3.laz',
'downloads/DUC_A01_2017_LAS_6.laz',
'downloads/DUC_A01_2017_LAS_18.laz',
'downloads/DUC_A01_2017_LAS_19.laz',
'downloads/DUC_A01_2017_LAS_10.laz',
'downloads/DUC_A01_2017_LAS_0.laz',
'downloads/DUC_A01_2017_LAS_7.laz',
'downloads/DUC_A01_2017_LAS_17.laz',
'downloads/DUC_A01_2017_LAS_9.laz',
'downloads/DUC_A01_2017_LAS_15.laz',
'downloads/DUC_A01_2017_LAS_13.laz',
'downloads/DUC_A01_2017_LAS_8.laz',
'downloads/DUC_A01_2017_LAS_12.laz']
point_cloud = laspy.read(downloaded_files[0])
print(point_cloud)
<LasData(1.0, point fmt: <PointFormat(1, 0 bytes of extra dims)>, 71736108 points, 0 vlrs)>
point_format = point_cloud.point_format
list(point_format.dimension_names)
['X',
'Y',
'Z',
'intensity',
'return_number',
'number_of_returns',
'scan_direction_flag',
'edge_of_flight_line',
'classification',
'synthetic',
'key_point',
'withheld',
'scan_angle_rank',
'user_data',
'point_source_id',
'gps_time']