LAGOS-US Overview

LAGOS-US provides an extensible research-ready platform to study the 479,950 lakes and reservoirs larger than or equal to 1 ha in the conterminous US at multiple, broad scales of space and time. Although lakes are our focal unit of study, studying land-water interactions requires not only in situ lake water quality measurements, but also descriptions of the lakes, their watersheds, and their landscape ecological context (i.e., the land use, geologic, climatic, and hydrologic setting of lakes). Each lake’s ecological context can be characterized at a variety of spatial extents (e.g., ecoregions, watersheds), which we call spatial divisions. Some of these ecological context variables are relatively static through time and are therefore characterized for a single date, whereas others are dynamic through time and are characterized at multiple time steps. Wherever possible, we include data for all lakes. These data provide a ‘census’ population of all lakes in the study area. 

The LAGOS-US research platform includes three core data modules: 

  1. LAGOS-US LOCUS for locational, identifying, and physical information of all lakes > 1 ha in surface area and their watersheds in the conterminous U.S.,
  2. LAGOS-US GEO for geospatial and temporal ecological context variables (e.g., land use, climate, hydrology) for all lakes in LOCUS characterized at multiple spatial divisions (e.g., equidistant buffers around lakes, watersheds, ecoregions), and 
  3. LAGOS-US LIMNO for in situ surface-water limnological physical, chemical, and biological measurements for a subset of lakes > 1 ha through time.

The LAGOS-US research platform was designed to be modular, i.e., each data module consists of data tables of themed variables that were derived using similar methods or data sources. This modularity facilitates documentation of the entire database and makes the data tables of manageable size. In addition, our vision is for LAGOS-US to be easily extensible (i.e., to allow other users to build extension modules that can be easily integrated into the LAGOS-US research platform). 

There are currently four LAGOS-US extension modules in development by members of our research team that will connect to LAGOS-US LOCUS through common lake identifiers: RESERVOIR provides a predicted classification of all 137,465 lakes > 4 ha as either a natural lake or a reservoir using a machine-learning algorithm and aerial imagery;  LAKE DEPTH includes mean and/or maximum depth measurements of over 17,000 lakes > 1 ha that were manually compiled from a wide range of online sources; NETWORKS uses graph theory to identify 898 lake networks that include 86,511 lakes > 1ha and provide quantitative surface water connectivity metrics for those networks and lakes; and LANDSAT provides predicted water quality measurements for chlorophyll a, Secchi depth, and colored dissolved organic matter for all lakes > 4 ha using machine-learning models based on atmospherically corrected Landsat imagery and LIMNO data, in addition to lakewide values of reflectance for each Landsat band and satellite overpass.

To create LAGOS-US, we used existing datasets from a variety of data sources, such as national-scale climate, land use/cover, and hydrology data, as well as government, tribal, and citizen science lake data. In building this research platform, we followed a similar set of three fundamental principles similar to those that we used to create LAGOS-NE, an earlier version of the database system for a subset of US states (Soranno et al. 2015, 2017). First, LAGOS-US is based on a foundation of open science by which we make our data publicly available when each module is completed, error-checked, and documented and we provide a permanent identifier and a versioning system for it to facilitate future reuse of the data. Second, we document and describe the original data sources, our methods for integrating data, possible errors that may exist in the data, and we provide code for such methods, when possible. Third, we preserve the provenance of the original data as much as possible.