Welcome to the LAGOS Visualization Blog. Here we will post interesting and fun visualizations of data from LAGOS as well as other posts that we find interesting to share and talk about. In this blog, we focus mostly on data visualizations because it is not easy to fully capture the complexity of macroscale data, which includes a range of environmental characteristics across both space and time. So, this space is devoted to thinking creatively about visualization for macrosystems ecology. Current members of the ‘Continental Limnology’ Team will be posting here.
By Patrick Hanly
While citizen scientists are already known to be a vital source of water quality data, they have also been quietly amassing a substantial collection of species records through digital platforms such as the popular iNaturalist. For example, there are 900,000 dragonfly and damselfly records on iNaturalist as of August 2020. Although iNaturalist was created with the goal of connecting people with nature, a fortunate byproduct of this effort is an extensive database of species records with spatial and temporal coverage that vastly exceeds the capacity of the scientific community.
You may have heard of eBird, a well-established citizen science project run by the Cornell Lab of Ornithology that tracks observations of birds. Similarly, iNaturalist accounts for 66% of the U.S.’s 470,000+ georeferenced records in the Global Biodiversity Information Facility (GBIF), an international organization that focuses on compiling biodiversity data and making it publicly accessible. However, unlike eBird, iNaturalist encompasses all biota and relies primarily on photographic records that can be corroborated by the community. Corroboration is an important verification step that increases the quality of the data and allows researchers to be part of the identification process to fix errors prior to use. Observations can achieve “Research Grade” when they are properly dated and georeferenced, submitted with verifiable evidence, and when greater than two-thirds of users agree on identification.
I am developing tools to help people access these important data. All records (Research Grade or not) are freely accessible in an open database through the iNaturalist API. This online tool facilitates downloads into R using a package I am developing called iNatTools that provides data processing tools such as ways to determine sampling efforts for ecological research. Research Grade records are also exported to the biodiversity data compiler GBIF. To date, these GBIF records have generated 738 citations, showing that Research Grade iNaturalist records are an increasingly important source of contemporary distribution data for many taxa.
Although vouchered specimens from museums and universities offer a wide breadth of species for many taxonomic groups, citizen science is an important source of recent and geographically widespread data for easily documented species such as dragonflies. These data will be essential for understanding biogeography and other investigations into species and ecological communities. Despite the large and growing number of observations, the biodiversity of many areas remains poorly documented. You can help fill these gaps — get started as an observer, identifier, or both.
By Jessica Díaz Vázquez
I joined the Data Intensive Landscape Limnology Lab in October 2018 to gain research experience in the general field of ecology. As I learned more about the database LAGOS and the openness of the lab for interdisciplinary research, I saw an opportunity to incorporate my interest in environmental justice.
I grew up in Northeast Houston, Texas in a predominantly latinx and low-income community that is adjacent to petrochemical plants and oil refineries. Living in a ‘frontline’ or environmental justice community means that the topics of health, racial/ethnic identity, economic status, and natural environment are extremely interconnected. Just like any other community, we love our backyard gardens, neighborhood parks, and local bayous. However, the disproportionate burden of air and water pollution make outdoor activities much less pleasant or healthy. From my lived experiences and as a rising senior in MSU’s Department of Fisheries & Wildlife, I seek to improve the habitat of wildlife and expose and correct environmental injustices. I am excited to apply my combined knowledge in fisheries & wildlife and environmental justice through this REU position.
The overall goals of this REU position are to integrate information about lake watersheds and lake water quality with human demographics and apply an environmental justice lens. I hope to answer the question: Are people and communities within marginalized demographics (e.g., low income, people of color, younger/older people) disproportionately affected by low water quality lakes and their watersheds?
For my research, I am using lake and watershed data from the LAGOS database that covers the conterminous U.S. Therefore, the human demographic data used must be compatible with this large scale. I am using tract-level data from the 2010 Decennial Census and the American Community Survey (ACS). The main variables that I will focus on for lakes are those that together serve as a measure of water quality: water clarity, phosphorus, and nitrogen. For the human demographic variables, I will choose those of interest in the environmental field, such as median household income, race/ethnicity, population, and sex. Figure 1 is an example of a visual output resulting from linking watersheds and median household income for LAGOS-NE.
Although I expect challenges to arise from working with two unique databases (LAGOS and ACS), I look forward to bringing a new perspective to the research group. Stay tuned for an update at the conclusion of my summer 2020 REU!
I was motivated to apply for this particular REU position as, growing up in northern Michigan, I have always been interested in nature and ecology, and I wanted to be able to apply my math degree in areas that would allow me to pursue these interests. It has been an amazing learning opportunity for me to apply things I have been learning in my math, computer science, and statistics classes into areas where I did not expect to apply them. Being able to work in such a diverse research group has helped me greatly in learning how to translate and apply mathematical skills into different useful applications.
The project that I mainly focused on over the course of the summer involved classification of lakes in LAGOS-NE (www.lagoslakes.org) into two categories: natural lakes, and reservoirs. Since this involved such a large number of lakes (~50,000), much of my work revolved around training a deep-learning algorithm with the help of a computer science REU student Laura Danilla. Manually, I classified a subset of the lakes using GIS layers and satellite imagery. We then used this subset of lakes with confirmed types to train our deepmind AI to identify lake types based solely on the shape of the lake.
Throughout the course of the summer, I created a training set containing 5334 lakes, roughly half natural lakes and half reservoirs. Using these lakes of known types, we estimated the performance of our model as we prepared to apply it to all lakes in LAGOS-NE. After this testing, we estimated our model to have around 80%-85% accuracy when determining lake type for a given lake in LAGOS-NE. Then, we applied our model to the ~45,000 remaining unclassified lakes in LAGOS-NE and obtained the following results: 63% of lakes in LAGOSNE (28,733) are natural lakes, and 39% of lakes (16,864) are reservoirs. For reservoirs, the average predictive confidence was 45% and for natural lakes the average confidence was 61%. This metric of confidence is estimated by the model as it is determining the type of a lake. It develops a probability of each lake being from either category (natural lake or reservoir). The confidence metric is the absolute value of the difference between the probabilities of a lake being in each category.
Figures 1 and 2 show the distributions of the model’s reservoir predictions and natural lake predictions, respectively. Note that natural lakes tend to appear in clusters, whereas reservoirs are more evenly distributed. Also note that regions with many lakes such as Minnesota have high concentrations of both reservoirs and natural lakes. Both of these results are very promising for our model because they match up well with what we expect.
For instance, we expect natural lakes to be found in clusters from processes such as glaciation, and reservoirs we expect to be more evenly distributed as they can form anywhere that we can pool water. Finally, Figure 3 shows the count of lakes in each category for each state.Working in this REU position over the summer has been a great experience for me. I’m still working in the lab this semester, hoping to extend my work to the entire conterminous US. After graduating in May 2020, I hope to continue to apply the skills I have developed working in the lab in similar areas. I have become particularly interested in deep-learning algorithms after working so closely with one, and I hope to find a position where I can continue to pursue this interest.
This animation shows the accumulation of water quality observations for each of the lakes in the LAGOS-NE database. For each year and lake, the cumulative count of in water quality observations to date is shown by color. The first field observation in the database was recorded in 1933 from Lake Pepin (WI/MN). The lake with the most data points across the time period is Lake Champlain (NY/VT/QC). Approximately 12,000 lakes have at least one observation and appear as points on this map. Another 39,000 lakes are included in LAGOS-NE that do not have water quality observations, but have a large range of GIS-derived ecological context variables and watersheds calculated.
We are thrilled to announce that the LAGOS-NE data paper is published, which means that the underlying data are live: https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/gix101/4555226
Creating something like LAGOS-NE takes a wide range of contributions, expertise, and types of work. We want to extend a HUGE thanks to everyone who contributed. This effort could not have happened without the willingness of people to work within an open-science perspective–to share their data, skills, and tools openly. I am also particularly struck by the important contributions of so many early-career scientists who provided so many creative and novel approaches and ideas to make this effort happen. Specifically, it took 5 major efforts and types of contributions to create LAGOS-NE, and all individuals played a key role:
- The data providers spent time sharing their data and documentation with us to create the database, and fielded numerous questions about the data.
- The data integrators spent time manipulating the data and authoring metadata for the individual datasets and were the point of contact to the data providers
- The geo-data creators developed the GIS tools to create the large number of metrics calculated from the many national-scale geographic datasets that part of LAGOS-NE
- The Information-managers designed and created the database
- The Data-accessibility-managers designed the strategy for sharing the data by preparing the data and metadata tables into a format to share and make publicly available; as well as design and write an R package for users to use the LAGOS-NE data.
I would like to extend a personal thanks to each and every one of the following individuals:
1. Data providers
- Provided water quality datasets by finding funding for data collection, sampling, entering data, conducting quality-control, writing documentation and metadata, and sharing — Linda Bacon, Michael Beauchene, Karen Bednar, Marvin Boyer, Mary Tate Bremigan, Steve Carpenter, Jamie Carr, Kendra S Cheruvelil, Matt Claucherty, Joseph Conroy, John Downing, Jed Dukett, Chris Filstrup, Clara Funk, Maria Gonzalez, Linda Green, John Halfman, Steve Hamilton, Paul Hanson, Elizabeth Herron, Celeste Hockings, James Jackson, Kari Jacobson-Hedin, Lorraine Janus, William Jones, Jack Jones, Caroline Keson, Scott Kishbaugh, Barbara Lathrop, Jo Latimore, Yuehlin Lee, Noah Lottig, Jason Lynch, Leslie Mathews, William McDowell, Karen Moore, Brian Neff, Sarah Nelson, Mike Pace, Donald Pierson, Amina Pollard, David Post, Paul Reyes, Donald Rosenberry, Karen Roy, Lars Rudstam, Orlando Sarnelle, Nancy Schuldt, Pat Soranno, Nick Spinelli, Emily Stanley, John Stoddard, Jason Tallant, Anthony Thorpe, Mike Vanni, Gretchen Watkins, Kathie Weathers, Kathy Webster, Jeff White, and Marcy Wilmes
2. Data integrators
- Authored metadata and prepared individual datasets — Mary Tate Bremigan, Claire Boudreau, Kendra S. Cheruvelil, Sarah Collins, C. Emi Fergus, Chris Filstrup, Emily N. Henry, Noah Lotticg, Sam Oliver, Nick Skaff, Pat Soranno, Emily Stanley, Kathy Webster
- Prepared the integrated metadata document for LAGOS-NE — C. Emi Fergus
- Prepared EML metadata for water quality datasets — C. Emi Fergus
- Prepared EML metadata for some water quality datasets — Claire Boudreau
- Designed and implemented the quality-control analysis for the water quality data — Noah Lottig
- Wrote parts of the technical documentation for LAGOS-NE that was part of the documentation article — Ed Bissell, Mary Bremigan, Kendra S. Cheruvelil, Sarah Collins, C. Emi Fergus, Corinna Gries, Noah Lottig, Caren Scott, Nick Skaff, Nicole Smith, Scott Stopyak, Pat Soranno, Craig Stow, Ty Wagner, Kathy Webster
- Editor of the technical documentation for LAGOS-NE that was part of the documentation article — Jean-Francois Lapierre
3. Geo-data creators
- Developed geospatial tools and performed geospatial analyses — Scott Stopyak, Nicole J. Smith
- Developed methods for delineating lake watersheds — Scott Stopyak
- Developed freshwater metrics — Scott Stopyak, C. Emi Fergus, Nicole J. Smith, Patricia Soranno
- Created LAGOS-NE_LOCUS and conducted quality control — Ed Bissell
- Designed the quality-control analysis for the geo-data — Sarah Collins, Caren Scott
- Conducted quality-control analysis for the geo-data — Sarah Collins, Caren Scott, C. Emi Fergus, Nick Scaff, Kathy Webster
- Authored geospatial metadata — Nicole J. Smith
- Prepared geodatabase for sharing — Nicole J. Smith
4. Information managers
- Database design, database creator, database manager — Ed Bissell
- Database design — Pang-Ning Tan
- Database design – Corinna Gries
- Database design contributor — Patricia Soranno
- Wrote R code to import water quality datasets into LAGOS data model — Ed Bissell, Sam Christel, Noah Lottig, Shuai Yuan
- Designed the strategy for sharing the data by preparing the data and metadata tables into a format to share and make publicly available — Corinna Gries, Colin Smith, i.e., Environmental Data Initiative
- Wrote the LAGOS-NE R package to make LAGOS-NE accessible to users — Jem Stachelek, Sam Oliver
Without all of these individuals, LAGOS-NE could never have happened. Thanks to you all.
— Pat Soranno, October 19, 2017
Some pictures of the CSI-Limnology team that worked together to integrate the data into LAGOS-NE:
These maps are from a recent paper by JF Lapierre et al. that compared the factors that control lake CO2 at the continental scale. He found that the spatial patterns in lake pCO2 driver‐response relationships translated into the formation of spatial clusters of pCO2 “regulation” that are shown in map (e) even though there is little apparent regional pattern in pCO2 itself (shown in (d)).
The above figure legend reproduced here:
“The spatially varying relationships of (a) Color, (b) alkalinity, and (c) Chl a with pCO2 in U.S. lakes. Colored dots on Figures a–c represent significant relationship between the proxy and pCO2 based on geographically weighted regressions. Colors indicate classes of t values (slope of the regression divided by standard error of the estimate), with red denoting a positive effect, blue denoting a negative effect, and white denoting no statistically significant effect on pCO2. Despite an absence of spatial pattern in (d) pCO2, the spatial patterns in lake pCO2 driver‐response relationships translated into the formation of (e) spatial clusters of pCO2 “regulation.” Clusters include lakes with comparable response of pCO2 to Chl a, Color, and tAlk (see Table ). Note that the map displays the boundary of U.S. territories, not just the land area.”
This map is from this article by Sarah Collins et al. It clearly shows how spatial patterns in TP and TN alone do not lead to similar patterns in the ratio of TN:TP. Also, spatial pattern in TN and TP are similar, but not identical and there are some interesting outliers, e.g., Michigan lakes. See the article for details on these cool patterns.
Emi Fergus et al. published a recent paper that describes the complex features of the freshwater landscape. These maps are very compelling in that they show that there are very different patterns between freshwater ABUNDANCE versus CONNECTIVITY.
FIGURE DESCRIPTION (Figure and text from Fergus et al. 2017): Freshwater abundance and connectivity maps by system type. Freshwater abundance is quantified as the total proportion area or stream density within the Hydrologic Unit (HU) 12 spatial unit for lakes (a), wetlands (b), and streams (c). Abundance values are binned as quantiles. Freshwater connectivity for lakes (d), wetlands (e), and streams (f) is represented by connectivity cluster groups determined by k‐means cluster group using principal components analysis (PCA) scores from lake, wetland, and stream connectivity metrics at the HU12 spatial scale. Dominated is in reference to where spatial units plotted on the PCA axes using relative proportion connectivity metric values. The solid black line represents the estimated boundary of the Wisconsin glacial period—north of the line is glaciated area and south of the line is unglaciated area.
This figure shows the distribution of the underlying data in Oliver et al 2017 (see below). Interestingly, when you look at all data, it is difficult to discern any patterns. While the study found that on average lakes weren’t changing in chlorophyll, roughly 15% of lakes were either increasing or decreasing in chlorophyll.
Samantha Oliver made this GIF using data and results from her recent paper in Global Change Biology published in 2017. She also shared her code for the above GIF here; as well as the data that the article is based on. You can read more about this article at this blog and this press release. This article was also recently featured in the Minneapolis Star and Tribune.