Duck, Here Come the Big Data!

The Backstory of One Young Graduate Student’s Research

By Marcella Domka

In August of 2020, I started my ‘next step’ after college as a graduate student at Michigan State University. I knew from the moment I received an invitation to join the Data Intensive Landscape Limnology (DILL, find more at lab, a lab that bases many of their projects and endeavors on big data, that I was facing an exciting challenge. Big data was a fairly new concept to me. I had several research experiences during my time as an undergraduate that helped me understand the vastly complex world of ecological interactions and freshwater systems, but none that spanned the extent of the entire United States and included thousands of lakes. It was daunting, to say the least!

However, I knew that I had a strong interest in freshwater ecology and wildlife, and wanted to better understand environmental interactions between biota (living components of an ecosystem) and abiota (non-living components of an ecosystem) at a much larger scale. The DILL lab was the perfect place to embark on this journey of greater understanding. I began my first semester as a master’s student working with my advisor, Dr. Kendra Spence Cheruvelil, a co-founder of the DILL lab. I remember that one of the first ‘tasks’ she assigned me was to brainstorm ideas for what would eventually become my thesis research. I had quite a few thoughts bouncing around in my head already, so I was happy to take this on and dive right into the questions that the fields of ecology, limnology, and big data science have to offer.

Right away, I knew that I wanted to study something about eutrophication. Eutrophication, or the excess concentration of nutrients in freshwater environments, is often caused by point and nonpoint sources of pollution (e.g., septic tanks and fertilizer runoff from agriculture, respectively). This excess of nutrients can lead to algal blooms that remove oxygen from the water. This results in hypoxic conditions and subsequently the death of many aquatic organisms. 

During an internship I had through the NSF-funded LAKES (Linking Applied Knowledge in Environmental Sustainability) program in summer of 2019, I studied the widespread eutrophication and phosphorus pollution of Lake Menomin in Menomonie, Wisconsin. It was a fantastic experience in which I learned a wide variety of laboratory and field techniques, conducted literature reviews and composed a graduate-level research poster, and most of all, learned which elements of freshwater ecology were most intriguing to me.

Lake Menomin on June 18th, 2019. The lake was not yet experiencing algal blooms and no green coloration can be seen (yet!).
Photo Caption: Lake Menomin on July 15th, 2019. As you can see, in about 1 month, the lake turned a bright green color due to excessive nutrient content (particularly high levels of phosphorus pollution).

I realized that integrating eutrophication as a major component of my thesis research would allow me to continue studying this concept that I clearly had a passion for. Eutrophication, and the concentrations of nutrients that may indicate eutrophic conditions, would be the foundation of my research. 

Before my first research question could be fully formulated, however, I had to consider an additional component. I knew upon my acceptance to the DILL lab that I would be working with the LAGOS-US RSVR database (find more at, which contained hundreds of thousands of data rows about two types of waterbodies: natural lakes (NLs) and reservoirs (RSVRs). A ‘natural lake’ is typically naturally formed, with no apparent flow-altering structures present, whereas a ‘reservoir’ is a lake that is likely to be human-made, or include some sort of large flow-altering human made structure. Natural lakes and reservoirs are different in a variety of ways, with reservoirs typically being warmer in temperature, with larger watershed areas and larger ratios of basin to lake/reservoir surface area. Additionally, reservoirs are not well studied in comparison with natural lakes.

 Part of the cornerstone of my research would be to investigate major differences between these two types of lakes, including differences with nutrient concentrations, which is where eutrophication comes back in to play. Thus, my first research question is: are natural lakes or reservoirs more likely to have higher concentrations of total phosphorus and chlorophyll-a and lower water transparency (variables that typically indicate eutrophication)?

Yes, I did mention ‘first’ research question. And yes, while I was happy that I would be studying nutrient concentrations across two distinct waterbody types (natural lakes and reservoirs), I knew there was something else missing from the scope of my research. From a very young age, I’d always loved everything about the natural world, but something about wildlife was especially captivating. I knew my master’s thesis wouldn’t feel complete without a wildlife component. 

After bringing this interest up to my advisor and getting feedback from my other DILL lab members, I did some searching to find readily available wildlife data. I decided to use waterfowl (aquatic birds such as ducks, geese, mergansers, etc.) data from the United States Fish and Wildlife Service (find more at Migratory Bird Data Center – About the Atlantic Flyway Breeding Waterfowl Survey). After performing a detailed literature search, I understood that there were myriad factors that may influence waterfowl use of aquatic habitats (such as lakes and reservoirs), so I knew that incorporating these data with the LAGOS nutrient data would form the perfect ‘second’ research question. Thus, I’m asking: are natural lakes or reservoirs more likely to be associated with more species of and more abundant waterfowl?

 Once these questions were finalized, I felt that my thoughts and passions were truly fueling my research questions. I came to graduate school to address multifaceted ecological questions, and I feel that I have embraced that process. While brainstorming, writing a thesis proposal, and performing months of data exploration and statistical analysis have proved challenging, I haven’t regretted anything. The only way forward is understanding.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: