On a mid-August morning at his office in Chicago, Paris Collingsworth opens a spreadsheet attachment with 15,000 rows of data. He’s received similar files a handful of times over the last three years. The data they’ve contained has launched a critical examination of how agencies manage one of the largest freshwater fisheries in the world and opened the door to solutions to some of Lake Erie’s trickiest problems.
“Advances in how we collect and analyze data are yielding a more detailed picture of hypoxia in Lake Erie than we’ve ever had,” said Collingsworth, an ecosystem specialist with IISG who works on projects supported by the Illinois Water Resources Center (IWRC) through the Great Lakes Restoration Initiative (GLRI).
These improvements aren’t limited to Lake Erie, though. Research projects throughout the region are benefiting from cutting-edge analytical tools that make it possible to decipher and model massive data sets in a fraction of the time it took just a few years ago.
And thanks to an international effort to coordinate Great Lakes research and a partnership with leading supercomputing experts, the walls that traditionally separated these data sets are coming down.
U.S. Geological Survey (USGS) Research Fishery Biologist Richard Kraus assumed his dissolved oxygen sensors had failed when they returned puzzling results during a field test on Lake Erie. That is until an Ohio Department of Natural Resources (Ohio DNR) official conducting similar tests nearby said his instruments were giving the same readings.
Data loggers deployed at 10 sampling stations collecting measurements every 10 minutes would later confirm the explanation: Hypoxia doesn’t just spread out from the central basin of the lake like scientists have long believed. Pockets of low oxygen also continuously spring up at the edge of the basin, where they’re sloshed around by internal waves.
“We were in awe when we looked at the data from the first season,” said Kraus who began continuously monitoring dissolved oxygen in Lake Erie in 2011, three years before the U.S. Environmental Protection Agency (U.S. EPA) Great Lakes National Program Office (GLNPO) deployed their loggers. “Sometimes an area would switch from normal to hypoxic conditions in a matter of hours.
“We wouldn’t have been able to see that short-term variability without such a large data set.”
The finding moves U.S. EPA and Environment Canada substantially closer to fulfilling their commitment to pinpoint the extent and severity of the hypoxic zone in Lake Erie, one of many priorities codified in the Great Lakes Water Quality Agreement.
It also has potentially sweeping repercussions on fishery management in the lake. The Great Lakes Fishery Commission bases annual commercial catch limits on models that assume the number of fish and the effectiveness— or catchability—of different fishing gear are the same throughout the lake.
But dynamic dead zones mean inconsistency. Fish and other aquatic wildlife numbers spike at the edge of hypoxic waters as some flee suffocation and others hunt those on the run. The result is an ever-changing patchwork of high and low-density habitats that could lead managers to think there are more or less fish in the lake than actually reside there.
“When the Ohio DNR Division of Wildlife sampled at the edge of the hypoxic zone this year, they caught huge numbers of yellow perch,” noted Collingsworth. “If they put that data point in the model, it changed the population estimate for the entire lake by about 30 percent.”
Ohio DNR, USGS and others are examining survey designs that would offer a more accurate picture of species behavior and numbers. In the meantime, the Lake Erie Committee of the Great Lakes Fishery Commission have approved an interim rule that allows managers to keep outlying data out of the models.
Better results together
For Collingsworth and others, the discoveries in Lake Erie are a shining example of how the Cooperative Science Monitoring Initiative (CSMI) can heighten research into issues plaguing the region.
“CSMI lets us push the boundaries of what we’re trying to do,” said Collingsworth, who aided GLNPO’s deployment of the dissolved oxygen loggers during the 2014 CSMI field year.
The binational effort focused on Lake Erie that year. U.S. EPA, USGS, Ohio DNR, the National Oceanic and Atmospheric Administration (NOAA) and other state and federal agencies, along with researchers at University of Toledo, Ohio State University, Heidelberg University and Case Western Reserve University, came together to focus their monitoring programs on closing some of the research gaps identified by lake managers.
The same process has occurred each year since the early 2000s in a continuous rotation around the five Great Lakes.
“CSMI is an opportunity for agency and university researchers to collaborate and coordinate ongoing research to find synergies,” said Joel Hoffman, a research biologist with U.S. EPA who has participated in four CSMI field years. “There are lots of people working on different things on the lake, but we rarely get a picture of the whole lake. Collaborating gives us that big picture.”
And that picture doesn’t always match expectations. During the 2015 Lake Michigan field year, researchers set out to map what they believed would be an incremental change in the number of phytoplankton living near the surface as they surveyed nearshore to offshore.
“The idea is that zebra and quagga mussels filter the water so well that in the spring when the nutrients come in and start to fuel the phytoplankton population, the mussels pull them out before they can reach the middle of the lake,” said Collingsworth. “It’s called the nearshore phosphorus shunt.”
Initial analysis of the data collected with the Triaxus sensors towed behind the R/V Lake Guardian confirms the base of the food web is smaller at the center of the lake, but the expected gradient isn’t there. Although it’s too soon to say for sure, this finding suggests that the role of surface productivity in Lake Michigan’s food web is more dynamic than previously thought.
Smarter data analysis
The scale and scope of CSMI field years means final results from the Lake Michigan Triaxus surveys won’t be released for roughly another year. But that’s considerably sooner than it would be if not for computer codes that automate some of the analysis.
Developed by engineers at the University of Illinois and programmers at the National Center for Supercomputing Applications (NCSA), these algorithms can read thousands of lines of data in near real time, filling in missing information and flagging unexpected results as they go—a process that would otherwise take months and suffer from more user errors.
And with each use, the modeling software running these algorithms get better at finding insights without being explicitly programmed to look for them.
“This is at the cutting edge of data analysis,” noted Collingsworth.
Beyond the Triaxus, open source algorithms have been developed for GLNPO’s Seabird sensors, which measure water quality characteristics throughout the water column, as well as underwater gliders that Hoffman and scientists at the University of Minnesota deployed during the 2016 Lake Superior field year to collect high-resolution data on the location of upwelling events, the structure of the deep chlorophyll layer and other lake dynamics.
“We’re talking about gigabytes of data—hundreds of lines per second,” said Hoffman.
Multiple data sets, one platform
Although CSMI field year data is currently managed by individual agencies and researchers, GLNPO and NCSA hope in the coming years to integrate the results into the online portal and visualization tool, Great Lakes Monitoring.
The planned expansion is made possible by the hybrid database NCSA built the tool on. Luigi Marini and eight other developers merged elements of the traditional relational database—imagine a series of spreadsheets— with something computer scientists call NoSQL.
“Because the tool wasn’t built for specific sources or data types, it is easier for us to ingest new data sources into the system,” explained Marini, senior research programmer at NCSA.
For Great Lakes scientists and natural resource managers, the result is a platform that makes it possible to quickly search and analyze decades of nutrient, contaminant and water characteristic data collected by multiple agencies using a variety of sampling methods. Much of the information available now comes from GLNPO, but USGS, NOAA, Heidelberg University and others have also contributed data from their environmental monitoring programs.
“Access to high-quality, continuous data has historically been a major hurdle to Great Lakes research,” said Brian Miller, director of IISG and IWRC. “What used to take months to retrieve now takes minutes.”
Marini, Collingsworth and the others behind Great Lakes Monitoring also plan to take advantage of the hybrid database to incorporate information from the Lake Erie dissolved oxygen loggers as well as the results of U.S. EPA biological surveys.
“It’s challenging but exciting to find ways to visualize these disparate types of data and sources—to bridge different types of data and make them accessible as a complete data set,” said Marini. “A tool that can do that is unique.”
“With these visualizations worked out,” added Collingsworth, “the growth potential for Great Lakes Monitoring is huge.”