Use Cases Gallery¶

Real-world scenarios demonstrating how AquaScope supports water research, monitoring, and decision-making across diverse contexts.

1. Taiwan River Basin Water Quality Monitoring¶

Problem. Taiwan's Environmental Protection Administration monitors over 300 river stations monthly. Researchers need a reproducible workflow to fetch the latest data, assess quality, detect long-term trends, and identify pollution hotspots — without manually downloading spreadsheets from multiple portals.

AquaScope Workflow

# Collect the latest river water quality data
aquascope collect --source taiwan_moenv --api-key $MOENV_KEY

# Profile the dataset
aquascope eda --file data/raw/taiwan_moenv_latest.json

# Auto-fix quality issues (duplicates, outliers)
aquascope quality --file data/raw/taiwan_moenv_latest.json --fix

# Detect long-term trends with Mann-Kendall
aquascope run --method trend_analysis \
  --file data/clean/taiwan_moenv_latest.json

# Compute River Pollution Index scores
aquascope run --method wqi_calculation \
  --file data/clean/taiwan_moenv_latest.json

Expected Outputs

EDA report: 15 parameters profiled, missing-value heatmap, distributions.
Trend results: statistically significant decreasing BOD at 40 % of stations (p < 0.05), indicating improving water quality.
RPI scores and pollution category per station per month.

2. US Groundwater Contamination Detection¶

Problem. A county health department suspects PFAS contamination in shallow groundwater wells near an industrial site. They need to pull multi-agency data from the Water Quality Portal, flag anomalous readings, and classify contamination severity.

AquaScope Workflow

from aquascope.collectors import WQPCollector
from aquascope.analysis.quality import assess_quality, preprocess
from aquascope.analysis.eda import profile_dataset
from aquascope.ai_engine.recommender import recommend
from aquascope.pipelines.model_builder import run_pipeline
import pandas as pd

# Collect WQP data for the target county
collector = WQPCollector()
records = collector.collect(state="US:36", county="US:36:001")
df = pd.DataFrame([r.model_dump() for r in records])

# Quality check and preprocess
report = assess_quality(df)
df_clean = preprocess(df, steps=report.recommended_steps)

# Profile and get AI recommendations
profile = profile_dataset(df_clean)
profile.research_goal = "Detect PFAS contamination anomalies"
recs = recommend(profile, top_k=3)

# Run the top-recommended pipeline
result = run_pipeline(recs[0].methodology.id, df_clean)
print(result.summary)

Expected Outputs

Quality report highlighting 12 % missing values and 3 duplicate records.
AI recommender suggests: PCA + Clustering, Random Forest classification, Correlation Analysis.
PCA clusters reveal a distinct group of 8 wells with elevated PFOS/PFOA.

3. European River Ecological Status Assessment¶

Problem. An EU-funded research consortium needs to assess ecological status across 170+ countries using GEMStat freshwater data, aligned with the Water Framework Directive's "good ecological status" benchmarks.

AquaScope Workflow

# Collect GEMStat global freshwater quality data
aquascope collect --source gemstat

# Run EDA to understand parameter coverage
aquascope eda --file data/raw/gemstat_latest.json

# Cluster stations by water quality signature
aquascope run --method pca_clustering \
  --file data/raw/gemstat_latest.json

# Correlate parameters to identify pollution drivers
aquascope run --method correlation_analysis \
  --file data/raw/gemstat_latest.json

Expected Outputs

Dataset profile: 50+ parameters across 12,000 stations, 42 countries.
PCA reveals 3 principal components explaining 78 % of variance (organic pollution, nutrient loading, heavy metals).
Correlation matrix identifies strong BOD–COD and TN–TP co-occurrence, pointing to agricultural runoff as a dominant driver.

4. Global Drought Early Warning System¶

Problem. A humanitarian organisation monitors drought indicators across sub-Saharan Africa. They need to combine UN SDG 6 water-stress indicators with USGS streamflow data and Taiwan's reservoir levels to build a multi-source drought severity index.

AquaScope Workflow

from aquascope.collectors import SDG6Collector, USGSCollector
from aquascope.collectors import TaiwanWRACollector
from aquascope.analysis.eda import profile_dataset
from aquascope.ai_engine.recommender import recommend
import pandas as pd

# Collect from three sources
sdg6 = SDG6Collector().collect()
usgs = USGSCollector().collect(site="09380000", days=730)
wra = TaiwanWRACollector().collect()

# Combine into a unified dataframe
frames = []
for records in [sdg6, usgs, wra]:
    frames.append(pd.DataFrame([r.model_dump() for r in records]))
df_all = pd.concat(frames, ignore_index=True)

# Profile and recommend
profile = profile_dataset(df_all)
profile.research_goal = "Drought severity index construction"
recs = recommend(profile, top_k=5)
for r in recs:
    print(f"  {r.rank}. {r.methodology.name} (score: {r.score:.2f})")

Expected Outputs

Merged dataset spanning 3 sources, 2 years, 4 parameter categories.
Top recommendations: ARIMA forecasting (for trend projection), Correlation Analysis (cross-source relationships), PCA + Clustering (drought severity grouping).
ARIMA forecast projects reservoir storage 6 months ahead with RMSE < 5 %.

5. Flood Risk Assessment for Urban Planning¶

Problem. A municipal planning department needs historical flood data to inform zoning decisions. They want to combine USGS gage-height records with Taiwan Civil IoT real-time sensor data to characterise flood frequency and identify high-risk zones.

AquaScope Workflow

# Collect USGS gage height data
aquascope collect --source usgs --days 3650

# Collect Taiwan Civil IoT real-time water level sensors
aquascope collect --source taiwan_civil_iot

# Profile both datasets
aquascope eda --file data/raw/usgs_latest.json
aquascope eda --file data/raw/civil_iot_latest.json

# Run XGBoost regression to predict flood stage
aquascope run --method xgboost_regression \
  --file data/raw/usgs_latest.json \
  --config '{"target": "gage_height"}'

# Trend analysis on peak annual flows
aquascope run --method trend_analysis \
  --file data/raw/usgs_latest.json

Expected Outputs

10-year USGS record: 3,650 daily observations across 5 stations.
Civil IoT real-time snapshot: 200+ sensors, 15-minute intervals.
XGBoost model predicts gage height with R² = 0.91, highlighting upstream precipitation and antecedent soil moisture as top features.
Mann-Kendall trend test shows statistically significant increasing peak flows at 3 of 5 stations, supporting flood risk upgrades in the zoning plan.