Skip to content

AquaScope Architecture

Overview

AquaScope follows a modular, layered architecture designed for extensibility and researcher-friendly workflows.

flowchart TB
    subgraph Interface["User Interface"]
        CLI["CLI<br/>14 commands"]
        API["Python API"]
        DASH["Streamlit Dashboard<br/>7 pages"]
    end

    subgraph Intelligence["Intelligence Layer"]
        AI["AI Engine<br/>26 methodologies"]
        PIPE["Pipelines<br/>7 auto-executable"]
        CHAL["Challenges<br/>flood / drought / WQ"]
    end

    subgraph Science["Scientific Core"]
        HYD["Hydrology<br/>FFA · Baseflow · FDC · Signatures"]
        AGRI["Agriculture<br/>FAO-56 ET₀ · Crop water · SWB"]
        ANA["Analysis<br/>EDA · QA · Copulas · Changepoints"]
        MODEL["Models<br/>Bayesian · Ensembles · LSTM · Transfer"]
        SPATIAL["Spatial<br/>DEM · Watershed · Strahler"]
    end

    subgraph Data["Data Layer"]
        COL["Collectors · 12 sources<br/>Taiwan · USA · Global · FAO"]
        SCHEMA["Unified Pydantic Schemas"]
        IO["Scientific I/O<br/>WaterML · HEC · SWMM · NetCDF · HDF5"]
    end

    subgraph Output["Output"]
        VIZ["Viz · 16 plots + Q-Q/P-P diagnostics"]
        REP["Reports · Markdown + HTML"]
        ALERT["Alerts · WHO · EPA · EU WFD"]
    end

    Interface --> Intelligence
    Intelligence --> Science
    Science --> Data
    Science --> Output
    Data --> SCHEMA

Layer Descriptions

1. Collectors (aquascope/collectors/)

Each collector inherits from BaseCollector and implements two methods: - fetch_raw(**kwargs) -> list[dict] — Fetch raw data from the API - normalise(raw) -> Sequence[WaterQualitySample | ...] — Transform into unified schema

The collect() method (from BaseCollector) chains these together: fetch → normalise.

Key design principle: Collectors are stateless. They receive configuration via constructor arguments (API keys, rate limits) and return pure data. The CachedHTTPClient handles caching and rate limiting transparently.

2. Unified Schemas (aquascope/schemas/)

All data from every source is normalised into one of: - WaterQualitySample — Point measurements (DO, pH, BOD5, etc.) - WaterLevelReading — River/reservoir water levels - ReservoirStatus — Daily reservoir operational data - SDG6Indicator — UN SDG 6 country-level metrics

Every record carries a DataSource enum identifying its origin. This means downstream code never needs to know which API the data came from.

3. Analysis (aquascope/analysis/)

  • EDA module — Auto-profiles any water dataset: statistics, distributions, outliers, correlations, completeness. Outputs an EDAReport that can be pretty-printed or fed to the recommender.
  • Quality module — Assesses data quality (duplicates, missing values, temporal gaps, unit inconsistencies) and provides automated preprocessing (imputation, outlier removal, resampling).

4. Pipelines (aquascope/pipelines/)

Auto-build and run approved methodologies. The pipeline registry maps method IDs to implementations:

from aquascope.pipelines.model_builder import run_pipeline

result = run_pipeline("correlation_analysis", df)
print(result.summary)
print(result.metrics)

Each pipeline returns a PipelineResult with summary text, structured metrics, and optional figure paths.

5. AI Engine (aquascope/ai_engine/)

  • Knowledge Base — 26 research methodologies with metadata: applicable parameters, data requirements, complexity, references, tags.
  • Recommender — Scores each methodology against a DatasetProfile using a multi-criteria rule engine. Optional LLM mode for deeper reasoning via OpenAI-compatible APIs (including local Ollama).

6. CLI (aquascope/cli.py)

Seven commands: collect, recommend, eda, quality, run, list-methods, list-sources. Each wraps the Python API with argument parsing.

Data Flow

1. Collect:   API → Collector.fetch_raw() → Collector.normalise() → Unified records
2. Store:     Records → JSON/CSV file (data/raw/)
3. Analyse:   DataFrame → EDA report + Quality assessment
4. Recommend: DatasetProfile → AI recommender → Ranked methodologies
5. Execute:   DataFrame + method_id → Pipeline → PipelineResult (metrics + figures)

Extending AquaScope