report_pipeline_2026 — Functionality Overview (March 2026)

Apps (`apps/`)

Web applications for interactive data exploration and annotation. All use conda env SA_3.13.

Node Inspector (DB) — port 5011

Interactive visualization of time-frequency acoustic data from PostgreSQL.
Browse per-node, per-day cochleograms with perceptual layers (EdB, Fg, Bg) and LAeq overlays.
- Stack: Bokeh, HoloViews, Datashader, PostgreSQL (via sa_fetcher)
- Start: ./start_node_inspector_db_app.sh
- Key files: main.py, config.py, projects.toml, datashaded_images.py, day_analysis_functions.py, data/retrieval.py, data/processing.py, plotting/overlays.py, ui/date_picker.py

Sound Annotator — port 5012

Web app for annotating soundscape audio clips with acoustic feature labels.
Supports multi-timescale annotation, multiple concurrent annotators, project management, and clip filtering.
- Stack: FastAPI, HTMX, Jinja2, Uvicorn
- Start: ./start_sa_annotator.sh
- Key files: main.py, config.py, app/services/annotation_service.py, app/services/project_service.py, app/services/session_service.py, models.py

Template Management — port 5013

Full template-building pipeline: prototype collection → clustering → classification → export.
- Stack: FastAPI, HTMX, Jinja2, HoloViews, NumPy, PostgreSQL
- Start: ./template_management/start_template_management.sh
- Key files: main.py, config.py, services/prototype_service.py, services/template_service.py, services/visualization_service.py, services/project_state.py

TF Rasterizer Browser — port 5015 (default)

Library (not a standalone server) for interactive rasterized time-frequency visualization in the browser. Import from notebooks/scripts via from apps.tf_rasterizer_browser import show_tf_data.
- Stack: Panel, HoloViews, Datashader, Bokeh, Zarr, xarray
- Key file: tf_rasterizer_in_browser.py

SA App Manager

Launcher/manager for the above apps. Reads apps_registry.toml to start/stop/check status.
- Key files: main.py, app_runner.py, config.py

Source Library (`src/`)

Shared Python modules used by apps and notebooks.

`src/sa_config.py` — Unified Configuration

Single entry point for all config. Loads and merges TOML files from config/ into one dict.
- load_config() / get_config() — load all shared config (paths, projects, analysis params, DB)
- get_project(name) — get merged project info (nodes, dates, portal, etc.)
- get_db_config() — get PostgreSQL connection config

`src/sa_data/` — Data Access

node_data_access.py — Fetch data from PostgreSQL:
fetch_day_data() — raw day data from DB
fetch_node_day_values() — processed day data with scaling and Fg derivation
fetch_interval_values() — fetch arbitrary time intervals
Includes optimized zstd decompression (monkey-patches sa_preloader)
data_retrieval.py — DataProcessor class: transforms raw measurement data into interpretable format (TF layers, 1D layers, Fg derivation, dequantization, timestamp conversion)

`src/node_day_analysis/` — Node-Day Selection

node_day_selection.py:
get_available_node_days() — query DB for all available node-day combinations
select_day_node_combinations() — filter by nodes, portals, date range

`src/Analysis/` — Acoustic Analysis

harmonic_analysis.py — Ridge-based harmonic analysis:
compute_ridge_props() — summarise ridges (frequency, energy, duration, concurrency, salience)
find_harmonic_candidates() — generate candidate harmonic complexes from ridges
classify_ridges_fast() — classify ridges into harmonic complexes vs. isolated tones
peak_mask_functions.py — Peak/ridge detection in spectrograms:
detect_narrow_events() — detect tonal and pulsed events at multiple timescales
find_peaks_simple() / find_peaks_above_bg() — column/row peak detection
find_ridges_in_peakMask() — trace ridges through peak masks (Numba-accelerated)
mask_lenghts(), mov_av() — supporting functions

`src/Templates/` — Template Pipeline

prototype_collection.py:
sample_random_seconds_from_node_days() — sample random seconds from DB for prototype building
sample_random_seconds() — random sampling from a single day's data
select_hourly_weighted_prototype_indices() — hourly-weighted prototype selection
template_pipelines.py:
fit_templates() — match data columns to templates via cosine similarity
reconstruct_from_templates() — reconstruct layers using best-matching templates + scaling
prototype_collection_pipeline() — end-to-end prototype collection from task list
template_fitting_pipeline() — end-to-end template fitting across node-days
template_matching.py:
find_best_templates_l1_offset_approx() — fast approximate L1 matching with offset
find_best_templates_l1_offset_exact() — exact L1 matching (slower, mathematically exact)
match_templates_to_layers() — match templates across multiple layers, compute residuals
template_analysis.py:
build_match_info_contiguous() — segment template match sequences into contiguous runs
summarize_match_info() — per-class statistics (duration, longest segment, factor extremes)
create_class_match_summary() — aggregate summary across multiple node-days
create_aggregated_template_heatmap() — time-windowed template prevalence heatmap
template_fit_and_count_loop() — batch fit + count templates across task list
template_ordering.py:
similarity_order_for_templates() — hierarchical clustering order (cosine similarity)
prune_templates() — remove templates with too few members
create_valid_order() — order: valid → no_data → artifacts
reorder_by_grouped_sorting() — sort within groups by prevalence counts
template_visualization.py:
plot_templates() — visualize template matrix with optional prevalence curves

`src/Visualization/` — Visualization Helpers

im_info_builder.py:
create_im_info() — convert values dict into im_info format for tf_rasterizer_browser

Pipeline Package (`sa_pipeline/`)

Installable package for data retrieval and batch downloading.

data_retriever/ — Retrieve soundscape data from PostgreSQL, save to Zarr/pickle
retriever.py — DataRetriever class (core logic)
cli.py — CLI interface via Typer
batch_downloader/ — Scheduled batch downloading from SA portals
scheduler.py — PreloaderScheduler with network resilience
common/ — Shared utilities
timezone_utils.py — localize_and_convert_to_local()

CLI entry points in scripts/: data_retriever, batch_downloader, data_retriever_from_postgress.py

Configuration (`config/`)

All TOML-based, loaded by src/sa_config.py:
- path_config.toml — filesystem paths (data dirs, workspace, cache, SSD)
- project_config.toml — project/node definitions, date ranges, portal info
- analysis_config.toml — analysis parameters (clustering, template building, layers)
- data_retriever.toml — data retrieval settings
- batch_downloader.toml — batch download jobs and schedule
- acoustic_annotations_default.toml / acoustic_annotations_urban_NL.toml — annotation label definitions

Notebooks (`notebooks/`)

Development and analysis notebooks:
- start_with_me.ipynb — getting started / orientation
- Clip_download+processing.ipynb — audio clip downloading and processing
- day_overview_development.ipynb — day-level analysis development
- Template_development.ipynb — template building development
- template_experiment.ipynb / _v2 / _layered — template matching experiments
- template_finding_and_application.ipynb — end-to-end template workflow
- connect_annotations_to_templates.ipynb — link annotations to template classes
- read_annotations.ipynb — read and explore annotation data
- harmonic_sieve.ipynb / harmonic_tracker.ipynb / sieve.ipynb — harmonic/tonal analysis
- ridge_anaysis_dev.ipynb — ridge analysis development
- Weather_Collection_Demo.ipynb — weather data collection demo

Subprojects (git submodules)

`sa_scheduler/`

SA Projects report runner. Provides DB config (config.toml) and scheduling infrastructure.

`sa_projects_testspace/`

Test workspace for SA project templates and configuration testing.

Data Directories

zarr_data/ — Zarr-format acoustic data
zst_data/ — Zstandard-compressed data
weather_store/ — Weather data (parquet: node-to-station index)
logs/ — Application and scheduler logs
archive/ — Archived/old files

Other Files

pyproject.toml — package config (sa-pipeline v0.1.0)
toc_src.md — table of contents for src modules
test_fib_runnable — compiled test binary (Fibonacci, likely a build-system test)

Pipeline Overview

report_pipeline_2026 — Functionality Overview (March 2026)

Apps (apps/)

Node Inspector (DB) — port 5011

Sound Annotator — port 5012

Template Management — port 5013

TF Rasterizer Browser — port 5015 (default)

SA App Manager

Source Library (src/)

src/sa_config.py — Unified Configuration

src/sa_data/ — Data Access

src/node_day_analysis/ — Node-Day Selection

src/Analysis/ — Acoustic Analysis

src/Templates/ — Template Pipeline

src/Visualization/ — Visualization Helpers

Pipeline Package (sa_pipeline/)

Configuration (config/)

Notebooks (notebooks/)