Sound Annotator

A web-based application for annotating soundscape audio clips with acoustic features, built with Python, FastAPI, and HTMX.

Features

Project Management: Create and manage multiple annotation projects
Acoustic Annotations: Multi-timescale annotation system with categorized labels
Audio Playback: Configurable playback duration with server-side trimming
Visual Analysis: Display clip visualizations and acoustic features
Location Mapping: View node locations on maps
Advanced Filtering: Filter clips by node, date, time, and weekday
Auto-hide Header: Maximizes screen space with mouse-activated navigation
Local-First: All data stored in filesystem (no database required)
HTMX-Powered: Dynamic UI updates without page reloads

Requirements

Python 3.8+
FastAPI
Uvicorn
Additional Python packages: jinja2, python-multipart, tomli, tomli-w, zstandard, numpy
ffmpeg (optional, for audio trimming)

Installation

# Install Python dependencies
pip install fastapi uvicorn jinja2 python-multipart tomli tomli-w

# (Optional) Install ffmpeg for audio trimming
# macOS:
brew install ffmpeg

# Ubuntu/Debian:
sudo apt-get install ffmpeg

Quick Start

# Start the server
uvicorn main:app --reload --port 8000

Open your browser and navigate to http://localhost:8000

Project Structure

sa_annotator/
├── main.py                              # FastAPI application
├── config.py                            # Project configuration management
├── models.py                            # Data models
├── utils.py                             # Utility functions
├── app/                                 # Application package
│   ├── dependencies.py                  # Dependency injection
│   ├── services/                        # Business logic layer
│   │   ├── annotation_service.py        # Annotation operations
│   │   ├── project_service.py           # Project management
│   │   └── session_service.py           # Session management
│   └── routers/                         # Future: Route handlers
├── annotation_library/                  # Annotation templates
│   ├── acoustic_annotations_default.toml
│   └── acoustic_annotations_urban_NL.toml
├── templates/                           # HTML templates
│   ├── base.html
│   ├── index.html
│   ├── module1_projects.html
│   ├── module3_browser.html
│   ├── clip_detail.html
│   ├── acoustic_annotation.html
│   └── ...
└── README.md

Usage Guide

Creating a Project

Navigate to Annotation Projects
Provide:
Project name
Audio directory path (containing .mp3 or .ogg files)
Features directory path (containing .pkl feature files)
Optional constraints (TOML format)
Click Create Project

Audio File Naming Convention

Audio files must follow this pattern:

ProjectName_NodeID_DateTime_Duration.mp3

Example: Gilze_Rijen_62_2026-01-26T04:43:20_60s.mp3

Browser Interface

The browser page is divided into two main areas:

Left Sidebar (1/5 width):
- Map: Node location visualization
- Player: Audio playback with clip information and navigation
- Filters: Filter clips by node, date, time, weekday (collapsible)
- Available Clips: Filtered clip list with count (collapsible)

Main Area (4/5 width):
- Clip visualization image (if available)
- Acoustic annotations interface
- Acoustic features (collapsible)

Acoustic Annotations System

Annotations are organized by timescale and category:

Timescales:
- < 1 Second: Short, discrete sounds (e.g., footsteps, clicks)
- < 1 Minute: Brief events (e.g., conversation, door activity)
- > 1 Minute: Extended events (e.g., traffic patterns, wind)
- Other Descriptors: Continuous characteristics (e.g., urban ambience)

Categories (default):
- Human
- Mechanical
- Natural
- Transport

Multi-Session Annotation

Multiple annotators can work on the same project simultaneously:

Set Session ID: On the Projects page, enter a unique session ID (e.g., your name)
Continue Session: Select from existing sessions to resume previous work
Independent Annotations: Each session's annotations are stored separately
Session Indicator: Browser page shows current active session

All annotations are tagged with the session ID, allowing multiple perspectives on the same clips.

Annotating Clips

Select labels: Click available labels to add them (they turn green with ✓)
Deselect labels: Click selected labels to remove them
Add custom labels:
Expand "Add Annotation" in left sidebar
Type label name
Select timescale (< 1 Second, < 1 Minute, etc.)
Select category (Human, Mechanical, Natural, Transport)
Click "Add & Select"
New label is automatically selected and saved to project config

Navigation

Click clips in the sidebar list to load them
Use Previous/Next buttons in the player section
Navigation respects active filters
First clip loads automatically on page open

Filtering

Apply filters to narrow down clips:
- Node ID: Specific node number
- Date From/To: Date range
- Time From/To: Time of day range
- Weekday: Specific day of week

Click Apply to update the clip list. The clip count updates automatically.

Auto-hide Header

The top navigation bar auto-hides to maximize screen space:
- Move mouse to top of screen to reveal
- Header stays visible while hovering
- Hides automatically when mouse moves away

Data Storage

Project Directory Structure

project_directory/
├── project.toml                    # Project configuration
├── acoustic_annotations.toml       # Project-specific labels (optional)
├── acoustic_annotations.ndjson     # Annotation data
└── clips.csv                       # Clip metadata

Acoustic Annotation Format (NDJSON)

Annotations are stored as one JSON object per line:

{
  "clip_id": "Gilze_Rijen_62_2026-01-26T04:43:20_60s",
  "session_id": "alice",
  "second": [
    {"label": "cough", "category": "human"},
    {"label": "footsteps", "category": "human"}
  ],
  "minute": [
    {"label": "conversation", "category": "human"},
    {"label": "door activity", "category": "human"}
  ],
  "hour": [
    {"label": "traffic hum", "category": "transport"}
  ],
  "other": [
    {"label": "urban ambience", "category": "human"}
  ]
}

Each annotation includes:
- clip_id: Unique clip identifier
- session_id: Annotator session identifier (for multi-user workflows)
- second, minute, hour, other: Arrays of label objects
- Each label: {"label": "...", "category": "..."}

Project Configuration (TOML)

project.toml:

[metadata]
name = "my_project"
created = "2026-01-26T10:00:00Z"

[paths]
audio_dir = "/path/to/audio"
features_dir = "/path/to/features"

playback_duration = 30

Acoustic Annotations Configuration

acoustic_annotations_default.toml defines the default label set:

[second.human]
labels = ["footsteps", "handclap", "cough", "laugh", "scream", "whistle"]

[second.mechanical]
labels = ["click", "knock", "hammer", "drill burst", "door slam"]

[second.natural]
labels = ["birdsong", "bird call", "insect chirp", "leaf rustle"]

[second.transport]
labels = ["car horn", "bicycle bell", "brake squeal", "tire screech"]

# ... more timescales (minute, hour, other)

Each project can have acoustic_annotations.toml that extends or overrides defaults.

Customization

Adding Custom Categories

Edit acoustic_annotations_default.toml or create project-specific config:

[second.custom_category]
labels = ["label1", "label2", "label3"]

The system automatically discovers all categories in the configuration.

Adding Custom Timescales

Add new timescale sections:

[microsecond]
[microsecond.human]
labels = [...]

Playback Duration

Configure in project settings:
- Default: 30 seconds
- Audio trimmed server-side using ffmpeg
- Reduces bandwidth and loading time

Technical Details

Technology Stack

Backend: FastAPI (Python)
Frontend: HTMX + Vanilla JavaScript
Templates: Jinja2
Styling: Inline CSS (no framework)
Audio: HTML5 Audio API
Configuration: TOML

Architecture

Service Layer: Clean separation of business logic
AnnotationService: All annotation operations
ProjectService: Project and clip management
SessionService: Multi-user session handling
Dependency Injection: FastAPI Depends() for state management
MVC Pattern: Models, utilities, templates clearly separated

Key Features

HTMX: Dynamic updates without page reloads
Out-of-band swaps: Update multiple page sections simultaneously
Sticky sidebar: Keeps navigation accessible while scrolling
Responsive layout: Grid-based (1:4 ratio)
Collapsible sections: Filters and clips list
Dynamic categories: Reads from configuration files
Template System: Multiple annotation templates (default, urban_NL, etc.)
Multi-session: Multiple annotators can work on same project

Development

Running in Development Mode

uvicorn main:app --reload --port 8000

The --reload flag enables auto-restart on code changes.

Adding New Features

Add business logic to appropriate service in app/services/
Create template in templates/
Add route in main.py (use dependency injection)
Update navigation in base.html

Code Architecture Guide

For detailed information about the refactored architecture, see:
- REFACTORING_PHASE1.md - Complete guide to service layer and migration patterns

Best practices:
- Use services for business logic (not direct utils calls)
- Use dependency injection (Depends()) for project manager and session
- Keep routes thin - logic belongs in services
- Services are testable - they don't depend on FastAPI

Troubleshooting

Audio won't play

Check audio files exist in configured directory
Verify file naming follows expected pattern
Check browser console for errors

Clips don't load

Verify clips.csv exists in project directory
Check constraint filters aren't too restrictive
Review server logs for errors

Map doesn't show

Map images should be in maps/ directory (relative to audio directory)
Format: map_node_{NodeID}.png

License

[Specify your license]