Sound Annotator

← Dashboard

README.md · last modified 2026-03-18 11:36

Sound Annotator

A web-based application for annotating soundscape audio clips with acoustic features, built with Python, FastAPI, and HTMX.

Features

Requirements

Installation

# Install Python dependencies
pip install fastapi uvicorn jinja2 python-multipart tomli tomli-w

# (Optional) Install ffmpeg for audio trimming
# macOS:
brew install ffmpeg

# Ubuntu/Debian:
sudo apt-get install ffmpeg

Quick Start

# Start the server
uvicorn main:app --reload --port 8000

Open your browser and navigate to http://localhost:8000

Project Structure

sa_annotator/
├── main.py                              # FastAPI application
├── config.py                            # Project configuration management
├── models.py                            # Data models
├── utils.py                             # Utility functions
├── app/                                 # Application package
│   ├── dependencies.py                  # Dependency injection
│   ├── services/                        # Business logic layer
│   │   ├── annotation_service.py        # Annotation operations
│   │   ├── project_service.py           # Project management
│   │   └── session_service.py           # Session management
│   └── routers/                         # Future: Route handlers
├── annotation_library/                  # Annotation templates
│   ├── acoustic_annotations_default.toml
│   └── acoustic_annotations_urban_NL.toml
├── templates/                           # HTML templates
│   ├── base.html
│   ├── index.html
│   ├── module1_projects.html
│   ├── module3_browser.html
│   ├── clip_detail.html
│   ├── acoustic_annotation.html
│   └── ...
└── README.md

Usage Guide

Creating a Project

  1. Navigate to Annotation Projects
  2. Provide:
  3. Project name
  4. Audio directory path (containing .mp3 or .ogg files)
  5. Features directory path (containing .pkl feature files)
  6. Optional constraints (TOML format)
  7. Click Create Project

Audio File Naming Convention

Audio files must follow this pattern:

ProjectName_NodeID_DateTime_Duration.mp3

Example: Gilze_Rijen_62_2026-01-26T04:43:20_60s.mp3

Browser Interface

The browser page is divided into two main areas:

Left Sidebar (1/5 width):
- Map: Node location visualization
- Player: Audio playback with clip information and navigation
- Filters: Filter clips by node, date, time, weekday (collapsible)
- Available Clips: Filtered clip list with count (collapsible)

Main Area (4/5 width):
- Clip visualization image (if available)
- Acoustic annotations interface
- Acoustic features (collapsible)

Acoustic Annotations System

Annotations are organized by timescale and category:

Timescales:
- < 1 Second: Short, discrete sounds (e.g., footsteps, clicks)
- < 1 Minute: Brief events (e.g., conversation, door activity)
- > 1 Minute: Extended events (e.g., traffic patterns, wind)
- Other Descriptors: Continuous characteristics (e.g., urban ambience)

Categories (default):
- Human
- Mechanical
- Natural
- Transport

Multi-Session Annotation

Multiple annotators can work on the same project simultaneously:

  1. Set Session ID: On the Projects page, enter a unique session ID (e.g., your name)
  2. Continue Session: Select from existing sessions to resume previous work
  3. Independent Annotations: Each session's annotations are stored separately
  4. Session Indicator: Browser page shows current active session

All annotations are tagged with the session ID, allowing multiple perspectives on the same clips.

Annotating Clips

  1. Select labels: Click available labels to add them (they turn green with ✓)
  2. Deselect labels: Click selected labels to remove them
  3. Add custom labels:
  4. Expand "Add Annotation" in left sidebar
  5. Type label name
  6. Select timescale (< 1 Second, < 1 Minute, etc.)
  7. Select category (Human, Mechanical, Natural, Transport)
  8. Click "Add & Select"
  9. New label is automatically selected and saved to project config

Navigation

Filtering

Apply filters to narrow down clips:
- Node ID: Specific node number
- Date From/To: Date range
- Time From/To: Time of day range
- Weekday: Specific day of week

Click Apply to update the clip list. The clip count updates automatically.

Auto-hide Header

The top navigation bar auto-hides to maximize screen space:
- Move mouse to top of screen to reveal
- Header stays visible while hovering
- Hides automatically when mouse moves away

Data Storage

Project Directory Structure

project_directory/
├── project.toml                    # Project configuration
├── acoustic_annotations.toml       # Project-specific labels (optional)
├── acoustic_annotations.ndjson     # Annotation data
└── clips.csv                       # Clip metadata

Acoustic Annotation Format (NDJSON)

Annotations are stored as one JSON object per line:

{
  "clip_id": "Gilze_Rijen_62_2026-01-26T04:43:20_60s",
  "session_id": "alice",
  "second": [
    {"label": "cough", "category": "human"},
    {"label": "footsteps", "category": "human"}
  ],
  "minute": [
    {"label": "conversation", "category": "human"},
    {"label": "door activity", "category": "human"}
  ],
  "hour": [
    {"label": "traffic hum", "category": "transport"}
  ],
  "other": [
    {"label": "urban ambience", "category": "human"}
  ]
}

Each annotation includes:
- clip_id: Unique clip identifier
- session_id: Annotator session identifier (for multi-user workflows)
- second, minute, hour, other: Arrays of label objects
- Each label: {"label": "...", "category": "..."}

Project Configuration (TOML)

project.toml:

[metadata]
name = "my_project"
created = "2026-01-26T10:00:00Z"

[paths]
audio_dir = "/path/to/audio"
features_dir = "/path/to/features"

playback_duration = 30

Acoustic Annotations Configuration

acoustic_annotations_default.toml defines the default label set:

[second.human]
labels = ["footsteps", "handclap", "cough", "laugh", "scream", "whistle"]

[second.mechanical]
labels = ["click", "knock", "hammer", "drill burst", "door slam"]

[second.natural]
labels = ["birdsong", "bird call", "insect chirp", "leaf rustle"]

[second.transport]
labels = ["car horn", "bicycle bell", "brake squeal", "tire screech"]

# ... more timescales (minute, hour, other)

Each project can have acoustic_annotations.toml that extends or overrides defaults.

Customization

Adding Custom Categories

Edit acoustic_annotations_default.toml or create project-specific config:

[second.custom_category]
labels = ["label1", "label2", "label3"]

The system automatically discovers all categories in the configuration.

Adding Custom Timescales

Add new timescale sections:

[microsecond]
[microsecond.human]
labels = [...]

Playback Duration

Configure in project settings:
- Default: 30 seconds
- Audio trimmed server-side using ffmpeg
- Reduces bandwidth and loading time

Technical Details

Technology Stack

Architecture

Key Features

Development

Running in Development Mode

uvicorn main:app --reload --port 8000

The --reload flag enables auto-restart on code changes.

Adding New Features

  1. Add business logic to appropriate service in app/services/
  2. Create template in templates/
  3. Add route in main.py (use dependency injection)
  4. Update navigation in base.html

Code Architecture Guide

For detailed information about the refactored architecture, see:
- REFACTORING_PHASE1.md - Complete guide to service layer and migration patterns

Best practices:
- Use services for business logic (not direct utils calls)
- Use dependency injection (Depends()) for project manager and session
- Keep routes thin - logic belongs in services
- Services are testable - they don't depend on FastAPI

Troubleshooting

Audio won't play

Clips don't load

Map doesn't show

License

[Specify your license]