README.md · last modified 2026-03-18 11:36
A web-based application for annotating soundscape audio clips with acoustic features, built with Python, FastAPI, and HTMX.
# Install Python dependencies
pip install fastapi uvicorn jinja2 python-multipart tomli tomli-w
# (Optional) Install ffmpeg for audio trimming
# macOS:
brew install ffmpeg
# Ubuntu/Debian:
sudo apt-get install ffmpeg
# Start the server
uvicorn main:app --reload --port 8000
Open your browser and navigate to http://localhost:8000
sa_annotator/
├── main.py # FastAPI application
├── config.py # Project configuration management
├── models.py # Data models
├── utils.py # Utility functions
├── app/ # Application package
│ ├── dependencies.py # Dependency injection
│ ├── services/ # Business logic layer
│ │ ├── annotation_service.py # Annotation operations
│ │ ├── project_service.py # Project management
│ │ └── session_service.py # Session management
│ └── routers/ # Future: Route handlers
├── annotation_library/ # Annotation templates
│ ├── acoustic_annotations_default.toml
│ └── acoustic_annotations_urban_NL.toml
├── templates/ # HTML templates
│ ├── base.html
│ ├── index.html
│ ├── module1_projects.html
│ ├── module3_browser.html
│ ├── clip_detail.html
│ ├── acoustic_annotation.html
│ └── ...
└── README.md
Audio files must follow this pattern:
ProjectName_NodeID_DateTime_Duration.mp3
Example: Gilze_Rijen_62_2026-01-26T04:43:20_60s.mp3
The browser page is divided into two main areas:
Left Sidebar (1/5 width):
- Map: Node location visualization
- Player: Audio playback with clip information and navigation
- Filters: Filter clips by node, date, time, weekday (collapsible)
- Available Clips: Filtered clip list with count (collapsible)
Main Area (4/5 width):
- Clip visualization image (if available)
- Acoustic annotations interface
- Acoustic features (collapsible)
Annotations are organized by timescale and category:
Timescales:
- < 1 Second: Short, discrete sounds (e.g., footsteps, clicks)
- < 1 Minute: Brief events (e.g., conversation, door activity)
- > 1 Minute: Extended events (e.g., traffic patterns, wind)
- Other Descriptors: Continuous characteristics (e.g., urban ambience)
Categories (default):
- Human
- Mechanical
- Natural
- Transport
Multiple annotators can work on the same project simultaneously:
All annotations are tagged with the session ID, allowing multiple perspectives on the same clips.
Apply filters to narrow down clips:
- Node ID: Specific node number
- Date From/To: Date range
- Time From/To: Time of day range
- Weekday: Specific day of week
Click Apply to update the clip list. The clip count updates automatically.
The top navigation bar auto-hides to maximize screen space:
- Move mouse to top of screen to reveal
- Header stays visible while hovering
- Hides automatically when mouse moves away
project_directory/
├── project.toml # Project configuration
├── acoustic_annotations.toml # Project-specific labels (optional)
├── acoustic_annotations.ndjson # Annotation data
└── clips.csv # Clip metadata
Annotations are stored as one JSON object per line:
{
"clip_id": "Gilze_Rijen_62_2026-01-26T04:43:20_60s",
"session_id": "alice",
"second": [
{"label": "cough", "category": "human"},
{"label": "footsteps", "category": "human"}
],
"minute": [
{"label": "conversation", "category": "human"},
{"label": "door activity", "category": "human"}
],
"hour": [
{"label": "traffic hum", "category": "transport"}
],
"other": [
{"label": "urban ambience", "category": "human"}
]
}
Each annotation includes:
- clip_id: Unique clip identifier
- session_id: Annotator session identifier (for multi-user workflows)
- second, minute, hour, other: Arrays of label objects
- Each label: {"label": "...", "category": "..."}
project.toml:
[metadata]
name = "my_project"
created = "2026-01-26T10:00:00Z"
[paths]
audio_dir = "/path/to/audio"
features_dir = "/path/to/features"
playback_duration = 30
acoustic_annotations_default.toml defines the default label set:
[second.human]
labels = ["footsteps", "handclap", "cough", "laugh", "scream", "whistle"]
[second.mechanical]
labels = ["click", "knock", "hammer", "drill burst", "door slam"]
[second.natural]
labels = ["birdsong", "bird call", "insect chirp", "leaf rustle"]
[second.transport]
labels = ["car horn", "bicycle bell", "brake squeal", "tire screech"]
# ... more timescales (minute, hour, other)
Each project can have acoustic_annotations.toml that extends or overrides defaults.
Edit acoustic_annotations_default.toml or create project-specific config:
[second.custom_category]
labels = ["label1", "label2", "label3"]
The system automatically discovers all categories in the configuration.
Add new timescale sections:
[microsecond]
[microsecond.human]
labels = [...]
Configure in project settings:
- Default: 30 seconds
- Audio trimmed server-side using ffmpeg
- Reduces bandwidth and loading time
AnnotationService: All annotation operationsProjectService: Project and clip managementSessionService: Multi-user session handlinguvicorn main:app --reload --port 8000
The --reload flag enables auto-restart on code changes.
app/services/templates/main.py (use dependency injection)base.htmlFor detailed information about the refactored architecture, see:
- REFACTORING_PHASE1.md - Complete guide to service layer and migration patterns
Best practices:
- Use services for business logic (not direct utils calls)
- Use dependency injection (Depends()) for project manager and session
- Keep routes thin - logic belongs in services
- Services are testable - they don't depend on FastAPI
clips.csv exists in project directorymaps/ directory (relative to audio directory)map_node_{NodeID}.png[Specify your license]