Architecture¶

The sync engine has three layers:

Data Layer (src/lastfm/) - Fetch scrobbles from Last.fm API with pagination and diversity targeting
Processing Layer (src/recency/, src/search/) - Weight tracks by recency/plays, search and score YouTube Music matches
Sync Layer (src/playlist/, src/ytm/) - Maintain playlists with minimal API calls using diff-based sync

Cache-First Design¶

All track resolution follows a three-tier priority:

Manual overrides (config/search_overrides.json) - user-specified fixes, checked first
Search cache (cache/.search_cache.json) - previously successful searches (30-day TTL)
YouTube Music API - only queried if both above miss; result is cached

This minimizes API calls and ensures consistent results across runs.

Key Patterns¶

Atomic writes - all cache saves use temp file + rename
File locking - fcntl.flock() prevents concurrent cache corruption
Negative caching - stores null results to avoid repeated failed searches
Template-based sync - PlaylistCache stores desired state; skips sync if unchanged
Rate limit handling - sleep between searches, retry with exponential backoff

RuntimeContext¶

RuntimeContext (src/context.py) is a dependency-injection dataclass created once per run in _build_context(). It holds all shared state:

Field	Type	Description
`settings`	`Settings`	Parsed configuration
`ytm`	`YTMusic`	Authenticated client (playlist operations)
`ytm_search`	`YTMusic`	Search client (anonymous if `USE_ANON_SEARCH=true`, otherwise same as `ytm`)
`search_cache`	`SearchCache`	Track → video ID cache
`search_overrides`	`SearchOverrides`	Manual overrides + blacklist
`playlist_cache`	`PlaylistCache`	Desired playlist state
`tag_cache`	`TagCache`	Last.fm tag cache
`tag_overrides`	`TagOverrides`	Manual tag fixes

The dual YTM client pattern keeps search queries out of the user's YouTube search history when anonymous search is enabled, while still using authenticated credentials for playlist operations.

Metrics¶

Both search and playlist operations track API usage for end-of-run logging:

Search metrics (src/search/metrics.py): total queries, songs searched, early terminations, session duration, early termination rate, queries per song, search rate (songs/sec)

Playlist metrics (src/playlist/metrics.py): per-operation counts (get_playlist, add_playlist_items, remove_playlist_items, get_song), total queries, session duration, query rate

Failure & Run Logs¶

Failure log (cache/.last_failure.json): Written by _save_failure_log() when a sync fails. Contains timestamp, error message, traceback, sync type, and an auto-generated hint (e.g., "Authentication expired" for 401 errors, "Rate limited" for 403). The web dashboard reads this to show failure banners with actionable advice.

Run log (cache/.last_run_log.json): Written by _save_run_log() after every successful sync. Stores minimal per-track data (artist, title, source) - the web dashboard enriches this at display time by pulling video IDs and metadata from the search cache. Source values: override, cache, search, blacklisted, not_found.

Main Sync Flow¶

run() (src/main.py) orchestrates the full workflow:

Build context - authenticate YTM, initialize caches and overrides via _build_context()
Fetch scrobbles - call fetch_recent_with_diversity() for diversity-targeted pagination
Weight & dedupe - collapse_recency_weighted() (if enabled) or dedupe_keep_latest()
Resolve to video IDs - resolve_tracks_to_video_ids() with three-tier priority
Backfill - if fewer tracks than LIMIT, fetch more scrobbles and resolve (up to BACKFILL_PASSES)
Reorder - if backfill happened with recency weighting, recalculate scores over the full scrobble set and reorder
Sync main playlist - sync_playlist() (existing) or create_playlist_with_items() (new), skipped if template unchanged
Sync weekly playlist - update_weekly_playlist() creates/updates a weekly snapshot
Finalize - clear failure log, save run log, log metrics, fire webhook

Backfill Algorithm¶

When the initial resolve yields fewer video IDs than LIMIT, backfill kicks in:

while len(video_ids) < target_count and current_pass <= BACKFILL_PASSES:
    shortage = target_count - len(video_ids)
    additional_limit = len(recents) + shortage * 2
    # Fetch deeper into history, dedupe against seen_track_keys
    # Resolve new tracks, append unique video IDs

Multi-pass: up to BACKFILL_PASSES (default 3) attempts
Fetch expansion: requests shortage * 2 extra scrobbles to account for duplicates and misses
Deduplication: seen_track_keys set prevents re-resolving tracks across passes
Post-backfill reorder: when recency weighting is enabled, the entire scrobble set is re-scored and the playlist reordered by final composite scores

Invalid Video ID Recovery¶

When the YTM API rejects video IDs during sync (400/409 errors):

InvalidVideoIDsError is raised with the bad IDs
_evict_from_cache() removes them from the search cache
The full track list is re-resolved (evicted tracks get fresh searches)
Sync is retried with the corrected video IDs

Track Resolution Pipeline¶

resolve_tracks_to_video_ids() (src/search/resolver.py) implements the three-tier priority in a single pass over all tracks:

Blacklist check - search_overrides.is_blacklisted() → skip with reason logged
Override lookup - search_overrides.get() → use fixed video ID
Cache lookup - search_cache.get() → use cached result (including negative NOT_FOUND sentinel)
API search - find_on_ytm() → cache result, sleep SLEEP_BETWEEN_SEARCHES

Returns (video_ids, misses, track_to_vid, run_log_mappings) for downstream use.

Diversity-Targeted Fetching¶

fetch_recent_with_diversity() (src/lastfm/fetch.py) targets unique tracks, not raw scrobble count:

Fetches pages of 200 scrobbles at a time
Counts unique (artist.lower(), track.lower()) pairs after each page
Stops when any of: unique tracks ≥ target_unique, total scrobbles ≥ max_raw_limit, or max_consecutive_empty pages with no new unique tracks
"Now playing" tracks (no timestamp) are filtered out during parsing

Tag fetching uses a separate fetch_track_tags() function that tries track.getTopTags first, falling back to artist.getTopTags if no track-level tags meet the minimum count threshold.

Recency Weighting¶

collapse_recency_weighted() (src/recency/weighting.py) aggregates scrobbles into unique WeightedTrack objects:

Per-track aggregation:

Groups by (artist.lower(), track.lower())
Tracks play count and most recent timestamp per track

Scoring formula:

\[\text{score} = w_{\text{play}} \times \frac{\text{plays}}{\text{max_plays}} + (1 - w_{\text{play}}) \times 0.5^{\text{age_hours} / \text{half_life}}\]

play_weight (\(w_{\text{play}}\)): default 0.7 (70% plays, 30% recency)
half_life_hours: default 24.0 - a track's recency score halves every 24 hours
Sorting priority: (-score, -ts, -plays) for stable ordering

Debug output: logs top 50 tracks with per-track breakdown (play count, normalized score, age, recency score, final composite).