Architecture¶
The sync engine has three layers:
- Data Layer (
src/lastfm/) - Fetch scrobbles from Last.fm API with pagination and diversity targeting - Processing Layer (
src/recency/,src/search/) - Weight tracks by recency/plays, search and score YouTube Music matches - Sync Layer (
src/playlist/,src/ytm/) - Maintain playlists with minimal API calls using diff-based sync
Cache-First Design¶
All track resolution follows a three-tier priority:
- Manual overrides (
config/search_overrides.json) - user-specified fixes, checked first - Search cache (
cache/.search_cache.json) - previously successful searches (30-day TTL) - YouTube Music API - only queried if both above miss; result is cached
This minimizes API calls and ensures consistent results across runs.
Key Patterns¶
- Atomic writes - all cache saves use temp file + rename
- File locking -
fcntl.flock()prevents concurrent cache corruption - Negative caching - stores
nullresults to avoid repeated failed searches - Template-based sync -
PlaylistCachestores desired state; skips sync if unchanged - Rate limit handling - sleep between searches, retry with exponential backoff
RuntimeContext¶
RuntimeContext (src/context.py) is a dependency-injection dataclass created once per run in _build_context(). It holds all shared state:
| Field | Type | Description |
|---|---|---|
settings |
Settings |
Parsed configuration |
ytm |
YTMusic |
Authenticated client (playlist operations) |
ytm_search |
YTMusic |
Search client (anonymous if USE_ANON_SEARCH=true, otherwise same as ytm) |
search_cache |
SearchCache |
Track → video ID cache |
search_overrides |
SearchOverrides |
Manual overrides + blacklist |
playlist_cache |
PlaylistCache |
Desired playlist state |
tag_cache |
TagCache |
Last.fm tag cache |
tag_overrides |
TagOverrides |
Manual tag fixes |
The dual YTM client pattern keeps search queries out of the user's YouTube search history when anonymous search is enabled, while still using authenticated credentials for playlist operations.
Metrics¶
Both search and playlist operations track API usage for end-of-run logging:
Search metrics (src/search/metrics.py): total queries, songs searched, early terminations, session duration, early termination rate, queries per song, search rate (songs/sec)
Playlist metrics (src/playlist/metrics.py): per-operation counts (get_playlist, add_playlist_items, remove_playlist_items, get_song), total queries, session duration, query rate
Failure & Run Logs¶
Failure log (cache/.last_failure.json): Written by _save_failure_log() when a sync fails. Contains timestamp, error message, traceback, sync type, and an auto-generated hint (e.g., "Authentication expired" for 401 errors, "Rate limited" for 403). The web dashboard reads this to show failure banners with actionable advice.
Run log (cache/.last_run_log.json): Written by _save_run_log() after every successful sync. Stores minimal per-track data (artist, title, source) - the web dashboard enriches this at display time by pulling video IDs and metadata from the search cache. Source values: override, cache, search, blacklisted, not_found.
Main Sync Flow¶
run() (src/main.py) orchestrates the full workflow:
- Build context - authenticate YTM, initialize caches and overrides via
_build_context() - Fetch scrobbles - call
fetch_recent_with_diversity()for diversity-targeted pagination - Weight & dedupe -
collapse_recency_weighted()(if enabled) ordedupe_keep_latest() - Resolve to video IDs -
resolve_tracks_to_video_ids()with three-tier priority - Backfill - if fewer tracks than
LIMIT, fetch more scrobbles and resolve (up toBACKFILL_PASSES) - Reorder - if backfill happened with recency weighting, recalculate scores over the full scrobble set and reorder
- Sync main playlist -
sync_playlist()(existing) orcreate_playlist_with_items()(new), skipped if template unchanged - Sync weekly playlist -
update_weekly_playlist()creates/updates a weekly snapshot - Finalize - clear failure log, save run log, log metrics, fire webhook
Backfill Algorithm¶
When the initial resolve yields fewer video IDs than LIMIT, backfill kicks in:
while len(video_ids) < target_count and current_pass <= BACKFILL_PASSES:
shortage = target_count - len(video_ids)
additional_limit = len(recents) + shortage * 2
# Fetch deeper into history, dedupe against seen_track_keys
# Resolve new tracks, append unique video IDs
- Multi-pass: up to
BACKFILL_PASSES(default 3) attempts - Fetch expansion: requests
shortage * 2extra scrobbles to account for duplicates and misses - Deduplication:
seen_track_keysset prevents re-resolving tracks across passes - Post-backfill reorder: when recency weighting is enabled, the entire scrobble set is re-scored and the playlist reordered by final composite scores
Invalid Video ID Recovery¶
When the YTM API rejects video IDs during sync (400/409 errors):
InvalidVideoIDsErroris raised with the bad IDs_evict_from_cache()removes them from the search cache- The full track list is re-resolved (evicted tracks get fresh searches)
- Sync is retried with the corrected video IDs
Track Resolution Pipeline¶
resolve_tracks_to_video_ids() (src/search/resolver.py) implements the three-tier priority in a single pass over all tracks:
- Blacklist check -
search_overrides.is_blacklisted()→ skip with reason logged - Override lookup -
search_overrides.get()→ use fixed video ID - Cache lookup -
search_cache.get()→ use cached result (including negativeNOT_FOUNDsentinel) - API search -
find_on_ytm()→ cache result, sleepSLEEP_BETWEEN_SEARCHES
Returns (video_ids, misses, track_to_vid, run_log_mappings) for downstream use.
Diversity-Targeted Fetching¶
fetch_recent_with_diversity() (src/lastfm/fetch.py) targets unique tracks, not raw scrobble count:
- Fetches pages of 200 scrobbles at a time
- Counts unique
(artist.lower(), track.lower())pairs after each page - Stops when any of: unique tracks ≥
target_unique, total scrobbles ≥max_raw_limit, ormax_consecutive_emptypages with no new unique tracks - "Now playing" tracks (no timestamp) are filtered out during parsing
Tag fetching uses a separate fetch_track_tags() function that tries track.getTopTags first, falling back to artist.getTopTags if no track-level tags meet the minimum count threshold.
Recency Weighting¶
collapse_recency_weighted() (src/recency/weighting.py) aggregates scrobbles into unique WeightedTrack objects:
Per-track aggregation:
- Groups by
(artist.lower(), track.lower()) - Tracks play count and most recent timestamp per track
Scoring formula:
play_weight(\(w_{\text{play}}\)): default0.7(70% plays, 30% recency)half_life_hours: default24.0- a track's recency score halves every 24 hours- Sorting priority:
(-score, -ts, -plays)for stable ordering
Debug output: logs top 50 tracks with per-track breakdown (play count, normalized score, age, recency score, final composite).