#1555 - Use plural ALBUMARTISTS / ARTISTS as authoritative source of individual contributors#1557
#1555 - Use plural ALBUMARTISTS / ARTISTS as authoritative source of individual contributors#1557Rouzax wants to merge 2 commits intoLMS-Community:public/9.2from
Conversation
2cd9ea5 to
f2dd005
Compare
…and MBID counts differ Slim::Schema::Contributor::add splits the artist name tag and the MusicBrainz ID tag independently and zips them positionally. When the lists differ in length (e.g. a joined-display "Artist A & Artist B" with 2 MBIDs, or "Artist A ft. Artist B" with 1 name and 2 MBIDs), the first MBID was bound to a joined-display name, poisoning the contributors row. Subsequent tracks whose MBID matched that row then inherited the wrong display name (see also the earlier report in LMS-Community#1052). Guard against the mismatch and drop the ambiguous MBIDs so the row is created without an MBID binding. Aligned Picard-convention tags are unaffected. Already-poisoned libraries need a wipecache. This is a defensive stop-gap, not a structural fix. Issue LMS-Community#1555 remains open: the proper resolution is to read the plural ARTISTS / ALBUMARTISTS Picard tags as the indexing source of truth, which will be a separate follow-up. Signed-off-by: Rouzax <GitHub@mgdn.nl>
…ve source of individual contributors When Picard-style plural tags (ALBUMARTISTS, ARTISTS) are present as arrayrefs on the scanned $attributes hash, use them as the authoritative list of individual contributors rather than parsing the joined singular ALBUMARTIST / ARTIST display string with splitList. The singular tag remains the display string on the track / album row. This is the structural follow-up to the stop-gap in 0dbbb98. With aligned plural tags now feeding Slim::Schema::Contributor->add as arrayrefs, the positional MBID zip is no longer a guess: it is an index provided by the tagger. The length-mismatch guard from the stop-gap stays as the safety net for the legacy single-string path. - Slim/Schema.pm _preCheckAttributes: defer ARTISTS, ALBUMARTISTS, TRACKARTISTS so they reach _mergeAndCreateContributors via the newTrack path (previously they were left in the regular attributes hash and lost before contributor processing). - Slim/Schema.pm _mergeAndCreateContributors: prefer $attributes->{$tag.'S'} as the contributor source when it is a non-empty arrayref, for the ARTIST / ALBUMARTIST / TRACKARTIST roles only. Other roles (COMPOSER, BAND, CONDUCTOR, ...) are unchanged. Trim plural names, drop empty-name slots and their aligned MBID slot, tolerate arrayref $contributor in the whitespace-strip step, and loosen the early skip so a track with plural but no singular still produces rows. Also accept an aligned plural sort arrayref (e.g. user-scripted multi-value ALBUMARTISTSORT/ARTISTSORT from %_albumartists_sort% / %_artists_sort%) and pass it positionally to Contributor::add; otherwise fall back to sort-by-name rather than polluting the first contributor with the joined singular sort string. - Slim/Schema.pm TRACKARTIST compilation transform (bug 2317 / 2638): also migrate the plural ARTISTS tag to TRACKARTISTS, symmetric with the existing ARTIST and MUSICBRAINZ_ARTIST_ID migration. Picard tag mapping reference: https://picard-docs.musicbrainz.org/en/latest/appendices/tag_mapping.html Backwards compatibility: files without plural tags, or with plural tags as scalars, fall through to the existing splitList path. No schema change. No format-parser change. Already-poisoned libraries from pre-stop-gap scans need a wipecache. Users who tuned splitList to work around the absence of plural-tag reading may find it is no longer needed, but it is not removed. Known limitation for MP3: Audio::Scan returns only the first value of multi-value ID3v2.4 TXXX text frames (which is how Picard writes plural tags into MP3 files). Until that is addressed, MP3 files see the poisoning gone but do not get the full multi-contributor benefit. FLAC, Ogg, Opus and MP4 are unaffected and gain the full benefit. Signed-off-by: Rouzax <GitHub@mgdn.nl>
f2dd005 to
0bfb154
Compare
| # Bug 17322, strip leading/trailing spaces from name. splitTag | ||
| # already trims arrayref elements, so the regex strip is scalar | ||
| # only. | ||
| unless (ref $contributor) { |
There was a problem hiding this comment.
I don't like how some variables can be either a scalar or a list ref. Can't we have separate variables for them? And if they were list refs, wouldn't we have to run the cleanup on all of its items?
I see this pattern in other places, like line 3135.
There was a problem hiding this comment.
Thanks, both fair points. Taking them separately:
Scalar-or-arrayref variables. Agreed that it's awkward, but the polymorphism predates this PR: $contributor = $attributes->{$tag} has always been able to be either shape depending on the tag reader, and the ref(...) eq 'ARRAY' ? join(...) : $scalar pattern (e.g. the debug log at the old line 3135 you pointed at) is legacy. Splitting into typed variables ($contributorScalar vs $contributorList, same for $brainzID / $sortBy) is a worthwhile cleanup but it touches the whole _mergeAndCreateContributors loop and the Contributor->add call shape. I'd prefer to keep this PR scoped to #1555 and open a follow-up refactor issue for the variable split. Happy to file that and link it from here if you agree.
Cleanup on arrayref items. This one is a real gap and I'll fix it in this PR. The existing unless (ref $contributor) guard relies on the older "splitTag already trims arrayref elements" invariant (see the comment just above it). The new plural-tag path I added trims explicitly at the build step, so arrayrefs from that source are clean, but an arrayref coming through the legacy path isn't guaranteed to be. I'll replace the scalar-only strip with a shape-aware version:
if (ref $contributor eq 'ARRAY') {
for my $n (@$contributor) {
next unless defined $n;
$n =~ s/^\s+//;
$n =~ s/\s+$//;
}
} else {
$contributor =~ s/^ +//;
$contributor =~ s/ +$//;
}That way the cleanup runs regardless of which path populated $contributor.
Re: the line 3135 reference, that one is a debug-log formatter rather than a cleanup site, so I don't think it needs a behavioural change, but I'll fold it into the follow-up refactor along with the other polymorphism sites.
| my @mbids = ref($pluralMbid) eq 'ARRAY' ? @$pluralMbid : (); | ||
| my @sorts = ref($pluralSort) eq 'ARRAY' ? @$pluralSort : (); |
There was a problem hiding this comment.
Why would you de-reference these two variables and not just use them directly?
There was a problem hiding this comment.
The dereference is there to normalise the shape once so the rest of the loop can treat it as a plain list. Two concrete reasons:
$attributes->{"MUSICBRAINZ_${tag}_ID"}/...SORTcan arrive as a scalar (single MBID from a legacy singular tag), an arrayref (Picard multi-value), or undef, independently of whether the plural names tag is an arrayref. Flattening to@mbids/@sortscollapses all the "not a usable list" cases to an empty array, so$mbids[$i]at line 3189 is safe regardless of input shape.- The boolean checks
if @mbids(3189) and@mbids ? ... : $brainzID(3194) mean "has at least one element". Using$pluralMbiddirectly has the wrong truthiness: an empty arrayref is truthy, soif ($pluralMbid)would wrongly enter the branch for[]. I'd needref($pluralMbid) eq 'ARRAY' && @$pluralMbidat every use site.
If you'd rather keep arrayrefs throughout for consistency with the rest of the function I'm happy to switch, but the guards at every use site would get noisier.
There was a problem hiding this comment.
PS: I'm signing out for the day and will be away the coming days so won't be able to respond directly
|
Per my comment on your issue #1555, I believe there is more to consider than just the scanner (if I am understanding your proposal here. I added my thoughts in reply to your forum post here (for the broader audience to pick apart): |
|
Pausing this PR while a design question plays out on the forum. Summary below; full discussion and feedback is on the forum thread referenced in #1555. What I got wrong: PR body design note 2 claimed "skin-visible names are unchanged". I verified the DB contents after the scanner changes but not the rendered output. Tested the live JSON API against fixture 01 ( Why the regression exists (verified in code, not speculated):
There is no existing column on What is already fine: Classic Skin templates ( Proposed redesign (open for feedback):
This supports the two use cases raised on the forum: the common "Bill Evans Trio with Scott LaFaro & Paul Motian" migration case where the joined credit is the display string and the three individuals are the indexable contributors; and the "Prince" vs "Prince and the Revolution" case where the user deliberately diverges index from display. Design questions are on the forum thread (linked from #1555). Looking for feedback on column naming, storage policy (always-store vs store-only-when-different), and scope before writing code. Status of the two PRs:
|
|
See my comments on the forum: https://forums.lyrion.org/forum/user-forums/ripping-encoding-transcoding-tagging/1816764-lms-metadata-scaan-overrides-local-tags?p=1819451#post1819451 I'm not trying to pour cold water on this, just making sure we've thought everything through. |
Please do, I thought I had thought about everything but @mikeysas already pointed out a big gap I missed. |
Summary
Structural fix for #1555. Reads the Picard-convention plural tags
ALBUMARTISTSandARTISTSas arrayrefs and uses them as the authoritative individual-contributor list, so the name <-> MBID pairing stops depending onsplitListheuristics over joined display strings. The singularALBUMARTIST/ARTISTtag remains the album / track display string.Stacks on the stop-gap PR #1556 (the length-mismatch guard in
Contributor::add). The guard remains in place as the safety net for the legacy single-string fall-back path and for files where the plural tag is absent or scalar.Contributes to #1555 (does not close it; closing waits until both PRs land). Also addresses the long-standing symptom reported in the closed #1052.
References:
This is a behavioural change to the scanner. Design feedback welcome before merge. Specifically: the plural-detection logic lives inside the contributor loop and keys off
$tag . 'S'(only firing for ARTIST / ALBUMARTIST / TRACKARTIST). An alternative is a pre-loop expansion pass that unwraps the plural arrayref onto the singular slot. I picked the in-loop form because it keeps the TRACKARTIST compilation transform the only place that mutates$attributespre-loop. Happy to refactor either way.Scope of benefit (read this first)
Confirmed against the Picard docs:
ARTISTS(plural track artists) is written by default by Picard (since Picard 2.x, as a standard tag across Vorbis / ID3v2 TXXX / MP4 freeform / ASF). So the track-level ARTIST role gets the full plural-path benefit for any Picard-tagged library without user configuration. This is the primary user-visible win, and it covers the "Artist A ft. Artist B" track-credit case that prompted the follow-up report on Scanner binds unaligned MusicBrainz album-artist IDs to joined-display contributor, poisoning later albums #1555.ALBUMARTISTSis NOT a standard Picard tag. It is not in Picard's tag mapping. Users who want it must add a small Picard script ($setmulti(albumartists,%_albumartists%)) or use another tagger / workflow that writes it (Jellyfin- and Navidrome-friendly Mp3tag templates, beets with the mb plugin, etc.). For vanilla-Picard libraries the ALBUMARTIST role falls back to the legacy singular path, which is still protected from MBID poisoning by #1555 - Stop-gap: drop MusicBrainz IDs when artist name and MBID counts differ #1556's length-mismatch guard but does not get the per-individual split.We cannot recover the per-individual ALBUMARTIST split from LMS's side when the plural tag is absent: Picard's join phrases are configurable, and guessing from the singular joined string is exactly what
splitListalready does best-effort. This PR is forward-compatible: as soon as a user's library carries the tag, the fix kicks in.Design notes
$attributes->{$tag.'S'}is a non-empty arrayref with at least one non-blank entry. A scalar plural value falls through to the legacy singular path. Single-element plural arrayref counts (Picard signal "this is one authoritative individual").ALBUMARTIST/ARTISTtag remains the album / track display string. Skin-visible names are unchanged.%_albumartists_sort%and%_artists_sort%do carry per-individual sort data from MusicBrainz, and users can script those onto the existingALBUMARTISTSORT/ARTISTSORTtags as multi-value content (e.g.$setmulti(albumartistsort,%_albumartists_sort%)). When the plural path is used, this PR checks whether the sort tag is an arrayref whose length matches the filtered plural names and, if so, uses it positionally for per-individual sort (e.g.van Buuren, Arminfor Armin). Otherwise it passessortBy => undef, which falls back to sort-by-name. Passing the joined singular sort string would pollute only the first contributor's sort with the whole joined value and leave the rest falling back to name; the symmetric name-fallback is strictly better.Contributor::add's existingif ($mbid)guard handles empty slots. Note: LMS's existingSlim::Formats::sanitizeTagValuesrejects anyMUSICBRAINZ_*IDtag containing a non-UUID element, including empty strings, and strips the whole tag. That stripping is pre-existing and orthogonal to this PR; a separate sanitizer relaxation would be needed to make empty slots end-to-end functional.Contributor::add.ARTISTStoTRACKARTISTS, symmetric with the existing singular and MBID migrations.splitListsplitting, byte-for-byte. #1555 - Stop-gap: drop MusicBrainz IDs when artist name and MBID counts differ #1556's length-mismatch guard is the safety net there.Patch surface
Slim/Schema.pm: three hunks (_preCheckAttributesdeferred-list addition;_mergeAndCreateContributorsplural-aware loop with sort-tag alignment; TRACKARTIST transform extension).Changelog9.html: one bullet under the in-progress release.No DB schema change. No format-parser change. No change to
Slim/Schema/Contributor.pm(the stop-gap from #1556 stays intact).Backwards compatibility
wipecacheto benefit on already-scanned libraries.splitListto work around the absence of plural-tag reading may find it is no longer needed. It is not removed; they can keep or clear it.Verification
Manual end-to-end verification against a hand-built fixture set (silent Opus / FLAC / MP3 / MP4 files tagged with mutagen). Pre-fix vs post-fix scan dumped from the SQLite library DB. Diff captured.
Fixture matrix (Opus, 11 scenarios)
ft.track-artist (the follow-up case)Multi-track album
3-track album, all tracks share
ALBUMARTISTS=[Armin, KI/KI]plural; track 1 has single ARTIST=Armin, track 2 single ARTIST=KI/KI, track 3 plural ARTISTS=[Armin, KI/KI]. Album-level resolves to two ALBUMARTIST rows aggregated across tracks; per-track ARTIST / TRACKARTIST rows correctly reflect each track's own credits. No joined rows at any level.Various Artists compilation
3-track VA compilation with canonical MB Various Artists MBID (
89ad4ac3-39f7-470e-963a-56509c546377),COMPILATION=1. Each track has its own ARTIST + plural ARTISTS + MBIDs; track 2 is a two-artist collab. Album row resolves to "Various Artists" primary;contributor_albumaggregates all individual track artists as ARTIST rows; per-track ARTIST rows correctly reflect each track's credits.Sort-tag alignment
Verified in two modes by inspecting
contributors.namesortafter rescan:ARMIN VAN BUUREN,KI KI). The joinedALBUMARTISTSORTstring does not leak into the first contributor.%_albumartists_sort%ontoALBUMARTISTSORTas multi-value, per the Picard convention described above): each contributor gets its own per-individual sort from the aligned slot (VAN BUUREN ARMINfor Armin, not name-fallbackARMIN VAN BUUREN).Cross-format
Same happy-path tags written to FLAC, MP4 (M4A), and MP3:
----:com.apple.iTunes:ARTISTSfreeform)WM/ARTISTS)Audio::Scan's WMA reader, may or may not surface multi-value. Flagging for a reviewer with a WMA test library.CI / smoketest
bash t/00_smoketest.shpasses (server starts, jsonrpc responds with version 9.2.0).Known limitations
MP3
Picard writes plural tags into MP3 files as ID3v2.4 multi-value text frames (one TXXX frame per descriptor, multiple values null-separated within the frame's text field, per the spec). The vendored
Audio::Scanmodule (CPAN/arch/<perl>/Audio/Scan.{pm,so}, sourced fromslimserver-vendor) returns only the first value of multi-value TXXX frames, truncating at the first null. By the timeMP3.pmsees the tag, the second and subsequent values are already lost.Effect with this PR:
Armin van Buuren & KI/KIjoined contributor row with first MBID bound (poisoned).Armin van Buurenrow alone with first MBID; KI/KI is missing entirely from the MP3 album.This is an improvement (no more poisoning) but is not the full multi-contributor benefit. Two paths to a complete MP3 fix:
Audio::Scanupstream (XS / C source inslimserver-vendor) to fully support ID3v2.4 multi-value text frames. Cleanest, most correct, but requires upstream coordination and rebuilds across all platforms.Slim/Formats/MP3.pmforALBUMARTISTS/ARTISTS/MUSICBRAINZ_ALBUMARTISTID/MUSICBRAINZ_ARTISTIDonly. ~50-100 lines usingpack/unpackon the raw frame bytes; opens the file a second time per scan. Separate small follow-up PR.Recommend shipping this PR as-is and addressing MP3 in a follow-up. Happy to do either.
Default-Picard ALBUMARTIST libraries
As noted in "Scope of benefit", vanilla Picard does not write
ALBUMARTISTS, so default-Picard libraries see the ARTIST-role win but retain the legacy ALBUMARTIST singular-path behaviour (now safe under #1556's guard). Users who want the full album-level split should add the Picard script referenced above, or switch to a workflow that writesALBUMARTISTS.FLAC, Ogg, Opus, and MP4 / M4A are unaffected by the
Audio::Scanlimitation and gain the full benefit whenever the plural tag is present.Test plan
bash t/00_smoketest.sh).namesortinspection).