Mutagen using forward slash as delimiter

46 Views Asked by At

I'm using mutagen to collect information about given MP3 files. It's working but there is one problem. When a song has mutliple artists it uses a forward slash as a delimiter. So the TPE1 tag may return the following when the song is, e.g., a collaboration between AC/DC and Ministry:

['Ministry/AC/DC']

This is problematic when trying to isolate separate artists from the tag. Splitting on / won't work because this will give three results: Ministry, AC and DC. This is my code:

import re
import mutagen

class MusicData:
    def __init__(self, root, filepath):
        self.fullpath = root + '\\' + filepath
        
        self.prepath = re.sub(r'\\[^\\]*$', '', self.fullpath) + '\\'
        self.filename = self.fullpath.replace(self.prepath, '')

        file = mutagen.File(self.fullpath)
        self.duration = file.info.length
        
        self.title = self.extractKey(file, 'TIT2')[0]
        self.artist = self.extractKey(file, 'TPE1')[0]
        self.album = self.extractKey(file, 'TALB')[0]
        self.year = self.extractKey(file, 'TDRC')[0]
        self.genre = self.extractKey(file, 'TCON')[0]
        self.publisher = self.extractKey(file, 'TPUB')[0]
        self.key = self.extractKey(file, 'TKEY')[0]
        self.bpm = self.extractKey(file, 'TBPM')[0]
    
    def extractKey(self, file, key):
        if(key in file):
            if(type(file.tags[key].text[0]) == mutagen.id3._specs.ID3TimeStamp):
                return [str(file.tags[key].text[0])]
            else:
                return file.tags[key].text
        else:
            return [""]

The documentation on mutagen is very brief and is making me none the wiser. How do I properly collect the artists from a given file using mutagen?

1

There are 1 best solutions below

0
Dmitry On

I believe the problem comes from difference between versions of ID3 tag standard.

v2.3.0 defined TPE1 tag as:

The 'Lead artist(s)/Lead performer(s)/Soloist(s)/Performing group' is used for the main artist(s). They are separated with the "/" character.

v2.4.0 states that:

All text information frames supports multiple strings, stored as a null separated list, where null is represented by the termination code for the character encoding. All text frame identifiers begin with "T".

Mutagen works with both versions of frames, but seems that for v2.3.0 there is no way to specify the separator upon a file opening. So the value read is always one, regardless of presence of forward slashes inside.

I would propose converting frames to v2.4.0 with small text processing:

from typing import List


SLASH = '/'


def split_with_respect(source: str) -> List[str]:
    """
    Split the source string into artists with respect to AC/DC
    """

    result = []

    def append(word: str) -> None:
        if result and result[-1] == 'AC' and word == 'DC':
            result[-1] = 'AC/DC'
        else:
            result.append(word)

    start = 0
    while True:
        index = source.find(SLASH, start)
        if index == -1:
            append(source[start:])
            break

        append(source[start:index])
        start = index + 1

    return result

And the following changes the file:

from mutagen.id3 import ID3, TPE1, Encoding


def convert(file_name: str) -> None:

    file = ID3(file_name)

    main_artist_values = file.get('TPE1').text

    if len(main_artist_values) != 1:
        # Have a correct delimiter and already multiple values
        return

    source = main_artist_values[0]
    artists = split_with_respect(source)

    if len(artists) == 1:
        # No changes
        return

    # Rewrite the tag
    file.add(TPE1(encoding=Encoding.UTF8, text=artists))
    file.save()

So that reading the tag produces multiple values:

>>> file = ID3(file_name)
>>> print(file.get('TPE1').text)
['Thunderstruck', 'AC/DC']

Python 3.11.1, Mutagen 1.47.0.