Batch File to Copy mp3 Files With "European" Characters in the Titles

93 Views Asked by At

I have a play list of slightly more than 10,000 mp3 files. The music library has a total of about 40,000 tracks. I decided to write a batch file to copy out only the files on the playlist to a directory on a different drive. I used Notepad++ to modify the text playlist file so I would get the path/file names correct. I simply needed to add quotes around the path/file names, prefix the lines with “copy ” and suffix the lines with the destination drive/directory. Did all that, made sure the batch file was save as UTF-8 and executed it.

After a few minutes the batch file completed. When I checked the destination file I noticed that about 70 files had not copied over. I used ‘Beyond Compare’ on the original playlist file against a playlist file I made from the files that did copy over. What I noticed was that the files that did not copy over had what I will call ‘European’ characters in the filename. So like “Dov' é L'Amore.mp3” and “José Feliciano - Feliciano! - 01 - California Dreamin'.mp3.” Other files with exclamations did not copy either.

I reran the file substituting ‘xcopy’ instead of ‘copy’ – same result. On to Robocopy - same result. At this point I decided to try and copy over one of the problem files using Robocopy at a command prompt to see what errors it reported. Surprise, surprise - it copied over, as did the others. So Robocopy at the command level will copy the files, but not in a batch file saved as UTF-8??

As a last resort I decided to try using Powershell. But as I am inexperienced in using it, I asked ChatGPT to write a script for me, and this is what it returned.

# Source and destination paths
$sourcePath = "F:\Directory\Music\Cher - Bob’s Cher Mix"
$destinationPath = "X:\BoboFMDrive"

# File name with special characters
$fileName = "Cher - My Cher Mix - 02 - Dov' é L'Amore.mp3"

# Full path of the source file
$sourceFile = Join-Path -Path $sourcePath -ChildPath $fileName

# Full path of the destination file
$destinationFile = Join-Path -Path $destinationPath -ChildPath $fileName

# Copy the file to the destination
Copy-Item -Path $sourceFile -Destination $destinationFile -Force

Write-Host "File copied successfully!"

And it worked!, but I am looking for a solution that will let me easily edit a text-based file with many lines/strings as it would be onerous to have to create a script for each file. Does anyone have any thoughts on a solution? I ended up just using ‘Beyond Compare’ and copied over the dropped files manually, but would like to find a better/easier solution for the future.

1

There are 1 best solutions below

4
dodrg On BEST ANSWER

The problem is the codepage. Windows, per default, is not using UTF-8. It uses the local ANSI codepage.
The codepage of UTF-8 is 65001

Commandline Test:

Prepare:

Create some filenames using different codepages:

D:\Test>  chcp
Active Codepage: 850.

D:\Test>  echo . >"Dov' é L'Amore_ansi.mp3"
D:\Test>  chcp 65001
Active Codepage: 65001

D:\Test>  echo . >"Dov' é L'Amore_utf8.mp3"
Check for Differences:
D:\Test>  chcp 850
Active Codepage: 850.

D:\Test>  dir
11.08.2023  17:08    <DIR>          .
11.08.2023  17:08    <DIR>          ..
11.08.2023  16:56                 4 Dov' é L'Amore_ansi.mp3
11.08.2023  16:58                 4 Dov' é L'Amore_utf8.mp3

D:\Test>  chcp 65001
Active Codepage: 65001

D:\Test>  dir
11.08.2023  17:08    <DIR>          .
11.08.2023  17:08    <DIR>          ..
11.08.2023  16:56                 4 Dov' é L'Amore_ansi.mp3
11.08.2023  16:58                 4 Dov' é L'Amore_utf8.mp3

D:\Test>  

As you can see, there is no difference. Obviously Windows internally converts the used characterset before the filename it's written to the fielsystem.

Result:

Therefor you have no problems, when using the commandline and batch without any evaluation of a file content.

File Test:

Prepare:

Using the Notepad.exe of Windows you can choose the file encoding during the action Save as ....

Create three files with the text Dov' é L'Amore.
Save them encoded as

  • ANSI
  • UFT-8
  • UTF-8 with Boom
Check for Differences:
D:\Test>  chcp 850
Active Codepage: 850.

D:\Test>  type ansi.txt
Dov' Ú L'Amore

D:\Test>  type utf8.txt
Dov' ├® L'Amore

D:\Test>  type utf8_boom.txt
´╗┐Dov' ├® L'Amore

D:\Test>  

Please note the Ú in the ansi.txt content!
This is the difference between

  • Local ANSI in "DOS": 850 = Latin1 and
  • ANSI of Windows GUI: 1252 = Windows-1252

As a GUI app Notepad.exe saved "ANSI" using characterset "Windows-1252".

D:\Test>  chcp 1252
Aktive Codepage: 1252.

D:\Test>  type ansi.txt
Dov' é L'Amore

D:\Test>  type utf8.txt
Dov' é L'Amore

D:\Test>  type utf8_boom.txt
Dov' é L'Amore

D:\Test>  
D:\Test>  chcp 65001
Aktive Codepage: 65001.

D:\Test>  type ansi.txt
Dov' � L'Amore

D:\Test>  type utf8.txt
Dov' é L'Amore

D:\Test>  type utf8_boom.txt
 Dov' é L'Amore

D:\Test>  

(Note/compare the space before the text in utf8_boom.txt's content)

In contrast to the filesystem, within a file the encoding in conjunction with the codepage is relevant.
If it gets out of sync the processed filenames will differ from the ones found in the filesystem.


Result:
  • Not using UTF-8 is possible, but even when interchanging GUI and CMD-line you might come into conflicts, as the GUI-ANSI characterset might differ from the CMD-ANSI characterset.
  • If you assume that there are characters present that are not part of your local ANSI, i.e. because the origin of some filenames is from a different culture, then
    • Save the playlist in UTF-8
    • Change the codepage during batch processing so the playlist file is interpreted as expected.

The practical part:

For scripts involving a UTF-8 text file temporarily change the codepage to UTF-8. To limit the change to the runtime of the batch, the code should be enclosed by setlocal / endlocal:

@echo off
setlocal
  chcp 65001

  rem   Your script ....
  type utf8.txt

endlocal

As seen here, storing the UTF-8 with or without boom makes no differences for the displayed characters, but the boom adds binary content. So it is better to store UTF-8 without boom, as the binary characters can irritate programs, especially when interchanging to other operating systems.