I have a play list of slightly more than 10,000 mp3 files. The music library has a total of about 40,000 tracks. I decided to write a batch file to copy out only the files on the playlist to a directory on a different drive. I used Notepad++ to modify the text playlist file so I would get the path/file names correct. I simply needed to add quotes around the path/file names, prefix the lines with “copy ” and suffix the lines with the destination drive/directory. Did all that, made sure the batch file was save as UTF-8 and executed it.
After a few minutes the batch file completed. When I checked the destination file I noticed that about 70 files had not copied over. I used ‘Beyond Compare’ on the original playlist file against a playlist file I made from the files that did copy over. What I noticed was that the files that did not copy over had what I will call ‘European’ characters in the filename. So like “Dov' é L'Amore.mp3” and “José Feliciano - Feliciano! - 01 - California Dreamin'.mp3.” Other files with exclamations did not copy either.
I reran the file substituting ‘xcopy’ instead of ‘copy’ – same result. On to Robocopy - same result. At this point I decided to try and copy over one of the problem files using Robocopy at a command prompt to see what errors it reported. Surprise, surprise - it copied over, as did the others. So Robocopy at the command level will copy the files, but not in a batch file saved as UTF-8??
As a last resort I decided to try using Powershell. But as I am inexperienced in using it, I asked ChatGPT to write a script for me, and this is what it returned.
# Source and destination paths
$sourcePath = "F:\Directory\Music\Cher - Bob’s Cher Mix"
$destinationPath = "X:\BoboFMDrive"
# File name with special characters
$fileName = "Cher - My Cher Mix - 02 - Dov' é L'Amore.mp3"
# Full path of the source file
$sourceFile = Join-Path -Path $sourcePath -ChildPath $fileName
# Full path of the destination file
$destinationFile = Join-Path -Path $destinationPath -ChildPath $fileName
# Copy the file to the destination
Copy-Item -Path $sourceFile -Destination $destinationFile -Force
Write-Host "File copied successfully!"
And it worked!, but I am looking for a solution that will let me easily edit a text-based file with many lines/strings as it would be onerous to have to create a script for each file. Does anyone have any thoughts on a solution? I ended up just using ‘Beyond Compare’ and copied over the dropped files manually, but would like to find a better/easier solution for the future.
The problem is the codepage. Windows, per default, is not using UTF-8. It uses the local ANSI codepage.
The codepage of
UTF-8is65001Commandline Test:
Prepare:
Create some filenames using different codepages:
Check for Differences:
As you can see, there is no difference. Obviously Windows internally converts the used characterset before the filename it's written to the fielsystem.
Result:
Therefor you have no problems, when using the commandline and batch without any evaluation of a file content.
File Test:
Prepare:
Using the
Notepad.exeof Windows you can choose the file encoding during the actionSave as ....Create three files with the text
Dov' é L'Amore.Save them encoded as
Check for Differences:
Please note the
Úin theansi.txtcontent!This is the difference between
850 = Latin1and1252 = Windows-1252As a GUI app
Notepad.exesaved "ANSI" using characterset "Windows-1252".(Note/compare the space before the text in
utf8_boom.txt's content)In contrast to the filesystem, within a file the encoding in conjunction with the codepage is relevant.
If it gets out of sync the processed filenames will differ from the ones found in the filesystem.
Result:
The practical part:
For scripts involving a UTF-8 text file temporarily change the codepage to UTF-8. To limit the change to the runtime of the batch, the code should be enclosed by
setlocal/endlocal:As seen here, storing the UTF-8 with or without boom makes no differences for the displayed characters, but the boom adds binary content. So it is better to store UTF-8 without boom, as the binary characters can irritate programs, especially when interchanging to other operating systems.