strtr acting weird - removing diacritics from a string

175 Views Asked by At

I'm having hard times removing diacritics from some $string. My code is

<?php
$string = "Příliš žluťoučký kůň úpěl ďábelské ódy.";
$without_diacritics = strTr($string, "říšžťčýůúěďó", "risztcyuuedo");
echo $without_diacritics; 

while expected output would be Prilis zlutoucky kun upel dabelske ody.

Instead, I'm receiving very weird response:

Puiszliuc uuluueoudoks� ku�u� s�pd�l d�scbelsks� s�dy.

I've thought that it could be a problem with multi-byte characters, but I've found that the strtr is multi-byte safe. Is my assumption wrong? What am I missing?

2

There are 2 best solutions below

0
Ja͢ck On BEST ANSWER

The problem is that your input translation string is twice as big as the output translation string (because of Unicode) and strtr() works with bytes instead of characters; a translation array would be better in this case:

$string = "Příliš žluťoučký kůň úpěl ďábelské ódy.";

echo strtr($string, [
  'ř' => 'r',
  'í' => 'i',
  'š' => 's',
  'ž' => 'z',
  'ť' => 't',
  'č' => 'c',
  'ý' => 'y',
  'ů' => 'u',
  'ú' => 'u',
  'ě' => 'e',
  'ď' => 'd',
  'ó' => 'o'
]);

Output:

Prilis zlutoucky kuň upel dábelské ody.

Demo

3
Darren On

A simple and tried solution (based off this answer), harnesses iconv() to convert the string "from your given encoding to ASCII characters".

$input = 'Příliš žluťoučký kůň úpěl ďábelské ódy.';
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
echo $input;

Example


Explanation

The issue you're facing is due to the encoding of the string/document. The issue with strtr() is that it isn't multibyte aware, as @ChrisForrence stated in his comment.

It may be because some of those characters are more than one byte, so it doesn't map properly.