Strange characters in (invalid) json string from post request (encoding issues)

144 Views Asked by At

I am trying to get data from a post request using the following line :

$data = file_get_contents('php://input');

The json string might be like: {"test" : "test one \xe0 "}

The problem is when I try to do a json_decode($data), I get null. By var_dump()ing $data, I see some characters like \xe0 \xe7a.

The data sent is in utf-8. I use utf8_decode($data) as well, but with no luck. Could someone explain what I am missing or how to solve this issue?

I need to convert the invalid json from:

$data = '{"test" : "test one \xe0 "}';

to:

$data = '{"test" : "test one à "}';
2

There are 2 best solutions below

2
mickmackusa On BEST ANSWER

Mutating a json string with string functions will always be something to be done with apprehension because it is generally easy for a false positive replacement to damage the payload. That said, here is a script to attempt to correct your invalid json string.

Code: (Demo)

$json = '{"test" : "test one \xe0, \x270B"}';
    
$json = preg_replace_callback(
           '/\\\\x([[:xdigit:]]+)/',
           fn($m) => sprintf('\u%04s', $m[1]),
           $json
     );
     
echo "\n" . var_export(json_validate($json), true);
echo "\n$json\n";
var_export(json_decode($json));

Output:

true
{"test" : "test one \u00e0, \u270B"}
(object) array(
   'test' => 'test one à, ✋',
)

If this has known flaws, please leave a comment below and I'll endeavor to overcome the issue when I have time.

A related answer of mine: Replace all hex sequences with ascii characters

1
Olivier On

A way to fix your JSON is to replace the invalid \xNN sequences with valid \u00NN sequences:

$data = '{"test" : "test one \xe0 "}';
$val = json_decode(str_replace('\x', '\u00', $data));
echo $val->test;

Output:

test one à