Replacing carriage returns (CR) or line feeds (LF) in JSON

52 Views Asked by At

I'm running into an issue with JSON, which I admittedly know little about:

I have UTF-8 encoded text that is formatted with line feeds and/or carriage returns that I'm packaging up in FME workbench to send it to an API for a software called Desk Alerts. The initial text is read from a text file and looks something like:

The enhanced alarm included the following information and instructions:

 
********************************************************************************
 blah blah blah blah
 
 additional blah blah info blah
 
 third line of blah blah blah
******************************************************************************

Now, note there are line feeds at the end of each of those lines, typically the messages are actually quite verbose and the final message is constructed from several files, so keeping the individual lines are of high importance in some fashion.

The schema for the API is relatively simple - it looks something like this:

{
  "body": "@Value(Body)",
  "title": "@Value(Subject)",
  "templateId": 4,
  "skinId": 1,
  "isEveryone": true,
  "recipients": []
}

...where @Value() are dynamic variables containing text like the above mentioned text.

When I send these requests, I get the following error:

{
  "status": "Error",
  "errorMessage": "Request model invalid",
  "data": {
    "validationErrors": [
      "[queryParam] : The queryParam field is required.",
      "[$.body] : '0x0A' is invalid within a JSON string. The string should be correctly escaped. 

...which I've had enough Google success to know means "we don't like \n". But all of the recommendations I came across so far have been things like using \\n instead of \n to be "properly escaped".

So I used a regex search for [\n] and replaced it with \\n, which JSON now likes, but gives me messages that look like this:

The enhanced alarm included the following information and instructions:\n \n \n ********************************************************************************\n \n THE FIRE TROUBLE CIRCUIT AT CLINICAL CENTER B BLDG IS REPORTING A MALFUNCTION!\n \n ********************************************************************************\n 3\n \n \tTHIS INDICATES A MALFUNCTION OF THE FIRE SYSTEM\n \tMONDAY THRU FRIDAY, 8:00 TO 4:30 PM, PLEASE NOTIFY PHYSICAL PLANT\n \tMAINTENANCE DISPATCH AT 3-1760.\n \tALL OTHER TIMES, NOTIFY THE CAMPUS TELEPHONE OPERATOR.\n \n ******************************************************************************\n \n

As you may have picked up for the message, I have the same problem with tabs.

Am I doing this wrong? Any advice would be appreciated.

1

There are 1 best solutions below

1
AmigoJack On

Let's look at RFC 8259 section 7: Strings which defines what Strings in JSON are:

A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

  • 0x0A (line feed = LF) is a control character
  • 0x09 (horizontal tabulator = HT) is a control character
  • 0x0D (carriage return = CR) is a control character

Your JSON is invalid and the error message is correct. You have to escape every character that is not allowed:

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point.
...
Alternatively, there are two-character sequence escape representations of some popular characters.

Examples:

character name Unicode code point escaping alternative popular escaping
0x09 horizontal tabulator (HT) \u0009 \t
0x0A line feed (LF) \u000A \n
0x0D carriage return (CR) \u000D \r

Not just line feeds and carriage returns and horizontal tabulators must be escaped - you have to escape every control character (0x00 to 0x1F inclusively) and \ and " in a String. For the most popular characters there are short escapings, known from many other programming languages (such as \n and \t). But it's also fine to escape them by their Unicode code point (as \u000A and \u0009), which should be easy to program yourself.

Having line breaks in your JSON for human readability makes it invalid, too - in doubt you must have a very long line of a String literal. A JSON parser most likely would not care about the length of a String, but can have its limits, too (RFC section 9):

An implementation may set limits on the length and character contents of strings.