Extendscript Regex not behaving as expected

55 Views Asked by At

I'm making a script for Adobe Indesign to read some data from a CSV file from Excel.

There are some CSV annoyances with commas and quotes, so I tried to use regex to split, instead of split(","). The regex I'm trying to use is:

/,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/g

A line from the CSV file may look like:

[SP],"Example 3/8"" Wrench, (2) Screwdrivers","Example 3/8"" Wrench, (2) Screwdrivers"

Which should be 3 cells:

Cell 1 Cell 2 Cell 3
[SP] Example 3/8" Wrench, (2) Screwdrivers Example 3/8" Wrench, (2) Screwdrivers

The following code works in javascript but not extendscript:

var csvData = csvLine.split(/,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/g);

I also tried using https://regex101.com/ as a sanity check, and it identifies commas where I would expect.

With extendscript I got:

Cell 1 Cell 2 Cell 3
[SP] Example 3/8"" Wrench Example 3/8" Wrench, (2) Screwdrivers

Where it is truncating part of the second cell, but not the third cell.

Is this a bug in extendscript? Or am I doing something wrong? Is there a different regex that would work in extendscript?

1

There are 1 best solutions below

0
Yuri Khristich On

Looks like another Extendscript glitch indeed.

Here is the 'stupid' working solution:

var s = '[SP],"Example 3/8"" Wrench, (2) Screwdrivers","Example 3/8"" Wrench, (2) Screwdrivers"';

var a = s
    .replace(/, /g,'<comma and space>')
    .replace(/""/g,'<inch>')
    .replace(/"/g,'')
    .split(',');

for (var i=0; i<a.length; i++) {
    a[i] = a[i]
        .replace(/<comma and space>/g,', ')
        .replace(/<inch>/g,'"')
}

alert(a.join('\n'));

enter image description here

Looks pretty awful but on the other hand you can see exactly what the code does, which is good for maintenance. )