What is the Miller command for separating emails into their own rows, while also copying down other column data?

144 Views Asked by At

I have a very large csv file (213,265 rows) with many columns.

In one of those columns I have some emails seperated by commas. A trimmed down version of the csv file looks like this:

enter image description here

I would like to use Miller to seperate out those emails into their own rows, but also copy down ALL the other columns in the spreadsheet (many of which aren't shown here in this simple example).

Following on with this example, I would like to end with something like this. But keep in mind the real spreadsheet has many other columns before and after the email column:

enter image description here

Is that possible to do with Miller (or another similar tool)? What would the command look like?

1

There are 1 best solutions below

2
aborruso On BEST ANSWER

The verb is nest. Starting from

company,address,email
anna,123 fake,"[email protected],[email protected],[email protected]"

and running

mlr --csv nest --explode --values --across-records --nested-fs "," -f email input.csv

you will have

+---------+----------+-----------------+
| company | address  | email           |
+---------+----------+-----------------+
| anna    | 123 fake | [email protected]    |
| anna    | 123 fake | [email protected] |
| anna    | 123 fake | [email protected]       |
+---------+----------+-----------------+

If you have a "bad" CSV, you could have some problems and you should try to clean it. A generic clean command could be this one:

mlr --csv -N clean-whitespace then remove-empty-columns then skip-trivial-records then cat -n sample.csv | mlr --csv nest --explode --values --across-records --nested-fs "," -f Email >output.csv

It removes empty rows, empty columns and wrong white spaces.