Read csv/tsv file line by line and split result in Golang

208 Views Asked by At

I want to read a TSV file in Golang, line by line, and do a process on a column of data.

Here is what my data looks like:

foo bar baz
1 2 10
2 3 50

Here is a C++ code that works well for my purpose:

ifstream iFile("input.tsv");
iFile >> str >> str >> str; // Skip first line
int sum = 0;
while (iFile >> x >> y >> w)
{
    sum += w; // A simple process on the third column.
}

But I can't find a similar code for Go. Here is a code for reading tsv in Goland:

file, _ := os.Open("input.tsv")
defer file.Close()
csvReader := csv.NewReader(file)
csvReader.Comma = '\t'
for {
    rec, err := csvReader.Read()
    fmt.Printf("%+v\n", rec[0])
}

The result will be:

1   2   10
2   3   50

But I can't split each record into x, y, w. Each line of output is a string but I prefer a list so I can easily access w. (the third item)

P.S.: The problem is solved. As @JimB showed us HERE, the code is working.

When the tsv file format is correct, the result will be

[]string{"1", "2", "10"}
[]string{"2", "3", "50"}

but if the tsv file uses multiple spaces instead of a tab, the result will be:

[]string{"1   2   100"}
[]string{"2   3   50"}

Which is not what we want.

2

There are 2 best solutions below

0
Jadefox10200 On

If I understand your question correctly, you want to load the csv, turn it into a nested array, loop through and access the [2] field on the nested array. I think you're already pretty close, but this is the code I've used:

//file path of the master file
f, err := os.Open(filePath)
if err != nil {     
    log.Fatalf("ERROR: error opening file: %v", err)
}
defer f.Close()

//pass the file to the reader
r := csv.NewReader(f)   

recs, err := r.ReadAll()
if err != nil {fmt.Printf("Failed to readall: %s\n", err); return}

l := len(recs)

for i := 0; i < l; i++ {
  //Do something with the w field
  w := recs[i][2]
}

As mentioned, you can read each line by using a callback in gocsv. I usually don't do this and simply marshal the file and then read but this is how I would do it if I did do it line by line:

package main

//handlers

import (
    "os"    
    "fmt"   
    "github.com/gocarina/gocsv"
    
)

func main() {

    file, _ := os.Open("csvTest.csv")
    defer file.Close()

    //CSV READ: 
    err := gocsv.UnmarshalToCallback(file, func(r Job_entry){
        fmt.Println(r.Title)
    })
    if err != nil {
        panic(err)
    }

}

type Job_entry struct {
    //table:                                `csv:""`            // EXAMPLES:
    Job_Id                  int64           `csv:"-"`           // 49914
    Title                   string          `csv:"TITLE"`       // BOOKLET
    Language                string          `csv:"LANGUAGE"`    // ENGLISH
    Quantity                int64           `csv:"QTY"`         // 684
        
}

Using a file such as:

ID,TITLE,LANGUAGE,QTY
EN.BKLT,Booklet,ENGLISH,400
EN.FORM,A form,ENGLISH,300
JA.BKLT,Booklet,JAPANESE,300

However, I usually use gocsv to marshal the file at one time in this manner:

package main

//handlers

import (
    "os"
    "fmt"
    "github.com/gocarina/gocsv"
    
)

func main() {

    file, _ := os.Open("csvTest.csv")
    defer file.Close()

    //CSV READ:
    csvRecords := []*Job_entry{}
    err := gocsv.Unmarshal(file, &csvRecords)
    if err != nil {fmt.Errorf("There was an error loading the CSV file: %v", err.Error()); return }

    for k, v := range csvRecords {
        fmt.Printf("Rec: %v\t%v\n", k, v.Title)
    }

}

type Job_entry struct {
    //table:                                `csv:""`            // EXAMPLES:
    Job_Id                  int64           `csv:"-"`           // 49914
    Title                   string          `csv:"TITLE"`       // BOOKLET
    Language                string          `csv:"QTY"` // 682
    Quantity                int64           `csv:"LANGUAGE"`            // ENGLISH
        
}
0
Zach Young On

Here's a complete program that will open the TSV, read it using Go's csv Reader, convert the 3rd column to an int, and accumulate the sum of the ints:

func main() {
    f, _ := os.Open("input.tsv")
    defer f.Close()

    r := csv.NewReader(f)
    r.Comma = '\t'

    records, err := r.ReadAll()
    if err != nil {
        panic(err)
    }
    records = records[1:] // reslice to omit header

    sum := 0
    for _, record := range records {
        x, err := strconv.Atoi(record[2])
        if err != nil {
            panic(err)
        }
        sum += x
    }

    fmt.Println(sum)
}

I don't even begin to understand the C++ code, except I can see that the tab char is never specified, so I assume the code works because >> means something like "skip to the next thing after a space (which might inlcude a tab)"... I dunno. To me it looks short, and magical, compared to the Go code.