PHP How to handle/parse csv files that have missing columns

686 Views Asked by At

I have many csv files generated by a third party, for which I have no say or control.

So each day I must import these csv data to mysql.

Some tables have correct matching number of columns to header.

Others do not.

Even when I did a prepared statement, it still did not import.

I tried to create a repair csv function, to add extra columns to each row, if their count of columns was less than the count of header columns.

As part of this project I am using the composer package league csv.

https://csv.thephpleague.com/

But here is my function code:

public function repaircsv(string $filepath) {
    
    // make sure incoming file exists
    if (!file_exists($filepath)) {
        
        // return nothing
        return;
    }
    
    // setup variables
    $tempfile = pathinfo($filepath,PATHINFO_DIRNAME).'temp.csv';
    $counter = 0;
    $colcount = 0;
    $myline = '';
    
    // check if temp file exists if it does delete it
    if (file_exists($tempfile)) {
        
        // delete the temp file
        unlink($tempfile);
        
    }

    // C:\Users\admin\vendor\league\csv
    require('C:\Users\admin\vendor\league\csv\autoload.php');
    
    // step one get header column count
    $csv = Reader::createFromPath($filepath);
    
    // set the header offset
    $csv->setHeaderOffset(0);

    //returns the CSV header record
    $header = $csv->getHeader();

    // get the header column count
    $header_count = count($header);
    
    // check if greater than zero and not null
    if ($header_count < 1 || empty($header_count)) {
        
        // return nothing
        return $header_count;
        
    }
    
    // loop thru csv file
    // now read file line by line skipping line 1
    $file = fopen($filepath, 'r');
    $temp = fopen($tempfile, 'w');

    // loop thru each line
    while (($line = fgetcsv($file)) !== FALSE) {
        
        // if first row just straight append
        if ($counter = 0) {
            
            // append line to temp file
            fputcsv($temp, $line);
            
        }
        
        // if all other rows compare column count to header column count
        if ($counter > 0) {
            
            // get column count for normal rows
            $colcount = count($line);
            
            // compare to header column count
            $coldif = $header_count - $colcount;
            
            // loop til difference is zero
            while ($colcount != $header_count) {
            
                // add to line extra comma
                $line .= ',';
                
                // get new column count
                $colcount = count($line);
                
            }
            
            // append to temp file
            fputcsv($temp, $line);
            
            // show each line
            $myline .= 'Line: ['.$line.']<br/><br/>';
            
        }
        
        // increment counter
        $counter++;

    }
    
    // check file size of temp file
    $fs = filesize($tempfile);
    
    // if below 200 ignore and do not copy
    if ($fs > 200) {
    
        // copy temp to original filename
        copy($tempfile,$filepath);
        
    }
    
    return $myline;
}

The logic is to copy the original csv file to a new temp csv file and add extra commas to rows of data that have missing columns.

Thank you for any help.

Edit: So the various csv's contain private data, so I can not share them.

But let us for example say i download multiple csvs for different data daily.

Each csv has a header row, and data.

If the number of columns in each row isn't 100% the same number of columns as in the header, it errors out.

If there are any special characters, it errors out.

There are 1000's of rows of data.

The code above is my first attempt to try to fix rows that have missing columns.

Here is an example FirstName, LastName, Email Steve,Jobs ,Johnson,[email protected]

Just a very small example.

I have no control of how the csvs are created, I do control the download process and import process.

Which then i use the csv data to update mysql tables.

I have tried the load data infile but that errors out too.

So I need to fix the csv files after they are downloaded.

Any ideas?

1

There are 1 best solutions below

0
Miroslav Ćurčić On

Do not mix array and string, instead of $line .= ','; do $Line[]= '';

Also fix:

$myline .= 'Line: ['.implode(',', $line).']<br/><br/>';

Suggestion, you can replace your while loop with:

$line = array_pad($line, $header_count, ''); // append missing items
$line = array_slice($line, 0, $header_count); // remove eventual excess items