How to import an arff file to a pandas df and later convert it to arff again

780 Views Asked by At

I want to preprocess a data base with scikit learn from an arff file, and later use on an python-weka-wrapper3 model the preprocessed data base, so I need a function to load the arff as df or transform the arff to csv, and later again download the edited df on an arff or transform a csv to arff.

Some people recomend https://github.com/renatopp/liac-arff (liac-arff) but I don't know how to do that with this library.

So, if someone knows any function or some code well explained on python3 I'll apreciate.

In my case I tried with this function:

def arff2csv(arff_path, csv_path=None):
    with open(arff_path, 'r') as fr:
        attributes = []
        if csv_path is None:
            csv_path = arff_path[:-4] + 'csv'  # *.arff -> *.csv
        write_sw = False
        with open(csv_path, 'w') as fw:
            for line in fr.readlines():
                if write_sw:
                    fw.write(line)
                elif '@data' in line:
                    fw.write(','.join(attributes) + '\n')
                    write_sw = True
                elif '@attribute' in line:
                    #print(line.split(' ')[2])
                    attributes.append(line.split(' ')[1])  # @attribute attribute_tag numeric

        print("Convert {} to {}.".format(arff_path, csv_path))
2

There are 2 best solutions below

0
fracpete On

If you want to stay within the scikit-learn ecosystem, you could have a look at the sklearn-weka-plugin library, which uses python-weka-wrapper3 under the hood.

BTW python-weka-wrapper3 can create datasets directly from numpy matrices. Examples: [1], [2]

0
Vikram On

I found this answer on AV to solve my problem of loading the arff data format into pandas dataframe.

https://discuss.analyticsvidhya.com/t/loading-arff-type-files-in-python/27419/2

from scipy.io import arff
import pandas

data = arff.loadarff('data_file_name.arff')
df = pandas.DataFrame(data[0])