I am trying to convert this dataset: COCOMO81 to arff.
Before converting to .arff, I am trying to convert it to .csv
I am following this LINK to do this.
I got that dataset from promise site. I copied the entire page to notepad as cocomo81.txt and now I am trying to convert that cocomo81.txt file to .csv using python. (I intend to convert the .csv file to .arff later using weka)
However, when I run
import pandas as pd
read_file = pd.read_csv(r"cocomo81.txt")
I get THIS ParserError.
To fix this, I followed this solution and modified my command to
read_file = pd.read_csv(r"cocomo81.txt",on_bad_lines='warn')
I got a bunch of warnings - you can see what it looks like here
and then I ran
read_file.to_csv(r'.\cocomo81csv.csv',index=None)
But it seems that the fix for ParserError didn't work in my case because my cocomo81csv.csv file looks like THIS in Excel.
Can someone please help me understand where I am going wrong and how can I use datasets from the promise repository in .arff format?
Looks like it's a csv file with comments as the first lines. The comment lines are indicated by
%characters, but also@(?), and the actual csv data starts at line 230.You should skip the first rows and manually set the column names, try something like this: