Weka unable to read arff file

127 Views Asked by At

everyone..

I try to open an arff file using weka but the error appear as below:

java.io.IOException: Unable to determine structure as arff (Reason: java.io.IOException: } expected at end of enumeration, read Token[EOL], line 6)

This is part of the code to refer:

@RELATION AB_NYC_2019-2k

@ATTRIBUTE id numeric
@ATTRIBUTE host_id numeric
@ATTRIBUTE host_name {,Christopher+SamanthaAndMason,Dee,"Porfirio Firo & Maria","Robert Bob"}
@ATTRIBUTE neighbourhood_group {,Bronx,Brooklyn,"DeeDre & Mama Shelley",Manhattan,Queens,"Staten Island"}
@ATTRIBUTE neighbourhood {,Allerton,Arrochar,Arverne,Astoria,"Battery Park City","Bay Ridge",Bayside,Bedford-Stuyvesant}
@ATTRIBUTE latitude numeric
@ATTRIBUTE longitude numeric
@ATTRIBUTE room_type {"Entire home/apt","Private room","Shared room"}
@ATTRIBUTE price numeric
@ATTRIBUTE minimum_nights numeric
@ATTRIBUTE number_of_reviews numeric
@ATTRIBUTE last_review {,1/1/2013,1/1/2016,1/1/2017,1/1/2018,1/1/2019,1/10/2019}
@ATTRIBUTE reviews_per_month numeric
@ATTRIBUTE calculated_host_listings_count numeric
@ATTRIBUTE availability_365 numeric

@DATA
2539,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,10/19/2018,0.21,6,365
2595,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,5/21/2019,0.38,2,355
3647,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365

This is my file

Can anyone help me? Thank you in advanced..

1

There are 1 best solutions below

0
fracpete On

It looks like you generated the ARFF manually, introducing some errors:

  • Weka uses ? to denote missing values in the data section, not empty cells
  • duplicate values in the nominal value lists are not allowed (Dee appears twice in host_name)

It is recommended to use the Java API for generating datasets instead of manually crafting ARFF text files.

Here is what I would do:

  • change your dataset to a CSV file, by replacing the ARFF header (@DATA and above) with this CSV header row:
    id,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365 
    
  • install the Common CSV Weka package, as it handles CSV files better than Weka's native CSVLoader (and restart Weka for the package to be available)
  • load the CSV using the Common CSV file format and check the Invoke options dialog check box before clicking onOK
  • In the options dialog use the following custom options:
    • dateFormat: M/d/yyyy
    • dateRange: 12
  • Click OK to load it
  • Apply the StringToNominal filter to your dataset using first-last as range (default is last) to convert the string attributes into nominal ones.