Why does read.table causes multiple rows to be placed in one row?

237 Views Asked by At

I used the following code to load in some data.

movies <- read.table("movies.dat", header=FALSE, sep="\n")

Most of the data is loaded in well, like this:

 58 58::Postman, The (Postino, Il) (1994)::Comedy|Drama|Romance   
 59 59::Confessional, The (Confessionnal, Le) (1995)::Drama|Mystery  
 60 60::Indian in the Cupboard, The (1995)::Adventure|Children|Fantasy 

The first number of each row being the row number within R, the rest is a string in one column. But some rows appear like this:

111 114::Margarets Museum (1995)::Drama       
    115::Happiness Is in the Field (Bonheur est dans le pré, Le) (1995)::Comedy       
    116::Anne Frank Remembered (1995)::Documentary       
    117::Young Poisoners Handbook, The (1995)::Crime|Drama 

So again, the bold 111 is the row number. Within row 111, there are 4 rows placed instead of just one. I checked the source .dat file and there seems to be no difference in formatting causing this. In the original .dat file all rownumbers also correspond with the id number (the second number). But in R some rows get placed for one rownumber.

Does anyone know what the problem is and how I can get one row per row number again?

EDIT: By the way, if someone wants to reproduce, here is the dataset I used (MovieLens) http://grouplens.org/datasets/movielens/

1

There are 1 best solutions below

0
On BEST ANSWER

Sorry, apparently I misused the separator (just started with R). Using Ilir's suggestion of using the function readLines() solved it. I used it instead of read.table to read movies.dat.

movies <- readLines("movies.dat")

then put it in a dataframe

dataframe <- data.frame(movies)

Thanks to Ilir!