I used the following code to load in some data.
movies <- read.table("movies.dat", header=FALSE, sep="\n")
Most of the data is loaded in well, like this:
58 58::Postman, The (Postino, Il) (1994)::Comedy|Drama|Romance
59 59::Confessional, The (Confessionnal, Le) (1995)::Drama|Mystery
60 60::Indian in the Cupboard, The (1995)::Adventure|Children|Fantasy
The first number of each row being the row number within R, the rest is a string in one column. But some rows appear like this:
111 114::Margarets Museum (1995)::Drama
115::Happiness Is in the Field (Bonheur est dans le pré, Le) (1995)::Comedy
116::Anne Frank Remembered (1995)::Documentary
117::Young Poisoners Handbook, The (1995)::Crime|Drama
So again, the bold 111 is the row number. Within row 111, there are 4 rows placed instead of just one. I checked the source .dat file and there seems to be no difference in formatting causing this. In the original .dat file all rownumbers also correspond with the id number (the second number). But in R some rows get placed for one rownumber.
Does anyone know what the problem is and how I can get one row per row number again?
EDIT: By the way, if someone wants to reproduce, here is the dataset I used (MovieLens) http://grouplens.org/datasets/movielens/
Sorry, apparently I misused the separator (just started with R). Using Ilir's suggestion of using the function readLines() solved it. I used it instead of read.table to read movies.dat.
then put it in a dataframe
Thanks to Ilir!