What is the difference between the read.csv and read_csv function in R when working with NULL values?

108 Views Asked by At

I am having an issue in R when uploading my csv file. For some reason, when using the read.csv function, my null values were not showing up as null after saving the csv file to a data frame. Does anyone know why the read.csv function is not showing null values but the read_csv function preserves the null values. NOTE: I'm a beginner and have been working with large datasets and R for about 5 weeks.

Using read.csv:

#Importing my csv file
(jan_data <- read.csv('202301-divvy-tripdata.csv'))

# Check for NULL values in the entire data frame
missing_values_df <- is.na(jan_data)

# Print the logical matrix indicating NULL values
print(missing_values_df)

# Count the number of NULL values in each column
print(colSums(missing_values_df))

The output:

ride_id      rideable_type         started_at           ended_at start_station_name 
                 0                  0                  0                  0                  0 
  start_station_id   end_station_name     end_station_id          start_lat          start_lng 
                 0                  0                  0                  0                  0 
           end_lat            end_lng      member_casual 
               127                127                  0 

Using readr::read_csv():

jan_data <- read_csv('202301-divvy-tripdata.csv')

#Check for NULL values in the entire data frame
missing_values_df <- is.na(jan_data)

#Print the logical matrix indicating NULL values
print(missing_values_df)

#Count the number of NULL values in each column
print(colSums(missing_values_df))```

Output:

ride_id      rideable_type         started_at           ended_at start_station_name 
                 0                  0                  0                  0              26721 
  start_station_id   end_station_name     end_station_id          start_lat          start_lng 
             26721              27840              27840                  0                  0 
           end_lat            end_lng      member_casual 
               127                127                  0 
1

There are 1 best solutions below

1
G. Grothendieck On

The difference in character columns when using the default na.strings= argument in read.csv and na= argument in read_csv is in empty fields. Try the code below where the second field on the last row is empty.

The reason for the difference is that by default na.strings="NA" in read.csv so empty fields result in zero length strings whereas in read_csv the default is na=c("", "NA") so empty fields result in an NA (not a NULL).

cat("A,B,C\na,b,c\nd,,e\n", file = "test.csv")
readr::read_csv("test.csv")
## ... snip ...
# A tibble: 2 × 3
  A     B     C    
  <chr> <chr> <chr>
1 a     b     c    
2 d     <NA>  e

read.csv("test.csv")
##   A B C
## 1 a b c
## 2 d   e