I have a data frame in R with box office number listed like $121.5M and $0.014M and I'd like to convert them to straight numbers. I'm thinking of striping the $ and M and then using basic multiplication. Is there a better way to do this?
Converting Movie Box Office to Numbers
87 Views Asked by Phillip Black At
3
There are 3 best solutions below
0
On
This removes the $ and translates K and M to e3 and e6. There is an example very similar to this in the gsubfn vignette.
library(gsubfn)
x <- c("$1.21M", "$100K") # input
ch <- gsubfn("[KM$]", list(K = "e3", M = "e6", "$" = ""), x)
as.numeric(ch)
## [1] 1210000 100000
The as.numeric line can be omitted if you don't need to convert it to numeric.
0
On
The function extract_numeric from the tidyr package strips all non-numeric characters from a string and returns a number. With your example:
library(tidyr)
dat <- data.frame(revenue = c("$121.5M", "$0.014M"))
dat$revenue2 <- extract_numeric(dat$revenue)*1000000
dat
revenue revenue2
1 $121.5M 121500000
2 $0.014M 14000
You could do this either by matching the non-numeric elements (
[^0-9.]*) and replace it by''Or by specifically matching the
$andM([$M]) and replace it with''Update
If you have a vector like below
Create another vector with the numbers and set the names with the corresponding abbrevations
Use that as index to replace the abbrevation and multiply it with the numeric part of the vector.