I'm calling a r-function from python script to apply smote on a dummy dataset. Here the majority class is 0(90%) and minority class is 1(10%). While calling r function directly giving me proper output but getting NA_character_ from same function calling from python. Below is the r function -
# file r_test.r
library(performanceEstimation)
rtest <- function(r_df, over_val, under_val) {
set.seed(0)
new_df <- smote(y ~ ., r_df, perc.over = over_val, perc.under = under_val, k = 5)
table(new_df$y)
return(new_df)
}
below is the python code to call this function -
import os
import numpy as np
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
from sklearn.datasets import make_classification
def function2(r_df, over_val, under_val):
r=ro.r
r.source(path)
p=r.rtest(r_df, over_val, under_val)
return p
path=os.path.join(os.getcwd(), "r_test.r")
X, y = make_classification(n_classes=2,
class_sep=2,
weights=[0.90, 0.10],
n_informative=4,
n_redundant=1,
flip_y=0,
n_features=5,
n_clusters_per_class=1,
n_samples=100,
random_state=10)
df = pd.DataFrame(X, columns = ["x1", "x2", "x3", "x4", "x5"])
df['y'] = y
df['y'].value_counts()
Output -
0 90
1 10
Name: y, dtype: int64
base = importr('base')
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(df)
with localconverter(ro.default_converter + pandas2ri.converter):
pd_from_r_df = ro.conversion.rpy2py(function2(r_from_pd_df, 5, 2))
pd_from_r_df['y'].value_counts()
Output -
0 100
NA_character_ 50
1 10
Name: y, dtype: int64
Number of NA_character_ is the exact number of minority class samples this smote function should generate. What mistake I'm making with the above code and instead of NA_character_, how could I get 1s? Note - completely new to R-language. If there is any problem in R code then please specify it with complete example.
Try converting that
ycolumn to factor first. Some other implementations (likethemis::smote()) will treat you with a nice informative error if types don't match.Walk-through with
reticulate, Python from R:Let's modify that function for a better match with examples in
?smote, i.e. turn response into factor:Created on 2023-09-30 with reprex v2.0.2