I am working on a data set that has 21 attributes. 16 are categorical, 3 are ordinal factors and 2 are date/ time(target variable). Number of rows are 14512.
What I what to achieve: This data set is basically about daily office incidents closed by different teams, and we are trying to predict the time that will be taken in case of certain predictor variables.
I am using R-Studio for the analysis.
Work done: So I thought to use Knn for computation and converted all predictors to binary dummy variables and target variable to A, B,C classed categorical.
Issue: Now once I apply the knn function example:
RPS_test_pred <- knn(train = RPS_train, test = RPS_test,cl = RPS_train_labels, k=1121)
keeping k as 1121(as we have 14513 rows in the data set, also training and test data divided to 70:30 ratio)
R studio crashes and closes stating - a fatal error occurred.
Please suggest any other way to compute this data or any other modelling technique that I should use which will suit this type of data more with example.
In the past I have worked with datasets containing many ordinal and categorical variables and have found success in doing some transformations to make them numerical. Here are some examples from work with housing price data.
Ordinal Variables I would start by recommending to change your ordinal variables into numerical values based on their relative order:
Categorical Variables Has worked to utilize group rankings based on the mean of the response variable you are looking at(Sale Price in my case):
More examples can be found in the code space here: https://www.kaggle.com/skirmer/fun-with-real-estate-data/code