Change of categorical data to numeric data in the required columns so that a linear regression can be apply to it

262 Views Asked by At

enter image description here

some of the columns on the given dataset contains categorical data. I have to change the data to the numeric so that I can apply simple linear regression to predict the score. The columns name are city, batting_team, bowling_team, batsman, non_striker, bowler. I have to use the features to predict the score by applying simple linear regression. I used one hot encoder to change the datatype but i'm not able to write perfect code for it. Please help me

2

There are 2 best solutions below

0
Ibrat Usmonov On

It will be more precise and beneficial to help you with that problem, if you have given the codes, dataset and output. However, here some steps you can do:

  1. check the categorical data whether it can be changed to numeric one
  2. If yes, you can cast categorical data into numeric one just like:
df['price'] = df['price'].astype("int")

This will return numeric data. Hope you find it useful.

0
Anna Andreeva Rogotulka On

To convert a string column to a numeric column for regression using scikit-learn (sklearn), you typically need to perform encoding on categorical data.

OneHotEncoder to convert the categorical string column into multiple binary columns, it's more suitable for not many unique values of category column.

LabelEncoder assigns a unique integer to each unique category in your string column.

from sklearn.preprocessing import OneHotEncoder, LabelEncoder
label_encoder = LabelEncoder()
data["bowling_team"] = label_encoder.fit_transform(data["bowling_team"])

# or you can use another type of encoders
encoder = OneHotEncoder(sparse=False, drop='first')
data["bowling_team"] = encoder.fit_transform(data[["bowling_team"]])