How to preprocess address, latitude and longitude features in tabular data for Pytorch

38 Views Asked by At

I have cleared my data into the next 6 columns that you see ahead that it is my input data. I split the dataset to have the label in another variable Y.

My main problem: I don't know how to preprocess the data to have a good input to any model.

My dataset X looks like this:

desc tipo address region latitude longitude
Galpón Industria Subdivisión de la Finca Denominada Violeta S/N Región de Arica y Parinacota-Arica -19.423411 -11.371551
  • desc - string
  • tipo - string
  • address - string
  • region - string
  • latitude - string
  • longitude - string

My dataset Y looks like

CIO
169379

What I tried

I have followed this tutorial that allows me to comprehend a little bit about tabular data, but the data is completely different and I don't know if it fits me as well. So, my code transformed all the data to a LabelEncoder, but it is obvious that doesn't apply to latitude and longitude.

for col in df.columns:
    if df.dtypes[col] == "object":
        df[col] = df[col].fillna("NA")
    else:
        df[col] = df[col].fillna(0)
    df[col] = LabelEncoder().fit_transform(df[col])

for col in df.columns:
    df[col] = df[col].astype('category')

Also, the author used some Categorical Embedding that I don't know if works properly with my kind of data too.

0

There are 0 best solutions below