KeyError: "['Building Age', 'Floor', 'Number of Floors'] not in index"

48 Views Asked by At
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import category_encoders as ce

# Read the data
transactions_master_df = pd.read_csv('my_data.csv')

# Calculate the average house price for each district
avg_price_per_district = transactions_master_df.groupby('District')['Price'].mean().reset_index()
avg_price_per_district.rename(columns={'Price': 'AvgPrice'}, inplace=True)

#print the average price for each district with the district column next to it
print(avg_price_per_district)

# Merge the average price information with the original DataFrame
transactions_master_df = pd.merge(transactions_master_df, avg_price_per_district, on='District', how='left')

# Binary encode the 'District' feature
encoder = ce.BinaryEncoder(cols=['District'], base=6)
transactions_encoded = encoder.fit_transform(transactions_master_df)

# Concatenate additional features to the encoded DataFrame
additional_features = ['Building Age', 'Floor', 'Number of Floors', 'Elevator', 
                      'number of bathrooms', 'Otopark', 'steeped alley', 
                      'material used and luxuriness', 'view', 
                      'prestige of that district and its vicinity']

# Check if additional features are present in the transactions_encoded DataFrame
for feature in additional_features:
    if feature not in transactions_encoded.columns:
        print(f"Warning: {feature} column not found in transactions_encoded DataFrame.")

# Concatenate additional features to the encoded DataFrame
final_features = pd.concat([transactions_encoded[['District_0', 'District_1', 'District_2', 'SquareMeter']], 
                            transactions_encoded[additional_features]], axis=1)

# Ensure 'final_features' contains the necessary columns for training
print(final_features.head())


hello, In this code, I'm building a model for my housing Prices dataset. First I'm encoding some of my non-numeric features and then when I'm concatenating the rest of my features to receive the final_features variable, I get the following error:

final_features = pd.concat([transactions_encoded[['District_0', 'District_1', 'District_2', 'SquareMeter']], 
---> 38                             transactions_encoded[additional_features]], axis=1)
KeyError: "['Building Age', 'Floor', 'Number of Floors'] not in index"

weirdly, these features exist in my dataset but I don't know why it gives this error to me.

1

There are 1 best solutions below

0
tousif Noor On

It appears that there are extra spaces in the column names 'Building Age', 'Floor', and 'Number of Floors' in your DataFrame. This discrepancy in column names with extra spaces is causing the KeyError.