I have written some code for a linear regression model to predict house prices. I'm witting exactly the same as a tutorial video; when I write random_state=42 it works without any error, but when I change the random_state to any other number it give this error.
Here is the code:
from sklearn.model_selection import train_test_split  
X = data.drop('SalesPrice', axis = 1)
y = data['SalesPrice']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
print("Actual value of the house: ", y_test[0])
print("Model prediction value: ", predictions[0])
and this is the error:
KeyError                                  Traceback (most recent call last)
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\_libs\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\_libs\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
Cell In[66], line 3
      1 predictions = lr.predict(X_test)
----> 3 print("Actual value of the house: ", y_test[0])
      4 print("Model prediction value: ", predictions[0])
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\series.py:1007, in Series.__getitem__(self, key)
   1004     return self._values[key]
   1006 elif key_is_scalar:
-> 1007     return self._get_value(key)
   1009 if is_hashable(key):
   1010     # Otherwise index.get_value will raise InvalidIndexError
   1011     try:
   1012         # For labels that don't resolve as scalars like tuples and frozensets
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\series.py:1116, in Series._get_value(self, label, takeable)
   1113     return self._values[label]
   1115 # Similar to Index.get_value, but we do not fall back to positional
-> 1116 loc = self.index.get_loc(label)
   1118 if is_integer(loc):
   1119     return self._values[loc]
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3655, in Index.get_loc(self, key)
   3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.
   3660     self._check_indexing_error(key)
KeyError: 0
				
                        
As the traceback mentions, the error originates from
print("Actual value of the house: ", y_test[0]).y_test[0]will only work when randomly 20% of the data also has the0th index in it aftertrain_test_split. That's why it works for some values ofrandom_stateand not for most.Generally you want to use either:
TLDR: Replace
y_test[0]in your print stament