Calculate the conditional probability from a data frame?

24 Views Asked by At

I have this 2 data frame (the first one is a dataframe of terms and the likelihood for each class :

index Positif Negatif
moga 0.046053 0.007463
terang 0.006579 0.029851
........ ........ ........
kualitas 0.013158 0.014925

and

stemmed
['min', 'tol', 'cilegon', 'serang', 'minim', 'lampu', 'terang', 'jalan', 'bayar', 'tol', 'mahal', 'emang', 'bayar', 'tol', 'doang', 'tah', 'tidak', 'plus', 'terang', 'jalan', 'tidak', 'lubang']
['moga', 'selesai', 'guna', 'jalan', 'tol', 'nyaman', 'lancar', 'jalan']

I want to calculate the conditional probability of the tokens on the stemmed column for each class (positive and negative).

# proses perhitungan Posterior

res = []
temp_df_test = [df_test.loc[k] for k in range(len(df_test))]
for idx, tr in enumerate(temp_df_test):
  temp_res = {'Positif': 0, 'Negatif': 0}
  row_training = tr['stemmed']
  for key in row_training:
    if key in list(LLHood_test.index):
      row  = LLHood_test.loc[key]
      temp_res['Positif'] = temp_res['Positif'] * row['Positif'] if temp_res['Positif'] != 0 else row["Positif"]
      temp_res['Negatif'] = temp_res['Negatif'] * row["Negatif"] if temp_res['Negatif'] != 0 else row["Negatif"]


  # kalikan dengan masing-masing prior
  temp_res.update({
      "Positif": temp_res['Positif'] * PriorPositif,
      "Negatif": temp_res['Negatif'] * PriorNegatif
  })

  # tambah key (Kelas) hasil pembanding terbesar antara negatif dan positif 
  temp_res.update({
      "Prediksi": "Positif" if temp_res['Positif'] > temp_res['Negatif'] else "Negatif"
  })
  res.append(temp_res)

hasil_test = pd.DataFrame.from_dict(res)

I've tried with this code, but the result aren't the same as I calculated manually with excel.

0

There are 0 best solutions below