I have this 2 data frame (the first one is a dataframe of terms and the likelihood for each class :
| index | Positif | Negatif |
|---|---|---|
| moga | 0.046053 | 0.007463 |
| terang | 0.006579 | 0.029851 |
| ........ | ........ | ........ |
| kualitas | 0.013158 | 0.014925 |
and
| stemmed |
|---|
| ['min', 'tol', 'cilegon', 'serang', 'minim', 'lampu', 'terang', 'jalan', 'bayar', 'tol', 'mahal', 'emang', 'bayar', 'tol', 'doang', 'tah', 'tidak', 'plus', 'terang', 'jalan', 'tidak', 'lubang'] |
| ['moga', 'selesai', 'guna', 'jalan', 'tol', 'nyaman', 'lancar', 'jalan'] |
I want to calculate the conditional probability of the tokens on the stemmed column for each class (positive and negative).
# proses perhitungan Posterior
res = []
temp_df_test = [df_test.loc[k] for k in range(len(df_test))]
for idx, tr in enumerate(temp_df_test):
temp_res = {'Positif': 0, 'Negatif': 0}
row_training = tr['stemmed']
for key in row_training:
if key in list(LLHood_test.index):
row = LLHood_test.loc[key]
temp_res['Positif'] = temp_res['Positif'] * row['Positif'] if temp_res['Positif'] != 0 else row["Positif"]
temp_res['Negatif'] = temp_res['Negatif'] * row["Negatif"] if temp_res['Negatif'] != 0 else row["Negatif"]
# kalikan dengan masing-masing prior
temp_res.update({
"Positif": temp_res['Positif'] * PriorPositif,
"Negatif": temp_res['Negatif'] * PriorNegatif
})
# tambah key (Kelas) hasil pembanding terbesar antara negatif dan positif
temp_res.update({
"Prediksi": "Positif" if temp_res['Positif'] > temp_res['Negatif'] else "Negatif"
})
res.append(temp_res)
hasil_test = pd.DataFrame.from_dict(res)
I've tried with this code, but the result aren't the same as I calculated manually with excel.