How do I know if I calculated DeepSpeech Confidence levels correctly?

36 Views Asked by At

I'm running the below script on a bunch of .json files I transcribed using this part of my script:

# Load all JSON files from the transcription directory
all_transcriptions = []
for file in os.listdir(TRANSCRIPTION_DIR):
    if file.endswith(".json"):
        with open(os.path.join(TRANSCRIPTION_DIR, file), 'r') as f:
            try:
                transcription = json.load(f)
                if 'transcripts' in transcription:
                    for transcript in transcription['transcripts']:
                        confidence = transcript['confidence']
                        if 'words' in transcript:
                            words_with_confidence = [{"word": word["word"], "confidence": confidence} for word in transcript['words']]
                            all_transcriptions.extend(words_with_confidence)
                else:
                    print(f"Warning: 'transcripts' key not found in {file}")
            except json.JSONDecodeError:
                print(f"Warning: Invalid JSON content in {file}")


# Confidence Analysis
plt.figure(figsize=(10, 5))
sns.histplot(df['confidence'], bins=50, kde=True)
plt.title('Distribution of Confidence Scores')
plt.xlabel('Confidence Score')
plt.ylabel('Frequency')
plt.show()

print(f"Average Confidence Score: {df['confidence'].mean()}")
print(f"Median Confidence Score: {df['confidence'].median()}")
print(f"Mode of Confidence Scores: {df['confidence'].mode().iloc[0]}")
print(f"Standard Deviation of Confidence Scores: {df['confidence'].std()}")

Resulting in these outputs:

Average Confidence Score: -15639.446837488811
Median Confidence Score: -14424.3515625
Mode of Confidence Scores: -29816.869140625
Standard Deviation of Confidence Scores: 8504.202492630297

This GitHub issue here also indicates negative numbers are better. However, no one has answered this guy's question, who also got negative confidence scores. A quick Google search doesn't help me understand it any easier either.

Is the current output correct for the DeepSpeech model Confidence logic?

0

There are 0 best solutions below