I am currently attempting to create a modem-like script in python that uses sound to respond to other instances of itself with sounddevice, kind of like a real modem used in the older days.
I have already developed some transmit and reply functions like a DTMF generator and binary converter, but I have a problem with detecting certain frequencies (440hz + 350hz aka. dial tone), which makes me unable to continue with listening to other sounds (DTMF, data, etc.) and reply in real time.
I am also pretty new to sounddevice and numpy, only used numpy code provided by other users for opencv. I have only figured out how to create and play chosen sinewaves for a chosen amount of time. For the receiving part I mostly used ChatGPT, but its code either didn´t reply or returned an error at all times, so I´ve decided to try and make one myself, but (atleast for me) the documentation doesn´t make sense for me, hopefully yet.
If you could help me in any way with the script ChatGPT gave me, here it is:
import sounddevice as sd
import numpy as np
# Parameters
target_frequencies = [440, 350] # Frequencies to detect (440Hz and 350Hz)
duration = 15 # Duration in seconds
sample_rate = 44100 # Sample rate
# Callback function for audio input
def audio_callback(indata, frames, time, status):
# Convert audio data to mono
mono_data = np.mean(indata, axis=1)
# Compute the Fast Fourier Transform (FFT)
fft_data = np.fft.fft(mono_data)
freqs = np.fft.fftfreq(len(fft_data), 1 / sample_rate)
# Find the indices of the target frequencies
target_indices = [np.argmin(np.abs(freqs - freq)) for freq in target_frequencies]
# Check if the target frequencies are present
if all(abs(fft_data[index]) > 10000 for index in target_indices):
print("yo yo yo")
# Start recording
with sd.InputStream(callback=audio_callback, channels=2, samplerate=sample_rate):
print("Listening for tones...")
sd.sleep(int(duration * 1000)) # Record for the desired duration
print("Recording finished")
Elseway, please atleast explain to me how for example InputStream works and how I can detect sounds from it.
Thank you!
I hope from what ChatGPT has produced that you've learned not to trust it for programming applications.
For your application it isn't enough to detect the maximum spectral component, and it certainly isn't enough to detect any such component above a blanket, arbitrary value of 10,000. Instead you need some kind of heuristic to compare the in-band spectral energies you care about to the total energy, and if that exceeds a threshold, then your tone is considered present. (In addition, you'll want a check for total energy to distinguish "some sound" from "background noise" for your environment; I have not shown this.)
FFT has many tradeoffs based on the sample size and the sampling frequency. You don't want resolution to be too low or you won't be able to distinguish good frequencies from bad. You don't want it too high or else each chunk will take longer than it needs to to capture (and take up more memory than it needs to, as well). You don't want sample size to be too low or you'll miss your lowest frequency. You don't want sample size to be too high or you'll take too long to capture a sample and won't respond as quickly as you could.
A reasonable value for frequency bucket size is 10 Hz in this case, because the greatest common factor of your two frequencies of interest is 10, and that's enough to distinguish those tones from other tones in the DTMF/POTS system.
Before trying it on your mic, try it on a canned file from Wikipedia: