Using the Alexa voice service RestAPI with cURL

3.2k Views Asked by At

I'd like to the the Alexa voice API (https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/rest/speechrecognizer-requests) with curl. The voicerecogniser API call is more complex than I'm used to using and needs an MP3 file attaching that includes the voice sample. Can anyone advise on how the following would be structured with curl? (There's more info at the given link)

POST /v1/avs/speechrecognizer/xxxxxxxxxxxx HTTP/1.1

Host: access-alexa-na.amazon.com
Authorization: Bearer xxxxxxxxxxxx
Content-Type: multipart/form-data; boundary=boundary_term
Transfer-Encoding: chunked

--boundary_term
Content-Disposition: form-data; name="request"
Content-Type: application/json; charset=UTF-8

{
    "messageHeader": {
        "deviceContext": [
            {
                "name": "playbackState",
                "namespace": "AudioPlayer"
                "payload": {
                    "streamId": "xxxxxxxxxxxx",
                    "offsetInMilliseconds": "xxxxxxxxxxxx",
                    "playerActivity": "xxxxxxxxxxxx"
                }
            },
            {
                ...
            },
            ...
        ]
    },
    "messageBody": {
        "profile": "alexa-close-talk",
        "locale": "en-us",
        "format": "audio/L16; rate=16000; channels=1"
    }
}

--boundary_term
Content-Disposition: form-data; name="audio"
Content-Type: audio/L16; rate=16000; channels=1

...encoded_audio_data...

--boundary_term--
1

There are 1 best solutions below

0
On

I'm no bash expert but this how I was able to interact with AVS using cURL. I generate a file containing the multipart body content which includes the binary audio data and pass that along to cURL.

############################################################
# First we creat a bunch of variables to hold data.
############################################################

# Auth token
TOKEN="Atza|IQEBLjAsAhR..."

# Boundary
BOUNDARY="BOUNDARY1234"
BOUNDARY_DASHES="--"

# Newline characters
NEWLINE='\r\n';

# Metadata headers
METADATA_CONTENT_DISPOSITION="Content-Disposition: form-data; name=\"metadata\"";
METADATA_CONTENT_TYPE="Content-Type: application/json; charset=UTF-8";

# Audio headers
AUDIO_CONTENT_TYPE="Content-Type: audio/L16; rate=16000; channels=1";
AUDIO_CONTENT_DISPOSITION="Content-Disposition: form-data; name=\"audio\"";

# Metadata JSON body
METADATA="{\
\"messageHeader\": {},\
\"messageBody\": {\
\"profile\": \"alexa-close-talk\",\
\"locale\": \"en-us\",\
\"format\": \"audio/L16; rate=16000; channels=1\"\
}\
}"

############################################################
# Then we start composing the body using the variables.
############################################################

# Compose the start of the request body
POST_DATA_START="
${BOUNDARY_DASHES}${BOUNDARY}${NEWLINE}${METADATA_CONTENT_DISPOSITION}${NEWLINE}\
${METADATA_CONTENT_TYPE}\
${NEWLINE}${NEWLINE}${METADATA}${NEWLINE}${NEWLINE}${BOUNDARY_DASHES}${BOUNDARY}${NEWLINE}\
${AUDIO_CONTENT_DISPOSITION}${NEWLINE}${AUDIO_CONTENT_TYPE}${NEWLINE}"

# Compose the end of the request body
POST_DATA_END="${NEWLINE}${NEWLINE}${BOUNDARY_DASHES}${BOUNDARY}${BOUNDARY_DASHES}${NEWLINE}"

# Now we create a request body file to hold everything including the binary audio data.

# Write metadata to body file
echo -e $POST_DATA_START > multipart_body.txt

# Append binary audio data to body file
cat hello.wav >> multipart_body.txt

# Append closing boundary to body file
echo -e $POST_DATA_END >> multipart_body.txt

############################################################
# Finally we get to compose the cURL request command
# passing it the generated request body file as the multipart body.
############################################################

# Compose cURL command and write to output file
curl -X POST \
  -H "Authorization: Bearer ${TOKEN}"\
  -H "Content-Type: multipart/form-data; boundary=${BOUNDARY}"\
  --data-binary @foo.txt\
  https://access-alexa-na.amazon.com/v1/avs/speechrecognizer/recognize\
  > response.txt

The audio MUST be mono channel, sampled at 16k Hz, and signed 16 bit PCM. Otherwise AVS sends nothing back.

For more information check out my Alexa Voice Service (AVS) with cURL blog post.