Uploading Files to Box via AWS Lambda Function - Issues with Multipart Parsing

128 Views Asked by At

I have an AWS Lambda function written in Python that is supposed to upload files to Box.com using their API. The Lambda function is triggered by an HTTP request with a multipart/form-data payload. The function should parse this payload and upload the file to Box. However, I'm encountering issues with multipart parsing and the file upload process, right now it is uploading file as corrupted.

Here's the code for my Lambda function:

import os
import json
import boto3
import uuid
import urllib3
from io import BytesIO
from multipart import to_bytes, MultipartParser

ssm = boto3.client('ssm', 'us-east-2')
http = urllib3.PoolManager()


def lambda_handler(event, context):

    if event['headers']['content-type'].startswith('multipart/form-data'):
        data = event['body']
        print("event: ", event)
        print("data: ", data)
        s = data.split("\r")[0][2:]
        p = MultipartParser(BytesIO(data.encode()), s)

        parts = p.parts()
        for part in parts:
            print("Key (Name):", part.name)

            if part.name == 'file':
                blob = part.value
                print(type(blob))
                temp_file_path = '/tmp/temp_file.xlsx'
                with open(temp_file_path, 'wb') as temp_file:
                    temp_file.write(blob.encode("latin-1"))
                if os.path.exists(temp_file_path):
                    with open(temp_file_path, 'rb') as file:
                        file_content = file.read()
                        print("File content:", file_content)

                # Define your Box access token
                token_url = os.environ['TOKEN_URL']
                access_token = getToken(token_url)
                setFileContent(access_token, blob.encode("latin-1"), str(uuid.uuid4()) + ".xlsx")
        print('File uploaded to Box successfully')

def setFileContent(token, file_data, file_name):
    url = "https://upload.box.com/api/2.0/files/content"
    headers = {"Authorization": "Bearer " + token}
    attributes = {"name": file_name, "parent": {"id": "124435"}}
    fields = {
        "attributes": json.dumps(attributes),
        'file': (file_name, file_data)
    }
    response = http.request('POST', url, headers=headers, fields=fields)
    print(response.status)
1

There are 1 best solutions below

0
John Rotenstein On

If I had to guess, I'd say that treating the file contents as latin-1 is not compatible with the binary format of an Excel file.

Here is a code example from 3 Ways to upload content to Box using Python | by Rui Barbosa | Box Developer Blog | Medium that might assist:

file_sha1 = hashlib.sha1()
    parts = []

    with open(file_path, "rb") as file_stream:
        for part_num in range(upload_session.total_parts):
            copied_length = 0
            chunk = b""
            while copied_length < upload_session.part_size:
                bytes_read = file_stream.read(upload_session.part_size - copied_length)
                if bytes_read is None:
                    # stream returns none when no bytes are ready currently 
                    # but there are potentially more bytes in the stream
                    # to be read.
                    continue
                if len(bytes_read) == 0:
                    # stream is exhausted.
                    break
                chunk += bytes_read
                copied_length += len(bytes_read)

            uploaded_part = upload_session.upload_part_bytes(
                chunk, part_num * upload_session.part_size, file_size
            )
            parts.append(uploaded_part)
            file_sha1.update(chunk)

Admittedly, it uses the Box Python SDK that will need to be included in your Python package for Lambda. However, the code example might show you an alternate way to extract and handle each chuck of binary rather than latin-1 content.