So I have a scientific data Excel file validation form in django that works well. It works iteratively. Users can upload files as they accumulate new data that they add to their study. The DataValidationView inspects the files each time and presents the user with an error report that lists issues in their data that they must fix.
We realized recently that a number of errors (but not all) can be fixed automatically, so I've been working on a way to generate a copy of the file with a number of fixes. So we rebranded the "validation" form page as a "build a submission page". Each time they upload a new set of files, the intention is for them to still get the error report, but also automatically receive a downloaded file with a number of fixes in it.
I learned just today that there's no way to both render a template and kick off a download at the same time, which makes sense. However, I had been planning to not let the generated file with fixes hit the disk.
Is there a way to present the template with the errors and automatically trigger the download without previously saving the file to disk?
This is my form_valid method currently (without the triggered download, but I had started to do the file creation before I realized that both downloading and rendering a template wouldn't work):
def form_valid(self, form):
"""
Upon valid file submission, adds validation messages to the context of
the validation page.
"""
# This buffers errors associated with the study data
self.validate_study()
# This generates a dict representation of the study data with fixes and
# removes the errors it fixed
self.perform_fixes()
# This sets self.results (i.e. the error report)
self.format_validation_results_for_template()
# HERE IS WHERE I REALIZED MY PROBLEM. I WANTED TO CREATE A STREAM HERE
# TO START A DOWNLOAD, BUT REALIZED I CANNOT BOTH PRESENT THE ERROR REPORT
# AND START THE DOWNLOAD FOR THE USER
return self.render_to_response(
self.get_context_data(
results=self.results,
form=form,
submission_url=self.submission_url,
)
)
Before I got to that problem, I was compiling some pseudocode to stream the file... This is totally untested:
import pandas as pd
from django.http import HttpResponse
from io import BytesIO
def download_fixes(self):
excel_file = BytesIO()
xlwriter = pd.ExcelWriter(excel_file, engine='xlsxwriter')
df_output = {}
for sheet in self.fixed_study_data.keys():
df_output[sheet] = pd.DataFrame.from_dict(self.fixed_study_data[sheet])
df_output[sheet].to_excel(xlwriter, sheet)
xlwriter.save()
xlwriter.close()
# important step, rewind the buffer or when it is read() you'll get nothing
# but an error message when you try to open your zero length file in Excel
excel_file.seek(0)
# set the mime type so that the browser knows what to do with the file
response = HttpResponse(excel_file.read(), content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
# set the file name in the Content-Disposition header
response['Content-Disposition'] = 'attachment; filename=myfile.xlsx'
return response
So I'm thinking either I need to:
- Save the file to disk and then figure out a way to make the results page start its download
- Somehow send the data embedded in the results template and sent it back via javascript to be turned into a file download stream
- Save the file somehow in memory and trigger its download from the results template?
What's the best way to accomplish this?
UPDATED THOUGHTS:
I recently had done a simple trick with a tsv file where I embedded the file content in the resulting template with a download button that used javascript to grab the innerHTML of the tags around the data and start a "download".
I thought, if I encode the data, I could likely do something similar with the excel file content. I could base64 encode it.
I reviewed past study submissions. The largest one was 115kb. That size is likely to grow by an order of magnitude, but for now 115kb is the ceiling.
I googled to find a way to embed the data in the template and I got this:
import base64
with open(image_path, "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode('utf-8')
ctx["image"] = image_data
return render(request, 'index.html', ctx)
I recently was playing around with base64 encoding in javascript for some unrelated work, which leads me to believe that embedding is do-able. I could even trigger it automatically. Anyone have any caveats to doing it this way?
Update
I have spent all day trying to implement @Chukwujiobi_Canon's suggestion, but after working through a lot of errors and things I'm inexperienced with, I'm at the point where I am stuck. A new tab is opened (but it's empty) and a file is downloaded, but it won't open (and there's a error in the browser console saying "Frame load interrupted".
I implemented the django code first and I think it is working correctly. When I submit the form without the javascript, the browser downloads the multipart stream, and it looks as expected:
--3d6b6a416f9b5
Content-Type: application/octet-stream
Content-Range: bytes 0-9560/9561
PK?N˝Ö€]'[Content_Types].xm...
...
--3d6b6a416f9b5
Content-Type: text/html
Content-Range: bytes 0-16493/16494
<!--use Bootstrap CSS and JS 5.0.2-->
...
</html>
--3d6b6a416f9b5--
Here's the javascript:
validation_form = document.getElementById("submission-validation");
// Take over form submission
validation_form.addEventListener("submit", (event) => {
event.preventDefault();
submit_validation_form();
});
async function submit_validation_form() {
// Put all of the form data into a variable (formdata)
const formdata = new FormData(validation_form);
try {
// Submit the form and get a response (which can only be done inside an async functio
let response;
response = await fetch("{% url 'validate' %}", {
method: "post",
body: formdata,
})
let result;
result = await response.text();
const parsed = parseMultipartBody(result, "{{ boundary }}");
parsed.forEach(part => {
if (part["headers"]["content-type"] === "text/html") {
const url = URL.createObjectURL(
new Blob(
[part["body"]],
{type: "text/html"}
)
);
window.open(url, "_blank");
}
else if (part["headers"]["content-type"] === "application/octet-stream") {
console.log(part)
const url = URL.createObjectURL(
new Blob(
[part["body"]],
{type: "application/octet-stream"}
)
);
window.location = url;
}
});
} catch (e) {
console.error(e);
}
}
function parseMultipartBody (body, boundary) {
return body.split(`--${boundary}`).reduce((parts, part) => {
if (part && part !== '--') {
const [ head, body ] = part.trim().split(/\r\n\r\n/g)
parts.push({
body: body,
headers: head.split(/\r\n/g).reduce((headers, header) => {
const [ key, value ] = header.split(/:\s+/)
headers[key.toLowerCase()] = value
return headers
}, {})
})
}
return parts
}, [])
}
The server console output looks fine, but so far, the outputs are non-functional.
For posterity, a guide to HTTP 1.1 multipart/byteranges
Responseimplemented in Django. For more information on multipart/byteranges see RFC 7233.The format of a multipart/byteranges payload is as follows:
You get the idea. The first two are of the same binary data split into two streams, the third is a JSON string sent in one stream and the fourth is a HTML string sent in one stream.
In your case, you are sending a
Filetogether with yourHTMLtemplate.See this example [stackoverflow] on parsing a multipart/byteranges on the client.