Google Storage HEAD response missing content-length

733 Views Asked by At

I am using a library (Apache Libcloud) to make requests to google storage. Inside the library, a HEAD request is made to the URL that is queried within google storage. You can find the code for it here, search for def get_object. The important line is

response = self.connection.request(object_path, method='HEAD')

According to the Google Documentation of HEAD response objects, a content-length field should be part of the response:

Response

...
expires: Wed, 30 May 2018 21:41:23 GMT
date: Wed, 30 May 2018 21:41:23 GMT
cache-control: private, max-age=0
last-modified: Wed, 30 May 2018 20:36:34 GMT
etag: "2218880ef78838266ecd7d4c1b742a0e"
x-goog-generation: 1486161811706000
x-goog-metageneration: 15
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 328
content-type: image/jpg
x-goog-hash: crc32c=HBrbzQ==
x-goog-hash: md5=OCydg52+pPG1Bwawjsl7DA==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 328  # <--- here
...

However, it is missing for some files (but not all files). I do receive a x-goog-stored-content-length entry in both cases, but the library needs content-length.

The content-length header is used a bit further down the call chain in def _headers_to_object, where I just get a KeyError due to the missing header:

    def _headers_to_object(self, object_name, container, headers):
        hash = headers['etag'].replace('"', '')
        extra = {'content_type': headers['content-type'],
                 'etag': headers['etag']}
        meta_data = {}

        if 'last-modified' in headers:
            extra['last_modified'] = headers['last-modified']

        for key, value in headers.items():
            if not key.lower().startswith(self.http_vendor_prefix + '-meta-'):
                continue

            key = key.replace(self.http_vendor_prefix + '-meta-', '')
            meta_data[key] = value

        obj = Object(name=object_name, size=headers['content-length'],  # <-- here
                     hash=hash, extra=extra,
                     meta_data=meta_data,
                     container=container,
                     driver=self)
        return obj

The question is: What could I possibly have done to the file when I uploaded it to cause google storage to not send that header? Or is this a bug within google storage (I doubt it)?

Some more info:

  • The content type is application/json
  • This file is gzip encoded.
  • It works for other gzip encoded files
  • I am using the Apache Libcloud API 3.3.0
  • I don't think it's a bug within libcloud since the documentation of HEAD specifies the content-length header, but it works if I overwrite _headers_to_object to use x-goog-stored-content-length.
  • I am unable to reproduce this with a file that I could make public at the moment to demonstrate
1

There are 1 best solutions below

0
feletodev On

I also needed this when integrating with a third-party provider that does not accept gzip encoded media url.

I ended up solving it by creating a "proxy API route" on my backend that adds a "content-type" response header, based on the "x-goog-stored-content-length" header from Google Cloud Storage.

Example (Node.js):

const fetchMediaResponse = await axios({
  method: "get",
  url: mediaUrl,
  responseType: "stream"
});

const contentType = fetchMediaResponse.headers["content-type"];
const contentLength =
  fetchMediaResponse.headers["x-goog-stored-content-length"];

res.writeHead(200, {
  "content-type": contentType,
  "content-length": contentLength
});

return fetchMediaResponse.data.pipe(res);