C++, ifstream, and char

115 Views Asked by At

I'm reading PNG files from disk and inserting the bytes into a vector of chars, instantiated with the new keyword.

Opening the file:

std::ifstream file("./images/orange.png", std::ios_base::binary);

Instantiating a vector of length 8 (#define PNG_SIGNATURE 8), where all elements are initially an empty char:

std::vector<char>* header_buff { new std::vector<char>(PNG_SIGNATURE, ' ') };

Writing bytes to the vector, where f is a reference to the file, and b is a pointer to my vector buffer:

std::vector<char> chunk_reader(std::vector<char>* b, std::ifstream &f) {
    char c { };

    for(int i { }; i < (*b).size(); ++i) {
        f.get(c);
        (*b)[i] = c;
    };

    return *b;
};

The first byte of a PNG is always going to be HEX 0x89.

According to the reference material, the std::ifstream member type is of type char.


My question is, on my implementation: MinGW GCC for Windows, I'm able to debug my program and see the first byte of the PNG as a signed char, decimal value -119. I think for ease of parsing a stream of binary, I'd want to use unsigned char, so if I were to seek to a particular byte of interest, I could check if its unsigned value is correct in the source code as 137 instead of its signed representation.

To my knowledge, and based on the posts I've read on here with similar questions, C++ leaves the signedness of a char to the implementation for flexibility. So, if I insert the gcc flag -funsigned-char, I get the behavior I expect, and can visually see in the debugger the decimal value 137 for the first byte.

I've read on here from more experienced programmers that this is a band-aid, and to keep something like this to the source code for readability, aka a reinterpreted cast from char to unsigned char, which to my knowledge makes sense since they're both one byte of information.

But, then I see posts saying reinterpret cast is hacky, and should be left alone to the use cases it was designed for in the standard library.

Can someone offer some advice as to what is the best practice in this situation?

I'd ultimately like to perform certain validations of individual bytes, like in the case of a PNG the first byte of a named chunk will have a special meaning if it's an uppercase or lowercase ASCII character - which is simple if I'm dealing with unsigned chars and can use decimal values in the source code, in a switch statement for example.

I'm very new to C++, so I'd appreciate your advice. I'm interested in the best practices and scalability of designing such a system.

3

There are 3 best solutions below

0
Clifford On

Make the vector the type you need and cast the ifstream input at the earliest opportunity.

Moreover I'd suggest that if this is binary data rather than text, you use std::uint8_t.

std::vector<std::uint8_t> chunk_reader( std::vector<std::uint8_t>& b,
                                        std::ifstream &f) 
{
    char c { };

    for(int i { }; i < b.size(); ++i) 
    {
        f.get(c);
        b[i] = static_cast<std::uint8_t>(c) ;
    }

    return b;
};
0
Super-intelligent Shade On

I know you are trying to avoid reinterpret_cast, but this is what I would do. Main advantage of this method is that you are not reading the data one byte at a time.

#include <cstdint>
#include <fstream>
#include <vector>

constexpr auto PNG_SIGNATURE = 8;

int main()
{
    std::ifstream fs{"/path/to/file", std::ios::binary};

    std::vector<std::uint8_t> buf(PNG_SIGNATURE);
    fs.read(reinterpret_cast<char*>(buf.data()), buf.size());

    // use buf

    return 0;
}

Link: https://godbolt.org/z/vTne7Gj64

1
Jerry Coffin On

Ideally, what would probably be nice here would be for the ifstream to read a stream of std::bytes, in which case your entire question kind of evaporates, because you could just read the correct number of std::bytes, and have what you want.

Unfortunately, iostreams predate the addition of std::byte by a few decades. Worse, they're based closely on C's I/O model, which worked primarily with chars since time immemorial (though with C I/O, you typically start by reading data into an int, then check for EOF, and only convert the result to char after determining that what you just read is not EOF).

As for how to deal with your specific situation. A PNG header is rather different from most things we read from files. In particular, a PNG header doesn't really contain any information of interest--it's just an 8-byte signature that gives a pretty strong indication that what comes after it is actually PNG data.

As such, I'd personally go about this almost entirely differently than you have. Situations like this (where we expect some specific data in an input file, and just need to verify that it's present) arise fairly frequently, and I find that it's useful to overload operator>> to make it relatively easy:

std::istream &operator>>(std::istream &is, unsigned char const *expected) {
    while (*expected && (*expected == is.peek()))
        is.ignore(1);
    if (*expected) {
        is.setstate(std::ios::failbit);
    }
    return is;
}

Putting that to use in your case (reading/verifying a PNG header) would look something like this:

unsigned char const *pngHeader = "\x89\x50\x4e\x47\x0D\x0A\x1A\x0A";

std::ifstream input("somefile.png");

if (!input >> pngHeader) {
    std::cerr << "somefile.png is not really a PNG File\n");
}

Looking at your question more generally, however, let's consider reading a subsequent chunk from a PNG file. A chunk format looks roughly like this:

struct PngChunk {
    std::uint32_t type;

    // length is stored implicit in data.length()
    std::vector<std::uint32_t> data;
    std::uint32_t CRC;
};

To read this, I would use a reinterpret_cast, something on this order:

std::istream &operator(std::istream &is, PngChunk &p) {
    std::uint32_t length;

    is.read(reinterpret_cast<char *>(&length);
    is.read(reinterpret_cast<char *>(&p.type));
    length = ntohl(length);
    p.type = ntohl(p.type);

    if (isIEND(length, type)) {
        is.setstate(std::ios::fail);
        return is;
    }        

    p.resize(length);
    is.read(reinterpret_cast<char *>(p.data.data());
    is.read(reinterpret_cast<char *>(&p.CRC));
    p.CRC = ntohl(p.CRC);

    if (computeCrc(p.data) != p.CRC) {
        is.setstate(std::ios::failbit);
    }
    return is;
}

Putting all this together, we'd read a PNG file something like this:

std::ifstream input("somefile.png");

if (!input >> pngHeader) {
    std::cerr << "not a png file\n";
}
std::vector<PngChunk> chunks(std::istream_iterator<PngChunk>(input), {});

At this point, we've read in all the png chunks, verified all their CRCs, etc., so they're ready to process (decompress, display, etc.)

Summary

  • Don't store data for which you have no use.
  • Reading from a file is a case where reinterpret_cast often makes sense.