C++20 changes reading into char arrays with `operator>>` - How to fix this?

156 Views Asked by At

EDIT: To summarize from the comments (before I close the topic):

  1. the issue has been discussed here previously:

The resolution was that the committee was aware they break code with this.

  1. I don't know what urgent issue lead the LWG to replace the old version (that was in the standard between C++98 and C++17 ie. for about 20 years) and not did it this way (as it IMHO was done with std::gets in C++14 - for a good reason):

Step 1: Only ADD the new templated version expecting a "reference to an array" from which at compile time the number of elements is deduced (which is surely is a protection against buffer overruns). It would have taken effect "from day one".

Step 2: deprecate the version that was in the standard from C++98 to C++17 and wait for community feedback if there use cases valid enough to keep it. Then, maybe remove it one standard later.

I think a valid use case is what I show here: https://godbolt.org/z/nG174vnqP

(extracted from real code but shortened to show the issue only). Contained is also a little demo why I think the old and the new version could well have co-existed. But maybe I'm wrong with that assessment.

What I find currently most annoying is there is no way to resolve the issue in C++20 without stepping into UB-land. Especially as I think the "old" version is still available internally and the new version just forwards to it - which is highly probably as you don't want a separate implementation for each different array length.


With the release of C++20, the operator>> overload for reading a char array now expects a char(&)[N] argument instead of a char*. The original code that compiled correctly since C++98, which looks like the following, will not work any more:

std::size_t sz = 10;
char *cp = new char[sz];
...
std::cin >> std::setw(sz) >> cp;

To correct this, the code can be modified as follows:

std::cin >> std::setw(sz) >> *reinterpret_cast<char(*)[std::numeric_limits<int>::max()]>(cp));

See here: https://godbolt.org/z/svPcT4eao

Additionally, there's an issue with a common implementation of variable-length strings that can silently change without indication at compile time.

To answer the generally asked question in the comments why I don't use std::string:

In fact, I use std::string a lot but I also occasionally coach people who work in projects where you don't want to add any unnecessary overhead and some prefer string classes that don't use three pointers when a single one is sufficient. The example is extracted from one of those.

Also I was pointed to this Can't use std::cin with char* or char[] in C++20 answer and yes, it is about the same topic but the even more important information in this answer is in an LWG it points to: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0487r1.html

It makes clear that valid C++17 code doesn't compile any longer but sadly it does't cover the silent change where code valid before isn't valid any longer but will not cause a compile time error:

At least given the

    struct vbuf {
        std_::size_t sz;
        char cbuf[1];
    };

technique with over-allocation hasn't been turned into UB by an earlier C++ standard.

In the comment below @n.m. remarked that this was already UB in C. He may be correct (I not checked all the C standard since C89 and I'm relatively sure it was not UB then) but at least it is a common technique (eg. in the buffers for Linux messages, see sndmsg(3P) etc.) and therefore I think it is a safe assumption - at least for the Linux family of compilers - this is well defined ans safe.

But I will not any longer make the claim the new version causes a "silent change" because this of course not applies if we are in UB-land.

0

There are 0 best solutions below