How to store either std::string or std::string_view in a std::variant?

260 Views Asked by At

I am working on a lexer. I have a Token struct, which looks like this:

struct Token {
    enum class Type { ... };
    
    Type type;
    std::string_view lexeme;
}

The Token's lexeme is just a view to a small piece of the full source code (which, by the way, is also std::string_view).

The problem is that I need to re-map special characters (for instance, '\n'). Storing them as-is isn't a nice solution.

I've tried replacing lexeme's type with std::variant<std::string, std::string_view>, but it has quickly become spaghetti code, as every time I want to read the lexeme (for example, to check if the type is Bool and lexeme is "true") it's a big pain.

Storing lexeme as an owning string won't solve the problem.

By the way, I use C++20; maybe there is a nice solution for it?

2

There are 2 best solutions below

0
Jan Schultke On BEST ANSWER

You could just use std::string

Firstly, a std::string could be used in a Token just as well as a std::string_view. This might not be as costly as you think, because std::string in all C++ standard libraries has SSOs (small string optimizations).

This means that short tokens like "const" wouldn't be allocated on the heap; the characters would be stored directly inside the container. Before bothering with std::string_view and std::variant, you might want to measure whether allocations are even being a performance issue. Otherwise, this is a case of premature optimization.

If you insist on std::variant ...

User @Homer512 has provided a solid solution already. Rather than using the std::variant directly, you could create a wrapper around it which provides a string-like interface for both std::string and std::string_view.

This is easy to do, because the name and meaning of most member functions is identical for both classes. That also makes them easy to use through std::visit.

struct MaybeOwningString
{
    using variant_type = std::variant<std::string, std::string_view>;
    using size_type = std::string_view::size_type;

    variant_type v;

    // main member function which grants access to either alternative as a view
    std::string_view view() const noexcept {
        return std::visit([](const auto& str) -> std::string_view {
            return str;
        }, v);
    }

    // various helper functions which expose commonly used member functions
    bool empty() const noexcept {
        // helper functions can be implemented with std::visit, but this is verbose
        return std::visit([](const auto& str) {
            return str.empty();
        }, v);
    }

    size_type size() const noexcept {
        // helper functions can also be implemented by using view()
        return view().size();
    }

    // ...
};
2
Homer512 On

It seems to me that all you need is to encapsulate the variant to provide a uniform interface to both. Since it is dirt-cheap to convert an std::string to an std::string_view and it is equally cheap to copy an std::string_view, you can just create a method for that and access the content like that.

struct OptOwnString
{
    using variant_t = std::variant<std::string, std::string_view>;
    variant_t value;

    std::string_view view() const noexcept
    {
        /**
         * Note: noexcept since it is effectively impossible to
         * make this particular variant valueless_by_exception
         */
        return std::visit([](auto const& v) {
              return std::string_view(v); }, value);
    }
};

int main()
{
    OptOwnString owning { std::string("foo") };
    std::cout << owning.view() << '\n';
    OptOwnString borrowed { owning.view() };
    std::cout << borrowed.view() << '\n';
}