Serialization of simple structures without external libraries

1.2k Views Asked by At

I'm trying to serialize simple single level classes like the ones bellow, without external libraries like boost, and without having to implement serializer function for every class. Even though I have so few classes that I could easily implement a serializer for each one, for future reference, I would like to have at hand a simple solution that scales well.

The requirement for each class to be serialized is that its members are only serializable types, and an array of member pointers is defined, so that at serialization the members can be iterated regardless of which class is passed.

The problem is that the compilation fails because of the missing cast where the member pointer is dereferenced, obviously:

esp32.ino: 122:35: error: 'footprint.MessageFootprint<1>::Members[i]' cannot be used as a member pointer, since it is of type 'void*

I don't know how to store the member pointers in a iterable collection or how to avoid the void* cast. That's my goal. I want to iterate over class members at serialization having a single generic serialization function. I don't know what to do.

enum SerializableDataTypes {
    SerInt,
    SerFloat,
    SerString,
    SerIntArray
};

template <int N>
struct MessageFootprint {
    SerializableDataTypes DataTypes[N];
    void* Members[N];
};

template<typename T, typename R>
void* void_cast(R(T::*m))
{
    union
    {
        R(T::*pm);
        void* p;
    };
    pm = m;
    return p;
}

class ControlMessage{};

// first structure to be serialized
class Message1 : public ControlMessage {
public:
    int prop1;
    int prop2;
};
const int Message1MemberCount = 2;
const MessageFootprint<Message1MemberCount> Message1FootPrint = { { SerInt, SerInt }, {void_cast(&Message1::prop1), void_cast(&Message1::prop2)} };

// second structure to be serialized
class Message2 : public ControlMessage {
public:
    int prop1;
    String prop2;
};
const int Message2MemberCount = 2;
const MessageFootprint<Message2MemberCount> Message2FootPrint = { { SerInt, SerInt }, {void_cast(&Message2::prop1), void_cast(&Message2::prop2)} };

template<int N>
void SerializeMessage(MessageFootprint<N> footprint, ControlMessage message) {
    for (int i = 0; i < N; i++) {
        if (footprint.DataTypes[i] == SerInt) {
            // serialization code here based on data type
            // for demonstration purposes it's only written in the serial port
            logLine(String(i));
            Serial.println(*((int*)(message.*(footprint.Members[i]))));
        }
    }
}

void main() {
    // usage example
    Message1 msg = Message1();
    msg.prop1 = 1;
    msg.prop2 = 2;
    SerializeMessage(Message1FootPrint, msg);
}
2

There are 2 best solutions below

0
HTNW On

Don't erase types; that is, don't cast your pointers to void*. If you preserve the types of the pointers through templates, you can choose the deserialization functions directly off their types, and thus you won't even have to specify them. Indeed, you already have a bug where you have marked the second member of Message2 SerInt when it is a String. If you work off the actual types instead of forcing the user to duplicate them, you avoid such errors. Also, the common superclass is completely unnecessary.

template<typename T, typename... Parts>
struct MessageFootprint {
    std::tuple<Parts T::*...> parts;
    MessageFootprint(Parts T::*... parts) : parts(parts...) { }
};
template<typename T, typename... Parts>
MessageFootprint(Parts T::*...) -> MessageFootprint<T, Parts...>; // deduction guide

// e.g.
struct Message1 {
    int prop1;
    int prop2;
};
inline MessageFootprint footprint1(&Message1::prop1, &Message1::prop2);
// deduction guide allows type of footprint1 to be inferred from constructor arguments
// it is actually MessageFootprint<Message1, int, int>
// if you are on a C++ standard old enough to not have deduction guides,
// you will have to manually specify them
// this is still better than letting the types be erased, because now the compiler
// will complain if you get it wrong

// e.g. if I replicate your mistake
struct Message2 {
    int prop1;
    std::string prop2;
};
inline MessageFootprint<Message2, int, int> footprint2(&Message2::prop1, &Message2::prop2);
// This does not go through because    ^^^ is wrong

Serialization is probably best handled with overloading. For each Part T::* in a MessageFootprint<T, Part...>, extract a Part& from the T and call out to an overloaded function that decides what to do based on Part:

// I have no idea what serial port communication stuff you're doing
// but this gets the point across
void SerializeAtom(int i) { std::cout << "I" << i; }
void SerializeAtom(std::string const &s) { std::cout << "S" << s.size() << "S" << s; }

template<typename T, typename... Parts>
void SerializeFootprint(MessageFootprint<T, Parts...> footprint, T const &x) {
    // calls the provided functor with the things in the tuple
    std::apply(
        // this lambda is a template with its own Parts2... template parameter pack
        // and the argument is really Parts2... parts
        // we then do a fold expression over parts
        // we need std::apply because there's no simpler way to get the actual
        // values out (std::get fails when there are duplicates)
        [&x](auto... parts) { (SerializeAtom(x.*parts), ...); },
        footprint.parts);
}
// Trying to write ^^^ before C++17 would probably be a nightmare

This system is extensible: to add a new "atomic" type, just overload SerializeAtom. No need to manage an enum or whatnot. Deserialization would mean a family of DeserializeAtom overloads that write into the given reference, and a DeserializeFootprint which would probably look exactly like SerializeFootprint.

Godbolt demonstration

0
Thomas Matthews On

I've developed a serialization system that uses buffering.

Each object inherits from an interface that declares functions for:
1. Returning the size of the object on the stream.
2. Storing the object members to a buffer.
3. Loading the object members from a buffer.

This system is based on the fact that structs and classes can contain padding and that the class/struct is most knowledgeable about its members. For example, a multibyte integer may be Big Endian in the buffer, and the object needs to convert to Little Endian. This system also accommodates different methods for writing variable length text fields.

class Binary_Stream_Interface:
{
  public:  
      // Returns the size, in uint8_t units, that the object occupies in
      // a buffer (stream), packed.
      virtual size_t    size_on_stream() const = 0; 

      // Loads the class members from a buffer, pointed to by p_buffer.
      // The p_buffer pointer will be incremented after loading the object.
      virtual void      load_from_buffer(uint8_t* &  p_buffer) = 0;

      // Stores the class members to a buffer, pointed to by p_buffer.
      // The p_buffer pointer will be incremented after loading the object.
      virtual void      store_to_buffer(uint8_t * &  p_buffer) const = 0;
};

To serialize (write) an object:
1. Call size_on_stream() to determine the buffer size needed.
2. Allocate the buffer.
3. Call store_to_buffer to store the object into the buffer.
4. Write the buffer to the stream, using std::ostream::write.
5. Delete the buffer.

Reading an object:
1. Call size_on_stream() to determine the buffer size needed.
2. Allocate the buffer.
3. Read the data from the stream into the buffer, using std::istream::read and the size needed.
4. Call the load_from_buffer() method.
5. Delete the buffer.

Implementation is left as an exercise for the OP.

Note: Templates can be used for common POD and std:string to make everything more uniform.

Edit 1: Example

struct Student
: public Binary_Stream_Interface
{
     std::string name;
     unsigned int id;

     size_t size_on_stream() const
     {
         size_t stream_size = sizeof(id) + sizeof(int) + name.length();
         return stream_size;
     }

     void load_from_buffer(uint8_t* & p_buffer)
     {
         // Read the string size.
         unsigned int length = *((unsigned int *)(p_buffer));
         p_buffer += sizeof(length);

         // Load the string text from the buffer
         name = std::string((char *) p_buffer, length);
         p_buffer += length;

         id = *((unsigned int *) p_buffer);
         p_buffer += sizeof(id);
    }

    void store_to_buffer(uint8_t * & p_buffer) const
    {
        unsigned int length = name.length();
        *((unsigned int *) p_buffer) = length;
        p_buffer += sizeof(unsigned int);

        p_char_buffer = (char *) p_buffer;
        std::copy(name.begin(), name.end(), p_char_buffer);
        p_buffer += length;

        *((unsigned int *) p_buffer) = id;
        p_buffer += sizeof(unsigned int);
    }
};