Creating DataFrame in Java based on ByteArrayInputStream

350 Views Asked by Sergii Chukhno At 02 July 2020 at 04:21

I need to convert following to Spark DataFrame in Java with the saving of the structure according to the avro schema. And then I'm going to write it to s3 based on this avro structure.

GenericRecord r = new GenericData.Record(inAvroSchema);
r.put("id", "1");
r.put("cnt", 111);

Schema enumTest =
        SchemaBuilder.enumeration("name1")
                .namespace("com.name")
                .symbols("s1", "s2");

GenericData.EnumSymbol symbol = new GenericData.EnumSymbol(enumTest, "s1");

r.put("type", symbol);

ByteArrayOutputStream bao = new ByteArrayOutputStream();
GenericDatumWriter<GenericRecord> w = new GenericDatumWriter<>(inAvroSchema);

Encoder e = EncoderFactory.get().jsonEncoder(inAvroSchema, bao);
w.write(r, e);
e.flush();

I can create the object based on JSON structure

  Object o = reader.read(null, DecoderFactory.get().jsonDecoder(inAvroSchema, new ByteArrayInputStream(bao.toByteArray())));

But maybe there is any way to create DataFrame based on ByteArrayInputStream(bao.toByteArray())?

Thanks

Original Q&A

There are 1 best solutions below

andreoss On 02 July 2020 at 04:37

No, you have to use a Data Source to read Avro data. And it's crutial for Spark to read Avro as files from filesystem, because many optimizations and features depend on it (such as compression and partitioning). You have to add spark-avro (unless you are above 2.4). Note that EnumType you are using will be String in Spark's Dataset

Also see this: Spark: Read an inputStream instead of File

Alternatively you can consider deploying a bunch of tasks with SparkContext#parallelize and reading/writing the files explicitly by DatumReader/DatumWriter.

Creating DataFrame in Java based on ByteArrayInputStream

There are 1 best solutions below

Related Questions in JAVA

Related Questions in DATAFRAME

Related Questions in APACHE-SPARK

Related Questions in BYTEARRAYINPUTSTREAM

Trending Questions

Popular # Hahtags

Popular Questions