In hadoop, why does the parquet format occupy higher memory than the original txt when I test?

36 Views Asked by fei yang At 24 September 2023 at 16:45

I am testing the impact of different data formats on hive query efficiency(win10,only my desktop). The original data is 400 txt files of almost the same size (total memory 169MB). I first converted to orc format (130MB), and then converted from orc format to parquet (423MB) and sequencefile (1.87GB). In my understanding, both parquet and sequencefile formats have some compression features, why does the result occupy higher memory than the original format?

Here is some information that I think is relevant： txt：inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false

orc:inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false

parquet:inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false

sequencefile:inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false

The above information is obtained by "describe extended table_name" So what happened?

Original Q&A

In hadoop, why does the parquet format occupy higher memory than the original txt when I test?

There are 0 best solutions below

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in PARQUET

Related Questions in ORC

Related Questions in SEQUENCEFILE

Trending Questions

Popular # Hahtags

Popular Questions