What would happen if deploy PostgreSQL with HDFS as filesystem in high-load scenario?

449 Views Asked by Gill Bates At 28 December 2020 at 19:36

It's a deliberately stupid question. But I'm just curious - what would happen if I mount HDFS using FUSE binding as a volume and launch PostgreSQL with a cluster stored on this HDFS volume and start writing massive amounts of data and/or do high-intensity reading?

Original Q&A

There are 1 best solutions below

r4cc00n On 02 January 2021 at 01:04

First I don't think it's a stupid question, with that said, let's use some definitions and we can continue from that point:

Fuse:

FUSE is a userspace filesystem framework. It consists of a kernel module, a userspace library, and a mount utility (fusermount).

HDFS (Hadoop Distributed File System):

A file system that is distributed amongst many networked computers or nodes. HDFS is fault-tolerant because it stores multiple replicas of files on the file system, the default replication level is 3.

So I think that a short version of your question @Gill Bates is: Does HDFS affect the performance of a Postgres DB (Of course assuming that the Postgres cluster is stored in HDFS)?

The short answer is, depends on your configuration but likely yes, as mentioned above you can think of HDFS as a file-system, and of course, Postgres stores the data in the file system, so it will be affected by the file system you are using, and let's say you perform multiple operations read/write, one of the great advantages of having a distributed file system as HDFS is that support multiple replicas of files which considerably reduces the common bottleneck of many clients accessing a single file so that may help to scale better.

So answering your question directly: what happens if I start writing massive amounts of data and/or do high-intensity reading?

Regardless of your file system is HDFS (which may help you to scale better and at the same time add fault tolerance to your file system) or not, the parameters that could determine/affect directly how good your DB responds under stress tests are:

Indexing
Partitioning
Checkpoints
VACUUM, ANALYZE (with FILLFACTOR)
Queries definition

And of course, depends on your stack too (how good is your server provided/host), based on my experience these are the facts that may affect more your Postgres DB (attached below some links that may help to clarify more ).

Hope the above helps to clarify!

What would happen if deploy PostgreSQL with HDFS as filesystem in high-load scenario?

There are 1 best solutions below

Related Questions in POSTGRESQL

Related Questions in HADOOP

Related Questions in HDFS

Related Questions in HIGH-LOAD

Trending Questions

Popular # Hahtags

Popular Questions