I'm working with a ClickHouse database and facing a challenge related to table structure modification and data backfilling. We use a materialized view to populate a table, and we need to add a new column to it. The data is inserted by an external component. Here's a simplified scenario:
Current Table "some_table" Structure:
Columns: A, B
Existing Data:
Column A Column B a1 b1 a2 b2
Materialized View:
CREATE MATERIALIZED VIEW IF NOT EXISTS some_mv ON CLUSTER default TO some_table
AS
SELECT A, B
FROM some_source_table
GROUP BY A, B
- Desired Outcome:
Add a new column C.
Alter the materialized view to also update column C
Backfill existing rows and ensure new rows auto-populate this column.
Resulting Table:
Column A Column B Column C a1 b1 c1 a2 b2 c2 ... ... ... aN bN cN
- What is the recommended procedure to modify a materialized view in scenarios like this?
- What are some effective strategies to evaluate and mitigate performance concerns during such migrations, particularly in cases of large table sizes? And what would be considered "large" in this context, assuming 10 to 15 columns? (es. millions of rows? 100s of millions? billions?)
- If I need to temporarily drop the materialized view, are there strategies to ensure that I can still have the data ingested during that time once the materialized view is recreated?
- What are some rollback or mitigation strategies to consider in case of unexpected issues?
Any guidance or best practices on handling this type of task would be greatly appreciated.
Thank you!