Dynamically selecting column based on if that column exists in another column's value (Java Spark SQL)

107 Views Asked by chingu At 10 July 2023 at 19:45

I am using Spark SQL Java (Not scala or python). Here is a simplified example of my dataset (it has many more columns than this):

ID	References	E_id	F_id	G_id
0	[E_id]	0000
1	[E_id, F_id, G_id]	0010	1000	1111

There is a column called References which contains an array of column names that this row of data has. So the first row only has a Ex_1 column and would not have the columns Ex_2 or Ex_3 while the second row has all 3 columns. For each row, I need to select relevant columns (Like References and ID which all rows of data have) and then also select the respective columns that are included in References.

At the moment, this is my code:

final Dataset<Row> partitionedDf = df.select(col("id"), col("references));

I can't hardcode any of the Ex_1, Ex_2...etc columns into my select because the possible columns inside References are dynamic. I need to figure out a way to get References for each row and then select the columns inside References as well so it's all included in my Dataset.

One thing to note is that the columns inside reference do all follow a pattern (they all end with "_id") so I'm not sure if regex matching could be used here as well, rather than relying on what is inside the References array.

Original Q&A

Dynamically selecting column based on if that column exists in another column's value (Java Spark SQL)

There are 0 best solutions below

Related Questions in JAVA

Related Questions in SQL

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-JAVA

Trending Questions

Popular # Hahtags

Popular Questions