pyspark streaming read two streams sequential. after getting data from the first stream run the second

43 Views Asked by At

i have a use case where i have a Stream_1 and Stream_2. Stream_1 only runs for about 5 seconds i want to get the some data to use (filter) it with Stream_2. stream_2 keep running for the whole process. how can i do that using pyspark structure streaming.

# The first Stream
stream_1= spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "broker01:29092") \
  .option("subscribe", "topic_1") \
  .load()

#filtering and take 2 values, value X,Y

query1 = stream_1 \
    .writeStream \
    .outputMode("append") \
    .format("console") \
    .start()


# The second Stream
stream_2= spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "broker01:29092") \
  .option("subscribe", "topic_2") \
  .load()

# Filtering and use value X,Y from the stream_1

query2 = stream_2 \
    .writeStream \
    .outputMode("append") \
    .format("console") \
    .start()
0

There are 0 best solutions below