I have a Cassandra table that contains a UUID field.
creating a spark data frame gives the field as {__class__=uuid.UUID, int=809582560205543685759249226656473694} or something like that, using pyspark 3.4.0. any idea of how to get the string representation of that?
any syntax for a working UDF if necessary will be appreciated.
def udf_for_uuid(input_val):
try:
return uuid.UUID(input_val)
except ValueError:
return None
uid_udf = udf(lambda z: udf_for_uuid(z), StringType())
df = spark.createDataFrame(data, schema)
df.withColumn('msg_id_string', str_to_uuid_udf(df['msg_id']))
msg_id (original column) shows {__class__=uuid.UUID, int=809582560205543685759249226656473694}
msg_id_string (added column) shows null
msg_id in cassandra is 007550ad-802f-11ed-a92a-0f3d2bcd625e