I have an i/p file which is coming in a csv format where date and timestamp are coming as String in the fromat mm/dd/yy and yyyy-mm-dd-hh.mm.ss.SSSSSS.
I am writing a parquet file and converting to date and timestamp format with the below code.
if k == 'Date':
data = data.withColumn(c, to_date(data[c],'MM/dd/yyyy'))
else:
if k == 'Timestamp':
data = data.withColumn(c, to_timestamp(data[c],'yyyy-MM-dd-HH.mm.ss.SSSSSS'))
But in the o/p date is getting written in YYYY-MM-DD and timestamp is being written in the format YYYY-MM-DDTHH:MM:SS format.
Is there a way I can write the o/p as date and timestamp datatype with the format MM/dd/yyyy and yyyy-MM-dd-HH.mm.ss.SSSSSS
Need the o/p in format 07/24/2007 and 2007-07-24-14.45.09.811306 with storing them as date and timestampo datatype. But they are coming as 2007-07-24 and 2007-07-24 14:45:09 when I cast them.
I have tried the below approach in pyspark to convert the string format to desired date format and timestamp format:
Input Data:
Results:
to_datefunction.Converting the "date2" column from string format to a TimestampType format using the
to_timestampfunction. The second argument "yyyy-MM-dd-HH.mm.ss.SSSSSS" specifies the format of the input string that represents the timestamp.Reformating the "date1" column to the output format 'MM/dd/yyyy' using the date_format function.
I have tried Casting the String datatype to date datatype