How do I run crawlers for AWS Glue Job that read an excel file?

489 Views Asked by At

I am trying to import an excel file with multiple sheets. Based on what I read Glue 2.0 can read excel files. I have tried this code and the job was successful but I am lost as to how I am supposed to run crawlers for Data Catalog, I cannot seem to find the destination.

Am I missing anything from this code?

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import pandas as pd

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)


excel_path= r"s3://input/employee.xlsx"
df_xl_op = pd.read_excel(excel_path,sheet_name = "Sheet1")
df=df_xl_op.applymap(str)
input_df = spark.createDataFrame(df)
input_df.printSchema()

job.commit()
0

There are 0 best solutions below