How do I run crawlers for AWS Glue Job that read an excel file?

489 Views Asked by nehlo_kimchen At 07 July 2022 at 20:25

I am trying to import an excel file with multiple sheets. Based on what I read Glue 2.0 can read excel files. I have tried this code and the job was successful but I am lost as to how I am supposed to run crawlers for Data Catalog, I cannot seem to find the destination.

Am I missing anything from this code?

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import pandas as pd

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)


excel_path= r"s3://input/employee.xlsx"
df_xl_op = pd.read_excel(excel_path,sheet_name = "Sheet1")
df=df_xl_op.applymap(str)
input_df = spark.createDataFrame(df)
input_df.printSchema()

job.commit()

Original Q&A

How do I run crawlers for AWS Glue Job that read an excel file?

There are 0 best solutions below

Related Questions in PYSPARK

Related Questions in AWS-GLUE

Related Questions in AWS-GLUE-DATA-CATALOG

Related Questions in AWS-GLUE-SPARK

Trending Questions

Popular # Hahtags

Popular Questions