How would you use AWS Data Pipeline, Elastic MapReduce, and Redshift to perform ETL and data warehousing?

76 Views Asked by At

I'm very new to data warehousing and AWS.

For school, we have to make a presentation on how data warehousing can be performed using the following three technologies:

  • Redshift
  • AWS Data Pipeline
  • Elastic MapReduce

This is my understanding thus far:

  • Redshift is the data warehouse platform where you would store your data to perform analysis and business intelligence activities.
  • AWS Data Pipeline can be used to schedule tasks and operations. Somehow it can also be used for data transformation
  • Elastic MapReduce can also be used for data transformation.

I just don't understand how you would used these things together to perform data warehousing activities. Would you use the Data Pipeline to schedule ETL processes in map reduce and then transfer data to RedShift? If so, how can you do that?

1

There are 1 best solutions below

0
pratik On

data pipeline <> Redshift <> EMR job example

Have explained the data flow here via diagram. We need to use EMR jobs when we need to find insights on large volume of data.

We can run SQL query on Redshift too in this case but assume some complex operation which can't be solved via SQL query.