ETL Pipeline on AWS EMR > AWS Elastic Mapreduce > Hive vs Spark

Hive vs Spark

2 of the most used tools right now to process big data is hive and spark , but spark is recently gain more and more popularity, beating hive in many cases (eg. Speed).

Spark	Hive
Data warehousing system for querying and analyzing large datasets stored in Hadoop Distributed File System (HDFS)	Spark, on the other hand, is a fast and flexible big data processing engine
Designed for batch processing and is optimized for long-running queries over large data sets.	Batch and real-time processing workloads.
Very stable	Steeper learning curve, especially coming from SQL
Easier to learn since Hive Query Language (HQL) is very similar to SQL	Multi-language support including Python, Scala, Java