2 of the most used tools right now to process big data is hive and spark , but spark is recently gain more and more popularity, beating hive in many cases (eg. Speed).
| Spark | Hive |
|---|---|
| Data warehousing system for querying and analyzing large datasets stored in Hadoop Distributed File System (HDFS) | Spark, on the other hand, is a fast and flexible big data processing engine |
| Designed for batch processing and is optimized for long-running queries over large data sets. | Batch and real-time processing workloads. |
| Very stable | Steeper learning curve, especially coming from SQL |
| Easier to learn since Hive Query Language (HQL) is very similar to SQL | Multi-language support including Python, Scala, Java |