AWS Elastic Mapreduce
AWS Elastic Mapreduce (EMR)
AWS EMR is a distributed framework that will handle big data, like other big data tools Apache Hive, Hadoop, Spark….
You can read more about that HERE. In short, it is a tools that can contain multiple clusters, a cluster can contain multiple nodes or compute units, and each of them could crunch a portion of the data and then bring it all together to produce results. With this kind of framework we could handle BIG DATA (like hundreds of Terabytes).
AWS EMR is used in wide range of big data tasks:
- data ingestion
- data processing
- data transformation
- data Analasis
- machine learning
- Spark streaming (Real-time data)
- AWS EMR is also highly scalable (easy to add or remove nodes to their cluster)
- Various security features, such as encryption for data in transit and at rest, and fine-grained access control.
- EMR can integrates with other AWS services, and can be easily used in conjunction with other AWS big data services, such as Amazon Redshift and Amazon Athena.
- EMR also provides various management and monitoring tools, such as Amazon CloudWatch, AWS CloudTrail, and AWS Management Console, to help users manage and monitor their big data workflows.