Details
-
Story
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
Description
Background:
We intend to use Apache Hive to do calculations on the test data, however, most likely there are better technologies. We should list the pros/cons of replacements of the computation layer.
To be researched technologies should have the following capabilities:
- UDF (user-defined functions)
- Ability to trigger at least per day (one way to integrate with AWS Data Pipeline or similar)
- Should be possible to do per hour as well if necessary in the long run
- Scalability
- Ideally not high priced
- Data stored in S3 or DynamoDB
AC
- Research alternatives to Apache Hive
- Apache Spark
- Amazon Athena
- Presto
- Apache Flink

- Others?
- Cassandra as an alternative to whole system
- Prepare a document which includes at least the following:
- Pros/Cons
- Price
- Performance comparison
Checklists
Acceptance criteria