[ABTEST-109] Research to find alternatives for Apache Hive Created: 28/Nov/19  Updated: 26/Aug/22

Status: Open
Project: A/B Testing
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Minor
Reporter: Ilgun Ilgun Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Epic Link: ABn Later
Story Points: 8
Team: AuthorX

 Description   

Background:

We intend to use Apache Hive to do calculations on the test data, however, most likely there are better technologies. We should list the pros/cons of replacements of the computation layer.

 

To be researched technologies should have the following capabilities:

  • UDF (user-defined functions)
  • Ability to trigger at least per day (one way to integrate with AWS Data Pipeline or similar)
    • Should be possible to do per hour as well if necessary in the long run
  • Scalability
  • Ideally not high priced
  • Data stored in S3 or DynamoDB

 

AC

  • Research alternatives to Apache Hive
    • Apache Spark
    • Amazon Athena
    • Presto
    • Apache Flink
    • Others?
  • Cassandra as an alternative to whole system
  • Prepare a document which includes at least the following:
    • Pros/Cons
    • Price
    • Performance comparison

Generated at Sun Feb 11 22:52:57 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.