[TXTREC-30] AWS is not able to analyze more than 25 documents at a time Created: 18/Jul/19  Updated: 09/Aug/19  Resolved: 08/Aug/19

Status: Closed
Project: Text Classification
Component/s: None
Affects Version/s: None
Fix Version/s: 1.0

Type: Story Priority: Neutral
Reporter: Trung Luu Assignee: Le Hai Thanh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0d
Time Spent: 2d 3h
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-07-18 at 3.32.28 PM.png    
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Epic Link: Txt Classification integration
Sprint: Add-Ons 17
Story Points: 5

 Description   

documents = requests

document size limit 5,000 bytes https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html

  • I found this issue in logs: 2019-07-18 15:18:07,129 ERROR info.magnolia.ai.text.amazon.AmazonTextClassifier : 'texts' can't contain more than 25 documents.
  • We should handle this case or display some text like 'reach the limitation...' in the tag column instead of empty.

 

Potential solution:

  • If the text collection has more than 25 items, split the text collection into sub-collections
  • For each subcollection do the request
  • and merge them together into a Map and return

 

FYI, https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectKeyPhrases.html#API_BatchDetectKeyPhrases_RequestSyntax

TextList

A list containing the text of the input documents. The list can contain a maximum of 25 documents. Each document must contain fewer that 5,000 bytes of UTF-8 encoded characters.

Type: Array of strings

Length Constraints: Minimum length of 1.

Required: Yes

 


Generated at Mon Feb 12 11:04:46 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.