[TXTREC-37] Let document containing more than 5000 bytes split into sub-documents Created: 19/Jul/19  Updated: 06/Aug/19  Resolved: 02/Aug/19

Status: Closed
Project: Text Classification
Component/s: None
Affects Version/s: None
Fix Version/s: 1.0

Type: Story Priority: Neutral
Reporter: Ilgun Ilgun Assignee: Oanh Thai Hoang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0d
Time Spent: 3d 0.75h
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Epic Link: Txt Classification integration
Sprint: Add-Ons 17
Story Points: 5

 Description   

AC

  • If the document (text) contains more than 5000 bytes, split it into sub-documents by (5000 bytes)
  • Merge the results together and return as a result

FYI, https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectKeyPhrases.html#API_BatchDetectKeyPhrases_RequestSyntax

TextList

A list containing the text of the input documents. The list can contain a maximum of 25 documents. Each document must contain fewer that 5,000 bytes of UTF-8 encoded characters.

Type: Array of strings

Length Constraints: Minimum length of 1.

Required: Yes


Generated at Mon Feb 12 11:04:50 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.