Uploaded image for project: 'Text Classification '
  1. Text Classification
  2. TXTREC-30

AWS is not able to analyze more than 25 documents at a time

    XMLWordPrintable

Details

    • Story
    • Resolution: Fixed
    • Neutral
    • 1.0
    • None
    • None

    Description

      documents = requests

      document size limit 5,000 bytes https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html

      • I found this issue in logs: 2019-07-18 15:18:07,129 ERROR info.magnolia.ai.text.amazon.AmazonTextClassifier : 'texts' can't contain more than 25 documents.
      • We should handle this case or display some text like 'reach the limitation...' in the tag column instead of empty.

       

      Potential solution:

      • If the text collection has more than 25 items, split the text collection into sub-collections
      • For each subcollection do the request
      • and merge them together into a Map and return

       

      FYI, https://docs.aws.amazon.com/comprehend/latest/dg/API_BatchDetectKeyPhrases.html#API_BatchDetectKeyPhrases_RequestSyntax

      TextList

      A list containing the text of the input documents. The list can contain a maximum of 25 documents. Each document must contain fewer that 5,000 bytes of UTF-8 encoded characters.

      Type: Array of strings

      Length Constraints: Minimum length of 1.

      Required: Yes

       

      Checklists

        Acceptance criteria

        Attachments

          Activity

            People

              thanh.lehai Le Hai Thanh
              trung.luu Trung Luu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Checklists

                  Task DoD

                  Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0d
                    0d
                    Logged:
                    Time Spent - 2d 3h
                    2d 3h