[TXTREC-9] Aggregate all the info of a page Created: 26/Mar/19  Updated: 19/Aug/19  Resolved: 23/Jul/19

Status: Closed
Project: Text Classification
Component/s: None
Affects Version/s: None
Fix Version/s: 1.0

Type: Story Priority: Neutral
Reporter: Laura Delnevo Assignee: Le Hai Thanh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 7.5h
Time Spent: 2d 0.5h
Original Estimate: Not Specified

Issue Links:
Cloners
is cloned by TXTREC-40 Classification goes into infinite loo... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Documentation update required:
Yes
Date of First Response:
Epic Link: Txt Classification integration
Sprint: Add-Ons 15, Add-Ons 16
Story Points: 5

 Description   
  • Aggregate all the text's properties within the page node 
  • This component will sit between the text recogniser and the "trigger" (observation OR manual action)
  • Have possibility for users to decide which components to tag (e.g. include title, summary and body - but exclude footer) 

dev notes

Content to be tagged: ideally aggregate content of all the page (but we have a limit from Amazon of 5,000 characters)



 Comments   
Comment by Le Hai Thanh [ 16/Jul/19 ]

Solution:

Introduce a configuration `text-classification -> aggregateDefinition -> properties`. Only aggregate properties which are presented in AggregateDefinition, is a String and it not empty. All properties will be aggregated into one document.

 

aggregateDefinition:
  properties: [title, keywords, description, text]

 

1. Aggregate properties which are defined in AggregateDefinition.properties for a Node.

2. Aggregate Component's properties (by recursion) which are defined in AggregateDefinition.properties for a Node.

3. Aggregate Area's properties (by recursion) which are defined in AggregateDefinition.properties for a Node.

Reason:

  1. There are a lot of properties which don't contain appropriate information for tagging so we need to limit which properties need to be aggregated
  2. A content have a lot of components/areas so we need to aggregate all of them based on defined properties.

Cons: Need to define which properties will be aggregated.

Generated at Mon Feb 12 11:04:34 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.