[TXTREC-68] Accented characters removed from content tags when using txtrec in French Created: 03/Sep/19  Updated: 23/Aug/22

Status: Open
Project: Text Classification
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Neutral
Reporter: Julie Legendre Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: maintenance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 2019-09-03_14-07-55.png     PNG File 2019-09-03_14-12-04.png     PNG File 2019-09-03_14-12-37.png    
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Documentation update required:
Yes
Team: AuthorX

 Description   

To reproduce:

  1. Set language to fr (or don't, textrec works even when set to en - the tags are not as good though).
  2. Create a page in french with accented characters such as é è à.
  3. Run classification.
    See screenshot of tags with no accents.

The issue with this is that content is being tagged to make it searchable; users would spell words with the correct accents when searching in the find bar and not find any results. See screenshots of what happens when searching for an accented word vs searching for the word as it appears in the tags.


Generated at Mon Feb 12 11:05:08 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.