Uploaded image for project: 'Text Classification '
  1. Text Classification
  2. TXTREC-68

Accented characters removed from content tags when using txtrec in French

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Neutral Neutral
    • None
    • None
    • Yes

      To reproduce:

      1. Set language to fr (or don't, textrec works even when set to en - the tags are not as good though).
      2. Create a page in french with accented characters such as é è à.
      3. Run classification.
        See screenshot of tags with no accents.

      The issue with this is that content is being tagged to make it searchable; users would spell words with the correct accents when searching in the find bar and not find any results. See screenshots of what happens when searching for an accented word vs searching for the word as it appears in the tags.

        Acceptance criteria

              Unassigned Unassigned
              jlegendre Julie Legendre
              AuthorX
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:

                  Bug DoR
                  Task DoD