Uploaded image for project: 'Text Classification '
  1. Text Classification
  2. TXTREC-68

Accented characters removed from content tags when using txtrec in French

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Selected
    • Priority: Neutral
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Documentation update required:
      Yes

      Description

      To reproduce:

      1. Set language to fr (or don't, textrec works even when set to en - the tags are not as good though).
      2. Create a page in french with accented characters such as é è à.
      3. Run classification.
        See screenshot of tags with no accents.

      The issue with this is that content is being tagged to make it searchable; users would spell words with the correct accents when searching in the find bar and not find any results. See screenshots of what happens when searching for an accented word vs searching for the word as it appears in the tags.

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            jlegendre Julie Legendre
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: