[MGNLPER-54] As an editor I want to use regular expressions in my fulltext searches Created: 18/Sep/18  Updated: 22/Sep/22  Resolved: 22/Sep/22

Status: Closed
Project: Periscope
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Martin Drápela Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2018-09-18-10-15-42-622.png     PNG File image-2018-09-18-10-32-41-167.png     PNG File image-2021-09-01-15-22-27-616.png     PNG File image-2021-09-01-15-25-52-250.png     PNG File image-2021-09-01-15-26-45-765.png    
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:
Epic Link: Periscope improvements
Team: AuthorX

 Description   

This ticket stems from a discussion on HipChat of Sep 17, 2018 and asks for implementing 

  1. either a full regex
  2. or a light/limited regex

functionality for Findbar.

If implemented, an on/off regex checkbox should appear close to the Findbar so that the user knows in which search mode they are currently working.

Consider which flavor of regex to implement, PCRE, python, golang, ...


Antti: Posting @Martin's comment from team meeting:

Michi ... just an afterthought ... can you search using some wildcard chars, or just in a pure regex mode? Would it be nice to have a regex on/off switch box next to Findbar?

Cedric:

We don't have anything like that currently.
We could however introduce it to the API, such that suppliers would be aware of a regex mode. Though some of them would not be able to fulfill that though, for example the Magnolia docs supplier couldn't do that, since the Confluence REST API doesn't support regex search.

Antti:

Find Bar is primarily a tool for content practitioners (digital marketers, authors, editors). In your ticket, focus on explaining what search functionality is missing from a business user's perspective.

Let's leave it to the developer team come up with an implementation proposal. Regex may be one way but likely not the only one.

Michael:

 Yeah could be an enhancement for periscope. I see some issues with it though:

– We would need some kind of clearly defined regex subset because whatever we provide has to be supported in all suppliers as regex is != regex.

– The findbar is more like a general purpose search like macos' spotlight or google and not an expert search engine like you have in JIRA. We have to see if the majority of the findbar users would actually use and understand that feature.


My primary rationale for having some regexed search capability in the Findbar is that I expect there will be advanced editors using Magnolia who would welcome a quick and easy solution for e.g. the following use cases:

1) Check if our production texts mix spelling variants in common word such as color vs colour:

col(o|ou)r

 

2) I'd like to lint our texts by removing all unnecessary double, triple ... space chars. Find the texts which qualify for this task:

(?<=\.) {2,}(?=[A-Z])

(borrowed and recopied from the English Wikipedia)

3) We need to unify the filenames of our assets. Some of the assets are probably stored with filenames in all-uppercase letters. Find them:

([A-Z-]+\.[a-z]+)

Finds:


You could probably find other use cases. 

Would the editors like to have this feature? See for example this blogpost:

https://velourfuture.com/2016/02/13/more-automated-checks-fun-with-scrivener-and-regex/

I ran several other Scrivener checks today, once again using the amazingly handy Regular Expression (RegEx) facility to find patterns on things.

 



 Comments   
Comment by Michael Mühlebach [ 18/Sep/18 ]

A full RegEx implementation will be quite hard. The reason is twofold: Because there is no real RegEx standard. Is it POSIX basic or extended, PCRE, with lazy matching, with possessive matching, etc. As there are so many different standards, we would have to implement it ourself to guarantee consistency and could not rely on the implementation of suppliers. Consistency is key for the scenarios described above.

But maybe another more constructive though: The scenarios above remind me strongly of functionality expected in good text editors. Things like search & replace is always very useful and powerful. Instead of implementing that in the findbar, where especially replace will be hard to implement, we could extend our content editor which this kind of functionality.

Comment by Cedric Reichenbach [ 18/Sep/18 ]

FWIW, mmichel mentioned that confluence does indeed support regex. But nevertheless, we should think about cases where a supplier can't, and how it should behave.

Comment by Martin Drápela [ 01/Sep/21 ]

Update 2021-09-01, still valid I guess, but I agree that the implementation might be tough (regex variants and so on):

Another spelling-based example, using content on the demo instance

Let's say the editors intend to normalize EN spelling to the US. Searching with the AND operator in the findbar is not currently supported:

which would be the use case - could be regex-searched using  cent(er|re)

 

So, the editor has to go individually:

Searching for "center" alone

 

Searching for "centre":

Comment by Mikaël Geljić [ 22/Sep/22 ]

Editors don't know what regular expressions are. Closing this as we will likely prioritize other search improvements.

Generated at Mon Feb 12 10:28:16 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.