[MGNLTEST-47] New mgnl page objects may have "timing issues" - analyze and fix if required Created: 09/Oct/19  Updated: 08/Jul/20  Resolved: 13/Jan/20

Status: Closed
Project: Magnolia Test Framework
Component/s: None
Affects Version/s: None
Fix Version/s: 1.0

Type: Task Priority: Blocker
Reporter: Christoph Meier Assignee: Federico Grilli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Date of First Response:
Epic Link: core-TF-features-bugs-improvements
Story Points: 8

 Description   

Context

While writing tests, it turned out that "we" (actually me, chm) have a lot of issues which seem related with timing. (Typically tests were failing with ElementNotVisibleException or NoSuchElementException)
While I 1st thought, that this is all related to my env. (docker on mac), some posts on slack from Rico let me think, that it could indeed be a probelm within the API.
I wrote a test to test whether the API has timing issues. (See below)


Testing the hypothesis

Test class which creates JCR content, see https://gist.github.com/Watcher24/24a57798641897470b5ff4e489415a0e

  • 2 of the methods create config:/modules/foobar
  • 2 of the methods create config:/modules/foobar/goodies/@beer
  • 2 methods using the API as “expected” in the (+/-) “most fluent way”.
  • 2 of the methods have some additional calls are meant to delay things a bit.

Test runs: 3x with 10 reps (=30 reps for each method).

Results: (✔︎/✖︎)

  • #createContent_withArtificalStops
    • 9/1, 9/1, 10/0 -> 28/2
  • #createContent_fullyFluent
    • 10/0, 9/1, 10/0 -> 29/1
  • #createMoreContent_withArtificalStops
    • 8/2, 10/0, 9/1 -> 27/3
  • #createMoreContent_fullyFluent
    • 5/5, 3/7, 3/7 -> 11/19

(Also see there for more details regarding the test results.)

Conclusion

Execution of #createMoreContent_fullyFluent fails much too much (19 failures on 30 runs).
Since #createMoreContent_withArtificalStops - which aims to do the same thing but using "artifical delays" - is much more successful, I state herby that we do have timing issues.
Anyhow ... further analysis is required.


Tasks

  • Further analyze the problem.
  • Add additional self-tests testing such timing issues.
    • Some self-tests should contain multiple calls of "events" which may need a "waiting period"
  • Add additional "waitings" inside page objects where it is required.

Basically make sure the API can be used the "fluent way" it was built for - without the need of adding "artifical delays".


Outcome

  • Created an @ImplicitWaitTimeout annotation to increase WebDriver's implicit wait timeout when needed
  • Added an ExpectedCondition called vaadinIsDone() allowing to suspend Selenium client code execution until Vaadin's client-server communication is done (possibly the major cause of intermittent failures requiring some sort of delay)
  • Hardened Grid row selection by using ExpectedConditions.refreshed(..) so to avoid StaleElementReferenceException


 Comments   
Comment by Christoph Meier [ 18/Oct/19 ]

Here is another example / a few lines of codes, where tests always fail:

        contentApp.clickRow(titleInitialVersion)
                .hitAction("Publish") 
                .hitAction("Edit page")   
                .hitAction("Edit page properties");

The "problem" is publish which will may take some time plus displays a "banner" ("Successfully bla bla ...").
afaik these banners as well as other similar types of notification interfere with the DOM in a way, that our locators typically fail, as long as such a message is present.
In some cases it is fine - actually required - to confirm the message, which disappear then.
But the banner from publishing has no button to confirm (however, there is a [x] to close it), and the message automatically disappear after a few seconds.

In the given case, I don't know how to avoid the failure (besides an artificial wait via debugger).
Note that the banner is a "result" of the preceding hitAction("Publish").
In theory we could add a long waiting period in the page object which implements #hitAction - however, this is not always required, since not all the actions lead to such banner.

Not sure what would be "ideal" here.
Probably something like:

hitAction(String actionLabel, int waitAfterward)

Not the most elegant way ... but probably a possibility?

 

Comment by Federico Grilli [ 18/Oct/19 ]

For the record, I'm reporting here parts of the discussion we had on Slack

rico
I actually thought that in some cases an overloaded hitAction(String action, int timeoutInSeconds) could be our best option for long running actions.
In the case you mention in the comment, I think you can do something like this
expect.notification().discard();

christoph
It helped a liitle bit … but publish sometimes takes a bit longer …. in this case it fails on expect.notification().discard()

rico
Given also that we don’t want test writers to mess with selenium api, otherwise we could let them use Conditions
...
OTOH, since publishing is such a common action, we could add a publish method to ContentApp and handle the waiting logic in there.

christoph
Publish is indeed a tricky case.
The volatility in the time it takes here, I guess, is on the Magnolia instance.
Then ... there is the notification adding another layer of timimg complexity.

Comment by Christoph Meier [ 18/Dec/19 ]

Time boxed for the time being to 5SP.

Comment by Federico Grilli [ 02/Jan/20 ]

Some results after the latest hardening contained in the PR. It appears that some flakiness in tests can't be completely and reliably avoided (network issues for instance).
The only way sometimes is to just retry a failing test a couple of times until it passes.

1st run: 55/60 tests passed, 5 failed
2nd run: 3/5 tests passed, 2 failed
3rd run: 1/2 passed, 1 failed (legit failure: grid doesn't scroll selection into view thus resulting in org.openqa.selenium.NoSuchElementException)

Methods prone to failure

  • info.magnolia.testframework.Grid#rowIsSelected
  • info.magnolia.testframework.ContentApp#tabIsSelected

Surefire may help us here https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html especially since version 3.0.0-M4 and https://issues.apache.org/jira/browse/SUREFIRE-1584

Comment by Christoph Meier [ 13/Jan/20 ]

Follow-up: MGNLCE-177

Generated at Mon Feb 12 07:45:02 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.