AI-based Test Automation Tools — Field Study. Challenges and obstacles
Codeless script generation
We also should remember that the ability to record tests does not magically solve all common issues of automation. It does not matter if you use code or recording to create tests you still need to make sure that you automate what matters. Tests should be well designed and automated at the correct level. One should also think about test data management, including setup and tear down.
Recording tests can speed up automation by providing an initial set of raw scenarios to work on. Still, automation engineers need to work on defining and reusing steps that are common for many tests, setting up and maintaining testing accounts, selecting test subsets to execute on different stages of the software development lifecycle. Recording all this is not a magic bullet. Engineers still should work to have robust, easy to maintain automation tests.
Self-healing feature is a double-sided sword
Self-healing is a great feature, but it does not mean that all your tests will be magically fixed every time. Yes, insignificant changes are fixed automatically, but when your UI is mature, you don’t have these changes introduced often. I would say such changes are rather rare after a couple of major releases. Definitely collecting multiple data points about each element and using them for element location improves the stability of tests. It rather impacts robustness (tests do not fail if the text on element changes depending on the time of day, for example) than decreases maintenance time(tests needed to be updated because of some changes in code). I found that self-healing saves way more time if the application has a brand-new UI that changes often.
Different tools handle self-healing in different ways. Some of them require the approval of every change, at least at the beginning while the system learns, so it is still a time investment. Others just make changes and proceed without notification, which I think is dangerous. For example, a currency sign in the platform I test and the number of digits after decimals is important but if they are missing — the self-healing systems will “auto-fix” such cases. Fortunately, most of the AI-based tools are starting to add the ability to review auto-fixed tests, accept or decline these changes, track history, and rollback to the previous version. Don’t forget that reviewing changes still requires time. I also suggest that you need to double-check what is the mechanism of handling auto-changes before selecting one or another test tool with self-healing features.
To overcome the “auto-healing” problem for the small but important changes that should not be ignored, tests can always be built in a very specific way, where every sign, every comma has its own assertion. The result of this? Huge, slow, and challenging to maintain automated tests. We can also make tests smaller and increase the number of tests, but if each test requires its own time-consuming setup having hundreds of such tests will lead to the increased time of test execution. We cannot wait for hours each time we build the software.
Every big change in the application, like introducing a new element, removing tabs or anything else, will still result in the test updates that should be done manually. Of course, QAs will be able to do it faster because re-recording steps is faster than adding code. We should remember that it will be faster if QAs use any tool that provides the recording, not only AI-based tools.
Self-generated tests: improve the test coverage, what about quality?
AI-based automation tools can generate tests in different ways. Some of them generate tests by collecting information on how real users use the software in production. This information can be extracted from logs, clickstream, or both. This probably works for software that uses the Business-to-Consumer model, then there are many users that produce a lot of data for ML algorithms. In the case of the Business-to-Business model, there are often fewer users and it is harder to collect enough data to train ML models. There are also many applications/features that do not have “users”, for example, different reports that are generated as a result of data processing algorithms and calculations. There is another big drawback of generating tests in such a way — the application or the feature should be in production, and there should be users, so what about new functionality. That said, test generation based on the usage of application can only be used to add missing regression cases.
Another type of self-generated tests are tests generated by link crawlers. These tests check that every link in the app works. This is definitely a useful tool, but a working link does necessarily mean it’s the right link, or that its a functioning application.
The auto-generated tests only cover what can be easily tested. They find shallow bugs, like not working buttons, particular values, broken links. They will never help you find the requirement that was missed, logic flaws, the error message that was not generated to help the user to overcome the problem, or spot usability issues. Moreover, auto-generated tests might give you a false feeling of security and allow major issues that escape to production after thousands of automated tests have passed. Such auto-generated tests are great to supplement QA efforts, but not a full replacement.
Another concerning thing is that marketing materials for these tools claim that using one or another way of autonomous test generation will give you 100% coverage. This raises the question of 100% coverage of what and more important do you even need to achieve these mystical 100%.
The usage of AI and ML in the testing tools is increasing daily. Almost every tool claims that there are some AI-powered features that help improve testing. The keyword here is “help”.
The test recording was introduced a long time ago, but even introducing AI and ML did not make it perfect. It is much better than it was, it saves time, but it still requires human intervention, sometimes for simple cases.
Fully autonomous test generation although sounds cool is not mature enough to leave all your testing in its hands. It can be useful to reveal gaps in coverage and easy to find bugs. Testing of complex systems cannot be done using only autonomously generated tests. These tests will never ask “what if”.
The same goes for self-healing tests. It allows QAs to save time and removes a lot of mechanical, mindless work, but it is not a magic bullet of a fully autonomous task. Users still need to review changes. Moreover, QA Engineers need to design tests using best test automation practices and techniques and incorporate self-healing into the design, so actual issues are never auto fixed (add explicit asserts, for example)
I may sound old school but the test that passes even after a change was introduced sounds to me more concerning than exciting. Isn’t the purpose of the most automated tests to uncover changes that were accidentally introduced and else would go to production unnoticed? Why do we want to hide these changes?
The AI tools are great in the areas where a lot of information should be collected and quickly processed, like visual testing, video/audio quality, log parsing, collecting real usage information, performance testing. Visual testing was not covered much in this article only because it does not fit my particular use case, but there are many success stories across the industry. It is impossible for a human to detect any single visual change in UI, so using software is justified and pays off. AI in this case helps to find only differences that are perceptible to end-users. They do not use simple pixel-to-pixel comparison, but instead, they use a class of AI algorithms called computer vision, that helps to keep signal to noise ratio high.
To have robust, stable, effective, lightweight, trustworthy, and easy to maintain automated tests, one should not only select the right tool but also invest in test design. Someone still needs to find out what should be tested, when, and how to do it in the most efficient way. AI tools’ test designing abilities are very limited, almost non-existent. Mechanical clicking on everything clickable and extracting test scenarios from logs or clickstream can hardly count as a test design technique. AI can automatically generate a lot of tests that can be easily skipped and do not bring any business value or improve quality. Running thousands of tests is either time-or resource consuming so it is still QA’s job to find the right balance between increasing coverage and speed of execution.
Tests often require complex data setup in software platforms or just large applications and none of the tools I’ve seen have features that help to solve this task. Data seeding, maintenance, and cleanup still should be done with some external tools or workarounds. I found this task sometimes most challenging when we talk about testing software platforms, solutions, and software that uses a business-to-business model.
Finally, to have effective automation the software itself should be designed with testing in mind, and no tool will be effective if the application is untestable.
To summarize, AI-based testing tools in the current state cannot fully replace QAs. They can be used in a supporting role and decrease the time spent on mechanical, boring tasks. They are great as a supplement to the QA team, to automate time-consuming activities, such as checking that every link in the application is working, or tasks that are impossible for a human, such as spotting every visual difference between current and previous build. AI-based tools can be used to collect data about users’ behavior to generate production-like test scenarios. They also provide the ability to generate tests from existing test documentation.
Existing AI-based testing tools are mainly aimed at web or mobile apps. Nevertheless, these tools continue to evolve and decrease human involvement in checking while at the same time increasing human productivity and giving them more time to concentrate on testing.
Will we lose our jobs? Well, any process automation results in the disappearance of one or another job function. Do we need to adapt? That is definitely so, but we already did it many times before, didn’t we?
AI tools comparison table
The table below is a comparison table of the five most popular and most promising, in my opinion, AI-based test automation tools. The “+” indicates that the tool has or claims to have one or another feature or capability but does not indicate the maturity of that specific feature. The information for TestCraft, Testim and Mabl provided bases on hands-on experience and research. Information about Appvance and Functionize features provided based on research only.