How to Use Image Recognition for Mobile App and Game Testing

Bitbar, the mobile devops company. Logo, large

Dear Testdroiders,

Many of you are using image recognition or other types of visual/element/object character recognition implementations for testing. Especially for mobile games this is very handy as all of the graphical content is based on OpenGL ES or directly coming from UI engines that are difficult to recognize by test automation scripts.

In this example, I’ll be walking you through of basic example how to use image recognition for mobile game testing and what sort of assets and test script you need for that.


In this example, we’ll be using Hill Climb Racing (directly downloaded from Google Play) for Android. For testing, we use a set of real Android devices (from different OEMs, different OS versions, different form factors etc.) and compare those results. Also, we’ll be using Appium test automation framework with our own image recognition/object character recognition feature implemented for Testdroid. And to make things easy and straightforward, we’ll be using server-side Appium execution so that minimalistic configurations are required from test automation perspective.

File Structure

Screen Shot 2015-05-19 at 12.37.44 PM
As it comes to putting the zip file together with all required test assets, the basic file structure is as illustrated in the picture above. The three core files for test package – pom.xml (for maven including the build specific stuff), (with Testdroid and Appium specific URL information) and (shell script for execution) – are the actual configuration for the test infrastructure. Image files (under queryimages/hc) are the png files that certain visual elements from Hill Climb Racing and will be used to define certain area and action for test script.

The Test Script and Images as Visual Identifiers

Screen Shot 2015-05-19 at 1.11.17 PM
With the help of Akaze and OpenCV, we can quickly build functions that compare the screen and graphics content in .png files. The idea is to give visual assets (.png files) as they are and test script will compare and perform action whenever those visual assets are shown on screen. For example, with some of the test frameworks the timing and delays have been problematic. With this sort of implementation you don’t need to implement delays or any timing logic for scripts but instead script can wait until certain visual assets are shown on screen.

    public Point[] findImageOnScreenAndSetRotation(String image) throws Exception {
        int retries = 5;
        Point[] imgRect = null;
        while ((retries > 0) && (imgRect == null)) {
            if (retries < 5) {
                log("Find image failed, retries left: " + retries);
            takeScreenshot(image + "_screenshot");

            //this will identify the rotation initially
            imgRect = findImage(image, image + "_screenshot", "notSet"); 
            retries = retries - 1;
        assertNotNull("Image " + image + " not found on screen.", imgRect);
        return imgRect;

The above function is used to determine if the screenshots need to be rotated and by what degree. This will also set the rotation for screen and images will be recognized with the proper orientation. Next we will perform a simple click for any identified visual asset described in .png file:
Screen Shot 2015-05-19 at 1.47.01 PM

For example, the image with “more button” content is compared here with the screen content and if suitable element is spotted, the click is performed. The function is called with the right coordinates and time how long button/visual element should be pressed is given as a parameter.

    public void tapImageOnScreen(String image) throws Exception {
        Point[] imgRect = findImageOnScreen(image);
        //imgRect[4] will have the center of the rectangle containing the image

        if (automationName.equals("selendroid")) {
            selendroidTapAtCoordinate((int) imgRect[4].x, (int) imgRect[4].y, 1);
        } else {
            driver.tap(1, (int) imgRect[4].x, (int) imgRect[4].y, 1);

    public void selendroidTapAtCoordinate(int x, int y, int secs) throws Exception {
        TouchActions actions = new TouchActions(driver);
        actions.down(x, y).perform();
        actions.up(x, y).perform();

In the game-play, we will use selendroidTapAtCoordinate directly. As a parameter, the png name is given and the time how long the key will be pressed. The test script has advanced to the actual game-play stage and we really only have three possible (and clickable) items on screen. With this configuration, the ‘gas’ pedal is pressed down for 15 seconds and then released:

Screen Shot 2015-05-19 at 2.30.18 PM

As you might now what happens in this game after car runs out of fuel, the game-play ends and social media sharing view with scoring etc. will be shown. Before that, the test script checks if test passed or failed – and this is also done based on images shown on the screen.

    public Point[] findImageOnScreenNoAssert(String image, int retries) throws Exception {
        Point[] imgRect = null;
        while ((retries > 0) && (imgRect == null)) {
            if (retries < 5) {
                log("Find image failed, retries left: " + retries);
            takeScreenshot(image + "_screenshot");
            imgRect = findImage(image, image + "_screenshot");
            retries = retries - 1;
        return imgRect;

Screen Shot 2015-05-19 at 2.39.10 PM

Sudden Notification Messages/Distractions for Test Scripts

The typical “issue” with visual image recognition on Android is those notification “pop-ups” that may disturb the execution of the script. However, those notifications can be also visually identified as a png and included in the script. For example, if the following notification message comes up your script can simply do the click on OK and everything keeps going on:

Screen Shot 2015-05-19 at 2.43.19 PM

Reviewing the Test Results

As this sort of test can be executed simultaneously on yet hundreds of devices, you will get screenshots, logs and Appium logs, test data etc. as a result at Testdroid Cloud. Let’s take the screenshots and comparison of those first. Screenshots are taken along the way when test script advances, those are used to compare the test script progress, perform actions and to define how quickly test script advances.

Screen Shot 2015-05-19 at 2.50.21 PM

Another very important part of the testing is the performance. This is especially the case with mobile games and we’ve integrated Gamebench as part of the test execution at Testdroid Cloud to provide comprehensive and accurate performance results.

Screen Shot 2015-05-19 at 2.50.07 PM

Then, after the execution on variety of different devices, the logs can be very instrumental for understanding if app/game has any problems or potential issues and how those can be fixed. We gather all important metrics (from logs) under Testdroid Cloud projects and all of those can be inspected after the test run with correct stamped time/date:

Screen Shot 2015-05-19 at 2.50.34 PM

Finally, with the test automation and possibility to do simultaneous test runs on variety of different devices, users get PASS/FAIL status from each device. This is great overview of which devices might require special attention as you fine-iron your app/game for market/users:

Screen Shot 2015-05-19 at 2.51.19 PM

Want Access to Full Source Code?

If want to try out how the image recognition with this described setup works on variety of real devices on cloud, let me know and I’ll be happy to share this project and source code with you. You can reach me via ville-veikko dot helppi at bitbar dot com.

Happy testing!

  • David Developer

    Awesome, you guys rock!

  • Android Developer

    I think the code of the image searching assumes the image is not rotated, right?
    If so, how could it find the car? In most times, it’s not horizontal…

    • Ville-Veikko Helppi

      For example in case of Hill Climb Racing, the screen cannot be rotated. Also, for majority of graphic content, it doesn’t have any impact if screen is in landscape or portrait mode as picture can be found there anyway. The size of the pic only varies (and that can be taken into consideration with comparison algorithm).

      • Android Developer

        I didn’t talk about the screen. I talked about the car. The car is moving. It has hills to drive on, so its angle can change.

        • Ville-Veikko Helppi

          Yes, of course. Image is comparable even as stretched or rotated. Some advanced algorithms can identify even based on certain patterns / parts of UI.

          • Android Developer

            Is this what the “findImage” does?
            I wonder how it works. Did you write its code?

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.