Skip to content

Previously we covered the hardware infrastructure that you need for setting up a robust device lab. Now we are covering more the configuration and software-related aspects related to making sure your in-house test lab runs without any failures and produces reproducible results for every test run. Let’s take a look at the most important mechanisms to achieve enterprise-grade robustness:

Always Run Tests on Clean Devices

One of the most common causes for flaky test results is that the environment where you run your tests i.e. your test device is not in exactly in the same state during each test session. This means that there may be different processes running, different applications or background services installed, different amounts of free memory or different amount of storage space available. In order to avoid test flakiness make sure that there is a proper cleaning cycle between every test session that uninstalls all unwanted apps, sets the device storage to the same state as it was before the test and reboots the devices so that there are no processes running from the previous test session. Learn more about enterprise-grade reliability of an in-house testing lab in our free ebook.

Create Intelligent Retry Mechanisms

Some of the test execution related failures are not actually related to the application under test but are caused by some test infrastructure-related reason. The good news is that this type of failure is easy to identify and tests that have failed because of such reasons can be automatically retried as many times as needed to have a test run that is free from infrastructure-related failures. The most common reasons for test infrastructure-related failures are: The connection between the device control server and the device fails, the connection between the device and your back end server fails, either the device or device control server runs out of storage space, device runs out of power in the middle of test session, etc. All these can be identified during the test session and the tests can be automatically retried so that the only remaining failures are real failures related to the application under test or the test scripts.

Check and Automatically Reconnect the USB and Wireless Data Connections

Losing either the USB or wireless data connection at some point is unfortunately very common on both Android and iOS devices. To make matters worse quite often the device is reporting that it has a live connection over USB, Wifi or Wireless data but no data is moving. You need to be able to automatically verify that both the USB connection and the Wifi connection are actually transferring data and if that’s not the case you need to automatically disconnect & reconnect the connections to get them back up again.

Automate All Configuration Changes

When you have hundreds of devices and tens of device control servers in your test lab you cannot (and you should not) really make any changes to settings or configurations manually because the risk of not doing the changes identically on all devices or servers is simply too high. We automate everything we can by using Opsworks Chef and we have implemented our on-device services that take care of changing the device settings etc. Even if we have been very systematic with this, every now and then there are problems because some small setting was done by hand and it was not replicated identically on all devices or servers. This is one area where you cannot afford to cut corners if your goal is 99.99% reliability of your in-house test lab.

Use Professional Monitoring for Your Test Lab Hardware

The last but not least important way to ensure the robustness of your in-house test lab is to set up professional monitoring for all aspects of your test lab infrastructure. The most important areas to monitor are:

Disk spaces – automated tests create a lot of data in the form of logs, screenshots, videos, memory dumps, network dumps and when you are running tens or even hundreds of devices sometimes 24/7 you may run out of disk space just out of the blue.

Network connections, latencies, packet loss rates – These have an impact on everything in your test lab and once you start troubleshooting the strangest and most randomly appearing problems, most often the root cause is networking related.

Power – Disruptions on the power supply (even very short ones) can cause very strange problems when the system has a lot of moving parts and all of them are dependent on a steady power supply. Use UPS backed power whenever you can and also make sure that your servers close down and come up in a controlled way when longer power disruption happens.

You can use any generic monitoring software for this. In Bitbar Testing we are currently using AWS monitoring, Loggly, and New Relic to handle our monitoring.

This blog post covered best practices on how to improve the reliability of your in-house test lab so that you can always rely on the test results and your organization will not have any testing downtime due to test infrastructure. Next, we will look into the operational aspects and best practices of running an enterprise-grade device lab.

Jouko Kaasila

Bitbar COO & Co-founder