tests: Add test infrastructure as a Python module
Review Request #3461 - Created April 28, 2016 and submitted
| Information | |
|---|---|
| Andreas Sandberg | |
| gem5 | |
| default | |
| Reviewers | |
| Default | |
Changeset 11472:23a195434229
---------------------------
tests: Add test infrastructure as a Python module
Implement gem5's test infrastructure as a Python module and a run
script that can be used without scons. The new implementation has
several features that were lacking from the previous test
infrastructure such as support for multiple output formats, automatic
runtime tracking, and better support for being run in a cluster
environment.
Tests consist of one or more steps (TestUnit). Units are run in two
stages, the first a run stage and then a verify stage. Units in the
verify stage are automatically skipped if any unit run stage wasn't
run. The library currently contains TestUnit implementations that run
gem5, diff stat files, and diff output files.
Existing tests are implemented by the ClassicTest class and "just
work". New tests can that don't rely on the old "run gem5 once and
diff output" strategy can be implemented by subclassing the Test base
class or ClassicTest.
Test results can be output in multiple formats. The module currently
supports JUnit, text (short and verbose), and Python's pickle
format. JUnit output allows CI systems to automatically get more
information about test failures. The pickled output contains all state
necessary to reconstruct a tests results object and is mainly intended
for the build system and CI systems.
Since many JUnit parsers parsers assume that test suite names look
like Java package names. We currently output path-like names with
slashes separating components. Test names are translated according to
these rules:
* '.' -> '-"
* '/' -> '.'
The test tool, tests.py, supports the following features:
* Test listing. Example: ./tests.py list arm/quick
* Running tests. Example:
./tests.py run -o output.pickle --format pickle \
../build/ARM/gem5.opt \
quick/se/00.hello/arm/linux/simple-timing
* Displaying pickled results. Example:
./tests.py show --format summary *.pickle
Change-Id: I527164bd791237aacfc65e7d7c0b67b695c5d17c
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Overall, this looks pretty solid - much more like a "test harness" than the existing regressions. Thanks!
Given that this change is already pretty significant, are there other changes we should introduce or plan to make? For example, below I suggest clarifying test naming conventions.
Also, I can see how this change might make it easier for users to add and run their own tests, but only within the existing cumbersome tests naming/directory structure. Can we make it a little clearer how to add tests with a different naming/directory structure? Ultimately, something I'd really like is the ability to add a named group of tests that validate subsets of functionality (e.g. current "regressions" are such a group, or I'd like to have a group of tests for both cache coherence and memory consistency for different ISAs, CPU and/or GPU, and different Ruby coherence protocols).
-
tests/testing/helpers.py (Diff revision 1) -
Can you add a comment description here that describes that this is an example test for the ProcessHelper?
-
tests/testing/tests.py (Diff revision 1) -
In general, I've found the tests directory naming structure to be confusing and underdocumented. Can we add some clear comments here and make these test character names more precise?
For instance, "category" is either "quick" or "long"; it's confusing to call these test categories, since they just appear to describe the duration of the test. Maybe "duration" instead? Can you document roughly what "quick" and "long" should be?
The "mode" name is ambiguous, since gem5 code uses "mode" to describe numerous different things (e.g. full-system vs. syscall emulation mode, memory modes, system/user modes, etc.). Maybe "syscall mode" instead?
I've always found it strange to just use "benchmark" as well, since many tests come from different suites... I'm not sure what would be better here.
Finally, the name "config" should reflect that it sufficiently describes the simulated system. In general, it includes a platform name (e.g. tsunami for ALPHA or pc for X86), whether there are multiple systems, the type of the CPU cores, and the memory mode. Is there a reason that we separate out the ISA or even the OS? Maybe "system config" instead? Can you add comments with this detail?
-
tests/tests.py (Diff revision 1) -
I think this abstract filter is going to be very useful... Unfortunately, I tried running this script and changing this (these) argument, but I'm still not sure I understand how it works.
Please make sure to add notes about this argument in the comments.
-
tests/tests.py (Diff revision 1) -
Can you add a description here that is similar to the patch description? I played around with this patch, and there were some things I couldn't figure out just from the help messages (e.g. see note above).
Description: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+1341) |
Thanks for the updates. LGTM.
If you're willing, I still think it would be good to add comments about what might need to be changed in these files in order to add new tests.
Hi Andreas, I had a couple questions/clarifications about this patch. The idea is that this script will run the test/test suites in an ordered fashion but doesn't contain any infrastructure to launch jobs in a distributed environment, correct? You mentioned that these work in a CI environment and can be reproduced using only the .pickle files, but it seems that actually job submission is left to users.
Also, the reporting is via output files that could be scanned by some other process, but it doesn't actively push results, do I understand that correctly?
Lastly, it seems that this is intended to be used once a build or builds have finished, but wouldn't be responsible for building or updating builds, right?
It looks like a nice improvement and I just want to be sure I understand the scope and capabilities correctly.
