From: Deepika Upadhyay Date: Wed, 4 Nov 2020 14:21:27 +0000 (+0530) Subject: doc/dev/developer_guide: rearrange and improve docs X-Git-Tag: v17.1.0~3140^2~3 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=c77d0817d55e3baace11eab353c4429d21d0bf50;p=ceph.git doc/dev/developer_guide: rearrange and improve docs * move running-tests-using-teuth.rst to doc/dev/developer_guide/tests-integration-testing-teuthology-workflow.rst * introduce developer's guide for Sentry and improve teuthology docs * add teuthology debugging guide * create testing_integration_tests subfolder for teuthology Signed-off-by: Deepika Upadhyay --- diff --git a/doc/dev/developer_guide/basic-workflow.rst b/doc/dev/developer_guide/basic-workflow.rst index 1a387a94e185..c2a3201d6b94 100644 --- a/doc/dev/developer_guide/basic-workflow.rst +++ b/doc/dev/developer_guide/basic-workflow.rst @@ -282,15 +282,15 @@ sub-directory`_ and are run via the `teuthology framework`_. .. _`teuthology framework`: https://github.com/ceph/teuthology The Ceph community has access to the `Sepia lab -`_ where :ref:`testing-integration-tests` can be -run on physical hardware. Other developers may add tags like "needs-qa" to your +`_ where `Integration Testing` _ can be +run on real hardware. Other developers may add tags like "needs-qa" to your PR. This allows PRs that need testing to be merged into a single branch and tested all at the same time. Since teuthology suites can take hours (even days in some cases) to run, this can save a lot of time. To request access to the Sepia lab, start `here `_. -Integration testing is discussed in more detail in the :ref:`testing-integration-tests` +Integration testing is discussed in more detail in the `Integration Testing` _ chapter. Code review @@ -387,3 +387,4 @@ the **ptl-tool** have the following form:: client: add timer_lock support Reviewed-by: Patrick Donnelly +.. _Integration Testing: ./testing-integration-tests/tests-integration-testing-teuthology-intro.rst diff --git a/doc/dev/developer_guide/index.rst b/doc/dev/developer_guide/index.rst index c33f0a575a16..c8a8600227c8 100644 --- a/doc/dev/developer_guide/index.rst +++ b/doc/dev/developer_guide/index.rst @@ -17,8 +17,7 @@ Contributing to Ceph: A Guide for Developers Issue tracker Basic workflow Tests: Unit Tests - Tests: Integration Tests - Running Tests Locally - Running Integration Tests using Teuthology - Running Tests in the Cloud + Tests: Integration Tests + Tests: Running Tests (Locally) + Tests: Running Tests in the Cloud Ceph Dashboard Developer Documentation (formerly HACKING.rst) diff --git a/doc/dev/developer_guide/running-tests-in-cloud.rst b/doc/dev/developer_guide/running-tests-in-cloud.rst index 60118aefdb86..17885c4d6645 100644 --- a/doc/dev/developer_guide/running-tests-in-cloud.rst +++ b/doc/dev/developer_guide/running-tests-in-cloud.rst @@ -2,7 +2,7 @@ Running Tests in the Cloud ========================== In this chapter, we will explain in detail how use an OpenStack -tenant as an environment for Ceph `integration testing`_. +tenant as an environment for Ceph `Integration Testing`_. Assumptions and caveat ---------------------- @@ -124,8 +124,7 @@ uploaded to http://teuthology-logs.public.ceph.com. Run a standalone test --------------------- -The standalone test explained in `Reading a standalone test`_ can be run -with the following command +The standalone test can be run with the following command .. prompt:: bash $ @@ -282,8 +281,7 @@ server list`` on the teuthology machine, but the target VM hostnames (e.g. ``target149202171058.teuthology``) are resolvable within the teuthology cluster. -.. _Integration testing: ../tests-integration-tests +.. _Integration Testing: ../testing-integration-tests/tests-integration-testing-teuthology-intro.rst .. _IRC: ../essentials/#irc .. _Mailing List: ../essentials/#mailing-list -.. _Reading A Standalone Test: ../testing-integration-tests/#reading-a-standalone-test .. _teuthology framework: https://github.com/ceph/teuthology diff --git a/doc/dev/developer_guide/running-tests-using-teuth.rst b/doc/dev/developer_guide/running-tests-using-teuth.rst deleted file mode 100644 index 492b7790e9e0..000000000000 --- a/doc/dev/developer_guide/running-tests-using-teuth.rst +++ /dev/null @@ -1,183 +0,0 @@ -Running Integration Tests using Teuthology -========================================== - -Getting binaries ----------------- -To run integration tests using teuthology, you need to have Ceph binaries -built for your branch. Follow these steps to initiate the build process - - -#. Push the branch to `ceph-ci`_ repository. This triggers the process of - building the binaries. - -#. To confirm that the build process has been initiated, spot the branch name - at `Shaman`_. Little after the build process has been initiated, the single - entry with your branch name would multiply, each new entry for a different - combination of distro and flavour. - -#. Wait until the packages are built and uploaded, and the repository offering - them are created. This is marked by colouring the entries for the branch - name green. Preferably, wait until each entry is coloured green. Usually, - it takes around 2-3 hours depending on the availability of the machines. - -.. note:: Branch to be pushed on ceph-ci can be any branch, it shouldn't - necessarily be a PR branch. - -.. note:: In case you are pushing master or any other standard branch, check - `Shaman`_ beforehand since it already might have builds ready for it. - -Triggering Tests ----------------- -After building is complete, proceed to trigger tests - - -#. Log in to the teuthology machine:: - - ssh @teuthology.front.sepia.ceph.com - - This would require Sepia lab access. To know how to request it, see: https://ceph.github.io/sepia/adding_users/ - -#. Next, get teuthology installed. Run the first set of commands in - `Running Your First Test`_ for that. After that, activate the virtual - environment in which teuthology is installed. - -#. Run the ``teuthology-suite`` command:: - - teuthology-suite -v -m smithi -c wip-devname-feature-x -s fs -p 110 --filter "cephfs-shell" - - Following are the options used in above command with their meanings - - -v verbose - -m machine name - -c branch name, the branch that was pushed on ceph-ci - -s test-suite name - -p higher the number, lower the priority of the job - --filter filter tests in given suite that needs to run, the arg to - filter should be the test you want to run - -.. note:: The priority number present in the command above is just a - placeholder. It might be highly inappropriate for the jobs you may want to - trigger. See `Testing Priority`_ section to pick a priority number. - -.. note:: Don't skip passing a priority number, the default value is 1000 - which way too high; the job probably might never run. - -#. Wait for the tests to run. ``teuthology-suite`` prints a link to the - `Pulpito`_ page created for the tests triggered. - -Other frequently used/useful options are ``-d`` (or ``--distro``), -``--distroversion``, ``--filter-out``, ``--timeout``, ``flavor``, ``-rerun``, -``-l`` (for limiting number of jobs) , ``-n`` (for how many times job would -run) and ``-e`` (for email notifications). Run ``teuthology-suite --help`` -to read description of these and every other options available. - -Testing QA changes (without re-building binaires) -------------------------------------------------- -While writing a PR you might need to test your PR repeatedly using teuthology. -If you are making non-QA changes, you need to follow the standard process of -triggering builds, waiting for it to finish and then triggering tests and -wait for the result. But if changes you made are purely changes in qa/, -you don't need rebuild the binaries. Instead you can test binaries built for -the ceph-ci branch and instruct ``teuthology-suite`` command to use a separate -branch for running tests. The separate branch can be passed to the command -by using ``--suite-repo`` and ``--suite-branch``. Pass the link to the GitHub -fork where your PR branch exists to the first option and pass the PR branch -name to the second option. - -For example, if you want to make changes in ``qa/`` after testing ``branch-x`` -(of which has ceph-ci branch is ``wip-username-branch-x``) by running -following command:: - - teuthology-suite -v -m smithi -c wip-username-branch-x -s fs -p 50 --filter cephfs-shell - -You can make the modifications locally, update the PR branch and then -trigger tests from your PR branch as follows:: - - teuthology-suite -v -m smithi -c wip-username-branch-x -s fs -p 50 --filter cephfs-shell --suite-repo https://github.com/username/ceph --suite-branch branch-x - -You can verify if the tests were run using this branch by looking at values -for the keys ``suite_branch``, ``suite_repo`` and ``suite_sha1`` in the job -config printed at the very beginning of the teuthology job. - -About Suites and Filters ------------------------- -See `Suites Inventory`_ for a list of suites of integration tests present -right now. Alternatively, each directory under ``qa/suites`` in Ceph -repository is an integration test suite, so looking within that directory -to decide an appropriate argument for ``-s`` also works. - -For picking an argument for ``--filter``, look within -``qa/suites///tasks`` to get keywords for filtering -tests. Each YAML file in there can trigger a bunch of tests; using the name of -the file, without the extension part of the file name, as an argument to the -``--filter`` will trigger those tests. For example, the sample command above -uses ``cephfs-shell`` since there's a file named ``cephfs-shell.yaml`` in -``qa/suites/fs/basic_functional/tasks/``. In case, the file name doesn't hint -what bunch of tests it would trigger, look at the contents of the file for -``modules`` attribute. For ``cephfs-shell.yaml`` the ``modules`` attribute -is ``tasks.cephfs.test_cephfs_shell`` which means it'll trigger all tests in -``qa/tasks/cephfs/test_cephfs_shell.py``. - -Killing Tests -------------- -Sometimes a teuthology job might not complete running for several minutes or -even hours after tests that were trigged have completed running and other -times wrong set of tests can be triggered is filter wasn't chosen carefully. -To save resource it's better to termniate such a job. Following is the command -to terminate a job:: - - teuthology-kill -r teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi - -Let's call the argument passed to ``-r`` as test ID. It can be found -easily in the link to the Pulpito page for the tests you triggered. For -example, for the above test ID, the link is - http://pulpito.front.sepia.ceph.com/teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi/ - -Re-running Tests ----------------- -Pass ``--rerun`` option, with test ID as an argument to it, to -``teuthology-suite`` command:: - - teuthology-suite -v -m smithi -c wip-rishabh-fs-test_cephfs_shell-fix -p 50 --rerun teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi - -The meaning of rest of the options is already covered in `Triggering Tests` -section. - -Teuthology Archives -------------------- -Once the tests have finished running, the log for the job can be obtained by -clicking on job ID at the Pulpito page for your tests. It's more convenient to -download the log and then view it rather than viewing it in an internet -browser since these logs can easily be upto size of 1 GB. What's much more -easier is to log in to the teuthology machine again -(``teuthology.front.sepia.ceph.com``), and access the following path:: - - /ceph/teuthology-archive///teuthology.log - -For example, for above test ID path is:: - - /ceph/teuthology-archive/teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi/4588482/teuthology.log - -This way the log remotely can be viewed remotely without having to wait too -much. - -Naming the ceph-ci branch -------------------------- -There are no hard conventions (except for the case of stable branch; see -next paragraph) for how the branch pushed on ceph-ci is named. But, to make -builds and tests easily identitifiable on Shaman and Pulpito respectively, -prepend it with your name. For example branch ``feature-x`` can be named -``wip-yourname-feature-x`` while pushing on ceph-ci. - -In case you are using one of the stable branches (e.g. nautilis, mimic, -etc.), include the name of that stable branch in your ceph-ci branch name. -For example, ``feature-x`` PR branch should be named as -``wip-feature-x-nautilus``. *This is not just a matter of convention but this, -more essentially, builds your branch in the correct environment.* - -Delete the branch from ceph-ci, once it's not required anymore. If you are -logged in at GitHub, all your branches on ceph-ci can be easily found here - -https://github.com/ceph/ceph-ci/branches. - -.. _ceph-ci: https://github.com/ceph/ceph-ci -.. _Pulpito: http://pulpito.front.sepia.ceph.com/ -.. _Running Your First Test: ../running-tests-locally/#running-your-first-test -.. _Shaman: https://shaman.ceph.com/builds/ceph/ -.. _Suites Inventory: ../tests-integration-tests/#suites-inventory -.. _Testing Priority: ../tests-integration-tests/#testing-priority diff --git a/doc/dev/developer_guide/testing_integration_tests/index.rst b/doc/dev/developer_guide/testing_integration_tests/index.rst new file mode 100644 index 000000000000..8cbe3855470b --- /dev/null +++ b/doc/dev/developer_guide/testing_integration_tests/index.rst @@ -0,0 +1,15 @@ +======================= +Teuthology User Guide +======================= + +.. rubric:: Contents + +.. toctree:: + :maxdepth: 1 + :glob: + + Introduction + Workflow + Debugging Tips + Sentry Notes + diff --git a/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-debugging-tips.rst b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-debugging-tips.rst new file mode 100644 index 000000000000..84d7e06a1fe0 --- /dev/null +++ b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-debugging-tips.rst @@ -0,0 +1,66 @@ +.. _tests-integration-testing-teuthology-debugging-tips: + +Analysing and Debugging A Teuthology Job +----------------------------------------- + +For scheduling an integration test please refer to, `Scheduling Test Run`_ +Here, we will be discussing how to analyse failed/dead jobs to root cause the problem and amend it. + +Triaging the cause of failure +------------------------------ + +Once a teuthology run is successfully completed, we can access the results using +pulpito dashboard for example: + +http://pulpito.front.sepia.ceph.com/ideepika-2020-11-03_04:03:28-rados-wip-yuri-testing-2020-10-28-0947-octopus-distro-basic-smithi/ which might look something + +This run has 2 job run failures. To triage, open the teuthology log for it using either: + +http://pulpito.front.sepia.ceph.com///teuthology.log + +or via sshing into teuthology server using:: + + ssh teuthology.front.sepia.ceph.com + +and then opening log file with signature as: + + /a///teuthology.log + +for example in our case:: + + nano /a/ideepika-2020-11-03_04:03:28-rados-wip-yuri-testing-2020-10-28-0947-octopus-distro-basic-smithi/5585704/teuthology.log + +Generally, a job failure is recorded in teuthology log as a Traceback which gets +added to job summary. While analysing a job failure, we generally start looking +for ``Traceback`` keyword and further see the call stack and logs that might had +lead to failure Most of the time, traceback will also be including the failing +command. + +.. note:: the teuthology logs are deleted every once in a while, if you are + unable to access example link, please feel free to refer any other case from + http://pulpito.front.sepia.ceph.com/ + +Reporting the Issue +------------------- + +Once the cause of failure is triaged, and is something which might not be +related to the developer's code change, this indicates that it might be a +generic failure for the upstream branch(in our case octopus), in which case, we +look for related failure keywords on https://tracker.ceph.com/ If a similar +issue has been reported via a tracker.ceph.com ticket, please add any relevant +feedback to it. Otherwise, please create a new tracker ticket for it. If you are +not familiar with the cause of failure, someone else will look at it. + +Debugging An Issue +------------------ + +If you want to work on a tracker issue, assign it to yourself, and try to +reproduce that issue. For this purpose you can run a job similar to the failed +job, using interactive-on-error mode in teuthology:: + + ideepika@teuthology:~/teuthology$ ./virtualenv/bin/teuthology -v --lock --block $ --interactive-on-error + +More details on using teuthology command please read `detailed test config`_ + +.. _Scheduling Test Run: ../tests-integration-testing-teuthology-workflow.rst/#scheduling-test-run +.. _detailed test config: https://github.com/ceph/teuthology/blob/master/docs/detailed_test_config.rst diff --git a/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro.rst b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro.rst new file mode 100644 index 000000000000..dcbd8c79c526 --- /dev/null +++ b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro.rst @@ -0,0 +1,531 @@ +.. _tests-integration-testing-teuthology-intro: + +Testing - Integration Tests - Introduction +========================================== + +Ceph has two types of tests: :ref:`make check ` tests and integration tests. +When a test requires multiple machines, root access or lasts for a +longer time (for example, to simulate a realistic Ceph deployment), it +is deemed to be an integration test. Integration tests are organized into +"suites", which are defined in the `ceph/qa sub-directory`_ and run with +the ``teuthology-suite`` command. + +The ``teuthology-suite`` command is part of the `teuthology framework`_. +In the sections that follow we attempt to provide a detailed introduction +to that framework from the perspective of a beginning Ceph developer. + +Teuthology consumes packages +---------------------------- + +It may take some time to understand the significance of this fact, but it +is `very` significant. It means that automated tests can be conducted on +multiple platforms using the same packages (RPM, DEB) that can be +installed on any machine running those platforms. + +Teuthology has a `list of platforms that it supports +`_ (as +of September 2020 the list consisted of "RHEL/CentOS 8" and "Ubuntu 18.04"). It +expects to be provided pre-built Ceph packages for these platforms. +Teuthology deploys these platforms on machines (bare-metal or +cloud-provisioned), installs the packages on them, and deploys Ceph +clusters on them - all as called for by the test. + +The Nightlies +------------- + +A number of integration tests are run on a regular basis in the `Sepia +lab`_ against the official Ceph repositories (on the ``master`` development +branch and the stable branches). Traditionally, these tests are called "the +nightlies" because the Ceph core developers used to live and work in +the same time zone and from their perspective the tests were run overnight. + +The results of the nightlies are published at http://pulpito.ceph.com/. The +developer nick shows in the +test results URL and in the first column of the Pulpito dashboard. The +results are also reported on the `ceph-qa mailing list +`_ for analysis. + +Testing Priority +---------------- + +The ``teuthology-suite`` command includes an almost mandatory option ``-p `` +which specifies the priority of the jobs submitted to the queue. The lower +the value of ``N``, the higher the priority. The option is almost mandatory +because the default is ``1000`` which matches the priority of the nightlies. +Nightlies are often half-finished and cancelled due to the volume of testing +done so your jobs may never finish. Therefore, it is common to select a +priority less than 1000. + +Job priority should be selected based on the following recommendations: + +* **Priority < 10:** Use this if the sky is falling and some group of tests + must be run ASAP. + +* **10 <= Priority < 50:** Use this if your tests are urgent and blocking + other important development. + +* **50 <= Priority < 75:** Use this if you are testing a particular + feature/fix and running fewer than about 25 jobs. This range can also be + used for urgent release testing. + +* **75 <= Priority < 100:** Tech Leads will regularly schedule integration + tests with this priority to verify pull requests against master. + +* **100 <= Priority < 150:** This priority is to be used for QE validation of + point releases. + +* **150 <= Priority < 200:** Use this priority for 100 jobs or fewer of a + particular feature/fix that you'd like results on in a day or so. + +* **200 <= Priority < 1000:** Use this priority for large test runs that can + be done over the course of a week. + +In case you don't know how many jobs would be triggered by +``teuthology-suite`` command, use ``--dry-run`` to get a count first and then +issue ``teuthology-suite`` command again, this time without ``--dry-run`` and +with ``-p`` and an appropriate number as an argument to it. + +To skip the priority check, use ``--force-priority``. In order to be sensitive +to the runs of other developers who also need to do testing, please use it in +emergency only. + +Suites Inventory +---------------- + +The ``suites`` directory of the `ceph/qa sub-directory`_ contains +all the integration tests, for all the Ceph components. + +`ceph-deploy `_ + install a Ceph cluster with ``ceph-deploy`` (:ref:`ceph-deploy man page `) + +`dummy `_ + get a machine, do nothing and return success (commonly used to + verify the `Integration Testing` _ infrastructure works as expected) + +`fs `_ + test CephFS mounted using FUSE + +`kcephfs `_ + test CephFS mounted using kernel + +`krbd `_ + test the RBD kernel module + +`multimds `_ + test CephFS with multiple MDSs + +`powercycle `_ + verify the Ceph cluster behaves when machines are powered off + and on again + +`rados `_ + run Ceph clusters including OSDs and MONs, under various conditions of + stress + +`rbd `_ + run RBD tests using actual Ceph clusters, with and without qemu + +`rgw `_ + run RGW tests using actual Ceph clusters + +`smoke `_ + run tests that exercise the Ceph API with an actual Ceph cluster + +`teuthology `_ + verify that teuthology can run integration tests, with and without OpenStack + +`upgrade `_ + for various versions of Ceph, verify that upgrades can happen + without disrupting an ongoing workload + +.. _`ceph-deploy man page`: ../../man/8/ceph-deploy + +teuthology-describe-tests +------------------------- + +``teuthology-describe`` was added to the `teuthology framework`_ to facilitate +documentation and better understanding of integration tests. + +The upshot is that tests can be documented by embedding ``meta:`` +annotations in the yaml files used to define the tests. The results can be +seen in the `ceph-qa-suite wiki +`_. + +Since this is a new feature, many yaml files have yet to be annotated. +Developers are encouraged to improve the documentation, in terms of both +coverage and quality. + +Please also see, `teuthology-desribe usecases`_ + +How integration tests are run +----------------------------- + +Given that - as a new Ceph developer - you will typically not have access +to the `Sepia lab`_, you may rightly ask how you can run the integration +tests in your own environment. + +One option is to set up a teuthology cluster on bare metal. Though this is +a non-trivial task, it `is` possible. Here are `some notes +`_ to get you started +if you decide to go this route. + +If you have access to an OpenStack tenant, you have another option: the +`teuthology framework`_ has an OpenStack backend, which is documented `here +`__. +This OpenStack backend can build packages from a given git commit or +branch, provision VMs, install the packages and run integration tests +on those VMs. This process is controlled using a tool called +``ceph-workbench ceph-qa-suite``. This tool also automates publishing of +test results at http://teuthology-logs.public.ceph.com. + +Running integration tests on your code contributions and publishing the +results allows reviewers to verify that changes to the code base do not +cause regressions, or to analyze test failures when they do occur. + +Every teuthology cluster, whether bare-metal or cloud-provisioned, has a +so-called "teuthology machine" from which tests suites are triggered using the +``teuthology-suite`` command. + +A detailed and up-to-date description of each `teuthology-suite`_ option is +available by running the following command on the teuthology machine + +.. prompt:: bash $ + + teuthology-suite --help + +.. _teuthology-suite: http://docs.ceph.com/teuthology/docs/teuthology.suite.html + +How integration tests are defined +--------------------------------- + +Integration tests are defined by yaml files found in the ``suites`` +subdirectory of the `ceph/qa sub-directory`_ and implemented by python +code found in the ``tasks`` subdirectory. Some tests ("standalone tests") +are defined in a single yaml file, while other tests are defined by a +directory tree containing yaml files that are combined, at runtime, into a +larger yaml file. + +Reading a standalone test +------------------------- + +Let us first examine a standalone test, or "singleton". + +Here is a commented example using the integration test +`rados/singleton/all/admin-socket.yaml +`_ + +.. code-block:: yaml + + roles: + - - mon.a + - osd.0 + - osd.1 + tasks: + - install: + - ceph: + - admin_socket: + osd.0: + version: + git_version: + help: + config show: + config set filestore_dump_file /tmp/foo: + perf dump: + perf schema: + +The ``roles`` array determines the composition of the cluster (how +many MONs, OSDs, etc.) on which this test is designed to run, as well +as how these roles will be distributed over the machines in the +testing cluster. In this case, there is only one element in the +top-level array: therefore, only one machine is allocated to the +test. The nested array declares that this machine shall run a MON with +id ``a`` (that is the ``mon.a`` in the list of roles) and two OSDs +(``osd.0`` and ``osd.1``). + +The body of the test is in the ``tasks`` array: each element is +evaluated in order, causing the corresponding python file found in the +``tasks`` subdirectory of the `teuthology repository`_ or +`ceph/qa sub-directory`_ to be run. "Running" in this case means calling +the ``task()`` function defined in that file. + +In this case, the `install +`_ +task comes first. It installs the Ceph packages on each machine (as +defined by the ``roles`` array). A full description of the ``install`` +task is `found in the python file +`_ +(search for "def task"). + +The ``ceph`` task, which is documented `here +`__ (again, +search for "def task"), starts OSDs and MONs (and possibly MDSs as well) +as required by the ``roles`` array. In this example, it will start one MON +(``mon.a``) and two OSDs (``osd.0`` and ``osd.1``), all on the same +machine. Control moves to the next task when the Ceph cluster reaches +``HEALTH_OK`` state. + +The next task is ``admin_socket`` (`source code +`_). +The parameter of the ``admin_socket`` task (and any other task) is a +structure which is interpreted as documented in the task. In this example +the parameter is a set of commands to be sent to the admin socket of +``osd.0``. The task verifies that each of them returns on success (i.e. +exit code zero). + +This test can be run with + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados/singleton/all/admin-socket.yaml fs/ext4.yaml + +Test descriptions +----------------- + +Each test has a "test description", which is similar to a directory path, +but not the same. In the case of a standalone test, like the one in +`Reading a standalone test`_, the test description is identical to the +relative path (starting from the ``suites/`` directory of the +`ceph/qa sub-directory`_) of the yaml file defining the test. + +Much more commonly, tests are defined not by a single yaml file, but by a +`directory tree of yaml files`. At runtime, the tree is walked and all yaml +files (facets) are combined into larger yaml "programs" that define the +tests. A full listing of the yaml defining the test is included at the +beginning of every test log. + +In these cases, the description of each test consists of the +subdirectory under `suites/ +`_ containing the +yaml facets, followed by an expression in curly braces (``{}``) consisting of +a list of yaml facets in order of concatenation. For instance the +test description:: + + ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml} + +signifies the concatenation of two files: + +* ceph-deploy/basic/distros/centos_7.0.yaml +* ceph-deploy/basic/tasks/ceph-deploy.yaml + +How tests are built from directories +------------------------------------ + +As noted in the previous section, most tests are not defined in a single +yaml file, but rather as a `combination` of files collected from a +directory tree within the ``suites/`` subdirectory of the `ceph/qa sub-directory`_. + +The set of all tests defined by a given subdirectory of ``suites/`` is +called an "integration test suite", or a "teuthology suite". + +Combination of yaml facets is controlled by special files (``%`` and +``+``) that are placed within the directory tree and can be thought of as +operators. The ``%`` file is the "convolution" operator and ``+`` +signifies concatenation. + +Convolution operator +^^^^^^^^^^^^^^^^^^^^ + +The convolution operator, implemented as an empty file called ``%``, tells +teuthology to construct a test matrix from yaml facets found in +subdirectories below the directory containing the operator. + +For example, the `ceph-deploy suite +`_ is +defined by the ``suites/ceph-deploy/`` tree, which consists of the files and +subdirectories in the following structure + +.. code-block:: none + + qa/suites/ceph-deploy + ├── % + ├── distros + │   ├── centos_latest.yaml + │   └── ubuntu_latest.yaml + └── tasks + ├── ceph-admin-commands.yaml + └── rbd_import_export.yaml + +This is interpreted as a 2x1 matrix consisting of two tests: + +1. ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml} +2. ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml} + +i.e. the concatenation of centos_7.0.yaml and ceph-deploy.yaml and +the concatenation of ubuntu_16.04.yaml and ceph-deploy.yaml, respectively. +In human terms, this means that the task found in ``ceph-deploy.yaml`` is +intended to run on both CentOS 7.0 and Ubuntu 16.04. + +Without the file percent, the ``ceph-deploy`` tree would be interpreted as +three standalone tests: + +* ceph-deploy/basic/distros/centos_7.0.yaml +* ceph-deploy/basic/distros/ubuntu_16.04.yaml +* ceph-deploy/basic/tasks/ceph-deploy.yaml + +(which would of course be wrong in this case). + +Referring to the `ceph/qa sub-directory`_, you will notice that the +``centos_7.0.yaml`` and ``ubuntu_16.04.yaml`` files in the +``suites/ceph-deploy/basic/distros/`` directory are implemented as symlinks. +By using symlinks instead of copying, a single file can appear in multiple +suites. This eases the maintenance of the test framework as a whole. + +All the tests generated from the ``suites/ceph-deploy/`` directory tree +(also known as the "ceph-deploy suite") can be run with + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite ceph-deploy + +An individual test from the `ceph-deploy suite`_ can be run by adding the +``--filter`` option + +.. prompt:: bash $ + + teuthology-suite \ + --machine-type smithi \ + --suite ceph-deploy/basic \ + --filter 'ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml}' + +.. note:: To run a standalone test like the one in `Reading a standalone + test`_, ``--suite`` alone is sufficient. If you want to run a single + test from a suite that is defined as a directory tree, ``--suite`` must + be combined with ``--filter``. This is because the ``--suite`` option + understands POSIX relative paths only. + +Concatenation operator +^^^^^^^^^^^^^^^^^^^^^^ + +For even greater flexibility in sharing yaml files between suites, the +special file plus (``+``) can be used to concatenate files within a +directory. For instance, consider the `suites/rbd/thrash +`_ +tree + +.. code-block:: none + + qa/suites/rbd/thrash + ├── % + ├── clusters + │   ├── + + │   ├── fixed-2.yaml + │   └── openstack.yaml + └── workloads + ├── rbd_api_tests_copy_on_read.yaml + ├── rbd_api_tests.yaml + └── rbd_fsx_rate_limit.yaml + +This creates two tests: + +* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} +* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests.yaml} + +Because the ``clusters/`` subdirectory contains the special file plus +(``+``), all the other files in that subdirectory (``fixed-2.yaml`` and +``openstack.yaml`` in this case) are concatenated together +and treated as a single file. Without the special file plus, they would +have been convolved with the files from the workloads directory to create +a 2x2 matrix: + +* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} +* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests.yaml} +* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests_copy_on_read.yaml} +* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests.yaml} + +The ``clusters/fixed-2.yaml`` file is shared among many suites to +define the following ``roles`` + +.. code-block:: yaml + + roles: + - [mon.a, mon.c, osd.0, osd.1, osd.2, client.0] + - [mon.b, osd.3, osd.4, osd.5, client.1] + +The ``rbd/thrash`` suite as defined above, consisting of two tests, +can be run with + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rbd/thrash + +A single test from the rbd/thrash suite can be run by adding the +``--filter`` option + +.. prompt:: bash $ + + teuthology-suite \ + --machine-type smithi \ + --suite rbd/thrash \ + --filter 'rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}' + +Filtering tests by their description +------------------------------------ + +When a few jobs fail and need to be run again, the ``--filter`` option +can be used to select tests with a matching description. For instance, if the +``rados`` suite fails the `all/peer.yaml `_ test, the following will only +run the tests that contain this file + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml + +The ``--filter-out`` option does the opposite (it matches tests that do `not` +contain a given string), and can be combined with the ``--filter`` option. + +Both ``--filter`` and ``--filter-out`` take a comma-separated list of strings +(which means the comma character is implicitly forbidden in filenames found in +the `ceph/qa sub-directory`_). For instance + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml,all/rest-api.yaml + +will run tests that contain either +`all/peer.yaml `_ +or +`all/rest-api.yaml `_ + +Each string is looked up anywhere in the test description and has to +be an exact match: they are not regular expressions. + +Reducing the number of tests +---------------------------- + +The ``rados`` suite generates tens or even hundreds of thousands of tests out +of a few hundred files. This happens because teuthology constructs test +matrices from subdirectories wherever it encounters a file named ``%``. For +instance, all tests in the `rados/basic suite +`_ run with +different messenger types: ``simple``, ``async`` and ``random``, because they +are combined (via the special file ``%``) with the `msgr directory +`_ + +All integration tests are required to be run before a Ceph release is +published. When merely verifying whether a contribution can be merged without +risking a trivial regression, it is enough to run a subset. The ``--subset`` +option can be used to reduce the number of tests that are triggered. For +instance + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados --subset 0/4000 + +will run as few tests as possible. The tradeoff in this case is that +not all combinations of test variations will together, +but no matter how small a ratio is provided in the ``--subset``, +teuthology will still ensure that all files in the suite are in at +least one test. Understanding the actual logic that drives this +requires reading the teuthology source code. + +The ``--limit`` option only runs the first ``N`` tests in the suite: +this is rarely useful, however, because there is no way to control which +test will be first. + +.. _ceph/qa sub-directory: https://github.com/ceph/ceph/tree/master/qa +.. _Sepia Lab: https://wiki.sepia.ceph.com/doku.php +.. _Integration Testing: ../testing_integration_tests/tests-integration-testing-teuthology-intro.rst +.. _teuthology repository: https://github.com/ceph/teuthology +.. _teuthology framework: https://github.com/ceph/teuthology +.. _teuthology-desribe usecases: https://gist.github.com/jdurgin/09711d5923b583f60afc + diff --git a/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst new file mode 100644 index 000000000000..9321210c3958 --- /dev/null +++ b/doc/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-workflow.rst @@ -0,0 +1,247 @@ +.. _tests-integration-testing-teuthology-workflow: + +Integration Tests using Teuthology Workflow +=========================================== + +Scheduling Test Run +------------------- + +Getting binaries +**************** + +To run integration tests using teuthology, you need to have Ceph binaries +built for your branch. Follow these steps to initiate the build process - + +#. Push the branch to `ceph-ci`_ repository. This triggers the process of + building the binaries. + +#. To confirm that the build process has been initiated, spot the branch name + at `Shaman`_. Little after the build process has been initiated, the single + entry with your branch name would multiply, each new entry for a different + combination of distro and flavour. + +#. Wait until the packages are built and uploaded, and the repository offering + them are created. This is marked by colouring the entries for the branch + name green. Preferably, wait until each entry is coloured green. Usually, + it takes around 2-3 hours depending on the availability of the machines. + +.. note:: Branch to be pushed on ceph-ci can be any branch, it shouldn't + necessarily be a PR branch. + +.. note:: In case you are pushing master or any other standard branch, check + `Shaman`_ beforehand since it already might have builds ready for it. + +Triggering Tests +**************** + +After building is complete, proceed to trigger tests - + +#. Log in to the teuthology machine:: + + ssh @teuthology.front.sepia.ceph.com + + This would require Sepia lab access. To know how to request it, see: https://ceph.github.io/sepia/adding_users/ + +#. Next, get teuthology installed. Run the first set of commands in + `Running Your First Test`_ for that. After that, activate the virtual + environment in which teuthology is installed. + +#. Run the ``teuthology-suite`` command:: + + teuthology-suite -v \ + -m smithi \ + -c wip-devname-feature-x \ + -s fs \ + -p 110 \ + --filter "cephfs-shell" \ + -e foo@gmail.com \ + -R fail + + Following are the options used in above command with their meanings - + -v verbose + -m machine name + -c branch name, the branch that was pushed on ceph-ci + -s test-suite name + -p higher the number, lower the priority of the job + --filter filter tests in given suite that needs to run, the arg to + filter should be the test you want to run + -e When tests finish or time out, send an email + here. May also be specified in ~/.teuthology.yaml + as 'results_email' + -R A comma-separated list of statuses to be used + with --rerun. Supported statuses are: 'dead', + 'fail', 'pass', 'queued', 'running', 'waiting' + [default: fail,dead] + +#. Wait for the tests to run. ``teuthology-suite`` prints a link to the + `Pulpito`_ page created for the tests triggered. + +.. note:: The priority number present in the command above is just a + placeholder. It might be highly inappropriate for the jobs you may want to + trigger. See `Testing Priority`_ section to pick a priority number. + +.. note:: Don't skip passing a priority number, the default value is 1000 + which is way too high; the job probably might never run. + +#. Wait for the tests to run. ``teuthology-suite`` prints a link to the + `Pulpito`_ page created for the tests triggered. + +Other frequently used/useful options are ``-d`` (or ``--distro``), +``--distroversion``, ``--filter-out``, ``--timeout``, ``flavor``, ``-rerun``, +``-l`` (for limiting number of jobs) , ``-n`` (for how many times job would +run). Run ``teuthology-suite --help`` to read description of these and every +other options available. + +Testing QA changes (without re-building binaires) +************************************************* + +While writing a PR you might need to test your PR repeatedly using teuthology. +If you are making non-QA changes, you need to follow the standard process of +triggering builds, waiting for it to finish and then triggering tests and +wait for the result. But if changes you made are purely changes in qa/, +you don't need rebuild the binaries. Instead you can test binaries built for +the ceph-ci branch and instruct ``teuthology-suite`` command to use a separate +branch for running tests. +The separate branch can be passed to the command by using ``--suite-repo`` and +``--suite-branch``. Pass the link to the GitHub fork where your PR branch exists +to the first option and pass the PR branch name to the second option. + +For example, if you want to make changes in ``qa/`` after testing ``branch-x`` +(of which has ceph-ci branch is ``wip-username-branch-x``) by running +following command:: + + teuthology-suite -v \ + -m smithi \ + -c wip-username-branch-x \ + -s fs \ + -p 50 + --filter cephfs-shell + + +You can make the modifications locally, update the PR branch and then +trigger tests from your PR branch as follows:: + + teuthology-suite -v \ + -m smithi \ + -c wip-username-branch-x \ + -s fs -p 50 \ + --filter cephfs-shell \ + --suite-repo https://github.com/$username/ceph \ + --suite-branch branch-x + +You can verify if the tests were run using this branch by looking at values +for the keys ``suite_branch``, ``suite_repo`` and ``suite_sha1`` in the job +config printed at the very beginning of the teuthology job. + +About Suites and Filters +************************ + +See `Suites Inventory`_ for a list of suites of integration tests present +right now. Alternatively, each directory under ``qa/suites`` in Ceph +repository is an integration test suite, so looking within that directory +to decide an appropriate argument for ``-s`` also works. + +For picking an argument for ``--filter``, look within +``qa/suites///tasks`` to get keywords for filtering +tests. Each YAML file in there can trigger a bunch of tests; using the name of +the file, without the extension part of the file name, as an argument to the +``--filter`` will trigger those tests. +For example, the sample command above uses ``cephfs-shell`` since there's a file +named ``cephfs-shell.yaml`` in ``qa/suites/fs/basic_functional/tasks/``. In +case, the file name doesn't hint what bunch of tests it would trigger, look at +the contents of the file for ``modules`` attribute. For ``cephfs-shell.yaml`` +the ``modules`` attribute is ``tasks.cephfs.test_cephfs_shell`` which means +it'll trigger all tests in ``qa/tasks/cephfs/test_cephfs_shell.py``. + +Viewing Tests Results +--------------------- + +Pulpito Dashboard +***************** + +Once the teuthology job is scheduled, the status/results for test run could +be checked from https://pulpito.ceph.com/. +It could be used for quickly checking out job logs... their status etc. + +Teuthology Archives +******************* + +Once the tests have finished running, the log for the job can be obtained by +clicking on job ID at the Pulpito page for your tests. It's more convenient to +download the log and then view it rather than viewing it in an internet browser +since these logs can easily be up to size of 1 GB. It is easier to +ssh into the teuthology machine again (``teuthology.front.sepia.ceph.com``), and +access the following path:: + + /ceph/teuthology-archive///teuthology.log + +For example, for above test ID path is:: + + /ceph/teuthology-archive/teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi/4588482/teuthology.log + +This way the log can be viewed remotely without having to wait too +much. + +.. note:: To access archives more conveniently, ``/a/`` has been symbolically + linked to ``/ceph/teuthology-archive/``. For instance, to access the previous + example, we can use something like:: + + /a/teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi/4588482/teuthology.log + +Killing Tests +------------- +Sometimes a teuthology job might not complete running for several minutes or +even hours after tests that were trigged have completed running and other +times wrong set of tests can be triggered is filter wasn't chosen carefully. +To save resource it's better to termniate such a job. Following is the command +to terminate a job:: + + teuthology-kill -r teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi + +Let's call the argument passed to ``-r`` as test ID. It can be found +easily in the link to the Pulpito page for the tests you triggered. For +example, for the above test ID, the link is - http://pulpito.front.sepia.ceph.com/teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi/ + +Re-running Tests +---------------- +You can pass --rerun option, with test ID as an argument to it, to +teuthology-suite command. Generally, this is useful in cases where teuthology test +batch has some failed/dead jobs that we might want to retrigger. We can trigger +jobs based on their status using:: + + teuthology-suite -v \ + -m smithi \ + -c wip-rishabh-fs-test_cephfs_shell-fix \ + -p 50 \ + --rerun teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi \ + -R fail,dead,queued,running \ + -e $CEPH_QA_MAIL + +The meaning of rest the of the options is already covered in `Triggering Tests`_ +section. + +Naming the ceph-ci branch +------------------------- +There are no hard conventions (except for the case of stable branch; see +next paragraph) for how the branch pushed on ceph-ci is named. But, to make +builds and tests easily identitifiable on Shaman and Pulpito respectively, +prepend it with your name. For example branch ``feature-x`` can be named +``wip-yourname-feature-x`` while pushing on ceph-ci. + +In case you are using one of the stable branches (e.g. nautilis, mimic, +etc.), include the name of that stable branch in your ceph-ci branch name. +For example, ``feature-x`` PR branch should be named as +``wip-feature-x-nautilus``. *This is not just a matter of convention but this, +more essentially, builds your branch in the correct environment.* + +Delete the branch from ceph-ci, once it's not required anymore. If you are +logged in at GitHub, all your branches on ceph-ci can be easily found here - +https://github.com/ceph/ceph-ci/branches. + +.. _ceph-ci: https://github.com/ceph/ceph-ci +.. _Pulpito: http://pulpito.front.sepia.ceph.com/ +.. _Running Your First Test: ../running-tests-locally/#running-your-first-test +.. _Shaman: https://shaman.ceph.com/builds/ceph/ +.. _Suites Inventory: ../tests-integration-testing-teuthology-intro.rst/#suites-inventory +.. _Testing Priority: ../tests-integration-testing-teuthology-intro.rst/#testing-priority +.. _Triggering Tests: ../tests-integration-testing-teuthology-workflow.rst/#triggering-tests diff --git a/doc/dev/developer_guide/testing_integration_tests/tests-sentry-developers-guide.rst b/doc/dev/developer_guide/testing_integration_tests/tests-sentry-developers-guide.rst new file mode 100644 index 000000000000..94dfae39aa6b --- /dev/null +++ b/doc/dev/developer_guide/testing_integration_tests/tests-sentry-developers-guide.rst @@ -0,0 +1,6 @@ +.. _tests-sentry-developers-guide: + +Sentry Notes +============ + +To be updated. Feel free to contribute. diff --git a/doc/dev/developer_guide/tests-integration-tests.rst b/doc/dev/developer_guide/tests-integration-tests.rst deleted file mode 100644 index 065903204022..000000000000 --- a/doc/dev/developer_guide/tests-integration-tests.rst +++ /dev/null @@ -1,528 +0,0 @@ -.. _testing-integration-tests: - -Testing - Integration Tests -=========================== - -Ceph has two types of tests: :ref:`make check ` tests and integration tests. -When a test requires multiple machines, root access or lasts for a -longer time (for example, to simulate a realistic Ceph deployment), it -is deemed to be an integration test. Integration tests are organized into -"suites", which are defined in the `ceph/qa sub-directory`_ and run with -the ``teuthology-suite`` command. - -The ``teuthology-suite`` command is part of the `teuthology framework`_. -In the sections that follow we attempt to provide a detailed introduction -to that framework from the perspective of a beginning Ceph developer. - -Teuthology consumes packages ----------------------------- - -It may take some time to understand the significance of this fact, but it -is `very` significant. It means that automated tests can be conducted on -multiple platforms using the same packages (RPM, DEB) that can be -installed on any machine running those platforms. - -Teuthology has a `list of platforms that it supports -`_ (as -of September 2020 the list consisted of "RHEL/CentOS 8" and "Ubuntu 18.04"). It -expects to be provided pre-built Ceph packages for these platforms. -Teuthology deploys these platforms on machines (bare-metal or -cloud-provisioned), installs the packages on them, and deploys Ceph -clusters on them - all as called for by the test. - -The Nightlies -------------- - -A number of integration tests are run on a regular basis in the `Sepia -lab`_ against the official Ceph repositories (on the ``master`` development -branch and the stable branches). Traditionally, these tests are called "the -nightlies" because the Ceph core developers used to live and work in -the same time zone and from their perspective the tests were run overnight. - -The results of the nightlies are published at http://pulpito.ceph.com/. The -developer nick shows in the -test results URL and in the first column of the Pulpito dashboard. The -results are also reported on the `ceph-qa mailing list -`_ for analysis. - -Testing Priority ----------------- - -The ``teuthology-suite`` command includes an almost mandatory option ``-p `` -which specifies the priority of the jobs submitted to the queue. The lower -the value of ``N``, the higher the priority. The option is almost mandatory -because the default is ``1000`` which matches the priority of the nightlies. -Nightlies are often half-finished and cancelled due to the volume of testing -done so your jobs may never finish. Therefore, it is common to select a -priority less than 1000. - -Job priority should be selected based on the following recommendations: - -* **Priority < 10:** Use this if the sky is falling and some group of tests - must be run ASAP. - -* **10 <= Priority < 50:** Use this if your tests are urgent and blocking - other important development. - -* **50 <= Priority < 75:** Use this if you are testing a particular - feature/fix and running fewer than about 25 jobs. This range can also be - used for urgent release testing. - -* **75 <= Priority < 100:** Tech Leads will regularly schedule integration - tests with this priority to verify pull requests against master. - -* **100 <= Priority < 150:** This priority is to be used for QE validation of - point releases. - -* **150 <= Priority < 200:** Use this priority for 100 jobs or fewer of a - particular feature/fix that you'd like results on in a day or so. - -* **200 <= Priority < 1000:** Use this priority for large test runs that can - be done over the course of a week. - -In case you don't know how many jobs would be triggered by -``teuthology-suite`` command, use ``--dry-run`` to get a count first and then -issue ``teuthology-suite`` command again, this time without ``--dry-run`` and -with ``-p`` and an appropriate number as an argument to it. - -To skip the priority check, use ``--force-priority``. In order to be sensitive -to the runs of other developers who also need to do testing, please use it in -emergency only. - -Suites Inventory ----------------- - -The ``suites`` directory of the `ceph/qa sub-directory`_ contains -all the integration tests, for all the Ceph components. - -`ceph-deploy `_ - install a Ceph cluster with ``ceph-deploy`` (:ref:`ceph-deploy man page `) - -`dummy `_ - get a machine, do nothing and return success (commonly used to - verify the :ref:`testing-integration-tests` infrastructure works as expected) - -`fs `_ - test CephFS mounted using FUSE - -`kcephfs `_ - test CephFS mounted using kernel - -`krbd `_ - test the RBD kernel module - -`multimds `_ - test CephFS with multiple MDSs - -`powercycle `_ - verify the Ceph cluster behaves when machines are powered off - and on again - -`rados `_ - run Ceph clusters including OSDs and MONs, under various conditions of - stress - -`rbd `_ - run RBD tests using actual Ceph clusters, with and without qemu - -`rgw `_ - run RGW tests using actual Ceph clusters - -`smoke `_ - run tests that exercise the Ceph API with an actual Ceph cluster - -`teuthology `_ - verify that teuthology can run integration tests, with and without OpenStack - -`upgrade `_ - for various versions of Ceph, verify that upgrades can happen - without disrupting an ongoing workload - -.. _`ceph-deploy man page`: ../../man/8/ceph-deploy - -teuthology-describe-tests -------------------------- - -In February 2016, a new feature called ``teuthology-describe-tests`` was -added to the `teuthology framework`_ to facilitate documentation and better -understanding of integration tests (`feature announcement -`_). - -The upshot is that tests can be documented by embedding ``meta:`` -annotations in the yaml files used to define the tests. The results can be -seen in the `ceph-qa-suite wiki -`_. - -Since this is a new feature, many yaml files have yet to be annotated. -Developers are encouraged to improve the documentation, in terms of both -coverage and quality. - -How integration tests are run ------------------------------ - -Given that - as a new Ceph developer - you will typically not have access -to the `Sepia lab`_, you may rightly ask how you can run the integration -tests in your own environment. - -One option is to set up a teuthology cluster on bare metal. Though this is -a non-trivial task, it `is` possible. Here are `some notes -`_ to get you started -if you decide to go this route. - -If you have access to an OpenStack tenant, you have another option: the -`teuthology framework`_ has an OpenStack backend, which is documented `here -`__. -This OpenStack backend can build packages from a given git commit or -branch, provision VMs, install the packages and run integration tests -on those VMs. This process is controlled using a tool called -``ceph-workbench ceph-qa-suite``. This tool also automates publishing of -test results at http://teuthology-logs.public.ceph.com. - -Running integration tests on your code contributions and publishing the -results allows reviewers to verify that changes to the code base do not -cause regressions, or to analyze test failures when they do occur. - -Every teuthology cluster, whether bare-metal or cloud-provisioned, has a -so-called "teuthology machine" from which tests suites are triggered using the -``teuthology-suite`` command. - -A detailed and up-to-date description of each `teuthology-suite`_ option is -available by running the following command on the teuthology machine - -.. prompt:: bash $ - - teuthology-suite --help - -.. _teuthology-suite: http://docs.ceph.com/teuthology/docs/teuthology.suite.html - -How integration tests are defined ---------------------------------- - -Integration tests are defined by yaml files found in the ``suites`` -subdirectory of the `ceph/qa sub-directory`_ and implemented by python -code found in the ``tasks`` subdirectory. Some tests ("standalone tests") -are defined in a single yaml file, while other tests are defined by a -directory tree containing yaml files that are combined, at runtime, into a -larger yaml file. - -Reading a standalone test -------------------------- - -Let us first examine a standalone test, or "singleton". - -Here is a commented example using the integration test -`rados/singleton/all/admin-socket.yaml -`_ - -.. code-block:: yaml - - roles: - - - mon.a - - osd.0 - - osd.1 - tasks: - - install: - - ceph: - - admin_socket: - osd.0: - version: - git_version: - help: - config show: - config set filestore_dump_file /tmp/foo: - perf dump: - perf schema: - -The ``roles`` array determines the composition of the cluster (how -many MONs, OSDs, etc.) on which this test is designed to run, as well -as how these roles will be distributed over the machines in the -testing cluster. In this case, there is only one element in the -top-level array: therefore, only one machine is allocated to the -test. The nested array declares that this machine shall run a MON with -id ``a`` (that is the ``mon.a`` in the list of roles) and two OSDs -(``osd.0`` and ``osd.1``). - -The body of the test is in the ``tasks`` array: each element is -evaluated in order, causing the corresponding python file found in the -``tasks`` subdirectory of the `teuthology repository`_ or -`ceph/qa sub-directory`_ to be run. "Running" in this case means calling -the ``task()`` function defined in that file. - -In this case, the `install -`_ -task comes first. It installs the Ceph packages on each machine (as -defined by the ``roles`` array). A full description of the ``install`` -task is `found in the python file -`_ -(search for "def task"). - -The ``ceph`` task, which is documented `here -`__ (again, -search for "def task"), starts OSDs and MONs (and possibly MDSs as well) -as required by the ``roles`` array. In this example, it will start one MON -(``mon.a``) and two OSDs (``osd.0`` and ``osd.1``), all on the same -machine. Control moves to the next task when the Ceph cluster reaches -``HEALTH_OK`` state. - -The next task is ``admin_socket`` (`source code -`_). -The parameter of the ``admin_socket`` task (and any other task) is a -structure which is interpreted as documented in the task. In this example -the parameter is a set of commands to be sent to the admin socket of -``osd.0``. The task verifies that each of them returns on success (i.e. -exit code zero). - -This test can be run with - -.. prompt:: bash $ - - teuthology-suite --machine-type smithi --suite rados/singleton/all/admin-socket.yaml fs/ext4.yaml - -Test descriptions ------------------ - -Each test has a "test description", which is similar to a directory path, -but not the same. In the case of a standalone test, like the one in -`Reading a standalone test`_, the test description is identical to the -relative path (starting from the ``suites/`` directory of the -`ceph/qa sub-directory`_) of the yaml file defining the test. - -Much more commonly, tests are defined not by a single yaml file, but by a -`directory tree of yaml files`. At runtime, the tree is walked and all yaml -files (facets) are combined into larger yaml "programs" that define the -tests. A full listing of the yaml defining the test is included at the -beginning of every test log. - -In these cases, the description of each test consists of the -subdirectory under `suites/ -`_ containing the -yaml facets, followed by an expression in curly braces (``{}``) consisting of -a list of yaml facets in order of concatenation. For instance the -test description:: - - ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml} - -signifies the concatenation of two files: - -* ceph-deploy/basic/distros/centos_7.0.yaml -* ceph-deploy/basic/tasks/ceph-deploy.yaml - -How tests are built from directories ------------------------------------- - -As noted in the previous section, most tests are not defined in a single -yaml file, but rather as a `combination` of files collected from a -directory tree within the ``suites/`` subdirectory of the `ceph/qa sub-directory`_. - -The set of all tests defined by a given subdirectory of ``suites/`` is -called an "integration test suite", or a "teuthology suite". - -Combination of yaml facets is controlled by special files (``%`` and -``+``) that are placed within the directory tree and can be thought of as -operators. The ``%`` file is the "convolution" operator and ``+`` -signifies concatenation. - -Convolution operator -^^^^^^^^^^^^^^^^^^^^ - -The convolution operator, implemented as an empty file called ``%``, tells -teuthology to construct a test matrix from yaml facets found in -subdirectories below the directory containing the operator. - -For example, the `ceph-deploy suite -`_ is -defined by the ``suites/ceph-deploy/`` tree, which consists of the files and -subdirectories in the following structure - -.. code-block:: none - - qa/suites/ceph-deploy - ├── % - ├── distros - │   ├── centos_latest.yaml - │   └── ubuntu_latest.yaml - └── tasks - ├── ceph-admin-commands.yaml - └── rbd_import_export.yaml - -This is interpreted as a 2x1 matrix consisting of two tests: - -1. ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml} -2. ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml} - -i.e. the concatenation of centos_7.0.yaml and ceph-deploy.yaml and -the concatenation of ubuntu_16.04.yaml and ceph-deploy.yaml, respectively. -In human terms, this means that the task found in ``ceph-deploy.yaml`` is -intended to run on both CentOS 7.0 and Ubuntu 16.04. - -Without the file percent, the ``ceph-deploy`` tree would be interpreted as -three standalone tests: - -* ceph-deploy/basic/distros/centos_7.0.yaml -* ceph-deploy/basic/distros/ubuntu_16.04.yaml -* ceph-deploy/basic/tasks/ceph-deploy.yaml - -(which would of course be wrong in this case). - -Referring to the `ceph/qa sub-directory`_, you will notice that the -``centos_7.0.yaml`` and ``ubuntu_16.04.yaml`` files in the -``suites/ceph-deploy/basic/distros/`` directory are implemented as symlinks. -By using symlinks instead of copying, a single file can appear in multiple -suites. This eases the maintenance of the test framework as a whole. - -All the tests generated from the ``suites/ceph-deploy/`` directory tree -(also known as the "ceph-deploy suite") can be run with - -.. prompt:: bash $ - - teuthology-suite --machine-type smithi --suite ceph-deploy - -An individual test from the `ceph-deploy suite`_ can be run by adding the -``--filter`` option - -.. prompt:: bash $ - - teuthology-suite \ - --machine-type smithi \ - --suite ceph-deploy/basic \ - --filter 'ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml}' - -.. note:: To run a standalone test like the one in `Reading a standalone - test`_, ``--suite`` alone is sufficient. If you want to run a single - test from a suite that is defined as a directory tree, ``--suite`` must - be combined with ``--filter``. This is because the ``--suite`` option - understands POSIX relative paths only. - -Concatenation operator -^^^^^^^^^^^^^^^^^^^^^^ - -For even greater flexibility in sharing yaml files between suites, the -special file plus (``+``) can be used to concatenate files within a -directory. For instance, consider the `suites/rbd/thrash -`_ -tree - -.. code-block:: none - - qa/suites/rbd/thrash - ├── % - ├── clusters - │   ├── + - │   ├── fixed-2.yaml - │   └── openstack.yaml - └── workloads - ├── rbd_api_tests_copy_on_read.yaml - ├── rbd_api_tests.yaml - └── rbd_fsx_rate_limit.yaml - -This creates two tests: - -* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} -* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests.yaml} - -Because the ``clusters/`` subdirectory contains the special file plus -(``+``), all the other files in that subdirectory (``fixed-2.yaml`` and -``openstack.yaml`` in this case) are concatenated together -and treated as a single file. Without the special file plus, they would -have been convolved with the files from the workloads directory to create -a 2x2 matrix: - -* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} -* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests.yaml} -* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests_copy_on_read.yaml} -* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests.yaml} - -The ``clusters/fixed-2.yaml`` file is shared among many suites to -define the following ``roles`` - -.. code-block:: yaml - - roles: - - [mon.a, mon.c, osd.0, osd.1, osd.2, client.0] - - [mon.b, osd.3, osd.4, osd.5, client.1] - -The ``rbd/thrash`` suite as defined above, consisting of two tests, -can be run with - -.. prompt:: bash $ - - teuthology-suite --machine-type smithi --suite rbd/thrash - -A single test from the rbd/thrash suite can be run by adding the -``--filter`` option - -.. prompt:: bash $ - - teuthology-suite \ - --machine-type smithi \ - --suite rbd/thrash \ - --filter 'rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}' - -Filtering tests by their description ------------------------------------- - -When a few jobs fail and need to be run again, the ``--filter`` option -can be used to select tests with a matching description. For instance, if the -``rados`` suite fails the `all/peer.yaml `_ test, the following will only -run the tests that contain this file - -.. prompt:: bash $ - - teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml - -The ``--filter-out`` option does the opposite (it matches tests that do `not` -contain a given string), and can be combined with the ``--filter`` option. - -Both ``--filter`` and ``--filter-out`` take a comma-separated list of strings -(which means the comma character is implicitly forbidden in filenames found in -the `ceph/qa sub-directory`_). For instance - -.. prompt:: bash $ - - teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml,all/rest-api.yaml - -will run tests that contain either -`all/peer.yaml `_ -or -`all/rest-api.yaml `_ - -Each string is looked up anywhere in the test description and has to -be an exact match: they are not regular expressions. - -Reducing the number of tests ----------------------------- - -The ``rados`` suite generates tens or even hundreds of thousands of tests out -of a few hundred files. This happens because teuthology constructs test -matrices from subdirectories wherever it encounters a file named ``%``. For -instance, all tests in the `rados/basic suite -`_ run with -different messenger types: ``simple``, ``async`` and ``random``, because they -are combined (via the special file ``%``) with the `msgr directory -`_ - -All integration tests are required to be run before a Ceph release is -published. When merely verifying whether a contribution can be merged without -risking a trivial regression, it is enough to run a subset. The ``--subset`` -option can be used to reduce the number of tests that are triggered. For -instance - -.. prompt:: bash $ - - teuthology-suite --machine-type smithi --suite rados --subset 0/4000 - -will run as few tests as possible. The tradeoff in this case is that -not all combinations of test variations will together, -but no matter how small a ratio is provided in the ``--subset``, -teuthology will still ensure that all files in the suite are in at -least one test. Understanding the actual logic that drives this -requires reading the teuthology source code. - -The ``--limit`` option only runs the first ``N`` tests in the suite: -this is rarely useful, however, because there is no way to control which -test will be first. - -.. _ceph/qa sub-directory: https://github.com/ceph/ceph/tree/master/qa -.. _Sepia Lab: https://wiki.sepia.ceph.com/doku.php -.. _teuthology repository: https://github.com/ceph/teuthology -.. _teuthology framework: https://github.com/ceph/teuthology