]> git.apps.os.sepia.ceph.com Git - teuthology.git/log
teuthology.git
3 years agosupervisor: Add missing continue statement 1689/head
Zack Cerza [Wed, 20 Oct 2021 20:34:39 +0000 (14:34 -0600)]
supervisor: Add missing continue statement

I missed this in #1688 ðŸ˜­

Signed-off-by: Zack Cerza <zack@redhat.com>
3 years agoMerge pull request #1688 from ceph/supervisor-unlock-safety
Zack Cerza [Wed, 20 Oct 2021 20:29:41 +0000 (14:29 -0600)]
Merge pull request #1688 from ceph/supervisor-unlock-safety

supervisor: Don't unlock nodes w/ bad description

3 years agosupervisor: Don't unlock nodes w/ bad description 1688/head
Zack Cerza [Wed, 20 Oct 2021 18:51:30 +0000 (12:51 -0600)]
supervisor: Don't unlock nodes w/ bad description

Very rarely, we enter a situation where nodes get used by two jobs
simultaneously. We can break this cycle if jobs refuse to unlock a
node that is locked by a different job.

This will not entirely prevent the problem, but it will keep it from
perpetuating itself.

Signed-off-by: Zack Cerza <zack@redhat.com>
3 years agoMerge PR #1685 into master
Patrick Donnelly [Fri, 15 Oct 2021 13:51:28 +0000 (09:51 -0400)]
Merge PR #1685 into master

* refs/pull/1685/head:
qa: remove extra debug message

Reviewed-by: Sage Weil <sage@redhat.com>
3 years agoMerge PR #1686 into master
Patrick Donnelly [Fri, 15 Oct 2021 13:50:41 +0000 (09:50 -0400)]
Merge PR #1686 into master

* refs/pull/1686/head:
tasks/kernel: fix missing arg to install_kernel

Reviewed-by: Sage Weil <sage@redhat.com>
3 years agotasks/kernel: fix missing arg to install_kernel i52944 1686/head
Patrick Donnelly [Fri, 15 Oct 2021 02:39:11 +0000 (22:39 -0400)]
tasks/kernel: fix missing arg to install_kernel

Fixes: https://tracker.ceph.com/issues/52944
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoqa: remove extra debug message 1685/head
Patrick Donnelly [Fri, 15 Oct 2021 02:29:46 +0000 (22:29 -0400)]
qa: remove extra debug message

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge PR #1682 into master
Patrick Donnelly [Thu, 14 Oct 2021 13:26:52 +0000 (09:26 -0400)]
Merge PR #1682 into master

* refs/pull/1682/head:
nuke: ignore internal directory listing when scanning for stale mounts in debugfs directory

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge pull request #1668 from ideepika/wip-flavor
Ilya Dryomov [Wed, 13 Oct 2021 15:52:50 +0000 (17:52 +0200)]
Merge pull request #1668 from ideepika/wip-flavor

suite: kernel_flavor >> flavor

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
3 years agoremove gitbuilder tests 1668/head
Deepika Upadhyay [Thu, 16 Sep 2021 12:49:43 +0000 (18:19 +0530)]
remove gitbuilder tests

gitbuilder has been replaced by shaman project and no longer being used,
the same functional testing we are doing with TestShamanProject,
therefore there is no need to maintain testing for obsolete code.

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agos/basic/default we no longer need mapping of basic to default
Deepika Upadhyay [Mon, 13 Sep 2021 20:59:07 +0000 (02:29 +0530)]
s/basic/default we no longer need mapping of basic to default

finally we are substituting basic as default when tring to do URI search,
we can remove this mangling and directly make use of default flavor,
we do not use flavor 'basic' for determining kernel flavor now.

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agosuite: kernel_flavor >> flavor
Deepika Upadhyay [Fri, 23 Jul 2021 14:21:49 +0000 (14:21 +0000)]
suite: kernel_flavor >> flavor

since addressing comment:
FIXME: ceph flavor and kernel flavor are separate things
remove basic -> default(flavor) mapping & update s/basic/default in docs

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agosuite: add flavor in base build_config
Deepika Upadhyay [Fri, 23 Jul 2021 14:13:05 +0000 (14:13 +0000)]
suite: add flavor in base build_config

build flavor remains common for all jobs and is something that is
good to look into while our jobs are being scheduled

This will be useful when changing from default flavor
to ones in available (crimson/jaeger) by using --flavor arg

teuthology/suite/test: update unit tests to test build flavor

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agosuite: update available flavor for teuthology-suite
Deepika Upadhyay [Fri, 23 Jul 2021 14:09:36 +0000 (14:09 +0000)]
suite: update available flavor for teuthology-suite

we now have new build flavors crimson and jaeger, update them in docs

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoMerge pull request #1683 from ceph/kernel-hwe
Sage Weil [Fri, 8 Oct 2021 15:51:20 +0000 (10:51 -0500)]
Merge pull request #1683 from ceph/kernel-hwe

tasks/kernel: add 'hwe' config flag for ubuntu distro hwe kernel

3 years agotasks/kernel: add 'hwe' config flag for ubuntu distro hwe kernel 1683/head
Sage Weil [Thu, 7 Oct 2021 15:03:22 +0000 (10:03 -0500)]
tasks/kernel: add 'hwe' config flag for ubuntu distro hwe kernel

The hwe kernel supports nvme_loop, but the non-hwe kernel does not.

I don't want to futz with the 'distro' moniker (although that is another
valid approach) because only some tests need hwe, and I can imagine a
situation where we want to run tests on both kernels.  Note that this
flag doesn't rule out adding support for something like
'-k distro-hwe' later.

Signed-off-by: Sage Weil <sage@newdream.net>
3 years agonuke: ignore internal directory listing when scanning for stale mounts in debugfs... 1682/head
Venky Shankar [Wed, 6 Oct 2021 11:12:59 +0000 (07:12 -0400)]
nuke: ignore internal directory listing when scanning for stale mounts in debugfs directory

kclient patchset:

    https://patchwork.kernel.org/project/ceph-devel/list/?series=556049

introduces meta directory to add debugging entries. This needs to be filtered
when scanning ceph debugfs directory.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
4 years agoMerge pull request #1673 from kamoltat/wip-ksirivad-fix-dic-update
Zack Cerza [Fri, 24 Sep 2021 20:23:24 +0000 (14:23 -0600)]
Merge pull request #1673 from kamoltat/wip-ksirivad-fix-dic-update

Fix dictionary update in try_push_job_info()

4 years agoMerge pull request #1675 from amathuria/wip-amathuri-fix-unlock
Zack Cerza [Mon, 20 Sep 2021 17:20:16 +0000 (11:20 -0600)]
Merge pull request #1675 from amathuria/wip-amathuri-fix-unlock

lock/ops.py: Fix retry mechanism for unlock_one

4 years agolock/ops.py: Fix retry mechanism for unlock_one 1675/head
Aishwarya Mathuria [Wed, 15 Sep 2021 18:10:57 +0000 (23:40 +0530)]
lock/ops.py: Fix retry mechanism for unlock_one

In unlock_one, we currently have a retry mechanism that is only triggered on a particular exception. With this change, we retry the request to unlock no matter what the cause of failure.

Fixes: https://tracker.ceph.com/issues/50921
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
4 years agoFix dictionary update in try_push_job_info() 1673/head
Kamoltat [Mon, 13 Sep 2021 21:14:41 +0000 (21:14 +0000)]
Fix dictionary update in try_push_job_info()

the function `try_push_job_info()` is not
updating `job_info` dictionary properly since
we want to update `job_info` with `extra_info`,
however, in lines 498 and 499 we are assigning
`job_info` to a copy of `extra_info` and updating
`job_info` with `job_config` which is incorrect.
Instead, we should assign `job_info` with
a copy of `job_config` and update `job_info` with
`extra_info`

4 years agoMerge pull request #1677 from ceph/no-fudge
Zack Cerza [Thu, 16 Sep 2021 22:38:21 +0000 (16:38 -0600)]
Merge pull request #1677 from ceph/no-fudge

tests: Drop fudge

4 years agotests: Drop fudge 1677/head
Zack Cerza [Thu, 16 Sep 2021 19:08:05 +0000 (13:08 -0600)]
tests: Drop fudge

Use mock instead.

Signed-off-by: Zack Cerza <zack@redhat.com>
4 years agoMerge pull request #1674 from ceph/wip-51944
Josh Durgin [Thu, 16 Sep 2021 00:37:24 +0000 (17:37 -0700)]
Merge pull request #1674 from ceph/wip-51944

supervisor: To preserve logs, delay nuking

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: David Galloway <dgallowa@redhat.com>
4 years agoMerge pull request #1676 from ceph/gh-actions
Zack Cerza [Wed, 15 Sep 2021 23:22:34 +0000 (17:22 -0600)]
Merge pull request #1676 from ceph/gh-actions

Add GitHub Actions definition for CI

4 years agotest_worker: Don't access the network! 1676/head
Zack Cerza [Wed, 15 Sep 2021 21:23:34 +0000 (15:23 -0600)]
test_worker: Don't access the network!

Signed-off-by: Zack Cerza <zack@redhat.com>
4 years agoAdd GitHub Actions definition for CI
Zack Cerza [Wed, 15 Sep 2021 20:21:31 +0000 (14:21 -0600)]
Add GitHub Actions definition for CI

Signed-off-by: Zack Cerza <zack@redhat.com>
4 years agosupervisor: To preserve logs, delay nuking 1674/head
Zack Cerza [Wed, 15 Sep 2021 17:46:33 +0000 (11:46 -0600)]
supervisor: To preserve logs, delay nuking

The previous behavior was causing machines to get nuked before any
attempt to fetch logs. If a machine took longer than 60s to become
available, collecting logs would fail. Since we also nuke after this
step, don't bother here.

Fixes: https://tracker.ceph.com/issues/51944
Signed-off-by: Zack Cerza <zack@redhat.com>
4 years agoRemote: log hostname when reconnecting
Zack Cerza [Wed, 15 Sep 2021 17:22:07 +0000 (11:22 -0600)]
Remote: log hostname when reconnecting

Signed-off-by: Zack Cerza <zack@redhat.com>
4 years agoMerge pull request #1672 from kshtsk/wip-fix-test-worker
kyr [Mon, 6 Sep 2021 19:17:04 +0000 (21:17 +0200)]
Merge pull request #1672 from kshtsk/wip-fix-test-worker

test/test_worker: fix test_prep_job teuth_config mocking

4 years agotest/test_worker: fix test_prep_job teuth_config mocking 1672/head
Kyr Shatskyy [Mon, 6 Sep 2021 12:36:55 +0000 (14:36 +0200)]
test/test_worker: fix test_prep_job teuth_config mocking

When home directory contains .teuthology.yaml the test_prep_job
can pick unexpected values, like teuthology_path which makes
tests fail.

This fix overrides teuthology config to avoid unexpected behaviour.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoMerge pull request #1671 from tchaikov/wip-tox-locale
Kefu Chai [Sat, 28 Aug 2021 12:18:14 +0000 (20:18 +0800)]
Merge pull request #1671 from tchaikov/wip-tox-locale

tox.ini: set a UTF-8 locale for decoding non-latin chars

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agobootstrap,tox.ini: set a UTF-8 locale for decoding non-latin chars 1671/head
Kefu Chai [Sat, 28 Aug 2021 01:10:12 +0000 (09:10 +0800)]
bootstrap,tox.ini: set a UTF-8 locale for decoding non-latin chars

Python3 uses locale.getdefaultlocale() to get the locale, which is used
to determine the encoding of filenames.

see https://docs.python.org/3/library/locale.html

there is chance that the specified locale is not capable of decoding the
filenames encoded in non-latin-1 characters, in that case, pip just fails
to decompress that file, like:

me/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 550, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 249, in unpack_url
    unpack_file(file.path, location, file.content_type)
  File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 256, in unpack_file
    untar_file(filename, location)
  File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
    with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)

in this change, UTF-8 is used, and also change the lang part to
en_US, so the output should be readable to who can read English.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
4 years agoMerge pull request #1669 from ceph/rh-user-sub
Vasu Kulkarni [Tue, 24 Aug 2021 17:13:53 +0000 (13:13 -0400)]
Merge pull request #1669 from ceph/rh-user-sub

task/internal/redhat: remove rh stage creds

4 years agoremove rh stage creds rh-user-sub 1669/head
rakeshgm [Tue, 17 Aug 2021 14:11:23 +0000 (19:41 +0530)]
remove rh stage creds

use stage creds from config file

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
4 years agoMerge pull request #1667 from kshtsk/wip-bump-ansible
Kefu Chai [Sun, 8 Aug 2021 04:03:27 +0000 (12:03 +0800)]
Merge pull request #1667 from kshtsk/wip-bump-ansible

requirements: bump ansible

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agobootstrap: drop hardcode LC_ALL=C 1667/head
Kyr Shatskyy [Fri, 6 Aug 2021 14:20:02 +0000 (16:20 +0200)]
bootstrap: drop hardcode LC_ALL=C

Trying to resolve UnicodeEncodeError while trying to install bootstrap:

ERROR: Exception:
Traceback (most recent call last):
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
    status = self.run(options, args)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
    return func(self, options, args)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 316, in run
    reqs, check_supported_wheels=not options.target_dir
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
    collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
    if not criterion.candidates:
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
    return any(self)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
    candidate = func()
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 209, in _make_candidate_from_link
    version=version,
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in __init__
    version=version,
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
    dist = self._prepare_distribution()
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in _prepare_distribution
    self._ireq, parallel_builds=True
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 552, in _prepare_linked_requirement
    self.download_dir, hashes
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 249, in unpack_url
    unpack_file(file.path, location, file.content_type)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 256, in unpack_file
    untar_file(filename, location)
  File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
    with open(path, "wb") as destfp:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 112: ordinal not in range(128)

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agobootstrap: do not use 'which'
Kyr Shatskyy [Fri, 6 Aug 2021 14:07:48 +0000 (16:07 +0200)]
bootstrap: do not use 'which'

Since 'which' command can be absent on some systems by default,
it is easier to test a tool bresence just trying to print its
version only.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agorequirements: bump ansible
Kyr Shatskyy [Fri, 6 Aug 2021 13:14:48 +0000 (15:14 +0200)]
requirements: bump ansible

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoMerge pull request #1662 from ceph/wip-badone-ceph-ansible-ubutu-upgrade-test
Brad Hubbard [Tue, 3 Aug 2021 20:25:08 +0000 (06:25 +1000)]
Merge pull request #1662 from ceph/wip-badone-ceph-ansible-ubutu-upgrade-test

ceph_ansible: Remove --system-site-packages

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
4 years agoceph_ansible: Remove --system-site-packages 1662/head
Brad Hubbard [Tue, 27 Jul 2021 04:04:04 +0000 (14:04 +1000)]
ceph_ansible: Remove --system-site-packages

Upgrading ansible is problematic as it conflicts with the installed
package on Ubuntu 20.04 so don't try to use system packages.

Add LANG environment variable to 'pip install ansible' command to work
around pip failing due to file names in ansible package with exotic
characters.

Fixes: https://tracker.ceph.com/issues/51856
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
4 years agoMerge pull request #1661 from tchaikov/wip-rtd-docs
Kefu Chai [Tue, 27 Jul 2021 08:49:29 +0000 (16:49 +0800)]
Merge pull request #1661 from tchaikov/wip-rtd-docs

.readthedocs: add .readthedocs.yml

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years ago.readthedocs: add .readthedocs.yml 1661/head
Kefu Chai [Sun, 25 Jul 2021 04:33:57 +0000 (12:33 +0800)]
.readthedocs: add .readthedocs.yml

to address the missing document of
https://docs.ceph.com/projects/teuthology/en/latest/commands/list.html,

they assume teuthology cli tools in $PATH when building sphinx document.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #1659 from kshtsk/wip-exclude-image
Josh Durgin [Fri, 16 Jul 2021 17:49:50 +0000 (10:49 -0700)]
Merge pull request #1659 from kshtsk/wip-exclude-image

openstack: add exclude_image regex parameter

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #1592 from kshtsk/wip-podman
Josh Durgin [Thu, 15 Jul 2021 20:17:27 +0000 (13:17 -0700)]
Merge pull request #1592 from kshtsk/wip-podman

add docker-compose scripts for development setups

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #1660 from kshtsk/wip-job-threshold
kyr [Thu, 15 Jul 2021 19:56:38 +0000 (21:56 +0200)]
Merge pull request #1660 from kshtsk/wip-job-threshold

suite: rename disable-num-jobs-check to job-threshold

4 years agosuite: rename disable-num-jobs-check to job-threshold 1660/head
Kyr Shatskyy [Thu, 15 Jul 2021 10:32:21 +0000 (12:32 +0200)]
suite: rename disable-num-jobs-check to job-threshold

Rename --disable-num-jobs-check to --job-threshold:
- for shorter recallable name;
- to allow change threshold value via parameter;
- to allow define default threshold value in teuthology config.

Use `--job-threshold 0` to disable job threshold check.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agodocker-compose: add some usage notes to the file 1592/head
Kyr Shatskyy [Thu, 15 Jul 2021 11:10:05 +0000 (13:10 +0200)]
docker-compose: add some usage notes to the file

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoopenstack: add exclude_image regex parameter 1659/head
Kyr Shatskyy [Wed, 14 Jul 2021 14:48:45 +0000 (16:48 +0200)]
openstack: add exclude_image regex parameter

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoMerge pull request #1657 from ceph/wip-max-jobs
Neha Ojha [Mon, 12 Jul 2021 21:13:53 +0000 (14:13 -0700)]
Merge pull request #1657 from ceph/wip-max-jobs

teuthology/suite/run.py, scripts/suite.py: disallow scheduling too many jobs

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoteuthology/suite/run.py, scripts/suite.py: disallow scheduling too many jobs 1657/head
Neha Ojha [Thu, 1 Jul 2021 18:28:05 +0000 (18:28 +0000)]
teuthology/suite/run.py, scripts/suite.py: disallow scheduling too many jobs

Add check_num_jobs() to prevent users from accidentally scheduling too many jobs, like
in rfriedma-2021-06-26_19:32:15-rados-wip-ronenf-scrubs-config-distro-basic-smithi.
JOBS_TO_SCHEDULE_THRESHOLD, set to 500 (most runs have fewer jobs than this),
will disallow users from scheduling more than 500 jobs. Users can schedule
more than 500 jobs by disabling this check using the --disable-num-jobs-check flag.

Signed-off-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #1654 from ceph/wip-focal
David Galloway [Wed, 7 Jul 2021 15:42:13 +0000 (11:42 -0400)]
Merge pull request #1654 from ceph/wip-focal

Update distro maps

4 years agotests: Update latest distro versions 1654/head
David Galloway [Wed, 7 Jul 2021 14:22:55 +0000 (10:22 -0400)]
tests: Update latest distro versions

Signed-off-by: David Galloway <dgallowa@redhat.com>
4 years agoMerge pull request #1653 from ceph/nuke-9
Josh Durgin [Mon, 21 Jun 2021 21:46:12 +0000 (14:46 -0700)]
Merge pull request #1653 from ceph/nuke-9

nuke: kill -9 the teuthology process

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoorchestra/opsys: Add CentOS 9 Stream
David Galloway [Thu, 10 Jun 2021 15:02:59 +0000 (11:02 -0400)]
orchestra/opsys: Add CentOS 9 Stream

Signed-off-by: David Galloway <dgallowa@redhat.com>
4 years agoorchestra/opsys: Update latest distro versions
David Galloway [Thu, 10 Jun 2021 15:02:26 +0000 (11:02 -0400)]
orchestra/opsys: Update latest distro versions

Signed-off-by: David Galloway <dgallowa@redhat.com>
4 years agonuke: kill -9 the teuthology process 1653/head
Sage Weil [Wed, 9 Jun 2021 19:51:37 +0000 (14:51 -0500)]
nuke: kill -9 the teuthology process

If the process has been kill -STOPped, then we'll unlock the machines, but
the process will stick around and we'll try to nuke it again later,
zapping the machines after they're being used by some other job, leading
to failures.  (Usually this manifests as an error when the other job stops
where it has trouble gzipping the logs.)

Use -9 to make sure even STOPped processes are killed.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #1625 from ideepika/wip-machine-type
kyr [Thu, 27 May 2021 11:01:42 +0000 (13:01 +0200)]
Merge pull request #1625 from ideepika/wip-machine-type

teuthology-suite: add default machine type(smithi)

4 years agoteuthology-suite: pick _machine_type /etc/teuthology.yml if not specified explicitly 1625/head
Deepika Upadhyay [Fri, 5 Mar 2021 07:04:33 +0000 (07:04 +0000)]
teuthology-suite: pick _machine_type /etc/teuthology.yml if not specified explicitly

right now, users have to always pass --machine-type when scheduling a
run, when not specified, command fails with no machine type specified
error.
Instead of failing, we can have `default_machine_type` which in our case
should pick smithi, specified in /etc/teuthology.yml

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
4 years agoMerge pull request #1645 from ceph/exec-all-hosts
Josh Durgin [Tue, 25 May 2021 14:22:32 +0000 (07:22 -0700)]
Merge pull request #1645 from ceph/exec-all-hosts

tasks/exec: add all-roles, all-hosts keys

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge PR #1571 into master
Patrick Donnelly [Fri, 21 May 2021 23:44:06 +0000 (16:44 -0700)]
Merge PR #1571 into master

* refs/pull/1571/head:
rpm: retry installing packages if mirrors are temporarily unreachable

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoMerge pull request #1649 from badone/wip-tracker-50922-container-build-wrong-arch
Brad Hubbard [Fri, 21 May 2021 22:05:22 +0000 (08:05 +1000)]
Merge pull request #1649 from badone/wip-tracker-50922-container-build-wrong-arch

Restrict build_complete check to x86_64

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoRestrict build_complete check to x86_64 1649/head
Brad Hubbard [Fri, 21 May 2021 05:22:26 +0000 (15:22 +1000)]
Restrict build_complete check to x86_64

Without this restriction a failed arm64 build will result in the
container build reporting failure.

Fixes: https://tracker.ceph.com/issues/50922
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
4 years agoMerge pull request #1648 from sunilkumarn417/add-install
Vasu Kulkarni [Wed, 19 May 2021 16:27:49 +0000 (09:27 -0700)]
Merge pull request #1648 from sunilkumarn417/add-install

Include install task to get all RPM pkgs

4 years agoInclude install task to get all RPM pkgs 1648/head
sunilkumarn417 [Wed, 19 May 2021 16:09:32 +0000 (21:39 +0530)]
Include install task to get all RPM pkgs

Signed-off-by: sunilkumarn417 <sunnagar@redhat.com>
4 years agoMerge pull request #1646 from sunilkumarn417/container-tool-setup
Josh Durgin [Tue, 18 May 2021 15:59:44 +0000 (08:59 -0700)]
Merge pull request #1646 from sunilkumarn417/container-tool-setup

task/internal/redhat.py: added container tool login support

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: rakeshgm <rakeshgm@redhat.com>
4 years agotask/internal/redhat.py: added container tool login support, 1646/head
sunilkumarn417 [Fri, 14 May 2021 09:08:31 +0000 (14:38 +0530)]
task/internal/redhat.py: added container tool login support,
this task is essential to access monitoring images from
Red Hat registry source.

Signed-off-by: sunilkumarn417 <sunnagar@redhat.com>
4 years agoMerge pull request #1647 from ceph/no-journald-to-syslog
Sage Weil [Mon, 17 May 2021 16:19:07 +0000 (11:19 -0500)]
Merge pull request #1647 from ceph/no-journald-to-syslog

task/internal/syslog: avoid failing runs when ceph daemon logs go to syslog misc.log

4 years agotask/internal/syslog: ignore misc.log 1647/head
Sage Weil [Fri, 14 May 2021 15:31:25 +0000 (10:31 -0500)]
task/internal/syslog: ignore misc.log

These regexes are all intended for kernel errors.  Ceph daemon
logs may leak into misc.log (*shakes fist at systemd-journald*)
and cause false positives (i.e., test failures).

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #1564 from ceph/wip-lemme-kill-runs
kyr [Sat, 15 May 2021 12:36:36 +0000 (14:36 +0200)]
Merge pull request #1564 from ceph/wip-lemme-kill-runs

kill.py: Allow deleting runs where bootstrap is failing

4 years agotasks/exec: add all-roles, all-hosts keys 1645/head
Sage Weil [Mon, 10 May 2021 16:13:25 +0000 (11:13 -0500)]
tasks/exec: add all-roles, all-hosts keys

'all' is ambiguous!

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agodocker-compose: add healthcheck for paddles postgres
Kyr Shatskyy [Fri, 29 Jan 2021 19:49:57 +0000 (20:49 +0100)]
docker-compose: add healthcheck for paddles postgres

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoadd docker-compose for starting paddles and pulpito
Kyr Shatskyy [Wed, 23 Dec 2020 10:42:15 +0000 (11:42 +0100)]
add docker-compose for starting paddles and pulpito

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agodocs: fix openSUSE qcow image links
Kyr Shatskyy [Fri, 11 Dec 2020 17:56:47 +0000 (18:56 +0100)]
docs: fix openSUSE qcow image links

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoMerge pull request #1644 from tchaikov/wip-pip
Kefu Chai [Fri, 30 Apr 2021 09:55:20 +0000 (17:55 +0800)]
Merge pull request #1644 from tchaikov/wip-pip

bootstrap: do not pass '--use-feature=2020-resolver' to pip

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
4 years agoMerge PR #1626 into master
Patrick Donnelly [Thu, 29 Apr 2021 23:55:15 +0000 (16:55 -0700)]
Merge PR #1626 into master

* refs/pull/1626/head:
orchestra: move methods for shell commands from remote.Remote

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agobootstrap: do not pass '--use-feature=2020-resolver' to pip 1644/head
Kefu Chai [Thu, 29 Apr 2021 14:43:14 +0000 (22:43 +0800)]
bootstrap: do not pass '--use-feature=2020-resolver' to pip

this reverts f2607ee8ce149f2951c5fd62c259fc4fa3ddcb5a

to silence the warning from pip:

WARNING: --use-feature=2020-resolver no longer has any effect, since it
is now the default dependency resolver in pip. This will become an error
in pip 21.0.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #1642 from jdurgin/wip-retry-paddles-reads
Josh Durgin [Thu, 29 Apr 2021 02:30:22 +0000 (19:30 -0700)]
Merge pull request #1642 from jdurgin/wip-retry-paddles-reads

lock/query: make robust against paddles errors

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agolock/query: make robust against paddles errors 1642/head
Josh Durgin [Tue, 20 Apr 2021 05:49:43 +0000 (01:49 -0400)]
lock/query: make robust against paddles errors

Retry paddles requests, and for get_status() return an empty dict
rather than None so callers behave.

get_status() failing in particular has caused the dispatcher and jobs
to fail several times over the past few weeks. With this change, we
should be able to run multiple paddles workers again, since all the
common callers will retry on error.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #1641 from tchaikov/always-be-happy
Kefu Chai [Mon, 12 Apr 2021 12:29:33 +0000 (20:29 +0800)]
Merge pull request #1641 from tchaikov/always-be-happy

task/internal: do not fail the script if systemd-sysusers core file not found

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agotask/internal: do not fail the script if systemd-sysusers core file not found 1641/head
Kefu Chai [Mon, 12 Apr 2021 05:16:10 +0000 (13:16 +0800)]
task/internal: do not fail the script if systemd-sysusers core file not found

in 79f373c1769ea4f9d744cf33c5b0a0e026922d0f, we started to filter out
the systemd-sysusers core files. but the script fails if no such a file
is found, like:

2021-04-12T02:58:51.065 ERROR:teuthology.run_tasks:Manager failed: internal.coredump
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/run_tasks.py", line 176, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/task/internal/__init__.py", line 479, in coredump
    wait=False,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi165 with status 1: "sudo sysctl -w kernel.core_pattern=core && sudo bash -c 'for f in `find /home/ubuntu/cephtest/archive/coredump
-type f`; do file $f | grep -q systemd-sysusers && rm $f ; done' && rmdir --ignore-fail-on-non-empty -- /home/ubuntu/cephtest/archive/coredump"

in this change, we ensure that the script never fails by adding `|| true`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotask/internal: split embedded shell into lines
Kefu Chai [Mon, 12 Apr 2021 05:15:24 +0000 (13:15 +0800)]
task/internal: split embedded shell into lines

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoorchestra: move methods for shell commands from remote.Remote 1626/head
Rishabh Dave [Thu, 4 Mar 2021 11:08:37 +0000 (16:38 +0530)]
orchestra: move methods for shell commands from remote.Remote

Move methods that issue commands via shell and that don't necessarily
need to depend on SHH from class Remote to a different class. This
enables applications like vstart_runner.py (in Ceph repo) to reuse these
methods for running tests locally without necessarily depending on SSH
and without duplicating them in vstart_runner.py.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
4 years agoMerge PR #1634 into master
Patrick Donnelly [Wed, 31 Mar 2021 18:06:08 +0000 (11:06 -0700)]
Merge PR #1634 into master

* refs/pull/1634/head:
orchestra/remote: extend mktemp() to accept data

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoorchestra/remote: extend mktemp() to accept data 1634/head
Rishabh Dave [Fri, 26 Mar 2021 09:26:11 +0000 (14:56 +0530)]
orchestra/remote: extend mktemp() to accept data

Extend remote.Remote.mktemp() to accept data as a parameter and write
the data to the temporary file after it is created.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
4 years agoMerge pull request #1636 from ideepika/fix-interactive-error
Josh Durgin [Mon, 29 Mar 2021 21:58:24 +0000 (14:58 -0700)]
Merge pull request #1636 from ideepika/fix-interactive-error

check ctx.archive is present or not in yaml config

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agocheck ctx.archive is present or not in yaml config 1636/head
Deepika Upadhyay [Mon, 29 Mar 2021 14:46:51 +0000 (20:16 +0530)]
check ctx.archive is present or not in yaml config

this specifically is for interactive on error mode where we usually do
not specify archive_path which fails without this check

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
4 years agoMerge pull request #1633 from jdurgin/wip-retry-paddles-writes
Josh Durgin [Thu, 25 Mar 2021 17:05:33 +0000 (10:05 -0700)]
Merge pull request #1633 from jdurgin/wip-retry-paddles-writes

report, lock.ops: retry write requests to paddles

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoreport, lock.ops: retry write requests to paddles 1633/head
Josh Durgin [Sun, 21 Mar 2021 22:28:52 +0000 (18:28 -0400)]
report, lock.ops: retry write requests to paddles

For more contended cases of updating job status and machine keys,
where we've seen 500 errors from DB conflicts, use random intervals
for the retries.

This is the teuthology half of fixing:
https://tracker.ceph.com/issues/49864

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #1632 from ceph/revert-nuke
Sage Weil [Sun, 21 Mar 2021 18:16:35 +0000 (13:16 -0500)]
Merge pull request #1632 from ceph/revert-nuke

Revert "Merge pull request #1631 from jdurgin/wip-nuke-poweroff"

4 years agoRevert "Merge pull request #1631 from jdurgin/wip-nuke-poweroff" 1632/head
Sage Weil [Sun, 21 Mar 2021 16:39:13 +0000 (11:39 -0500)]
Revert "Merge pull request #1631 from jdurgin/wip-nuke-poweroff"

This reverts commit c48eb744081d22bc82d7d099d4edb67ae02551e0, reversing
changes made to b96569170f15eae4604f361990ea65737b28dff1.

This is causing log gzipping to fail because the logs already exist as .gz files.
My guess is that the logs are left over from previous, but I'm not sure how
that would happen.

In any case, the merge of this PR corresponds exactly to when we started seeing
the log gzip failures.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #1631 from jdurgin/wip-nuke-poweroff
Josh Durgin [Fri, 19 Mar 2021 22:50:18 +0000 (15:50 -0700)]
Merge pull request #1631 from jdurgin/wip-nuke-poweroff

nuke: don't power-off machines when not rebooting

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agonuke: don't power-off machines when not rebooting 1631/head
Josh Durgin [Fri, 19 Mar 2021 21:01:20 +0000 (21:01 +0000)]
nuke: don't power-off machines when not rebooting

This ensures jobs that time out can still have their logs gathered.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #1628 from ceph/ignore-systemd-sysusers-core
Josh Durgin [Sat, 13 Mar 2021 03:30:42 +0000 (19:30 -0800)]
Merge pull request #1628 from ceph/ignore-systemd-sysusers-core

task/internal: ignore systemd-sysusers core file

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agotask/internal: ignore systemd-sysusers core file 1628/head
Sage Weil [Fri, 12 Mar 2021 17:58:47 +0000 (11:58 -0600)]
task/internal: ignore systemd-sysusers core file

This is related to dnsmasq.  When installing hte kubic podman 3.0.1
packages,

  Running scriptlet: dnsmasq-2.79-13.el8_3.1.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                            14/16
/var/tmp/rpm-tmp.6MFp00: line 5:  9079 Segmentation fault      (core dumped) systemd-sysusers -  &> /dev/null <<SYSTEMD_INLINE_EOF
u dnsmasq - "Dnsmasq DHCP and DNS server" /var/lib/dnsmasq
SYSTEMD_INLINE_EOF

  Installing       : dnsmasq-2.79-13.el8_3.1.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                            14/16
warning: group dnsmasq does not exist - using root
warning: group dnsmasq does not exist - using root
warning: group dnsmasq does not exist - using root

  Running scriptlet: dnsmasq-2.79-13.el8_3.1.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                            14/16
/var/tmp/rpm-tmp.pfCGxn: line 3:  9089 Segmentation fault      (core dumped) systemd-sysusers &> /dev/null

  Installing       : podman-3.0.1-2.el8.3.2.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                             15/16
  Installing       : podman-plugins-3.0.1-2.el8.3.2.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                     16/16
  Running scriptlet: container-selinux-2:2.145.0-1.el8.noarch                                                                                                                                                                                                                                                                                                                                                                                                                  16/16
  Running scriptlet: podman-plugins-3.0.1-2.el8.3.2.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                     16/16
/var/tmp/rpm-tmp.bFfmjl: line 6: 11098 Segmentation fault      (core dumped) /usr/bin/systemd-sysusers
warning: %triggerin(systemd-239-18.el8.x86_64) scriptlet failed, exit status 139

Error in <unknown> scriptlet in rpm package podman-plugins
  Verifying        : dnsmasq-2.79-13.el8_3.1.x86_64                                                                                                                                                                                                                                                                                                                                                                                                                             1/16

Nothing to do with us.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #1573 from smithfarm/wip-45570
kyr [Fri, 12 Mar 2021 09:20:23 +0000 (10:20 +0100)]
Merge pull request #1573 from smithfarm/wip-45570

orchestra/console: raise RuntimeError when fail to power on

4 years agoMerge pull request #1627 from ceph/wip-debug-levels
Josh Durgin [Thu, 11 Mar 2021 16:48:34 +0000 (08:48 -0800)]
Merge pull request #1627 from ceph/wip-debug-levels

suite/placeholder.py: lower osd specific debug levels

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agosuite/placeholder.py: lower osd specific debug levels 1627/head
Neha Ojha [Wed, 10 Mar 2021 23:33:55 +0000 (23:33 +0000)]
suite/placeholder.py: lower osd specific debug levels

Signed-off-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #1620 from ceph/wip-badone-ceph-ansible-tracker-49485
Brad Hubbard [Tue, 9 Mar 2021 22:20:21 +0000 (08:20 +1000)]
Merge pull request #1620 from ceph/wip-badone-ceph-ansible-tracker-49485

ceph_ansible: Satisfy 'six' dependency

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
4 years agoselinux: fix typo
Sage Weil [Sat, 27 Feb 2021 20:13:30 +0000 (14:13 -0600)]
selinux: fix typo

Signed-off-by: Sage Weil <sage@newdream.net>