]> git-server-git.apps.pok.os.sepia.ceph.com Git - teuthology.git/log
teuthology.git
3 days agoMerge pull request #2141 from ceph/container-manifest main
Zack Cerza [Thu, 16 Apr 2026 20:56:07 +0000 (14:56 -0600)]
Merge pull request #2141 from ceph/container-manifest

workflows/dev_container: Fix multi-arch images

3 weeks agoMerge PR #2159 into main
Patrick Donnelly [Wed, 25 Mar 2026 13:54:10 +0000 (09:54 -0400)]
Merge PR #2159 into main

* refs/pull/2159/head:
teuthology/suite: log postmerge filtering

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
3 weeks agoMerge pull request #2160 from Adarsha1999/openstack-rocky10-user-data
kyr [Wed, 25 Mar 2026 10:51:50 +0000 (11:51 +0100)]
Merge pull request #2160 from Adarsha1999/openstack-rocky10-user-data

openstack: Add cloud-init user-data for Rocky Linux 10 and 10.1

3 weeks agoopenstack: Add cloud-init user-data for Rocky Linux 10 and 10.1 2160/head
Adarsha Dinda [Tue, 24 Mar 2026 17:51:23 +0000 (23:21 +0530)]
openstack: Add cloud-init user-data for Rocky Linux 10 and 10.1

4 weeks agoteuthology/suite: log postmerge filtering 2159/head
Patrick Donnelly [Mon, 23 Mar 2026 13:10:12 +0000 (09:10 -0400)]
teuthology/suite: log postmerge filtering

Otherwise it's hard to discern what caused a job to be dropped. The
"postmerge" script itself may be empty but the other filtering still
runs.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 weeks agoMerge pull request #2158 from ceph/utf8-fix
Zack Cerza [Wed, 18 Mar 2026 15:53:57 +0000 (09:53 -0600)]
Merge pull request #2158 from ceph/utf8-fix

bootstrap: Support nonstandard locales

4 weeks agobootstrap: Support nonstandard locales 2158/head
Zack Cerza [Tue, 17 Mar 2026 20:32:38 +0000 (14:32 -0600)]
bootstrap: Support nonstandard locales

Some newer systems use e.g. LANG=C.utf8, which is breaking bootstrap since it expects to see e.g. C.utf-8. Instead of trying to parse the values, simply split on '.' and append 'utf-8'.

Signed-off-by: Zack Cerza <zack@cerza.org>
4 weeks agoMerge pull request #2155 from tchaikov/seed
Zack Cerza [Tue, 17 Mar 2026 18:04:10 +0000 (12:04 -0600)]
Merge pull request #2155 from tchaikov/seed

schedule: fix first-in-suite option help

4 weeks agoschedule: fix first-in-suite option help 2155/head
Kefu Chai [Tue, 17 Mar 2026 08:15:38 +0000 (16:15 +0800)]
schedule: fix first-in-suite option help

The schedule command validates --seed, --subset and
--no-nested-subset together with --first-in-suite, and the suite
runner passes them that way when writing the rerun memo.

Update the help text to match the implemented behavior.

Reported-by: T K Chandra Hasan <t.k.chandra.hasan@ibm.com>
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
6 weeks agoMerge pull request #2150 from anshuman-agarwala/python-13-bump
Zack Cerza [Fri, 6 Mar 2026 23:04:56 +0000 (16:04 -0700)]
Merge pull request #2150 from anshuman-agarwala/python-13-bump

requirements: bumped packages for python3.13

7 weeks agoBumped packages for python3-13 2150/head
Anshuman [Mon, 23 Feb 2026 04:46:55 +0000 (10:16 +0530)]
Bumped packages for python3-13

7 weeks agoMerge pull request #2148 from ceph/no-ref-sha1
Zack Cerza [Wed, 25 Feb 2026 21:12:59 +0000 (14:12 -0700)]
Merge pull request #2148 from ceph/no-ref-sha1

rpm.py: Do not sed ref/sha1

8 weeks agorpm.py: Do not sed ref/sha1 2148/head
David Galloway [Thu, 19 Feb 2026 15:28:01 +0000 (10:28 -0500)]
rpm.py: Do not sed ref/sha1

This is a relic from gitbuilder days and causes issues if a dev puts "ref" in the branch name

Signed-off-by: David Galloway <david.galloway@ibm.com>
2 months agoMerge pull request #2146 from ceph/reboot-7min
David Galloway [Tue, 17 Feb 2026 00:16:53 +0000 (19:16 -0500)]
Merge pull request #2146 from ceph/reboot-7min

fog: Try ipmi power-cycle if stuck in a reimage reboot hang

2 months agoMerge pull request #2147 from batrick/kernel-fix
David Galloway [Tue, 17 Feb 2026 00:16:28 +0000 (19:16 -0500)]
Merge pull request #2147 from batrick/kernel-fix

teuthology/task/kernel: always hard reboot

2 months agoteuthology/task/kernel: always hard reboot kernel-fix 2147/head
Patrick Donnelly [Thu, 12 Feb 2026 00:33:45 +0000 (19:33 -0500)]
teuthology/task/kernel: always hard reboot

On the new trial machines, the `shutdown -r now` routine
is hanging somewhere before reboot. The cause of this is unknown; it's
been very resistant to debugging. So, just sync file systems, remount
RO, and then do a hard reboot.

Fixes: https://tracker.ceph.com/issues/74717
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2 months agofog: Try ipmitool off&on if stuck in a reimage reboot hang 2146/head
David Galloway [Thu, 12 Feb 2026 20:44:05 +0000 (15:44 -0500)]
fog: Try ipmitool off&on if stuck in a reimage reboot hang

Fixes: https://tracker.ceph.com/issues/74717
Signed-off-by: David Galloway <david.galloway@ibm.com>
2 months agoMerge pull request #2143 from kshtsk/wip-supervisor-kill-job
Zack Cerza [Wed, 11 Feb 2026 16:21:15 +0000 (09:21 -0700)]
Merge pull request #2143 from kshtsk/wip-supervisor-kill-job

dispatcher/supervisor: fix kill_job call

2 months agodispatcher/test: fix KeyError: 'owner' 2143/head
Kyr Shatskyy [Wed, 11 Feb 2026 14:15:21 +0000 (15:15 +0100)]
dispatcher/test: fix KeyError: 'owner'

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
2 months agolock/ops: missing f-prefix in f-string
Kyr Shatskyy [Wed, 11 Feb 2026 13:30:32 +0000 (14:30 +0100)]
lock/ops: missing f-prefix in f-string

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
2 months agokill: let pids be None when call kill_process
Kyr Shatskyy [Wed, 11 Feb 2026 13:26:09 +0000 (14:26 +0100)]
kill: let pids be None when call kill_process

If we run teuthology-kill with -j option it may not know
about job pid, let it pass None as pids argument for
kill_process so it can figure it out on its own.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
2 months agodispatcher/supervisor: fix kill_job call
Kyr Shatskyy [Wed, 11 Feb 2026 12:47:10 +0000 (13:47 +0100)]
dispatcher/supervisor: fix kill_job call

Fixes: bf0242c5599c861d855d13925a565cf437b7a41b
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
2 months agoworkflows/dev_container: Fix multi-arch images 2141/head
Zack Cerza [Mon, 9 Feb 2026 18:33:40 +0000 (11:33 -0700)]
workflows/dev_container: Fix multi-arch images

Signed-off-by: Zack Cerza <zack@cerza.org>
2 months agoMerge pull request #2113 from batrick/egrep 1.2.3
kyr [Sat, 24 Jan 2026 10:43:09 +0000 (11:43 +0100)]
Merge pull request #2113 from batrick/egrep

teuthology/task: use grep switch instead of egrep

2 months agoMerge pull request #2137 from kshtsk/wip-rocky-9.7
David Galloway [Fri, 23 Jan 2026 19:32:16 +0000 (14:32 -0500)]
Merge pull request #2137 from kshtsk/wip-rocky-9.7

orchestra: rocky 9.6 and 10.0 are gone

2 months agoorchestra: rocky 9.6 and 10.0 are gone 2137/head
Kyr Shatskyy [Wed, 21 Jan 2026 20:37:14 +0000 (21:37 +0100)]
orchestra: rocky 9.6 and 10.0 are gone

Welcome Rocky and Alma Linux 9.7 and 10.1

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
2 months agoMerge pull request #2134 from ceph/kill-pid
David Galloway [Wed, 21 Jan 2026 13:45:18 +0000 (08:45 -0500)]
Merge pull request #2134 from ceph/kill-pid

run: Send PID to paddles

2 months agoMerge pull request #2036 from ceph/reimage-unlock
Dan Mick [Tue, 20 Jan 2026 23:41:41 +0000 (15:41 -0800)]
Merge pull request #2036 from ceph/reimage-unlock

lock.ops.unlock_one_safe: Invert run-match logic

2 months agorun: Send PID to paddles 2134/head
Zack Cerza [Tue, 20 Jan 2026 22:54:27 +0000 (15:54 -0700)]
run: Send PID to paddles

This is a follow-up to bf0242c5599c861d855d13925a565cf437b7a41b

Signed-off-by: Zack Cerza <zack@cerza.org>
3 months agoMerge pull request #2131 from ceph/fis-lis
David Galloway [Fri, 16 Jan 2026 00:57:55 +0000 (19:57 -0500)]
Merge pull request #2131 from ceph/fis-lis

supervisor: Avoid prematurely pushing some jobs

3 months agoMerge pull request #2130 from ceph/kill-multi-supervisor
David Galloway [Fri, 16 Jan 2026 00:57:43 +0000 (19:57 -0500)]
Merge pull request #2130 from ceph/kill-multi-supervisor

kill: Handle supervisor procs when killing runs

3 months agosupervisor: Avoid prematurely pushing some jobs fis-lis 2131/head
Zack Cerza [Fri, 16 Jan 2026 00:38:35 +0000 (17:38 -0700)]
supervisor: Avoid prematurely pushing some jobs

This is a follow-up to ff615aae541032c647e78d3959d368f595c93e31; it caused us to
submit the first-in-suite and last-in-suite jobs to paddles. Those present has
having 'unknown' status, which will be confusing to users.

Signed-off-by: Zack Cerza <zack@cerza.org>
3 months agokill: Handle supervisor procs when killing runs kill-multi-supervisor 2130/head
Zack Cerza [Thu, 15 Jan 2026 19:20:16 +0000 (12:20 -0700)]
kill: Handle supervisor procs when killing runs

This is a follow-up to ff615aae541032c647e78d3959d368f595c93e31, which only
handled killing individual jobs. Since we're using the results server for all
run and job metadata, we can drop all mentions of the archive. This change
is necessary since we've restricted access to the archive from the teuthology
machine for normal users, to avoid resource contention.

Signed-off-by: Zack Cerza <zack@cerza.org>
3 months agoMerge pull request #2129 from ceph/maas-fixes
Zack Cerza [Thu, 15 Jan 2026 17:58:08 +0000 (10:58 -0700)]
Merge pull request #2129 from ceph/maas-fixes

maas: handle nodes with unexpected status

3 months agomaas: handle nodes with unexpected status 2129/head
Zack Cerza [Thu, 15 Jan 2026 01:04:58 +0000 (18:04 -0700)]
maas: handle nodes with unexpected status

Often times, we just need to release and re-allocate before provisioning.

Signed-off-by: Zack Cerza <zack@cerza.org>
3 months agoMerge pull request #2117 from deepssin/uefi_fix
Dan Mick [Wed, 14 Jan 2026 17:28:20 +0000 (09:28 -0800)]
Merge pull request #2117 from deepssin/uefi_fix

Fix kernel boot on UEFI systems

3 months agoFix kernel boot on UEFI systems 2117/head
deepssin [Thu, 18 Dec 2025 13:52:21 +0000 (13:52 +0000)]
Fix kernel boot on UEFI systems

- Add _update_uefi_grub_config() to sync UEFI GRUB config
- Fixes issue where systems reboot into old kernel on UEFI

Signed-off-by: deepssin <deepssin@redhat.com>
3 months agoMerge pull request #2105 from vamahaja/maas-integration
Zack Cerza [Tue, 13 Jan 2026 21:54:36 +0000 (14:54 -0700)]
Merge pull request #2105 from vamahaja/maas-integration

[Lib] Add MAAS (Metal-as-a-Service) provisioner

3 months agomaas: Correct image names 2105/head
Zack Cerza [Thu, 8 Jan 2026 20:27:24 +0000 (13:27 -0700)]
maas: Correct image names

Signed-off-by: Zack Cerza <zack@cerza.org>
3 months agoprovisioner: Add maas provisioner
Ceph Teuthology [Tue, 4 Nov 2025 16:19:08 +0000 (21:49 +0530)]
provisioner: Add maas provisioner

Signed-off-by: Vaibhav Mahajan <vaibhavsm04@gmail.com>
3 months agoMerge pull request #2126 from kshtsk/wip-defaults-https
kyr [Tue, 13 Jan 2026 13:47:55 +0000 (14:47 +0100)]
Merge pull request #2126 from kshtsk/wip-defaults-https

config: use https by default

3 months agoconfig: use https by default 2126/head
Kyrylo Shatskyy [Tue, 13 Jan 2026 11:41:00 +0000 (12:41 +0100)]
config: use https by default

There is no permission to use 80 port (http) for new lab's resources
for security reasons, so defaults to use https now.

Signed-off-by: Kyrylo Shatskyy <kyrylo.shatskyy@clyso.com>
3 months agoMerge pull request #2125 from ceph/kill-supervisor
David Galloway [Fri, 9 Jan 2026 21:06:45 +0000 (16:06 -0500)]
Merge pull request #2125 from ceph/kill-supervisor

kill: Look for, and kill, the supervisor process

3 months agokill: Look for, and kill, the supervisor process 2125/head
Zack Cerza [Fri, 9 Jan 2026 00:36:30 +0000 (17:36 -0700)]
kill: Look for, and kill, the supervisor process

Useful if one wants to kill a job that is still waiting for its nodes to be
provisioned.

Signed-off-by: Zack Cerza <zack@cerza.org>
4 months agoMerge pull request #2116 from tchaikov/wip-drop-distutils
Kefu Chai [Fri, 19 Dec 2025 01:36:29 +0000 (09:36 +0800)]
Merge pull request #2116 from tchaikov/wip-drop-distutils

teuthology: remove dependency on distutils

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 months agoteuthology/task: use grep switch instead of egrep 2113/head
Patrick Donnelly [Mon, 1 Dec 2025 15:29:52 +0000 (10:29 -0500)]
teuthology/task: use grep switch instead of egrep

Resolves:

    2025-11-30T03:49:48.951 DEBUG:teuthology.task.internal.syslog:Checking ubuntu@smithi155.front.sepia.ceph.com
    2025-11-30T03:49:48.952 DEBUG:teuthology.orchestra.run.smithi155:> egrep --binary-files=text '\bBUG\b|\bINFO\b|\bDEADLOCK\b' /home/ubuntu/cephtest/archive/syslog/kern.log | grep -v 'task .* blocked for more than .* seconds' | grep -v 'lockdep is turned off' | grep -v 'trying to register non-static key' | grep -v 'DEBUG: fsize' | grep -v CRON | grep -v 'BUG: bad unlock balance detected' | grep -v 'inconsistent lock state' | grep -v '*** DEADLOCK ***' | grep -v 'INFO: possible irq lock inversion dependency detected' | grep -v 'INFO: NMI handler (perf_event_nmi_handler) took too long to run' | grep -v 'INFO: recovery required on readonly' | grep -v 'ceph-create-keys: INFO' | grep -v INFO:ceph-create-keys | grep -v 'Loaded datasource DataSourceOpenStack' | grep -v 'container-storage-setup: INFO: Volume group backing root filesystem could not be determined' | egrep -v '\bsalt-master\b|\bsalt-minion\b|\bsalt-api\b' | grep -v ceph-crash | egrep -v '\btcmu-runner\b.*\bINFO\b' | head -n 1
    2025-11-30T03:49:48.983 INFO:teuthology.orchestra.run.smithi155.stderr:egrep: warning: egrep is obsolescent; using grep -E
    2025-11-30T03:49:48.983 INFO:teuthology.orchestra.run.smithi155.stderr:egrep: warning: egrep is obsolescent; using grep -E
    2025-11-30T03:49:48.983 INFO:teuthology.orchestra.run.smithi155.stderr:egrep: warning: egrep is obsolescent; using grep -E

Fixes: https://tracker.ceph.com/issues/74259
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 months agoteuthology: implement strtobool and use it 2116/head
Kefu Chai [Thu, 18 Dec 2025 09:27:35 +0000 (17:27 +0800)]
teuthology: implement strtobool and use it

The distutils module was deprecated in Python 3.10 and removed in
Python 3.12. This commit replaces the deprecated distutils.utils.strtobool
imports with strtobool in teuthology.util module.

Changes:
- Add strtobool.py to teuthology/util
- Replace distutils.util.strtobool with
teuthology.util.strtobool.strtobool

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
4 months agoteuthology/task/install: implement LooseVersion and use it
Kefu Chai [Thu, 18 Dec 2025 09:05:58 +0000 (17:05 +0800)]
teuthology/task/install: implement LooseVersion and use it

The distutils module was deprecated in Python 3.10 and removed in
Python 3.12. This commit replaces the deprecated distutils.version
imports with the a homebrew LooseVersion implementation.

Changes:
- implement LooseVersion which is able to parse versions like
  '10.2.2-63-g8542898-1trusty'.
- Replace distutils.version.LooseVersion with
  teuthology.util.version.LooseVersion packaging.version.LooseVersion

Fixes:
```
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 81, in <module>
    from teuthology.orchestra.remote import RemoteShell
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/orchestra/remote.py", line 6, in <module>
    import teuthology.lock.util
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/lock/util.py", line 6, in <module>
    import teuthology.provision.downburst
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/provision/__init__.py", line 4, in <module>
    import teuthology.exporter
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/exporter.py", line 11, in <module>
    import teuthology.dispatcher
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/dispatcher/__init__.py", line 22, in <module>
    from teuthology.dispatcher import supervisor
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/dispatcher/supervisor.py", line 18, in <module>
    from teuthology.task import internal
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/task/internal/__init__.py", line 27, in <module>
    from teuthology.task.internal.redhat import (setup_cdn_repo, setup_base_repo,            # noqa
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/task/internal/redhat.py", line 13, in <module>
    from teuthology.task.install.redhat import set_deb_repo
  File "/tmp/tmp.xwxq8FOScf/teuthology/teuthology/task/install/__init__.py", line 14, in <module>
    from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'
```

Related: https://peps.python.org/pep-0632/

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
4 months agolock.ops.unlock_one_safe: Invert run-match logic 2036/head
Zack Cerza [Wed, 19 Mar 2025 18:35:11 +0000 (12:35 -0600)]
lock.ops.unlock_one_safe: Invert run-match logic

When unlock_one_safe is called with run_name, the caller means to express
"unlock this node if it belongs to this run".
When it is called with run_name and job_id, it means "unlock this node if it
belongs to this job in this run".
We had inverted the logic, causing leaks on reimage failures.

Signed-off-by: Zack Cerza <zack@cerza.org>
4 months agoMerge pull request #2110 from Matan-B/wip-matanb-pexec-logs
Matan Breizman [Thu, 20 Nov 2025 08:48:57 +0000 (10:48 +0200)]
Merge pull request #2110 from Matan-B/wip-matanb-pexec-logs

teuthology/task/pexec.py: add logs to command executed

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
5 months agoteuthology/task/pexec.py: add logs to command executed wip-matanb-pexec-logs 2110/head
Matan Breizman [Wed, 19 Nov 2025 13:49:36 +0000 (15:49 +0200)]
teuthology/task/pexec.py: add logs to command executed

The current output by pexec is:
```
INFO:teuthology.run_tasks:Running task pexec...
INFO:teuthology.task.pexec:Executing custom commands...
INFO:teuthology.task.pexec:Running commands on host ubuntu@smithi012.front.sepia.ceph.com
DEBUG:teuthology.orchestra.run.smithi012:> TESTDIR=/home/ubuntu/cephtest bash -s
```

The output should include the acutal command executed, similar to
exec.py:

```
INFO:teuthology.run_tasks:Running task exec...
INFO:teuthology.task.exec:Executing custom commands...
INFO:teuthology.task.exec:Running commands on role client.0 host ubuntu@smithi168.front.sepia.ceph.com
DEBUG:teuthology.orchestra.run.smithi168:> sudo TESTDIR=/home/ubuntu/cephtest bash -c 'sudo ceph osd pool create low_tier 4'
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
5 months agoMerge pull request #2107 from deepssin/fix-gevent-loopexit-dispatcher-crash
Zack Cerza [Tue, 18 Nov 2025 00:04:46 +0000 (17:04 -0700)]
Merge pull request #2107 from deepssin/fix-gevent-loopexit-dispatcher-crash

Fix dispatcher crash on gevent LoopExit exceptions