]>
git.apps.os.sepia.ceph.com Git - teuthology.git/log
Zack Cerza [Wed, 27 Dec 2023 17:44:55 +0000 (10:44 -0700)]
Merge pull request #1906 from ceph/kill-unbound
kill.kill_processes: Fix possibly-unbound variables
Zack Cerza [Wed, 20 Dec 2023 23:19:10 +0000 (16:19 -0700)]
kill.kill_processes: Fix possibly-unbound variables
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 20 Dec 2023 22:41:10 +0000 (15:41 -0700)]
Merge pull request #1903 from ceph/wip-package-queries
suite: Improve package query caching
Zack Cerza [Wed, 20 Dec 2023 22:39:42 +0000 (15:39 -0700)]
Merge pull request #1900 from ceph/systemd
Add systemd units for exporter and dispatcher
Zack Cerza [Wed, 29 Nov 2023 23:34:51 +0000 (16:34 -0700)]
run.util.find_git_parents: Drop refresh()
This takes a long time, and can time out. The mirror is updated every ten
minutes automatically.
Signed-off-by: Zack Cerza <zack@redhat.com>
kyr [Sun, 10 Dec 2023 17:25:23 +0000 (18:25 +0100)]
Merge pull request #1896 from ceph/dependabot/pip/urllib3-1.26.18
build(deps): bump urllib3 from 1.26.6 to 1.26.18
dependabot[bot] [Sun, 10 Dec 2023 16:21:41 +0000 (16:21 +0000)]
build(deps): bump urllib3 from 1.26.6 to 1.26.18
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.6 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.6...1.26.18)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Zack Cerza [Wed, 29 Nov 2023 18:55:28 +0000 (11:55 -0700)]
run: Fix some pyright errors
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 29 Nov 2023 00:27:04 +0000 (17:27 -0700)]
orchestra.opsys: Add some newer OS codenames
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 29 Nov 2023 00:25:13 +0000 (17:25 -0700)]
tests: Remove some gitbuilder-related tests
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 22 Nov 2023 01:50:01 +0000 (18:50 -0700)]
Make logs slightly quieter during scheduling
Particularly in non-verbose mode.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 22 Nov 2023 01:54:08 +0000 (18:54 -0700)]
repo_utils.ls_remote: Memoize
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 22 Nov 2023 01:25:56 +0000 (18:25 -0700)]
suite: Improve package query caching
We had our own "system" for caching, but it had the unfortunate characteristic
of being a big bowl of spaghetti. While eating said pasta I also noticed we
had two competing "distro defaults" concepts - so that let me delete even more
code. Yum!
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 29 Nov 2023 17:23:58 +0000 (10:23 -0700)]
Merge pull request #1899 from ceph/kill-proc-perms
Dan Mick [Tue, 28 Nov 2023 23:28:51 +0000 (15:28 -0800)]
Merge pull request #1902 from ceph/dispatcher-quiet
dispatcher: Dont spam the journal
Zack Cerza [Mon, 27 Nov 2023 23:25:30 +0000 (16:25 -0700)]
Merge pull request #1792 from VallariAg/unittest-xml-scanner
orch/run: Add unit test xml scanner
Vallari Agrawal [Fri, 27 Oct 2023 08:58:18 +0000 (14:28 +0530)]
util/scanner: add UnitTestScanner.num_of_total_failures
In UnitTestScanner's final error message, add total count of failures
before the first error occurance, like "(total x failed) <message>".
Another minor change: add "..." if the failure reason is more than 200 chars.
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
Vallari Agrawal [Tue, 19 Sep 2023 15:04:28 +0000 (20:34 +0530)]
add utils/tests/test_scanner.py
and test_run_unit_test in test_remote.py
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
Vallari Agrawal [Sat, 1 Oct 2022 11:16:52 +0000 (16:46 +0530)]
add Scanner, UnitTestScanner, ValgrindScanner
1. add 'run_unit_test' to Remote
2. create util/scanner.py
3. new exception: UnitTestError
4. add `lxml` dependency in setup.cfg
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
Zack Cerza [Fri, 10 Nov 2023 22:24:21 +0000 (15:24 -0700)]
kill: Don't unlock nodes if killing procs fails
... so that we don't unlock nodes while their jobs are running.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 17 Nov 2023 20:48:38 +0000 (13:48 -0700)]
supervisor: Drop job output
It gets logged to its own file in the job archive.
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Fri, 17 Nov 2023 20:43:33 +0000 (12:43 -0800)]
Merge pull request #1901 from ceph/fog-debug-quieter
fog: Drop request debug logging
Zack Cerza [Fri, 17 Nov 2023 20:34:21 +0000 (13:34 -0700)]
dispatcher: Drop supervisor output
It gets logged to its own file in the job archive.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 17 Nov 2023 20:22:58 +0000 (13:22 -0700)]
fog: Drop request debug logging
It's too noisy.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 15 Nov 2023 20:03:25 +0000 (13:03 -0700)]
Add systemd units for exporter and dispatcher
These are copies of what is currently in use in sepia.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 26 Oct 2023 19:05:51 +0000 (13:05 -0600)]
Merge pull request #1892 from ceph/devstack-simplified
Add containers/teuthology-dev
Zack Cerza [Tue, 26 Sep 2023 21:32:25 +0000 (14:32 -0700)]
Add containers/teuthology-dev
This is nearly identical to docs/docker-compose/teuthology, but with
some changes to better work with ceph-devstack. The bits in
docs/docker-compose should be able to be adapted easily to work with
this container.
Signed-off-by: Zack Cerza <zack@redhat.com>
Patrick Donnelly [Tue, 17 Oct 2023 19:38:56 +0000 (15:38 -0400)]
Merge PR #1895 into main
* refs/pull/1895/head:
install/bin/stdin-killer: macOs (Darwin) compatibility
Reviewed-by: Zack Cerza <zack@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zack Cerza [Tue, 17 Oct 2023 16:16:46 +0000 (10:16 -0600)]
Merge pull request #1894 from VallariAg/fix-readthedocs-builds
fix readthedocs PR builds
Vallari Agrawal [Tue, 17 Oct 2023 09:16:42 +0000 (14:46 +0530)]
readthedocs: fix 'The configuration key "build.image" is deprecated'
builds are failing because support for deprecated “build.image” is
fully removed by readthedocs, need to use "build.os" instead.
ref: https://blog.readthedocs.com/use-build-os-config/
error: https://readthedocs.org/projects/teuthology/builds/
22250705 /
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
Leonid Usov [Tue, 17 Oct 2023 10:37:27 +0000 (13:37 +0300)]
install/bin/stdin-killer: macOs (Darwin) compatibility
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Josh Durgin [Wed, 11 Oct 2023 16:42:32 +0000 (09:42 -0700)]
Merge pull request #1887 from ceph/paramiko-eoferror
orchestra: Tolerate EOFError during connect
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Zack Cerza [Tue, 5 Sep 2023 23:29:55 +0000 (17:29 -0600)]
Merge pull request #1888 from ceph/keyscan-tweak
Zack Cerza [Tue, 5 Sep 2023 18:42:46 +0000 (12:42 -0600)]
misc._ssh_keyscan: Sort keys before returning any
ssh-keyscan's output is unsorted, so this function wasn't deterministic.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 31 Aug 2023 18:10:09 +0000 (11:10 -0700)]
orchestra: Move connection exception handling
... to inside the retry loop. Also, add an increment to the safe_while
instance we use.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 31 Aug 2023 17:34:41 +0000 (10:34 -0700)]
orchestra: Treat EOFError as SSHException
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 30 Aug 2023 17:34:16 +0000 (11:34 -0600)]
Merge pull request #1886 from ceph/update-paramiko
Kamoltat (Junior) Sirivadhna [Wed, 30 Aug 2023 15:59:27 +0000 (11:59 -0400)]
Merge pull request #1884 from kamoltat/wip-ksirivad-fix-62445
teuthology/scrape: Fix bad backtrace parsing in Teuthology.log
Reviewed-by Zack Cerza <zcerza@redhat.com>
Kamoltat (Junior) Sirivadhna [Wed, 30 Aug 2023 15:44:58 +0000 (11:44 -0400)]
Merge pull request #1885 from ceph/fix-docs-build
Fix docs build
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Zack Cerza [Mon, 28 Aug 2023 20:07:19 +0000 (14:07 -0600)]
Update paramiko
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 23 Aug 2023 19:24:56 +0000 (13:24 -0600)]
tox: Avoid buggy sphinx versions
See https://github.com/ceph/teuthology/pull/1884
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 23 Aug 2023 19:21:43 +0000 (13:21 -0600)]
setup.cfg: Drop license_file
It's deprecated in favor of `license_files`, but the default value is
sufficient.
Signed-off-by: Zack Cerza <zack@redhat.com>
Kamoltat Sirivadhna [Thu, 17 Aug 2023 16:28:00 +0000 (12:28 -0400)]
teuthology/scrape: Fix bad backtrace parsing in Teuthology.log
Problem:
- confusing warning message stating that
the back trace is malformed
- We kept adding to the backtrace buffer
even when we exceeded the `MAX_BT_LINES`
Solution:
- Correct the warning message to be
"Ignoring backtrace that exceeds MAX_BT_LINES"
- reset the buffer once we exceeded MAX_BT_LINES
- Added some cases where we detect start/end of back trace.
Fixes:https://tracker.ceph.com/issues/62445
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Dan Mick [Tue, 15 Aug 2023 19:40:36 +0000 (12:40 -0700)]
Merge pull request #1883 from ceph/nuke-desc-typeerror
nuke: Avoid a TypeError w/ null node description
Zack Cerza [Tue, 15 Aug 2023 18:05:38 +0000 (12:05 -0600)]
nuke: Avoid a TypeError w/ null node description
This avoids a `TypeError: argument of type 'NoneType' is not iterable`
when nuking a node whose description is None.
ex: https://sentry.ceph.com/share/issue/
91172146663f4c71a6cbfe43725b2e07 /
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Mon, 14 Aug 2023 20:34:08 +0000 (13:34 -0700)]
Merge pull request #1882 from ceph/sentry-reimage-taskname
supervisor.reimage(): Improve Sentry reporting
Zack Cerza [Mon, 14 Aug 2023 18:48:47 +0000 (12:48 -0600)]
supervisor.reimage(): Improve Sentry reporting
Set the `task` tag value to 'reimage' when reporting reimage failures to
Sentry, to make searching for them in its UI easier.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 4 Aug 2023 17:56:20 +0000 (11:56 -0600)]
Merge pull request #1881 from ceph/stdin-killer-setpgrp
stdin-killer: do not setpgrp is already leader
Patrick Donnelly [Fri, 4 Aug 2023 13:17:28 +0000 (09:17 -0400)]
stdin-killer: do not setpgrp if already leader
Fixes failure like:
2023-08-03T19:40:10.942 INFO:teuthology.orchestra.run.smithi100.stderr:Traceback (most recent call last):
2023-08-03T19:40:10.942 INFO:teuthology.orchestra.run.smithi100.stderr: File "/usr/bin/stdin-killer", line 213, in <module>
2023-08-03T19:40:10.943 INFO:teuthology.orchestra.run.smithi100.stderr: os.setpgrp()
2023-08-03T19:40:10.943 INFO:teuthology.orchestra.run.smithi100.stderr:PermissionError: [Errno 1] Operation not permitted
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Zack Cerza [Wed, 2 Aug 2023 20:25:13 +0000 (14:25 -0600)]
Merge pull request #1880 from ceph/wip-62286
Zack Cerza [Wed, 2 Aug 2023 18:03:21 +0000 (12:03 -0600)]
PhysicalConsole: Tolerate invalid UTF-8 characters
... in pexpect.spawn() calls.
Fixes: https://tracker.ceph.com/issues/62286
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 2 Aug 2023 17:04:21 +0000 (11:04 -0600)]
PhysicalConsole.check_status(): Use log.exception
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 2 Aug 2023 16:33:38 +0000 (10:33 -0600)]
Merge pull request #1845 from NitzanMordhai/wip-nitzan-correct-typo-osd-default-pool-size
Dan Mick [Tue, 1 Aug 2023 00:51:49 +0000 (17:51 -0700)]
Merge pull request #1877 from ceph/sentry-ae
supervisor: Fix an AttributeError in reimage()
Zack Cerza [Mon, 31 Jul 2023 23:31:43 +0000 (17:31 -0600)]
supervisor: Fix an AttributeError in reimage()
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Fri, 28 Jul 2023 23:08:05 +0000 (16:08 -0700)]
Merge pull request #1859 from ceph/quiet-urllib3
Turn down logging for urllib3.util.retry
Dan Mick [Fri, 28 Jul 2023 21:52:06 +0000 (14:52 -0700)]
Merge pull request #1875 from ceph/reimage-errs-sentry
Report reimage failures to Sentry
Dan Mick [Fri, 28 Jul 2023 21:02:07 +0000 (14:02 -0700)]
Merge pull request #1876 from ceph/afa-sort
task.ansible.FailureAnalyzer: Sort failure items
Zack Cerza [Fri, 28 Jul 2023 19:22:52 +0000 (13:22 -0600)]
task.ansible.FailureAnalyzer: Sort failure items
To reduce unecessary duplication in e.g. Sentry.
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Thu, 27 Jul 2023 21:56:28 +0000 (14:56 -0700)]
Merge pull request #1874 from ceph/fix-fog-timeout
fog: Fix a connection timeout bug
Dan Mick [Thu, 27 Jul 2023 21:55:36 +0000 (14:55 -0700)]
Merge pull request #1873 from ceph/console-log
orchestra.console: Scope loggers to shortname
Zack Cerza [Thu, 27 Jul 2023 17:49:16 +0000 (11:49 -0600)]
supervisor.reimage: Report failures to Sentry
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 27 Jul 2023 17:25:23 +0000 (11:25 -0600)]
Move Sentry reporting logic to utils
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 27 Jul 2023 17:42:46 +0000 (11:42 -0600)]
FOG._wait_for_ready(): Catch ConnectionErrors
Instead of just ConnectionResetErrors, which inherit from
ConnectionError
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 27 Jul 2023 17:41:11 +0000 (11:41 -0600)]
remote: Raise ConnectionError when appropriate
Instead of just Exception.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 27 Jul 2023 16:24:25 +0000 (10:24 -0600)]
orchestra.console: Scope loggers to shortname
This will make reading console debug logging easier.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Mon, 24 Jul 2023 19:38:11 +0000 (13:38 -0600)]
Merge pull request #1871 from ceph/sentry-ansible
Zack Cerza [Mon, 24 Jul 2023 17:22:51 +0000 (11:22 -0600)]
exceptions.AnsibleFailedError: Add fingerprint()
This will cause Sentry to group events by their failure reasons, rather
than lumping all AnsibleFailedErrors together
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Sat, 22 Jul 2023 03:42:43 +0000 (20:42 -0700)]
Merge pull request #1869 from dmick/wip-pexpect
orchestra/console: log output from pexpect commands
Dan Mick [Thu, 20 Jul 2023 02:45:11 +0000 (19:45 -0700)]
orchestra/console: log output from pexpect commands
in case anything weird is being noticed and communicated by
ipmitool, try to display anything it says
Signed-off-by: Dan Mick <dmick@redhat.com>
Dan Mick [Sat, 22 Jul 2023 00:35:03 +0000 (17:35 -0700)]
Merge pull request #1870 from ceph/pyyaml-fix
Pin PyYAML to fix CI breakage
Zack Cerza [Fri, 21 Jul 2023 19:53:29 +0000 (13:53 -0600)]
Update pip-tools
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 21 Jul 2023 19:49:52 +0000 (13:49 -0600)]
Exclude PyYAML 5.4.0,5.4.1
See https://github.com/yaml/pyyaml/issues/601
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 18 Jul 2023 17:48:15 +0000 (11:48 -0600)]
Merge pull request #1866 from ceph/aa-fix
Zack Cerza [Fri, 14 Jul 2023 20:58:51 +0000 (14:58 -0600)]
Merge pull request #1867 from ceph/keyscan-timout
Zack Cerza [Fri, 14 Jul 2023 20:27:32 +0000 (14:27 -0600)]
misc.ssh_keyscan: Always retry, and retry more
We started seeing reimage failures with errors like:
"teuthology.exceptions.MaxWhileTries: 'ssh_keyscan $host' reached
maximum tries (6) after waiting for 5 seconds"
Let's be quite a bit more generous.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 14 Jul 2023 18:01:35 +0000 (12:01 -0600)]
TestFailureAnalyzer: Add tests for dropped items
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 14 Jul 2023 17:50:24 +0000 (11:50 -0600)]
ansible.FailureAnalyzer: Drop malformed records
If host_obj is the wrong type, we won't be able to extract anything
useful from it. In these cases, we'll end up using the raw string as we
used to do.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 14 Jul 2023 17:38:54 +0000 (11:38 -0600)]
Ansible._handle_failure: YAMLErrors are special
Return to treating them differently, but also continue to catch other
exceptions here.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 14 Jul 2023 17:37:33 +0000 (11:37 -0600)]
ansible.FailureAnalyzer: Look for SSH errors
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 14 Jul 2023 17:17:19 +0000 (11:17 -0600)]
ansible.FailureAnalyzer: items -> values
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Thu, 13 Jul 2023 23:33:37 +0000 (16:33 -0700)]
Merge pull request #1865 from ceph/ansible-fail-tolerate-exceptions
Ansible._handle_failure: Catch all Exceptions
Zack Cerza [Thu, 13 Jul 2023 22:58:05 +0000 (16:58 -0600)]
Ansible._handle_failure: Catch all Exceptions
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 13 Jul 2023 00:58:34 +0000 (18:58 -0600)]
Merge pull request #1864 from ceph/analyze-ansible
Zack Cerza [Wed, 5 Jul 2023 21:12:05 +0000 (15:12 -0600)]
ansible: Try to summarize failure logs
The failure logs we capture are sometimes helpful, but are often too
long, too complex, and too noisy to understand. Sentry also struggles to
associate related failures because of the presence of unique data such
as timestamps and URLs.
While I don't see a quick and generic solution to this, there are
several common failure modes that can easily be summarized. This commit
begins that work by looking for errors caused by network outages.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 5 Jul 2023 23:07:32 +0000 (17:07 -0600)]
test_ansible: Use mock_open()
This provides a more complete interface than what we were constructing.
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Sat, 1 Jul 2023 00:05:18 +0000 (17:05 -0700)]
Merge pull request #1863 from ceph/fog-wfr-eoferror
FOG._wait_for_ready: Tolerate EOFError
Zack Cerza [Fri, 30 Jun 2023 22:11:14 +0000 (16:11 -0600)]
FOG._wait_for_ready: Tolerate EOFError
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 30 Jun 2023 19:19:08 +0000 (13:19 -0600)]
Merge pull request #1862 from ceph/retry-sentinel-connreset
Zack Cerza [Fri, 30 Jun 2023 18:29:52 +0000 (12:29 -0600)]
Remote.reconnect(): Use a default timeout of 30s
And rewrite with safe_while.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 29 Jun 2023 19:08:37 +0000 (13:08 -0600)]
FOG._wait_for_ready: Tolerate ConnectionResetError
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 29 Jun 2023 19:02:47 +0000 (13:02 -0600)]
contextutil: Remove leftover print statement
Looks like this was missed during PR submission/review in
8f8d05852
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 28 Jun 2023 17:56:02 +0000 (11:56 -0600)]
Merge pull request #1858 from ceph/exporter-restart
Dan Mick [Tue, 27 Jun 2023 22:10:08 +0000 (15:10 -0700)]
Merge pull request #1860 from ceph/disp-rc
dispatcher: Return the highest of the jobs' RCs
Zack Cerza [Tue, 11 Oct 2022 19:10:53 +0000 (13:10 -0600)]
dispatcher: Return the highest of the jobs' RCs
This is so that ceph-devstack can report job failures
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Mon, 26 Jun 2023 23:09:43 +0000 (16:09 -0700)]
Merge pull request #1855 from ceph/ssh-ux
Improve error message when there is no SSH key
Zack Cerza [Mon, 26 Jun 2023 22:54:04 +0000 (16:54 -0600)]
Turn down logging for urllib3.util.retry
This quiets messages like: "Converted retries value: 10 ->
Retry(total=10, connect=None, read=None, redirect=None, status=None)"
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 13 Jun 2023 23:49:48 +0000 (17:49 -0600)]
exporter: Restart every 24h
A design limitation of prometheus-client's multiprocessing mode is that
each process creates files to store its own metrics; the exporter then
has to read each file, even if the process which created it is dead.
This results in request latency growing over time, to the point of
multiple seconds when the file count gets into the thousands. This
eventually results in prometheus failing to fetch, leaving gaps in our
data.
We can work around this by restarting at a regular interval; 24h seems
like a fine place to start.
Signed-off-by: Zack Cerza <zack@redhat.com>
Dan Mick [Fri, 16 Jun 2023 22:48:04 +0000 (15:48 -0700)]
Merge pull request #1856 from ceph/fog-debug
fog: Add more debug logging
Zack Cerza [Fri, 16 Jun 2023 16:24:29 +0000 (10:24 -0600)]
FOG._wait_for_ready(): Use instance logger
Signed-off-by: Zack Cerza <zack@redhat.com>