Ionut Balutoiu [Fri, 17 Feb 2023 11:38:43 +0000 (13:38 +0200)]
utils: fix ssh_exec function
We need to use `return` inside the function to properly return the
exit code of the function. Using `exit` will abruptly exit the
entire script, which is not what we want.
* Rely on the default `$ErrorActionPreference` value (which is `Continue`).
* Add new function `SanitizeName` that it's used to sanitize the name of the log files.
The function code existed before, but it was duplicated in the script.
* General PowerShell code cleanup.
Lucian Petrut [Wed, 15 Feb 2023 14:41:26 +0000 (16:41 +0200)]
windows: fix log collection
The "Get-WinEvent" command used to retrieve Windows event log
messages can fail if the specified log has no entries.
We're using the "SilentlyContinue" action to avoid erroring out
in such cases.
However, the script still terminates abruptly while collecting
logs. For this reason, we'll use the "Ignore" error action instead.
We'd rather not have test failures just because we failed to
retrieve some Windows event log entries.
Unlike "SilentlyContinue", "Ignore" doesn't populate the global
$Error variable, which may be used when running the script
remotely.
While at it, we're adding some log messages at the end of the
"run_tests" and "collect-event-logs.ps1" scripts.
Stefan Chivu [Wed, 15 Feb 2023 08:42:08 +0000 (08:42 +0000)]
ceph-windows: Fixed txt event log dump
The event log collection script was throwing an error if no event
logs could be found for the filters applied. Therefore, the
Get-WinEvent call in DumpEventLogTxt has been modified to silently
continue if such case arises.
Signed-off-by: Stefan Chivu <schivu@cloudbasesolutions.com>
Lucian Petrut [Wed, 8 Feb 2023 11:19:12 +0000 (13:19 +0200)]
increase vm memory for windows test jobs
Windows test jobs use two vms: a windows vm that takes 8gb of ram
and a linux one that currently uses 32gb of ram.
We're using memstore with 5gb per osd. It seems that the linux vm
is running out of memory, which is why the OSDs can get
terminated while running the tests.
We'll go ahead and increase the linux vm memory to 64gb.
Ionut Balutoiu [Fri, 3 Feb 2023 08:44:40 +0000 (10:44 +0200)]
ceph-windows: Build Ceph on Windows inside libvirt VM
Make sure that the Ceph on Windows build is done inside an Ubuntu
libvirt VM. This ensures that the build is done on a clean environment
on each job run.
After the Windows build is done, the Ubuntu VM is rebuilt to ensure
that Ceph vstart will have a clean environment as well.
Stefan Chivu [Fri, 27 Jan 2023 14:08:54 +0000 (16:08 +0200)]
ceph-windows: Collect more artifacts
This commit changes the run_tests script in order to include more
logs and overall information as build artifacts, such as Windows
client logs, Windows event logs, the ceph.conf on the client, the
wnbd version and the status of the ceph cluster.
Signed-off-by: Stefan Chivu <schivu@cloudbasesolutions.com>
Stefan Chivu [Tue, 31 Jan 2023 12:08:02 +0000 (14:08 +0200)]
ceph-windows: Added script for collecting Windows event logs
In order to collect more information for the build artifacts,
the collect-event-logs.ps1 script will be run in order to extract
the event logs from the Windows client machine.
It will dump all the event logs as evtx and then convert them to
txt in order to be accessible on all platforms.
If the -IncludeEvtxFiles flag is used, the evtx files can be kept.
By default they are deleted.
Also, if the -CleanupEventLog flag is used, then the machine's
events will get cleared after the dump. By default they are kept.
The dumped event log files can be found in the directory sent as
parameter using the mandatory -LogDirectory parameter.
Signed-off-by: Stefan Chivu <schivu@cloudbasesolutions.com>
Lucian Petrut [Wed, 25 Jan 2023 12:52:45 +0000 (14:52 +0200)]
ceph-windows: update test timeouts
The Python rbd-wnbd tests time out on stable branches. The reason
is that if the test script is missing, we're fetching it from
the main branch and then run the tests individually.
The issue is that while on the main branch the entire suite uses
a 30m timeout, on stable branches we're excuting each individual
test with a 5m timeout, which isn't enough for some of the tests
(e.g. the FIO one).
For this reason, we're going to increase the timeout.
While at it, we're pinning the commit id when fetching the test
script on older branches. That will allow us to move it or
refactor it.
Due to a regression introduced in tox 4.3.1, we have to pin
the version used in the CI in order to avoid the following error:
```
tox.report.HandledError: replace failed in centos8-filestore-create.commands with MatchRecursionError('circular chain between set env VAGRANT_UP_FLAGS, CEPH_DEV_BRANCH, CEPH_DEV_SHA1')
```
Due to a regression introduced in tox 4.3.1, multiple environment variable substitution
is broken.
```
tox.report.HandledError: replace failed in centos-non_container-update.commands with MatchRecursionError('circular chain between set env DOCKER_HUB_USERNAME, DOCKER_HUB_PASSWORD, INVENTORY')
```
Let's pin to the last tox stable version where it used to work fine (4.2.8).
Lucian Petrut [Thu, 24 Nov 2022 08:01:52 +0000 (10:01 +0200)]
ceph-windows: re-enable rbd-wnbd stamp test
We had to disable one of the rbd-wnbd tests while investigating
the reason why disks started coming up as read-only.
This was actually caused by the wnbd bus type, which changed from
virtual to SAS. The default Windows policy (offlineShared) doesn't
automatically bring online such disks, SAS being considered a
shared bus.
While at it, we're also enabling the new *Fs* tests, which use
a Windows partition instead of the raw block devices.
Ionut Balutoiu [Mon, 31 Oct 2022 16:16:56 +0000 (18:16 +0200)]
ceph-windows: Add more Windows testing
Run `test_rbd_wnbd.py` script from the upstream Ceph QA Windows
workunit scripts as part of the `script/ceph-windows/run_tests`.
The new test cases don't take a lot of time to execute (unless
something is wrong with Ceph on Windows or Linux). So, we won't
have any noticeable jobs' execute time penalty.
This adds more Windows tests to the Jenkins jobs:
* `ceph-windows-test`
* `ceph-windows-pull-requests`
* Bump Python to `3.11`.
* Add `fio` as part of Windows CI image setup.
* Add `prettytable` pip dependency. We'll use this for the Python
`test_rbd_wnbd.py` script.
```
ERROR! the playbook: /home/jenkins-build/build/workspace/ceph-volume-prs-lvm-centos8-bluestore-dmcrypt/tests/functional/collect-logs.yml could not be found
```
Since `collect-logs.yml` is maintained in ceph-ansible repository,
when `collect_ceph_logs` is called from a job different from
ceph-ansible, the path is incorrect.
The idea is to make ceph-volume jobs override this path.
update ceph-volume nightly jobs.
drop nautilus testing as it's EOL for a while.
drop testing against el7
drop testing against xenial (replace with focal)
test against quincy and pacific
The previous commit installed the gperftools-devel package and set the
cmake ALLOCATION variable to tcmalloc but this isn't used at all during
the RPM package build via mock.
We also need those changes in the RPM spec file.
This is a temporary solution until the upstream nfs-ganesha RPM spec file
will support such change.