Dan Mick [Sat, 9 Dec 2023 00:53:41 +0000 (16:53 -0800)]
ceph-setup, ceph-tag: pin ansible to < 9
This is a temporary hack until we have time to properly test
changing include to include_tasks, required by ansible 9 (include
has been deprecated for some time, and now it's officially gone)
Zack Cerza [Mon, 13 Nov 2023 19:12:39 +0000 (12:12 -0700)]
Remove teuthology-pull-requests
This job represents a small subset of what we test with GitHub Actions. It's
also unreliable, for reasons including - but not limited to - the fact that
it's not possible to request an agent with a particular OS version.
Dan Mick [Thu, 26 Oct 2023 00:20:44 +0000 (17:20 -0700)]
ceph-build-pull-requests: rather than ansible, pin ansible-core
Last change pinned ansible to 8.4.0 to workaround a bug. The
buggy code is actually in ansible-core, so pin that instead.
See https://github.com/ceph/ceph-build/pull/2173
This step is CPU intensive, but it's very hard to tell what was going
on the worker node at the time and what made it slow. As kernel builds
are pretty infrequent, just bump the timeout drastically.
Ilya Dryomov [Wed, 11 Oct 2023 10:57:15 +0000 (12:57 +0200)]
kernel: unbreak RPM builds
A bunch of kbuild changes went into 6.6-rc1. In particular, the
location of rpmbuild structure has changed and BuildRequires clauses
are now applied universally so we need dwarves to be installed.
ceph-dev-new-setup: enable debug options for dev builds
Note: it has been this way since at least 35e1a715. It's difficult to tell when
or even if ceph was ever properly built with debugging configurations for QA as
there are corresponding changes in ceph with the switch to cmake which makes
this challenging to evaluate.
It's likely that it was wrongly assumed that cmake would set the build type to
Debug because the ".git" directory would be present. This is not the case
because the "make-dist" script (executed below) creates a git tarball that is
used for the actual untar/build. See also:
https://github.com/ceph/ceph/pull/53800
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
windows: add Ceph dir to Windows Defender exclusion list
We're getting occasional false positives from Windows Defender.
For this reason, we'll add the Ceph dir to the exclusion list.
ERROR - Test exception: [WinError 225] Operation did not complete
successfully because the file contains a virus or potentially unwanted
software. Total exceptions: 22
Windows Defender Antivirus has taken action to protect this machine from
malware or other potentially unwanted software.
For more information please see the following:
https://go.microsoft.com/fwlink/?linkid=37020&name=Trojan:Win32/Bearfoos.A!m
l&threatid=2147731250&enterprise=0
Name: Trojan:Win32/Bearfoos.A!ml
ID: 2147731250
Severity: Severe
Category: Trojan
Path: file:_C:\ceph\rbd-wnbd.exe
Detection Origin: Local machine
Detection Type: Concrete
Detection Source: Real-Time Protection
User: NT AUTHORITY\SYSTEM
Process Name: C:\Program Files\Python311\python.exe
Action: Quarantine
Action Status: No additional actions required
Error Code: 0x80508023
Error description: The program could not find the malware and other
potentially unwanted software on this device.
Laura Flores [Wed, 16 Aug 2023 20:42:50 +0000 (15:42 -0500)]
scripts: lower concurrency for many-core machines
adami nodes have 96 processing units (nproc). Based on the old script
logic, we were compiling 90 build jobs with 96 processing units. This
combination (96/90) ends up causing many instances of memory
overconsumption on adami nodes.
With this new logic, we take into account that jobs take up more memory
than we expect. This will make it so adami nodes will compile 67 build
jobs on 96 processing units, which will hopefully avoid so many instances
of memory overconsumption. (Total memory on adami nodes is generally
~270036 MiB, so 270036 / 4000 = 67)
braggi nodes do not have this problem; they have 48 processing units.
It has been working so far to compile 48 build jobs with 48 processing
units on the braggi nodes, and this new logic will not change that or
any other node with nproc <= 50.
Fixes: https://tracker.ceph.com/issues/57296 Signed-off-by: Laura Flores <lflores@ibm.com>
Dan Mick [Wed, 9 Aug 2023 02:05:25 +0000 (19:05 -0700)]
Standardize on "label" name for parameter
The Jenkins UI (or at least the Python binding) uses "label", not
"labels" (for reconfig only, because it uses XML, and the XML contains
the tag 'label'. Sadly that's not true for create.)
Dan Mick [Wed, 19 Jul 2023 20:31:31 +0000 (13:31 -0700)]
ansible: use 'inventory_hostname' to look up jenkins labels
inventory_hostname is as it appears in the inventory file, and keeping
that consistent is easier than dealing with short vs long names
Note: this requires a change to ceph-sepia-secrets to standardize on
the inventory form of the name
Looks like this job has never worked.
The file should be bind-mounted in /code, not in /
There's no need to pass arguments, the entrypoint automatically
scan /code and run flake8 against any present files.
Lucian Petrut [Wed, 2 Aug 2023 08:01:58 +0000 (08:01 +0000)]
windows: log available disk space
Some Windows test jobs have failed after the OSDs ran out of disk
space. Those jobs use Linux VMs with 128GBs of storage and run
3 OSDs, each having 15GB.
In order to get a better picture, we'll log the disk space usage,
including the vstart directory contents.
Dan Mick [Tue, 25 Jul 2023 02:35:26 +0000 (19:35 -0700)]
ansible: remove vault_password_file; jenkins-build doesn't have one
Jenkins also uses ansible for some builds, and this config item breaks
the run, because although the ansible secret isn't needed, if you configure
it, ansible requires it. <sigh>
Dan Mick [Fri, 21 Jul 2023 02:43:05 +0000 (19:43 -0700)]
jenkins-job-builder: don't delete jobs named 'preserve-*'
This is to allow job development without losing the jobs and past builds
as other ceph-build PRs are pushed. It's assumed that if you create a job
with 'preserve-' in its title that you will clean it up when you're done.
Laura Flores [Mon, 17 Jul 2023 21:44:44 +0000 (16:44 -0500)]
ceph-dev-new-setup/config/definitions: increase timeout for cloning ci repo
Might help address failures like
```
fatal: fetch-pack: invalid index-pack output`
```
and
```
error: rev-list died of signal 15
error: github.com:ceph/ceph-ci.git did not send all necessary objects
```
Dan Mick [Sat, 15 Jul 2023 01:07:32 +0000 (18:07 -0700)]
examples/builder.yml, library/jenkins_node: support node update
Original implementation would only permit nodes to be created if they
did not exist. Add support to update certain attributes of the node
with a run against extant nodes. Only permit a few attribute updates.
This is primarily motivated by the desire to support maintaining node
labels with ansible.
Dan Mick [Thu, 8 Jun 2023 09:05:24 +0000 (02:05 -0700)]
files/ssh/hostkeys/github.com.pub: update the github host key
github reissued its host key in March 2023 (see
https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/).
Record here for use in setting up jenkins builders.
Dan Mick [Thu, 8 Jun 2023 09:03:39 +0000 (02:03 -0700)]
ansible/examples/builder.yml: kill any rogue agent/slave.jar procs
Some time ago slave.jar was renamed agent.jar, and there were
builders running the old version, sometimes as root (which caused
problems when the job would check out git workspaces as root
that could then not be removed by a job running as jenkins-build).
Clean up the crufty procs, if any.