Lucian Petrut [Thu, 30 Mar 2023 12:14:10 +0000 (12:14 +0000)]
include: move ALLPERMS definition to compat.h
The Windows CI job started to fail as some libcephfs tests that use
ALLPERMS have been moved [1] to a separate file which doesn't have
the ALLPERMS definition.
/ceph/src/test/libcephfs/suidsgid.cc:240:36: error: ‘ALLPERMS’ was
not declared in this scope
240 | ASSERT_EQ(stx.stx_mode & (mode_t)ALLPERMS, before_mode);
We'll move this definition to compat.h so that we won't have to
redefine it in each file that uses it.
Note that we're moving the Windows "fs_compat.h" include up,
ensuring that the constants used by ALLPERMS are defined.
Venky Shankar [Fri, 31 Mar 2023 04:02:37 +0000 (09:32 +0530)]
Merge PR #49460 into main
* refs/pull/49460/head:
qa: fix issue with fn unable to fetch port and ip
qa: fix helper function _check_nfs_cluster_status()
qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'
qa: fix cluster creation failure in test_nfs.py
qa: test export creation at filepath and symlink
qa: added test case test_nfs_export_with_invalid_path
mgr/nfs: disallow non-existent paths when creating export
mgr/nfs/tests: mock check_cephfs_path
mgr/nfs/utils: add helper func to check cephfs path
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Rishabh Dave <ridave@redhat.com>
Laura Flores [Thu, 30 Mar 2023 15:38:14 +0000 (10:38 -0500)]
qa/crontab: check older builds on teuthology/nop when necessary
Today's scheduled run failed since the newest build of main
had failed. If we add `-n 10` to the command, this will
make it so we start at the newest build and backtrack
up to 10 older builds if necessary.
A higher number than that is not necessary, as the suite
failing to run will signal to us that more than the last
10 main builds are broken in Shaman.
2023-02-24T20:49:10.323 INFO:tasks.cephfs_test_runner: info_output = json.loads(self._nfs_cmd('cluster', 'info', self.cluster_id))['test']['backend'][0]
2023-02-24T20:49:10.323 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
dparmar18 [Tue, 21 Feb 2023 18:08:42 +0000 (23:38 +0530)]
qa: fix helper function _check_nfs_cluster_status()
Comment in the code says to wait for two minutes as cluster
creation takes time but actually it's waiting for thirteen
minutes, it's not required to wait this long, i think a minute
here is more than enough, also switched to using safe_while().
dparmar18 [Mon, 13 Feb 2023 14:32:06 +0000 (20:02 +0530)]
qa: fix cluster creation failure in test_nfs.py
Also adds a function _nfs_complete_cmd() that returns process obj so that stdout/stderr
can be used for evaluation(_nfs_cmd() uses raw_cluster_cmd() that returns just stdout
and it became difficult to time cluster creation errors in _test_create_cluster()).
It takes sometime to update the cluster data, therefore running the command set
(check nfs server status -> nfs cluster create test -> check cluster status) in
a loop (max six iteration with sleep of 5 secs at each iteration) fixes the issue.
Venky Shankar [Thu, 30 Mar 2023 10:43:48 +0000 (16:13 +0530)]
Merge PR #47649 into main
* refs/pull/47649/head:
mds: adjust MDSRank::command_tag_path invocation of enqueue_scrub()
doc/scrub: documented stray evaluation using recursive scrub
qa: added testcases
mds: make `scrub status` print flag `scrub_mdsdir`
mds: add scrub_mdsdir to ScrubHeader
mds: do not dump multiple JSON obj
mds: evaluate strays while performing scrub on root path
mds: remove inode from scrub_stack if being purged
mds: do not scrub inode if it is purging
Venky Shankar [Thu, 30 Mar 2023 09:18:26 +0000 (14:48 +0530)]
Merge PR #50053 into main
* refs/pull/50053/head:
libcephfs: move ClearSetuid to suidsgid.cc
libcephfs: add test cases for dropping the suid/sgid in write/truncate
libcephfs: add test cases for dropping the suid/sgid in fallocate
libcephfs: fix ClearSetuid incorrectly using SETATTR_MODE mask
client: switch to clear_suid_sgid for ftruncate
client: switch to clear_suid_sgid for _write()
mds/client: clear the suid/sgid in fallocate path
client: allow unprivileged users to clear suid/sgid
Patrick Donnelly [Wed, 29 Mar 2023 20:15:47 +0000 (16:15 -0400)]
Merge PR #49773 into main
* refs/pull/49773/head:
mds: add config to decide whether to mark dentry bad
qa: add missing scan_links step for data scan recovery
qa/tasks/cephfs: test damage to dentry's first is caught
qa/tasks/cephfs: use rank_asok and allow specifying rank
qa/tasks: allow specifying timeout command prefix to ceph
mds: provide test configs for creating first corruption
mds: catch damage to dentry's first field
mds: add debugging for pre_cow_old_inode
mds: cleanup code
Dhairya Parmar [Wed, 29 Mar 2023 17:50:50 +0000 (23:20 +0530)]
mgr/nfs/utils: add helper func to check cephfs path
this helper instantiates CephfsClient, however this was
initially planned in ExportMgr class in export.py but
due to make check failure where main python thread
experienced a dead lock which after several efforts
pointed at instantiation of CephfsClient in ExportMgr
was problematic, it was decided in order to achieve
singleton behavior, func has been added inside this
helper func that restricts instantiation using functool's
lru_cache.
Rishabh Dave [Mon, 27 Mar 2023 12:41:51 +0000 (18:11 +0530)]
qa/workunit/fs: print commands for making debugging easier
Print the commands and their arguments as they are being executed for
kernel_untar_tar.sh so that it's easier to debug when a teuthology
failure occurs due to it.
Dongsheng Yang [Wed, 15 Mar 2023 06:54:39 +0000 (06:54 +0000)]
librbd: fix wrong attribute for rbd_quiesce_complete api
When we use rbd_quiesce_complete api, we got an error:
/usr/bin/ld: undefined reference to `rbd_quiesce_complete'
Then we found the problem is the symbol of rbd_quiesce_complete
in librbd.so is LOCAL. After some investigation, we found
the attribute of rbd_quiesce_complete api is CEPH_RADOS_API
rather than expected CEPH_RBD_API.
Fixes: https://tracker.ceph.com/issues/59208 Signed-off-by: Dongsheng Yang <dongsheng.yang.linux@gmail.com>
Mohan Sharma [Tue, 27 Dec 2022 06:01:04 +0000 (11:31 +0530)]
ceph-volume: fix drive-group issue
The drive-group expects the batch_args to be a string,
however in the current version it is passed as a list
of one element, thus calling the first item of the list solves the issue.
Xiubo Li [Tue, 21 Mar 2023 01:51:49 +0000 (09:51 +0800)]
qa: enable kclient test for newop test
The kclient have already fix this. This will only enable the upstream
kclient with the testing branch, the downstream ones may not include
the fixing yet, so skip them for now.
The nautilus will only support the syntax v1. And for kclient there
is not need to do the upgrade.
Fixes: https://tracker.ceph.com/issues/57591 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Avan Thakkar [Tue, 28 Mar 2023 13:32:47 +0000 (19:02 +0530)]
exporter: user only counter dump/schema commands for extacting counters
Fixes: https://tracker.ceph.com/issues/59191 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Ceph exporter no more required the output of perf dump/schema, as the ``counter dump`` command
returns both labeled and unlabeled perf counters which exporter can fetch and export.
Removed the ``exporter_get_labeled_counters`` confiug option as exporter will now export
all the counters, labeled or unlabeled.
Also the fix includes the support for renaming the metrics name of rgw multi-site and
adding labels to it, similar to what is there in prometheus module.
Adam King [Mon, 30 Jan 2023 16:27:09 +0000 (11:27 -0500)]
qa/cephadm: add check that iscsi daemon /etc/hosts matches host /etc/hosts
To make sure we aren't being affected by any podman introduced
changes to the /etc/hosts file and test that we're properly
mounting /etc/hosts in our daemon containers
Adam King [Sat, 21 Jan 2023 23:44:22 +0000 (18:44 -0500)]
cephadm: mount host /etc/hosts for containers in podman deployments
Podman messes with the /etc/hosts file in certain version. There
was already a past issue with it placing the container name
there fixed by https://github.com/ceph/ceph/pull/42242. This time
it is adding an entry for "host.containers.internal" (seems to be
podman 4.1 onward currently). Iscsi figures out the FQDN for a
host by running
which is resolving to "host.containers.internal" when run in
the container with the podman modified /etc/hosts.
There is also an issue with grafana dashboard with
this entry present
Passing --no-hosts resolves this, but I think in the past
we avoided that due to not wanting to break deployments
where host name resolution was handled using /etc/hosts.
That's why we had that workaround previously linked. This
time I'm not sure such a workaround exists. The try here
is to mount a copy of the host's version of /etc/hosts
into the iscsi container. That copy won't have the extra
entry podman adds in but will have any user created entries in
case they were actually using it for host name resolution.
If /etc/hosts file isn't present for whatever reason, we're
assuming that this user isn't using /etc/hosts for hostname
resolution, and just going back to passing --no-hosts.
Fixes: https://tracker.ceph.com/issues/58532 Fixes: https://tracker.ceph.com/issues/57018 Signed-off-by: Adam King <adking@redhat.com>