Ronen Friedman [Tue, 8 Oct 2024 13:25:56 +0000 (08:25 -0500)]
qa/standalone/scrub: remove TEST_recovery_scrub_2
That test does no longer match the actual requirements and
implementation of scrubbing.
It was already deactivated in
https://github.com/ceph/ceph/pull/59590. Here - it is
fully removed, mainly for the sake of backporting.
JonBailey1993 [Wed, 9 Oct 2024 10:28:42 +0000 (11:28 +0100)]
common/io_exerciser: Modify is_locked_by_me call in ceph_test_rados_io_sequence
is_locked_by_me() is a function of ceph::mutex which is only used in debug builds. By using the ceph_mutex_is_locked_by_me macro, we can neatly make sure we only run this function in debug mode, allowing compilation to no longer be affected when running in release mode.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
* refs/pull/60037/head:
test/common: add death test for double !recursive lock
common/test: do not test exception raised from recursive lock
test/common: fix invalid vim mode
common,osdc: remove obsolete ceph::mutex_debugging
common: assert debug mutex lock is not held if !recursive
Revert the commit (manually, by restoring the file by hand to the state
it was in prior to d7c144c) to the state that it was in before I added
the Executive Council Responsibilities document to governance.rst. This
document cannot be edited at will, but must be voted on by the
Leadership Team.
Aashish Sharma [Thu, 3 Oct 2024 08:28:14 +0000 (13:58 +0530)]
mgr/dashboard: fix gateways section error:”404 - Not Found RGW Daemon not found: None”
A case was missed here where we do have a default realm created but no default_zonegorup, in that case, the existing behavior should prevail and that's not being handled. If a default_realm is created but no default_zonegroup is there, weshould continue getting the keys from daemon_name = next(iter(daemon_keys))
mgr/dashboard: show non default realm sync status in rgw overview page
Currently, we just show the sync status of the default realm in rgw
overview page. This PR is to show the sync status of non-default realms
as well. Multisite sync status can be viewed for any of the active daemon
which runs in default/non-default realm.
Yuval Lifshitz [Tue, 1 Oct 2024 15:19:46 +0000 (15:19 +0000)]
common: missing std include with GCC 14
In file included from src/rgw/driver/posix/bucket_cache.h:19,
from src/test/rgw/test_posix_bucket_cache.cc:4:
src/common/cohort_lru.h: In member function _void cohort::lru::TreeX<T, TTree, CLT, CEQ, K, LK>::lock()_:
src/common/cohort_lru.h:334:14: error: _for_each_ is not a member of _std_
334 | std::for_each(locks.begin(), locks.end(),
| ^~~~~~~~
src/common/cohort_lru.h: In member function _void cohort::lru::TreeX<T, TTree, CLT, CEQ, K, LK>::unlock()_:
/home/yuvalif/ceph5/src/common/cohort_lru.h:339:14: error: _for_each_ is not a member of _std_
339 | std::for_each(locks.begin(), locks.end(),
| ^~~~~~~~
Aashish Sharma [Fri, 4 Oct 2024 10:54:02 +0000 (16:24 +0530)]
mgr/dashboard: increase timeout to detect replication user in the secondary cluster
Increase timeout to detect replication user in the secondary cluster in rgw multisite automation wizard. Currently its set to 2 mins, increase it to 5 minutes.
when you import realm token to the secondary cluster, we wait for the replication/system user we created in the primary cluster to be present in the secondary cluster and when we find that user we set the credentials in the secondary cluster using ceph dashboard set-rgw-crdentials . The timeout for this is set to 2 minutes and sometimes it takes more than 2 minutes for the user to be replicated in the secondary cluster
When libcephfs aio tests (src/test/client) are run
with objectcacher disabled (ceph_test_client --client_oc=false),
the TestClient.LlreadvLlwritev fails and core dumps. The client
hits the assert 'caps_ref[c]<0'.
This patch fixes the same. There is no need to give out cap_ref
and take it again between multiple read because of short reads.
In some cases, the get_caps used to fail in C_Read_Sync_NonBlocking::finish
causing cap_ref to go negative when put_cap_ref is done at last in
C_Read_Finish::finish_io
Cause:
In aio path, the client_lock was not being held
in the internal callback after the io is done where
it's expected to be taken leading to corruption.
Zac Dover [Fri, 4 Oct 2024 13:21:32 +0000 (23:21 +1000)]
doc/governance: add exec council responsibilites
Add the Ceph Executive Council's responsibilties to the
doc/governance.rst document. It was decided during the weekly CLT
meeting on 30 Sep 2024 to add this to the ceph/ceph git repository.
mgr/smb: fix condition for smb earmark when cluster_id doesn't match
This commit resolves an issue where accessing `earmark.split('.')[2]` would cause a
"list index out of range" error when the earmark is set to just "smb" without additional scopes.
The fix introduces a parsing function to safely handle earmarks, ensuring proper behavior
even when no cluster ID or additional scopes are present.
Ilya Dryomov [Thu, 3 Oct 2024 15:54:07 +0000 (17:54 +0200)]
librbd/crypto/LoadRequest: clone format for migration source image
Migration source and migration target images naturally have the same
encryption format, but the user shouldn't have to need to specify it
for the image that they can't even immediately see -- migration source
image gets moved to the RBD trash to avoid mistaken usage while
migration is in progress.
Formats must also be cloned if the image is under migration, so
rename m_is_current_format_cloned to m_is_current_format_assumed to
avoid potential confusion with clone() being called in two places but
m_is_current_format_cloned being set in only one place.
John Mulligan [Tue, 1 Oct 2024 15:27:44 +0000 (11:27 -0400)]
cephadm: use a shared smb.conf for clustered smb container sets
Use a shared smb.conf when deploying ctdb enabled containers. There was
a problem updating configs on the ctdb enabled clusters and the issue
was that the configwatch sidecar was not using CTDB, rather it had a
"default" copy of smb.conf that enabled only registry config, but not
CTDB. Examining the cluster this problem was found to be general to all
sidecars that are either sambacc based (not starting smbd, winbindd,
etc) and the smbmetrics sidecar.
Fixes: https://tracker.ceph.com/issues/68322 Signed-off-by: John Mulligan <jmulligan@redhat.com>
RGW: Cloud Restore cli and its corresponding response for user.
* For first and repititive request 202 Accepted will be corresponding response code.
* For CloudRestored status 200 OK will be corresponding response code.
* For conflicting requests 409 Conflict corresponding response code.
Also Fixed storage class update while listing objects.
Earlier while restoring object temporarily list-objects (s3api) and
radosgw-admin bucket list didn't have updated storage class. With this
fixed it now has the cloudtier storage class.
* It allows read-through for cloud-tiered objects via restore_obj_from_cloud
* New tier config options user need to set allow_read_through to true and
read_through_restore_days more than 1 for this feature to work, also
objects with retain_head_object will be available for this feature.
* First get request will fail with restoring in progress error, objects
are downloaded asynchronously.
* The objects restore are temporary.
* Tested `aws s3api get-object`, `aws s3api head-object` and `aws s3 cp`
In addition send timeout errors for first readthrough request
Also addressed lint warning and other cleanup(review comments)
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
Soumya Koduri [Thu, 3 Oct 2024 02:33:20 +0000 (08:03 +0530)]
rgw/cloudtier: Restore object from cloud endpoint
1)Add functionality to restore cloud-transitioned objects on demand.
Current commit has below -
* Given <bucket,object>, fetch the object from the cloud endpoint.
* if days provided and > 0, the restore is marked temporary with expiry date.
* Without <days>, it is marked as permanent restore.
2)Use ObjectExpirer/delete_at attr to delete temp objects
For temporarily restored objects, set delete_at attr to the expiration time.
This will add those objects to ObjectExpirer list. Use LC worker thread to
scan that list and delete expired objects. By delete here, it means to delete
restored object data and reset HEAD object as Cloud-transitioned object as it
was before restore.
In addition below changes are done -
* If temporary, object is still marked RGWObj::CloudTiered and mtime is set same as
transition time.
* If permanent, object is marked RGWObj::Main and mtime is set to restore time (now()).
* rgw_restore_debug_interval option added to set configure restore Days (similar to rgw_lc_debug_interval)
There is an issue with ObjectExpirer code where in if an object is added
to ObjectExpirer list and is re-written, it is not deleted from the expirer list
and hence the new object may get deleted. Fixed the same and also addressed
minor review comments.
3)Design doc added
4) ObjCategory should be set to CloudTiered only for cloud-transitioned
objects and temporarily restored objects. Permanent copies are to be
treated as regular objects.
Dan Mick [Wed, 26 Jun 2024 02:07:41 +0000 (19:07 -0700)]
Add Containerfile and build.sh to build it.
The intent is to replace ceph-container.git, at first for ci containers
only, and eventually production containers as well.
There is code present for production containers, including
a separate "make-manifest-list.py" to scan for and glue the two
arch-specific containers into a 'manifest-list' 'fat' container,
but that code is not yet fully tested.
This code will not be used until a corresponding change to the
Jenkins jobs in ceph-build.git is pushed.
Note that this tooling does not authenticate to the container repo;
it is assumed that will be done elsewhere. Authentication is
verified by pushing a minimal image to the requested repo.
Patrick Donnelly [Mon, 30 Sep 2024 13:53:09 +0000 (09:53 -0400)]
common/test: do not test exception raised from recursive lock
The C++ standard does not require that implementations raise std::system_error
when double-locking a non-recursive lock. Our implementation of debug_mutex
now catches this error with a ceph_assert so it cannot be caught.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>