Revert the commit (manually, by restoring the file by hand to the state
it was in prior to d7c144c) to the state that it was in before I added
the Executive Council Responsibilities document to governance.rst. This
document cannot be edited at will, but must be voted on by the
Leadership Team.
Zac Dover [Fri, 4 Oct 2024 13:21:32 +0000 (23:21 +1000)]
doc/governance: add exec council responsibilites
Add the Ceph Executive Council's responsibilties to the
doc/governance.rst document. It was decided during the weekly CLT
meeting on 30 Sep 2024 to add this to the ceph/ceph git repository.
mgr/smb: fix condition for smb earmark when cluster_id doesn't match
This commit resolves an issue where accessing `earmark.split('.')[2]` would cause a
"list index out of range" error when the earmark is set to just "smb" without additional scopes.
The fix introduces a parsing function to safely handle earmarks, ensuring proper behavior
even when no cluster ID or additional scopes are present.
John Mulligan [Tue, 1 Oct 2024 15:27:44 +0000 (11:27 -0400)]
cephadm: use a shared smb.conf for clustered smb container sets
Use a shared smb.conf when deploying ctdb enabled containers. There was
a problem updating configs on the ctdb enabled clusters and the issue
was that the configwatch sidecar was not using CTDB, rather it had a
"default" copy of smb.conf that enabled only registry config, but not
CTDB. Examining the cluster this problem was found to be general to all
sidecars that are either sambacc based (not starting smbd, winbindd,
etc) and the smbmetrics sidecar.
Fixes: https://tracker.ceph.com/issues/68322 Signed-off-by: John Mulligan <jmulligan@redhat.com>
RGW: Cloud Restore cli and its corresponding response for user.
* For first and repititive request 202 Accepted will be corresponding response code.
* For CloudRestored status 200 OK will be corresponding response code.
* For conflicting requests 409 Conflict corresponding response code.
Also Fixed storage class update while listing objects.
Earlier while restoring object temporarily list-objects (s3api) and
radosgw-admin bucket list didn't have updated storage class. With this
fixed it now has the cloudtier storage class.
* It allows read-through for cloud-tiered objects via restore_obj_from_cloud
* New tier config options user need to set allow_read_through to true and
read_through_restore_days more than 1 for this feature to work, also
objects with retain_head_object will be available for this feature.
* First get request will fail with restoring in progress error, objects
are downloaded asynchronously.
* The objects restore are temporary.
* Tested `aws s3api get-object`, `aws s3api head-object` and `aws s3 cp`
In addition send timeout errors for first readthrough request
Also addressed lint warning and other cleanup(review comments)
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
Soumya Koduri [Thu, 3 Oct 2024 02:33:20 +0000 (08:03 +0530)]
rgw/cloudtier: Restore object from cloud endpoint
1)Add functionality to restore cloud-transitioned objects on demand.
Current commit has below -
* Given <bucket,object>, fetch the object from the cloud endpoint.
* if days provided and > 0, the restore is marked temporary with expiry date.
* Without <days>, it is marked as permanent restore.
2)Use ObjectExpirer/delete_at attr to delete temp objects
For temporarily restored objects, set delete_at attr to the expiration time.
This will add those objects to ObjectExpirer list. Use LC worker thread to
scan that list and delete expired objects. By delete here, it means to delete
restored object data and reset HEAD object as Cloud-transitioned object as it
was before restore.
In addition below changes are done -
* If temporary, object is still marked RGWObj::CloudTiered and mtime is set same as
transition time.
* If permanent, object is marked RGWObj::Main and mtime is set to restore time (now()).
* rgw_restore_debug_interval option added to set configure restore Days (similar to rgw_lc_debug_interval)
There is an issue with ObjectExpirer code where in if an object is added
to ObjectExpirer list and is re-written, it is not deleted from the expirer list
and hence the new object may get deleted. Fixed the same and also addressed
minor review comments.
3)Design doc added
4) ObjCategory should be set to CloudTiered only for cloud-transitioned
objects and temporarily restored objects. Permanent copies are to be
treated as regular objects.
Dan Mick [Wed, 26 Jun 2024 02:07:41 +0000 (19:07 -0700)]
Add Containerfile and build.sh to build it.
The intent is to replace ceph-container.git, at first for ci containers
only, and eventually production containers as well.
There is code present for production containers, including
a separate "make-manifest-list.py" to scan for and glue the two
arch-specific containers into a 'manifest-list' 'fat' container,
but that code is not yet fully tested.
This code will not be used until a corresponding change to the
Jenkins jobs in ceph-build.git is pushed.
Note that this tooling does not authenticate to the container repo;
it is assumed that will be done elsewhere. Authentication is
verified by pushing a minimal image to the requested repo.
mgr/dashboard: Allow adding all listeners unders a subsystems
Issue:
- Currently a user cannot add all listeners under a subsystem
- This results into an error: `Failure adding nqn.2001-07.com.ceph:1725013182540 listener at 10.70.44.140:4420: Gateway's host name must match current host (dhcp47-54)`
Reason:
- The gateway address used while creating listener is random now in nvmeof client
- After checking the gateway logs of each node, its is found that no grpc request recieved for adding listener on the respective node rather going to the node that is chosen by default in nvmeof client.
- But nvmeof backend check that current gateway matches the one with sent in request for adding listener (ref: https://github.com/ceph/ceph-nvmeof/blob/devel/control/grpc.py#L2104)
Fix:
- Using `traddr` from listener API to set the current gateway address
- Since `traddr` gives only IP address, without port therefore extracting full address from `NvmeofGatewaysConfig.get_gateways_config()`
- This ensures correct path usage
This effort was started long time ago by Mike Perez.[1] I now completed the remaining steps to achieve the Passing level of the OpenSSF Best Practices Badge.[2]
It should be used as an opportunity to implement best practices in the Ceph community. For example, the Passing level was achieved without meeting the optional compliance with Dynamic Analysis or Security Scanning.
- when no group is set a bad request is made `?gw_group=null`
- this causes a dashboard exception to be raised as well
- avoiding sending any request when no group is present
- fixes casing of "hostname" in gateway column
doc/rados: edit "Placement Groups Never Get Clean"
Make grammar improvements (and correct a verb disagreement) in the
section "Placement Groups Never Get Clean" in
doc/rados/troubleshooting/troubleshooting-pg.rst.
* the steps performed by the Windows CI job
* artifact structure
* frequently asked questions
The document is meant to assist the Ceph developers in investigating
CI failures. This is especially important as the Windows CI job runs
integration tests that would otherwise only be executed by
Teuthology, thus helping catch potential regressions quickly.
Note that the identified regressions are not necessarily Windows
specific, usually affecting Linux builds as well.
Prachi Goel [Sun, 11 Aug 2024 17:05:59 +0000 (22:35 +0530)]
mgr/dashboard: rbd table actions enhancements
Fixes:https://tracker.ceph.com/issues/67198
Below are the changes:-
Delete option moved to last in list.
Delete is disabled in case image is secondary.
Move to trash should be second last option.
Tooltip is added for ‘Move to Trash’ and ‘Delete’
Tooltip is added for Copy option
Subheading is added for Copy Form.
Tooltip is added for ‘Flatten’ option.
Delete tooltip content has been in case image is primary.
Leonid Chernin [Thu, 26 Sep 2024 13:47:00 +0000 (13:47 +0000)]
mon/nvmeofgw*: fix tracking gateways in DELETING state
1.Ignore subsystems of GWs in state DELETING when calculate number namespaces
2.Call tracking function always in the monitor's tick - not just if the
beacon is active
Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
instead of calling ioctx.get_last_version() after a rados operation,
callers now pass version_t* as an output parameter. in the null_yield
case, that version is assigned to ioctx.get_last_version() as normal. in
the async case, we get the version out of librados::async_operate()'s
return value
in bucket incremental sync, if a generation other than the current generation is requested,
we mark it with -EAGAIN and retry in error repo in RGWDataSyncSingleEntry(). within this block,
we check if that requested generation is lesser than the current gen, write it to error repo,
set -EAGAIN and write it to error repo in the outer function once again. don't duplicate error
repo entry addition for this condition.
librados/asio: add version_t to completion signatures
IoCtx::aio_operate() doesn't update IoCtx::get_last_version(). to make
the resulting version_t available to the caller, we have to read it out
of the AioCompletionImpl and return it to the caller
qa: avoid a non-standard shell construct in rbd/iscsi_client.t
dash which is used as /bin/sh on Ubuntu interprets "2&> /dev/null" as
an instruction to launch iscsiadm in the background. While that is
mostly compensated by the following sleep, stderr isn't redirected to
/dev/null either -- the output gets polluted and the test fails.
... since it's not available on Ubuntu. In this case mpathconf just
sets a couple of default values and defines an empty blacklist section,
so it's easy enough to replicate.
Patrick Donnelly [Thu, 26 Sep 2024 15:24:58 +0000 (11:24 -0400)]
Merge PR #58936 into main
* refs/pull/58936/head:
mds: do not duplicate journaler write heads
mds: use Journaler getters
osdc: properly acquire locks for getters
osdc: add print method for Journaler::Header
mds: do not trim segments after open file table commit
mds: delay expiry if LogSegment is ahead of committed oft seq
mds: do not write journal head twice on trim
mds: simplify and explain expiry finisher ctx
mds: add mds_lock asserts for journal flush
mds: skip second wait_for_safe
mds: trim only to the LogSegment created for flush
mds: allow passing explicit seq to trim to
mds: quiet unhelpful debug message
mds: add C_IO_Wrapper completion debugging
mds: add dout for new segment