Nitzan Mordechai [Tue, 13 Jun 2023 09:39:38 +0000 (09:39 +0000)]
test: add encode decode test for pg_pool_t
Adding some unit test for pg_pool_t to make sure encode\decode\encode
doesn't failed to produce the same pg_pool_t struct
Nitzan Mordechai [Tue, 13 Jun 2023 04:52:12 +0000 (04:52 +0000)]
osd_type: encode new version for stretch CRUSH buckets
We are implementing a new version of the encoding scheme for stretch
CRUSH buckets in our OSD system.
However, we have encountered limitations when it comes to extending the
encoding version for non-stretch pools. Currently, we are restricted to
using version 29 due to backward compatibility concerns.
To address this, we have devised a solution that excludes version 30 while
maintaining backward compatibility. This means that we will specifically check
for version 30 and maintain the same behavior for older clients.
For new clients, we can proceed with encoding and decoding using version 31.
To accommodate stretch pools, we will utilize std::optional. The first byte of
the encoding will serve as a boolean indicator, determining if the optional data is present.
If the first byte is set to true, we will encode and decode four uint32_t members: peering_crush_*.
By implementing these changes, we can ensure compatibility with older clients while enabling the use
of the enhanced encoding scheme for new clients, specifically for stretch pools.
it seems that with Grafana 10.4.0 the domain parameter is taken into
account while building the final url (earlier versions didn't seem to
behave the same way). This change sets the domain to the hostname where
Grafana daemon is running instead of '*.lab'. serve_from_sub_path is
removed as it's no needed and when add it causes some undesirable
redirections that could break monitoring HA.
This commit adds SSL support to the ceph-exporter deployment
made by cephadm. When `secure_monitoring_stack` is set to `True`,
the `ceph-exporter` container is restarted with SSL enabled.
Leonid Chernin [Tue, 17 Oct 2023 13:25:07 +0000 (13:25 +0000)]
mon: add NVMe-oF gateway monitor and HA
- gateway submodule
Fixes: https://tracker.ceph.com/issues/64777
This PR adds high availability support for the nvmeof Ceph service. High availability means that even in the case that a certain GW is down, there will be another available path for the initiator to be able to continue the IO through another GW. High availability is achieved by running nvmeof service consisting of at least 2 nvmeof GWs in the Ceph cluster. Every GW will be seen by the host (initiator) as a separate path to the nvme namespaces (volumes).
The implementation consists of the following main modules:
- NVMeofGWMon - a PaxosService. It is a monitor that tracks the status of the nvmeof running services, and take actions in case that services fail, and in case services restored.
- NVMeofGwMonitorClient – It is an agent that is running as a part of each nvmeof GW. It is sending beacons to the monitor to signal that the GW is alive. As a part of the beacon, the client also sends information about the service. This information is used by the monitor to take decisions and perform some operations.
- MNVMeofGwBeacon – It is a structure used by the client and the monitor to send/recv the beacons.
- MNVMeofGwMap – The map is tracking the nvmeof GWs status. It also defines what should be the new role of every GW. So in the events of GWs go down or GWs restored, the map will reflect the new role of each GW resulted by these events. The map is distributed to the NVMeofGwMonitorClient on each GW, and it knows to update the GW with the required changes.
It is also adding 3 new mon commands:
- nvme-gw create
- nvme-gw delete
- nvme-gw show
The commands are used by the ceph adm to update the monitor that a new GW is deployed. The monitor will update the map accordingly and will start tracking this GW until it is deleted.
Signed-off-by: Leonid Chernin <lechernin@gmail.com> Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
mgr/cephadm: introducing new cmd to generate self-signed certs
this new Cephadm cmd introduces the ability to generate self-signed
certificates for external modules, signed by Cephadm as the root CA.
This feature is essential for implementing mTLS. Previously, if the
user did not provide a certificate and key, the dashboard would
generate its own. With this update, the dashboard now calls Cephadm
to generate self-signed certificates, enabling secure mTLS
communication with other backend applications. Prometheus module
also makes use of this new functionality to generate self-signed
certificates.
mgr/cephadm: introducing cert_mgr new class to centralize certs mgmt
cert_mgr will be the unique responsible of managing all certificates
generated and maintained by cephadm. Cephadm in addition now provides
a new cmd to generate certificates for external modules.
librbd/migration: close source image in OpenSourceImageRequest
Currently, on errors in FormatInterface::open(), RawFormat disposes
of src_image_ctx, but QCOWFormat doesn't, which is a leak. Rather than
having each format do it internally, do it in OpenSourceImageRequest.
librbd/migration: don't instantiate NativeFormat, handle it via dispatch
Trying to shoehorn NativeFormat under FormatInterface doesn't really
work. It fundamentally doesn't fit in:
- Unlike for RawFormat and QCOWFormat, src_image_ctx for NativeFormat
is not dummy -- it's an ImageCtx for a real RBD image. Pre-creating
it in OpenSourceImageRequest with the expectation that placeholder
values would be overridden later forces NativeFormat to reach into
ImageCtx guts, duplicating the logic in the constructor. This also
necessitates calling snap_set() in a separate step, since snap_id
isn't known at the time ImageCtx is created.
- Unlike for RawFormat and QCOWFormat, get_image_size() and
get_snapshots() implementations for NativeFormat are dummy.
- read() and list_snaps() implementations for NativeFormat are
inconsistent: read() passes through io::ImageDispatch layer, but
list_snaps() doesn't. Both can be passing through, meaning that in
essence these are also dummy.
All of this is with today's code. Additional complications arise with
planned support for migrating from external clusters where src_image_ctx
would require more invasive patching to "move" to an IoCtx belonging to
an external cluster's CephContext and also with other work.
With the above in mind, NativeFormat actually consists of:
1. Code that parses the "type: native" source spec
2. Code that patches ImageCtx, working around the fact that it's
pre-created in OpenSourceImageRequest
3. A bunch of dummy implementations for FormatInterface
With this change, (1) is wrapped into a static method that also creates
ImageCtx after all required parameters are known and (2) and (3) go away
entirely. NativeFormat no longer implements FormatInterface and doesn't
get instantiated at all.
In preparation for not instantiating NativeFormat and losing a copy of
the source spec JSON object in m_json_object, refactor the parsing code
to use only const methods (which std::map's operator[] isn't) and local
variables where possible.
librbd/migration/NativeFormat: do pool lookup instead of creating io_ctx
A Rados instance is sufficient to map the pool name to the pool ID,
no need to involve an IoCtx instance as well. While at it, report
distinctive errors for a non-existing pool and an invalid JSON value
for pool_name key cases.
librbd/migration: make SourceSpecBuilder::parse_source_spec() static
In preparation for divorcing NativeFormat from FormatInterface and
changing when/how src_image_ctx is created, make parse_source_spec()
independent of src_image_ctx. The "invalid source-spec JSON" error is
duplicated by the "failed to parse migration source-spec" error, so
just get rid of the former to spare having to pass CephContext to
parse_source_spec().
The "Client::mds_check_access" expects the target_path without
leading '/' as it eventually calls the "MDSCapMatch::match_path"
which expects the target_path passed to be with out leading '/'
as well.
The single leading '/' was being removed. But absolute path
constructed did have leading '//', so removing all the leading
'/' was necessary.
This causes the clients not to be able to access a particular
path even though it has a rw permission on the specific path.
Add missing spaces, don't use the word stream when reporting errors
on POSIX file operations (open() and lseek64()) and fix a cut-and-paste
typo in RawSnapshot.
This is a rework of the POSIXDriver. It refactors out the actual posix
parts into a set of classes that provide access to underlying
directory/file/symlink, etc. These primatives are used to build up full
support for Bucket, Object, Multipart, and VersionedObjects.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>