Adam C. Emerson [Wed, 19 Jan 2022 21:49:05 +0000 (16:49 -0500)]
rgw: Report empty endpoints as error instead of crashing
Fixes: https://tracker.ceph.com/issues/53941 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3c4a64ca040d3a0e0ddf762c391575498dc2a77f) Fixes: https://tracker.ceph.com/issues/53973 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Mykola Golub [Fri, 14 Jan 2022 18:21:29 +0000 (18:21 +0000)]
cls/journal: skip disconnected clients when finding min_commit_position
When a new journal client is registered, all already registered
clients are checked, and a client with min position is selected
as a position for the new client. Thus we may expect that
starting from the registered position all journal entries will be
available (not trimmed) for the new client.
But when looking for a min commit position, the client_register
function did not take into account that a registered client might
be in disconnected state, and in that case the journal entries
might be trimmed for this client.
Kamoltat [Wed, 12 Jan 2022 02:41:01 +0000 (02:41 +0000)]
pybind/mgr/progress: enforced try and except on accessing event dictionary
There is a certain race condition scenario where
an event gets deleted while the progress module
iterates through the ``events`` dictionary,
without a ``try and except``, this will cause
an unhandled exception error and will crash
the module.
This commit will enforce ``try and except``
on every part of the code where we are accessing
the ``events`` dictionary.
Casey Bodley [Wed, 10 Mar 2021 21:12:13 +0000 (16:12 -0500)]
rgw: allow rgw_data_notify_interval_msec=0 to disable notifications
the data changes log for multisite will occasionally broadcast recent
changes to other zones, which they can use to prioritize sync of some
of the most recent changes. they'll eventually see all changes as they
replay the data changes log, though, so notifications aren't required
for successful sync. the ability to turn them off is useful for testing
Casey Bodley [Thu, 13 Jan 2022 20:56:11 +0000 (15:56 -0500)]
rgw/dbstore: hide dbstore_log.h from rgw_main.cc
dbstore_log.h sets global dout_subsys/dout_prefix macros, and was
leaking into rgw_main.cc through the common/dbstore.h. this caused all
of rgw_main's log output to start with the wrong prefix "rgw dbstore: "
Casey Bodley [Tue, 23 Nov 2021 20:44:03 +0000 (15:44 -0500)]
rgw/multisite: metadata sync only retries on errors
in 866d66b8749b28ec626a8d0adba3d14fdd8abead, metadata sync was fixed to
retry on error codes other than EAGAIN/ECANCELED. but this change caused
us to retry on success as well, which means we send 10 GET requests for
each piece of metadata, and write it to rados 10 times
Ernesto Puerta [Fri, 14 Jan 2022 11:56:55 +0000 (12:56 +0100)]
Merge pull request #44507 from votdev/issue_53813_nfs_page_not_found
mgr/dashboard: NFS pages shows 'Page not found'
Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Laura Paduano <lpaduano@suse.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>
Yaarit Hatuka [Wed, 12 Jan 2022 05:01:48 +0000 (05:01 +0000)]
mgr/telemetry: verify there are new collections when nagging due to a major
upgrade
When adding a new collection we define whether to nag the user about it.
We may add many collections and nag about none of them. However, in case
of a major upgrade, we wish to notify the user about these new
collections. This commit verifies there are indeed new collections when
nagging due to a major upgrade.
Yaarit Hatuka [Wed, 12 Jan 2022 04:36:27 +0000 (04:36 +0000)]
mgr/telemetry: improve output of `ceph telemetry collection ls`
STATUS column now indicates whether a collection is being reported, and
the reasons why it's not (either the user is not opted-in to this
collection, or its channel is off).
Also, removed the ENROLLED and DEFAULT columns due to potential
confusion they may cause.
In case a user is not opted-in to certain collections, a message will
appear above the table with the missing collections:
New collections are available:
['basic_base', 'basic_mds_metadata', 'crash_base', 'device_base',
'ident_base', 'perf_perf']
Run `ceph telemetry on` to opt-in to these collections.
Yaarit Hatuka [Tue, 7 Dec 2021 18:30:56 +0000 (18:30 +0000)]
mgr/telemetry: add command to list all collections
List all collections, their current enrollment state, status, default,
and description, with:
$ ceph telemetry collection ls
NAME ENROLLED STATUS DEFAULT DESC
basic_base TRUE ON ON Basic information about the cluster (capacity, number and type of daemons, version, etc.)
basic_mds_metadata TRUE ON ON MDS metadata
crash_base TRUE ON ON Information about daemon crashes (daemon type and version, backtrace, etc.)
device_base TRUE ON ON Information about device health metrics
ident_base TRUE OFF OFF User-provided identifying information about the cluster
perf_perf TRUE OFF OFF Information about performance counters of the cluster
Please note:
NAME:
=====
Collection name; prefix indicates the channel the collection belongs to.
ENROLLED:
=========
Signifies the collections that were available in the module when the
user last opted-in to telemetry. Please note: Even if a collection is
'enrolled', its metrics will be reported only if its channel is enabled.
STATUS:
=======
Indicates whether the collection metrics are reported; this is
determined by the status (enabled / disabled) of the channel the
collection belongs to, along with the enrollment status of the
collection.
DEFAULT:
========
The default status (enabled / disabled) of the channel the collection
belongs to.
Yaarit Hatuka [Tue, 23 Nov 2021 21:28:47 +0000 (21:28 +0000)]
mgr/telemetry: add preview-device and preview-all commands
`ceph telemetry show` will show a sample cluster report if the user is
opted-in to telemetry. The report will be compiled of the collections
the user is opted-in to. To preview a report compiled of the most recent
collection available, use `ceph telemetry preview`.
The device channel is not included in the cluster report, since it's
being sent to a different endpoint, thus we use
`ceph telemetry show-device` in case the user is opted-in to telemetry
and the device channel is enabled. If not, it can also be previewed with
`ceph telemetry preview-device`.
If telemetry is on, and device channel is enabled, both reports can be
reviewed with `ceph telemetry show-all`, otherwise use
`ceph telemetry preview-all`.
Yaarit Hatuka [Tue, 23 Nov 2021 17:11:38 +0000 (17:11 +0000)]
mgr/telemetry: add command to list all channels
List all channels, their current state, default, and description, with:
$ ceph telemetry channel ls
NAME ENABLED DEFAULT DESC
basic ON ON Share basic cluster information (size, version)
ident OFF OFF Share a user-provided description and/or contact email for the cluster
crash ON ON Share metadata about Ceph daemon crashes (version, stack straces, etc)
device ON ON Share device health metrics (e.g., SMART data, minus potentially identifying info like serial numbers)
perf ON OFF Share perf counter metrics summed across the whole cluster
Yaarit Hatuka [Tue, 23 Nov 2021 00:12:10 +0000 (00:12 +0000)]
mgr/telemetry: add commands to enable/disable channels
Currently we enable/disable a telemetry channel via CLI with:
`ceph config set mgr mgr/telemetry/channel_basic true`
`ceph config set mgr mgr/telemetry/channel_crash false`
We can now do this with:
`ceph telemetry enable channel basic`
`ceph telemetry disable channel crash`
Yaarit Hatuka [Mon, 15 Nov 2021 16:53:59 +0000 (16:53 +0000)]
mgr/telemetry: introduce new design for adding new data
The current design requires increasing the telemetry revision each time
we add new data to the report. As a result, users need to re-opt-in to
telemetry. This new design allows for adding new data to the report,
while allowing users to keep sending only what they already opted-in to,
hence no re-opt-in is required. In case users wish to report the new
data as well, they need to re-opt-in and enable any new channels.
Also, move formatting perf histograms to a function, so we can use it
both in `show` and `preview` commands.
Fix get_report call in dashboard to use get_report_locked.