Nizamudeen A [Fri, 2 Sep 2022 10:19:31 +0000 (15:49 +0530)]
Merge pull request #47867 from MrFreezeex/quincy-ceph-mixin-backports
quincy: monitoring: ceph mixin backports
Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Adam King [Thu, 18 Aug 2022 12:49:57 +0000 (08:49 -0400)]
qa/cephadm: specify using container host distros for workunits
Right now, the OS Type and OS Version for these workunits
tests is left blank on pulpito and they appear to be trying to
run ubuntu jammy currently which is causing failures. We should
specify what distros the tests should run on then very explicitly
tell it to start trying new distros when we can get the tests to
pass.
胡玮文 [Sun, 9 Jan 2022 15:17:38 +0000 (23:17 +0800)]
mon/MDSMonitor: remove redundant state change check
There are two sets of checks to state change in prepare_beacon.
Since the last commit, many of these checks are covered by
`MDSMap::state_transition_valid`. So merging these checks.
This fixes the bug that standby-replay is evicted unexpectedly.
This bug is introduced in 794d13c9ff4 (mon/MDSMonitor: reject illegal want_states from MDS)
but only reveal itself after 20509bb6c82 (MDSMonitor: handle damaged from standby-replay)
The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.
In 4a3afcf, the $PATH is set for the test, but we cannot set multiple
properties with a single `set_property()` cmake command. We fix that by
adding the installation path of jsonnet-bundler
(CMAKE_CURRENT_BINARY_DIR) to the $PATH used for every tox test.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch> Co-Authored-By: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit d46e14c71bffda1381dac7da244ab8347d035769)
Xiubo Li [Wed, 1 Jun 2022 02:32:58 +0000 (10:32 +0800)]
mds: notify the xattr_version to replica MDSes
When one client changes a xattr's value in the auth MDS, when replying
to the client the MDS possibly will drop the increased xattr_version
and new value in the reply message if no 'Xs' caps will be issued to
the client together.
And when the client wants to get this xattr's value, and if it sends
the request to a replicated MDS, since the replicated MDS still has
the old value of the xattr_version, and then the client will drop the
xattr value since xattr_version is not changed.
We need to notify the xattr_version to the replicated MDSes together
with the xattrs when notifying the lock state.
Adam King [Thu, 25 Aug 2022 16:09:49 +0000 (12:09 -0400)]
mgr/orchestrator/tests: don't match exact whitespace in table output
It seems that the exact spacing may differ a bit between
python versions. Currently seeing py3 (which cooresponds to py 3.6
on my system) passing these tests and py37 (which is python 3.7
obviously) failing. I think verifying against the exact whitespace
is unnecessary anyhow. As long as it isn't egregious, we don't
really need to worry about exactly what the spacing is.
Laura Flores [Wed, 24 Aug 2022 22:23:45 +0000 (22:23 +0000)]
src/pybind/mgr/telemetry: parse `outb` instead of `outs`
Following the merge of https://github.com/ceph/ceph/pull/47650, which
fixes the confusion between std out and std err in admin socket
commands, we will need to reference the out stream (outb) instead
of the error stream (outs) when we parse heap stats.
Laura Flores [Tue, 5 Jul 2022 22:06:15 +0000 (22:06 +0000)]
mgr/telemetry: add `perf_memory_metrics` collection to telemetry
This new collection includes heap stats and mempool metrics for
mon and mds daemons.
A `tell_command` function was introduced to the mgr module as a wrapper
around the `send_command` function to make it easier to run "tell"
admin socket commands.
osd, mds: fix the "heap" admin cmd printing always to error stream
Before the patch `ceph::osd_cmds::heap()` was confusing
the concepts of _stderr_ and _stdout_. This was the direct
cause of the differences in output between `ceph tell` and
`ceph daeamon`.
Thanks to Laura Flores who made the extremely useful observation
noted in https://tracker.ceph.com/issues/57119#note-3.
Zac Dover [Thu, 25 Aug 2022 15:56:41 +0000 (01:56 +1000)]
doc/mgr: add prompt directives to dashboard.rst
This commit adds prompt directives (.. prompt:: bash $) to
the commands in dashboard.rst.
There are several ".. include::" directives in the dashboard.rst
file, which means that part of this page is sourced from elsewhere
than the dashboard.rst file. Because I have not yet added prompt
directives to those files, there is an inconsistency in the rendering
of this file. Most of the commands on this page have unselectable
prompts (unselectable prompts are the prompts that don't get added to
the buffer when you copy them to one of the clipboards). But the
commands on this page that come from those ".. include::" directives
do not yet have unselectable prompts.
This file is over 1600 lines long. It was perhaps not optimally wise
of me to have edited all of it in one fell swoop. It took many hours,
and carefully checking it will probably take at least one hour. I
suggest that whoever reviews this should not spend much time on it,
but should instead make a quick pass over the page and make sure that
it looks passable.
The English syntax on this page (and throughout the Dashboard doc-
umentation) will be tightened to remove ambiguity and to improve
readability in the near future, so hold all English-language-related
comments for a future pull request.
Adam King [Wed, 24 Aug 2022 14:36:53 +0000 (10:36 -0400)]
doc/cephadm: fix example for specifying networks for rgw
count_per_host must be used with underscores rather
than dashes to work, you need to pass service_id not
service_name and the option for the port is called
rgw_frontend_port not just "port"
Zac Dover [Tue, 23 Aug 2022 06:59:04 +0000 (16:59 +1000)]
doc/mgr: edit orchestrator.rst
This PR improves the English language in the "Orchestrator CLI"
section of the MGR documentation. It adds a couple of section
headers in order to signpost the information in the document
a bit more than had already been done, but it makes no major
structural changes to the presentation of the information here.
This PR was motivated by feedback from the 2022 Ceph User Survey
in which one of the respondents wrote "better ceph orch documen-
tation".
The final section on this page, "Current Implementation Status",
must be verified by someone who is familiar with the current state
of "ceph orch" and a date stamp should be applied to the top of
the section so that the word "current" has a meaningful referent.
Xiubo Li [Wed, 6 Apr 2022 00:12:26 +0000 (08:12 +0800)]
ceph-fuse: add dedicated snap stag map for each directory
This will fix the fino colliding bug, which is caused when the
snapid is later than 0xffff.
From mds 'mds_max_snaps_per_dir' option, we can see that the max
snapshots for each directory is 4_K, and in ceph-fuse we have
around 64_K, which is from 0xffff - 2, stags could be used to make
the fake fuse inode numbers for each directory.
Xiubo Li [Thu, 24 Mar 2022 02:01:57 +0000 (10:01 +0800)]
ceph-fuse: return EINVAL if get invalid fino instead of assert
All the snap ids of the finos returned to libfuse from libcephfs
will be recorded in the map of 'stag_snap_map', and will never be
erased before unmounting. So if libfuse passes invalid fino the
ceph-fuse should return EINVAL errno instead of crash itself.
Xiubo Li [Wed, 23 Mar 2022 02:05:32 +0000 (10:05 +0800)]
mds-client: make the fake inos option unchangeable in runtime
If the flags is empty then in option.h in can_update_at_runtime()
it will return true. That means this opetion could be changed in
runtime, which is buggy. Because if this is false, ceph-fuse will
use its own fake inos instead of libcephfs'. If this is changed
during runtime, we will hit inos dosn't exist assert bugs.
John Mulligan [Tue, 2 Aug 2022 13:45:59 +0000 (09:45 -0400)]
mgr/volumes: drop pre-python 3.2 version checks
Based on other conversations we believe that there is no need to support
python versions lower than Python 3.6 for pacific and later. This means
it is safe to drop the remaining version checks for python
3.2.
John Mulligan [Mon, 11 Jul 2022 20:44:00 +0000 (16:44 -0400)]
mgr/volumes: a lock to guard against races reading/writing config
Fixes: https://tracker.ceph.com/issues/55583
Use a python threading lock to avoid race conditions where the
config file is being both read and written to at the same time.
Before this change, the content of the config file being parsed could be
'corrupted' by the MetadataManager racing with itself. Along with the
previous two patches, additional logging was added to the mgr code to
produce the simplified version of the mgr log below:
```
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'[GLOBAL]\nversion = 2\ntype = clone\npath = /volumes/Park/babydino2/c9f773af-5221-49c6-846c-d65c0920ae3f\nstate = pending\n\n[source]\nvolume = cephfs\ngroup = Park\nsubvolume = Jurrasic\nsnapshot = dinodna0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b''
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'[GLOBAL]\nversion = 2\ntype = clone\npath = /volumes/Park/babydino2/c9f773af-5221-49c6-846c-d65c0920ae3f\nstate = pending\n\n[source]\nvolume = cephfs\ngroup = Park\nsubvolume = Jurrasic\nsnapshot = dinodna0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] wrote 203 bytes to config b'/volumes/Park/babydino2/.meta'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'a0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b''
[volumes ERROR volumes.module] Failed _cmd_fs_clone_cancel(clone_name:babydino2, format:json, group_name:Park, prefix:fs clone cancel, vol_name:cephfs) < "":
Traceback (most recent call last):
...
File "/usr/lib64/python3.6/configparser.py", line 1111, in _read
raise e
configparser.ParsingError: Source contains parsing errors: b'/volumes/Park/babydino2/.meta'
[line 13]: 'a0\n'
```
Looking at the above you can see that the log indicates a write to the
config file (of 203 bytes). This happens before the file has finished
reading and thus instead of getting an empty string indicating EOF, it
gets that last four bytes of the new content of the file. The lock
prevents the MetadataManager from both reading and writing the config
file at the same time.
John Mulligan [Tue, 12 Jul 2022 22:33:07 +0000 (18:33 -0400)]
mgr/volumes: write volume metadata with shim class
Add a class that works a bit like a python file object so that we
can simplify the flush function. Providing a file-like object to
the ConfigParser's write function avoids unnecessary copies to
a StringIO object and makes the code easier to read.
With no more uses of StringIO, the StringIO imports are removed.
John Mulligan [Tue, 12 Jul 2022 22:32:54 +0000 (18:32 -0400)]
mgr/volumes: read volume metadata file using read_string
The read_string method, available in Python 3.2 (we assume Python 3.6 as
our current minimum python versino), supports parsing a provided string
for ini-style configuration parameters. Refactoring the reading of the
config file from cephfs into a simple iterator function and then
providing it to the ConfigParser as a single string, allows us to avoid
using StringIO and simplifies the refresh function.
Nizamudeen A [Tue, 26 Apr 2022 10:19:09 +0000 (15:49 +0530)]
mgr/dashboard: prometheus rules internal server error
After we increase/decrease the count of the node-exporter, we get a 500
- Internal server error from api/prometheus/rules endpoint. On further
debugging its caused by the jsonDecodder, because I guess the expected
input for the json.loads() is not a json formatted input. So to fix
that issue I can either do an error handling on the json.loads() or I
can move the json.loads() on the already existing try block. I went for
the second approach here.
qa: filter internal directories in 'subvolumegroup ls' command
Internal directories: '_nogroup', '_index', '_legacy', '_deleting'
1. Internal directories should be filtered in 'subvolmegroup ls' command.
2. Internal directories should not be accepted as a group name.
mgr/volumes: filter internal directories in 'subvolumegroup ls' command
Internal directories: '_nogroup', '_index', '_legacy', '_deleting'
1. Internal directories should be filtered in 'subvolmegroup ls' command.
2. Internal directories should not be accepted as a group name.
Used the https://www.npmjs.com/package/@grafana/e2e npm packages and
followed
https://github.com/grafana/grafana/blob/main/contribute/style-guides/e2e.md
to understand the style of the grafana e2e testing.
In this PR I introduces the tests for the Hosts Overall
Performance and also RGW per Daemon and Overall Performance