John Mulligan [Tue, 6 Jun 2023 17:24:37 +0000 (13:24 -0400)]
cephadm: use 0o600 as the default mode for write_new
Add a constant DEFAULT_MODE of `0o600`, and make it the default of
the perms argument to write_new. This reduces a lot of code since
0o600 is the majority of the permissions used. Other cases can continue
to pass None to indicate no particular permissions are desired.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 6 Jun 2023 17:16:29 +0000 (13:16 -0400)]
cephadm: convert SNMPGateway create_daemon_conf to use write_new
While it is not entirely clear why this pattern of using os.open and
posix open flags instead of `open` directly was used I determined (using
strace) that the only major difference between these open flags and
those used by `open` was the lack of O_TRUNC. Unlike some other cases
this function does not use an intermediate temporary file. This means
that if the file being written already exists and the data being written
is smaller then the remaining data will not be over-written.
I looked over the context that this function is used in and decided that
this behavior must not be intentional. Thus it should be safe
to convert this function to `write_new`.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 6 Jun 2023 16:37:14 +0000 (12:37 -0400)]
cephadm: convert more temporary file writes to use write_new
Some functions are using the pattern:
```
with open(os.open(name + '.new, os.O_CREAT | os.O_WRONLY, 0o600), 'w') as f:
f.write(...)
os.rename(name + '.new', name)
```
While it is not entirely clear why this pattern was first used,
it accomplishes the same goal as `write_new` only directly calling
the posix open call. I analyzed the open flags for `write_new` and
these calls using `strace` and noted that the only significant
difference was the lack of O_TRUNC in these cases. Since the ".new"
files should not exist the lack of O_TRUC ought not make any difference.
With this decided we can convert these instances to `write_new`.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 6 Jun 2023 16:25:34 +0000 (12:25 -0400)]
cephadm: convert _write_custom_conf_files to use write_new
We double checked the meaning of "w+" and it will open the file
read-write. Since the file is never read there's no real reason
to keep it that way so its OK to convert to `write_new`.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 6 Jun 2023 00:12:59 +0000 (20:12 -0400)]
cephadm: convert some functions to use write_new
Convert a lot of the basic uses of the pattern:
with open(...) as f:
f.write(...)
os.fchown(f, ...) # sometimes
os.fchmod(f, ...) # sometimes
os.rename(...) # sometimes
These are the most obvious cases to convert to `write_new`
and should largely be uncontroversial.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 6 Jun 2023 00:08:49 +0000 (20:08 -0400)]
cephadm: create functional mock for fchown
The pyfakefs library apparently doesn't have its own mock for os.fchown.
This means that code using fchown currently calls into a mock with
no affect on the fake fs. For some reason I don't fully understand,
existing test cases work because they don't always follow the pattern
of open-write-rename. Switching to `write_new`, which always does a
rename, breaks some of the assertions performed in the tests on the fake
fs. Add a mock fchown that updates the state of the fake fs so
that converting call sites to use `write_new` will continue to work.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 6 Jun 2023 00:12:10 +0000 (20:12 -0400)]
cephadm: add write_new function for robust file writes
The cephadm code has a very common pattern made of at least one of
the three following steps:
* call fchown on the open file to set ownership
* call fchmod on the open file to set permissions
* rename the file from a temp name to final name
Add the write_new function to encapsulate these common actions.
If owner is not None then fchown will be called.
If perms is not None then fchmod will be called.
An optional encoding value may be passed.
It always uses a temporary file as a temporary file ensures that
there can never be a partially written file even in the event of
a power outage or system crash.
Encapsulating this all into a function also allows us to make
changes to this approach in the future without touching every
call site using `open(..., "w")` etc.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Zac Dover [Mon, 5 Jun 2023 02:13:28 +0000 (12:13 +1000)]
doc/rados: edit pools.rst (2 of x)
Edit doc/operations/rados/pools.rst.
There remains confusion in this part of the document regarding pg_num
and pgp_num. pg_num and pgp_num are not explained with sufficient
clarity. A future commit will clear up this confusion. There is also
some potential confusion between on the one hand the strings "pg-num"
and "pgp-num" and on the other hand "pg_num" and "pgp_num". The strings
with the hyphens are used in dummy commands, and the strings with the
underscores are used as key names. I think it possible that this could
confuse a reader, but I am open to discussion on the matter.
https://tracker.ceph.com/issues/58485
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Mark Nelson [Wed, 27 Apr 2022 15:06:22 +0000 (15:06 +0000)]
[CHERRY-PICKED FOR TESTING ONLY] crimson: Enable tcmalloc when using seastar
classic-osds have always caused significant memory fragmentation
when using the libc memory allocator due to the way that Ceph
tends to utilize memory. In recent testing, crimson-osd was found
to use 25-27GB of RAM with the stock 3GB bluestore cache settings
(osd_memory_target is only used when tcmalloc is available). Upon
further testing, it was found that the classic OSD is even worse,
using between 32-33GB of RAM after a 5 minute 4K sequential
write test when using libc malloc.
The good news is that it appears that crimson-osd is able to use
tcmalloc for alienstore without significant modification. Better
still, it drastically reduces memory usage. In the same test that
resulted in 25GB RSS memory usage for crimson-osd with libc malloc,
a tcmalloc linked version took around 9GB (with an 8GB
osd_memory_target). Since we do not yet (afaik) expose classic OSD
debugging in crimson it is tough to tell why we are still a little
over, but it's clear that for alienstore we are going to need to
use tcmalloc as we do in classic.
John Mulligan [Mon, 19 Sep 2022 17:46:48 +0000 (13:46 -0400)]
doc: update the cephadm download instructions
Starting with reef, cephadm is a compiled (zipapp) python application.
The cephadm script has been renamed and thus the old curl-based
download instructions will no loner work. While cephadm still has
no dependencies outside the Python stdlib, this will be changed in
future versions so it is no longer appropriate to just download the
source file of cephadm and run it either.
This change updates the `Install cephadm` section of the doc to explain
how to acquire a "compiled" version of cephadm as well as:
* moving and tweaking the note that the two installation methods are
distinct
* adding a new note linking to instructions on building cephadm
* moving the distribution-specific installations before the curl-based
installation to subtly hint that we prefer you to get it using
packages if you can
* Noting cephadm's minimal required python verision and how to run it
with a particular python version.
Note from Zac Dover, June 1, 2023: Note: This commit is a cherry-pick of d11cf0e, which was introduced by John Mulligan in #48180. This is one of
three commits introduced in that PR, and this cherry-pick cleans up
omissions I (Zac Dover) inadvertently introduced while attempting to
rectify the merge conflicts in #51843. This should be the final
main-branch-targeting commit that cleans up PR#51483.
John Mulligan [Mon, 22 May 2023 18:20:19 +0000 (14:20 -0400)]
doc: add instructions for compiling cephadm
Now that cephadm is based on zipapp, add a short section to the
developer docs explaining how to build cephadm yourself.
Note: This commit is a cherry-pick of 9ad38033cc5c7f177cb8fe3bae696682687e0346, which was introduced by John
Mulligan in #48180. This is one of three commits introduced in that PR,
and this cherry-pick cleans up omissions I (Zac Dover) inadvertently
introduced while attempting to rectify the merge conflicts in #51843. I
expect that one more cherry-picked commit (specifically, d11cf0e82aab8d4cef9d423e5d463a373eaf383a, which cannot be merged easily
until d7921e88d69b4bc355da9c0327cc33e59e7d7abb has been merged into
main, for reasons that are too
Rick-and-Morty-there-should-never-be-more-than-one-dot to go into here)
will follow this one.
John Mulligan [Wed, 24 May 2023 17:42:26 +0000 (13:42 -0400)]
doc: make instructions to get an updated cephadm common
As discussed in person and over the ceph orch weekly, we want all users
to use a recent supported version of cephadm. Previously, the
instructions only had those downloading cephadm with curl using the
"add-repo" and "install" commands to get a up-to-date cephadm build.
According to ADK we've seen cases of users get "old" distro packages
in the past. Change the instructions so that the "update cephadm" steps
are common after acquiring a "bootstrap copy" of cephadm.
Note: This commit is a cherry-pick of d7921e88d69b4bc355da9c0327cc33e59e7d7abb, which was introduced by John
Mulligan in https://github.com/ceph/ceph/pull/48180. This is one of
three commits introduced in that PR, and this cherry-pick cleans up
omissions I (Zac Dover) inadvertently introduced while attempting to
rectify the merge conflicts in https://github.com/ceph/ceph/pull/51843.
I expect that two more cherry-picked commits will follow this one.
Matan Breizman [Wed, 31 May 2023 11:06:16 +0000 (11:06 +0000)]
CMakeLists.txt: increase verbosity for selected allocator
Unless the allocator was set on command line, we will select one based on the following order:
```
"specify memory allocator to use. currently tcmalloc, tcmalloc_minimal, \
jemalloc, and libc is supported. if not specified, will try to find tcmalloc, \
and then jemalloc. If neither of then is found. use the one in libc.")
```
with this change, cmake will explicitly message the compiler selected,
otherwise we have no option to identify the one which is being used.
John Mulligan [Mon, 19 Sep 2022 17:46:48 +0000 (13:46 -0400)]
doc: update the cephadm download instructions
Starting with reef, cephadm is a compiled (zipapp) python application.
The cephadm script has been renamed and thus the old curl-based
download instructions will no loner work. While cephadm still has
no dependencies outside the Python stdlib, this will be changed in
future versions so it is no longer appropriate to just download the
source file of cephadm and run it either.
This change updates the `Install cephadm` section of the doc to explain
how to acquire a "compiled" version of cephadm as well as:
* moving and tweaking the note that the two installation methods are
distinct
* adding a new note linking to instructions on building cephadm
* moving the distribution-specific installations before the curl-based
installation to subtly hint that we prefer you to get it using
packages if you can
* Noting cephadm's minimal required python verision and how to run it
with a particular python version.
doc: make instructions to get an updated cephadm common
As discussed in person and over the ceph orch weekly, we want all users
to use a recent supported version of cephadm. Previously, the
instructions only had those downloading cephadm with curl using the
"add-repo" and "install" commands to get a up-to-date cephadm build.
According to ADK we've seen cases of users get "old" distro packages
in the past. Change the instructions so that the "update cephadm" steps
are common after acquiring a "bootstrap copy" of cephadm.
John Mulligan [Wed, 24 May 2023 17:42:26 +0000 (13:42 -0400)]
doc: make instructions to get an updated cephadm common
As discussed in person and over the ceph orch weekly, we want all users
to use a recent supported version of cephadm. Previously, the
instructions only had those downloading cephadm with curl using the
"add-repo" and "install" commands to get a up-to-date cephadm build.
According to ADK we've seen cases of users get "old" distro packages
in the past. Change the instructions so that the "update cephadm" steps
are common after acquiring a "bootstrap copy" of cephadm.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 19 Sep 2022 17:46:48 +0000 (13:46 -0400)]
doc: update the cephadm download instructions
Starting with reef, cephadm is a compiled (zipapp) python application.
The cephadm script has been renamed and thus the old curl-based
download instructions will no loner work. While cephadm still has
no dependencies outside the Python stdlib, this will be changed in
future versions so it is no longer appropriate to just download the
source file of cephadm and run it either.
This change updates the `Install cephadm` section of the doc to explain
how to acquire a "compiled" version of cephadm as well as:
* moving and tweaking the note that the two installation methods are
distinct
* adding a new note linking to instructions on building cephadm
* moving the distribution-specific installations before the curl-based
installation to subtly hint that we prefer you to get it using
packages if you can
* Noting cephadm's minimal required python verision and how to run it
with a particular python version.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
mgr/dashboard: add 'omit_usage' query param to dashboard api 'get rbd' endpoint
Allows RBD info to be retrieved without getting associated usage info. This
can be useful for large RBDs where the process of gathering such usage info
is sometimes very slow.
Ilya Dryomov [Mon, 29 May 2023 15:40:05 +0000 (17:40 +0200)]
Revert "test: adjust rbd test case guards to handle new defaults"
This reverts commit feb2fc02404775bc262677a2d0434faec0348c53 which
appears to have caused us to lose old format coverage in the Python
bindings tests (rbd_python_api_tests_old_format.yaml).
Unset RBD_FEATURES enviroment variable means "old format". This
shouldn't be mucked with in any way, see require_new_format() and
create_image() methods in particular.
Ilya Dryomov [Sat, 27 May 2023 10:28:40 +0000 (12:28 +0200)]
osd/OSDCap: allow rbd.metadata_list method under rbd-read-only profile
This was missed in commit acc447d5de7b ("osd/OSDCap: rbd profile
permits use of rbd.metadata_list cls method") which adjusted only
"profile rbd" OSD cap. Listing image metadata is an essential part
of opening the image and "profile rbd-read-only" OSD cap must allow
it too.
While at it, constrain the existing grant for rbd profile from "any
object in the pool" to just "rbd_info object in the global namespace of
the pool" as this is where pool-level image metadata actually lives.
Nitzan Mordechai [Wed, 24 May 2023 12:40:35 +0000 (12:40 +0000)]
test: futex fail if more notification sent after destroy
When testing with more then 1 completion, we may hit an issue
with semaphors been notify after destroy.
we should add wait for each completion and not destroy the sem.
before all notified.
During multipart listing, the mtime of the uploads were not being
loaded, resulting in the current time being returned. Fix this by
setting the correct mtime.
Fixes: https://tracker.ceph.com/issues/61251 Signed-off-by: Daniel Gryniewicz <dang@redhat.com>