WriteLogCacheEntry gets appended to persist_log_entries before
write_data_pos is updated with the actual media offset. Because
push_back() makes a copy, the updated write_data_pos value never
makes it to media, making recovery impossible.
Ilya Dryomov [Thu, 13 May 2021 11:11:57 +0000 (13:11 +0200)]
librbd/cache/pwl/ssd: actually use first_{valid,free}_entry on recovery
first_valid_entry and first_free_entry pointers are read from media
but not actually used: both m_first_valid_entry and m_first_free_entry
get assigned 0 (or garbage). next_log_pos gets the same value as well
meaning that not only no recovery is attempted but the cache also gets
corrupted because DATA_RING_BUFFER_OFFSET is not applied.
Ilya Dryomov [Sat, 8 May 2021 08:24:37 +0000 (10:24 +0200)]
librbd/cache/pwl/ssd: don't count log entries
In ssd mode log entries are variable size. Attempting to count and
impose watermarks on the number of log entries is bogus because the
total number of entries it would take to fill the cache to capacity
is also variable and can't be precisely estimated.
had conflicts, but no new changes Fixes: https://tracker.ceph.com/issues/50669 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ea65553b4a9ee1349c6da8452d861afe579e99e9)
All parameters are integers and none of them are (in-)out, so don't
take them by reference. Additionally num_lanes, num_log_entries and
num_unpublished_reserves don't need to be 64-bit as their respective
fields in AbstractWriteLog are 32-bit.
Ilya Dryomov [Wed, 12 May 2021 10:19:07 +0000 (12:19 +0200)]
librbd/cache/pwl: rename m_log_pool_config_size to m_log_pool_size
trivial fix: no new changes: https://www.diffchecker.com/9btXJhCC Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 829ef952d2e408fe3676b38e7ecd26cbb04571a5)
librbd/cache/pwl/ssd/WriteLog: don't crash on split log entries
write_log_entries() will split a log entry at the end of the log, the
remainder is written to the beginning at DATA_RING_BUFFER_OFFSET. On
the read side aio_read_data_block() doesn't handle this case and just
crashes. Unless the workload in use is <= 4K, the image is rendered
unusable sooner or later.
librbd/cache/pwl: use m_bytes_allocated_cap for both rwl and ssd
Follow rwl mode and use AbstractWriteLog::m_bytes_allocated_cap
instead of m_log_pool_ring_buffer_size specific to ssd. This fixes
"bytes available" calculation in STATS output.
librbd/cache/pwl/ssd/WriteLog: decrement m_bytes_allocated when retiring
Currently if ssd cache is filled to capacity, all future I/O hangs
indefinitely because even though the cache eventually becomes clean
and retires enough entries to get back under RETIRE_HIGH_WATER, this
isn't communicated to AbstractWriteLog::check_allocation().
Kefu Chai [Sat, 30 Oct 2021 03:18:17 +0000 (11:18 +0800)]
admin/doc-requirements.txt: pin Sphinx at 3.5.4
* pin Sphinx at 3.5.4
* pin docutils at 0.18
at least the combination of these two versions
is known to compile.
to address the bug reported at
https://sourceforge.net/p/docutils/bugs/431/
the backtrace looks like:
/home/jenkins-build/build/workspace/ceph-pr-docs/build-doc/virtualenv/lib/python3.8/site-packages/sphinx/util/docutils.py:285:
RemovedInSphinx30Warning: function based directive support is now
deprecated. Use class based directive instead.
warnings.warn('function based directive support is now deprecated. '
Exception occurred:
File
"/home/jenkins-build/build/workspace/ceph-pr-docs/build-doc/virtualenv/lib/python3.8/site-packages/docutils/writers/html5_polyglot/__init__.py",
line 445, in section_title_tags
if (ids and self.settings.section_self_link
AttributeError: 'Values' object has no attribute 'section_self_link'
Nathan Cutler [Wed, 20 Oct 2021 10:51:02 +0000 (12:51 +0200)]
rgw/tracing: unify SO version numbers within librgw2 package
The librgw2 package contains several SO files. Two of those - librgw_op_tp.so
and librgw_rados_tp.so - had a different version number than the main librgw.
This was a violation of the openSUSE Shared Library Packaging Policy [1] but it
also seems like a "violation" of common sense.
* APIVersion:
* Moved to a separate file
* Added doctests
* Added sentinel values:
* DEFAULT = 1.0
* EXPERIMENTAL = 0.1
* NONE = 0.0
* Added to_mime_type() helper method
* Controllers.__init__:
* Added type hints
* Replaced string versions with APIVersions
* Feedback controller:
* Replaced with EXPERIMENTAL (probably it should be NONE)
Fixes: https://tracker.ceph.com/issues/52480 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/controllers/__init__.py
- Remove the current changes and keep the incoming new changes
src/pybind/mgr/dashboard/controllers/crush_rule.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/controllers/docs.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/controllers/feedback.py
- Deleted the file since feedback module isn't backported to pacific
src/pybind/mgr/dashboard/controllers/host.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/openapi.yaml
- Generated a new openapi yaml file
src/pybind/mgr/dashboard/tests/__init__.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_docs.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_host.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_tools.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_versioning.py
- Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/controllers/crush_rule.py
- Removed the MethodMap decorator which updates the version of the
enpoint to 2.0 because those changes which caused that version
updating were not backported to pacific
Patrick Donnelly [Tue, 14 Sep 2021 17:02:12 +0000 (13:02 -0400)]
test/libcephfs: put inodes after lookup
Otherwise, the client umount will hang due to inability to trim the
inodes looked up using the low-level interface. This results in slow-op
warnings and an eviction:
2021-09-11T17:23:31.097+0000 7f99c3522700 0 log_channel(cluster) log [WRN] : evicting unresponsive client smithi176 (9756), after 303.924 seconds
2021-09-11T17:23:31.097+0000 7f99c3522700 10 mds.0.server autoclosing stale session client.9756 172.21.15.176:0/3891214934 last renewed caps 303.924s ago
mgr/dashboard: make modified API endpoints backward compatible
Fixes: https://tracker.ceph.com/issues/52480 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing APIVersion class to handle versioning for API-endpints and making
them backward compatible.
The test is failing on deleting a host because the agent daemon is
present in that host. Its not possible to simply delete a host. We need
to drain it first and then delete it.
where the numbers of scrubbed object, clones, dirty and omap are always
less than the total number of corresponding numbers, if the PG contains
object(s) whose hash happens to be 0xffffffff.
in this change, if the calculated hash of the upper bound is greater
than the maximum possible number represented by uint32_t, in addition to
setting the hash of the upper bound hobj to 0xffffffff, we also set the
nspace of hobj of the upper bound to "\xff", so that the upper bound
is greater than an hobj whose hash happens to be 0xfffffff. please note,
the nspace of "\xff" is not an ascii string, so it's not likely to be
less than a real-world nspace of an hobj.
with this new *greater* upper bound, we are able to include the previous
missing hobj when listing the objects in a PG. so the scrub won't be
annoyed when the number of objects does not match.
Mykola Golub [Mon, 30 Aug 2021 06:58:04 +0000 (07:58 +0100)]
osd: re-cache peer_bytes on every peering state activate
peer_bytes is used for backfill reservation request and may be
reset if backfill is interrupted, and we want it set back before
continuing backfill and re-sending the reservation request.
Kamoltat [Mon, 5 Oct 2020 09:38:35 +0000 (09:38 +0000)]
mgr/progress: optimize global recovery module
Instead of fetching `pg_stats` from the python
part of manager module, we filter out the pgs
that are in active + clean state in ActivePyModules.cc
then parse these pgs along with `reported_epoch` and
the `total_num_pgs` of the clusters to global recovery
module.
mgr/test_progress.py: Delay recover in test_progress
Changes some the tests in teuthology to make
the test more deterministic.
Using:
`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.
took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.
Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.
pybind/mgr/progress: introduce 5 second sleep interval
Current progress module only checks pg stats
and osdmap when it is notified by the cluster.
However, this is expensive in large cluster
with many pools and osds. we
change it to only check both pg stats and osdmap
every 5 seconds.
in the function _osd_in_out() we now calculate
`is_relocated` by: old_osds != new_osds such that
it does not matter if the difference between osds
are positive or negative.
Cherry-pick notes:
- src/test/rgw/bucket_notification/test_bn.py changes manually applied to src/test/rgw/rgw_multi/tests_ps.py for Pacific
- conflicts in rgw_op.cc due to rename of RGWObject to Object after Pacific