The calls to remove a bucket had parameters to specify a prefix and
delimiter, which does not make sense. This was precipitated due to some
existing Swift protocol logic, but buckets are removed irrespective of
prefix and delimiter. So the functions and calls are adjusted to
remove those parameters. Additionally, those same parameters were
removed for aborting incomplete multipart uploads.
Additionally a bug is fixed in which during bucket removal, multipart
uploads were only removed if the prefix was non-empty.
Conflicts:
src/rgw/rgw_sal_rados.cc
src/rgw/rgw_sal.h
src/rgw/rgw_sal_rados.h
- Alterations due to Zipper 7 code refactoring
src/rgw/rgw_sal_dbstore.cc
src/rgw/rgw_sal_dbstore.h
- Did not exist before Zipper 7 code refactoring
ceph-volume: fix bug with miscalculation of required db/wal slot size for VGs with multiple PVs
Previous logic for calculating db/wal slot sizes made the assumption that there would only be
a single PV backing each db/wal VG. This wasn't the case for OSDs deployed prior to v15.2.8,
since ceph-volume previously deployed multiple SSDs in the same VG. This fix removes the
assumption and does the correct calculation in either case.
Sage Weil [Fri, 5 Nov 2021 15:39:07 +0000 (11:39 -0400)]
mgr/cephadm: allow osd spec removal
OSD specs/drivegroups are essentially templates for OSD creation but do
not map to the full lifecycle of the OSDs that they create. When a spec
is removed, remove it immediately.
If no --force is provided, the error lists which OSDs will be left behind.
If --force is passed, the service is removed.
This leaves behind a few oddities:
- When you list services, OSDs that were created by the drivegroup may
still exist, causing the drivegroup to appear in the list as
unmanaged services.
- If you create a new drivegroup with the same name, the prior OSDs will
appear to belong to the new spec instance, regardless of whether the
spec/drivegroup parameters are the same.
AndrewSharapov [Fri, 29 Oct 2021 15:10:20 +0000 (18:10 +0300)]
mgr/cephadm: Fixed spawning ip addresses list for public network interface.
Eevery call of find_ip_on_host() actually duplicates the list of public ip
addresses in self.networks, while it should NOT change it. As the result
value of key mgr/cephadm/host.<hostname> in kv store becomes very large
and may cause crash of ceph mgr.
fix tox test: AttributeError: 'HostCache' object has no
attribute 'update_host_networks' which was introduced in 78983ad0d0cce422da32dc4876ac186f6d32c3f5 (not yet in pacific)
Venky Shankar [Tue, 10 Aug 2021 07:04:51 +0000 (03:04 -0400)]
tasks/cephfs_mirror: optionally run in foreground
cephfs mirror damon thrasher needs to send SIGTERM to mirror
daemons. The mirror daemon needs to run in foreground for
it to receive signal via `daemon.signal`.
Yin Congmin [Fri, 12 Nov 2021 08:54:31 +0000 (16:54 +0800)]
qa/suites/rbd/persistent-writeback-cache: add test case
Add the test case which size is 8GB, So that some problems that occur
only in test scenarios above 4GB may be found in this test. For example,
the variables of 32-bit may be unexpected value when it operates with
a 64 bit value.
Jianpeng Ma [Mon, 8 Nov 2021 06:33:28 +0000 (14:33 +0800)]
librbd/cache/pwl: fix reorder issue between func process_writeback_dirty_entries
In fact, we not only make sure ops in order in func process_writeback_dirty_entries,
but also make sure ops in order between func process_writeback_dirty_entries.
mds/FSMap: assign v16.2.4 compat to pre-v16.2.5 standby daemons
With v16.2.5, the monitors store an MDS's CompatSet with its mds_info_t
in the MDSMap. If an older MDS fails and rejoins the cluster, it gets
assigned the empty CompatSet. This is problematic during upgrades as an
MDS failure may prevent the upgrade process from continuing and cause
file system unavailability.
This patch makes it so the mons will assign a reasonable default: a
CompatSet used since v14.2.0 until v16.2.5.
Fixes: https://tracker.ceph.com/issues/53150 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 74e3f5ec5a49ce99b56c305624e9110fcb4b787c)
cmake/modules/Findpmem: always set pmem_VERSION_STRING
before this change, `pmem_VERSION_STRING` is not set if it is not able
to fulfill the specified version requirement. the intention was to check
if the version is able to satisfy the requirement. but actually, passing
an empty `pmem_VERSION_STRING` to `find_package_handle_standard_args()`
as the option of `VERSION_VAR` does not fail this check. on the
contrary, it prints
-- Found pmem: pmem_pmemobj_INCLUDE_DIR;pmem_pmem_INCLUDE_DIR (Required
is at least version "1.17")
if we requires pmem 1.17, while the found version is, for instance,
1.10.
if the required version is 1.7, and the found version is 1.10, the
output from cmake is:
-- Found pmem: pmem_pmemobj_INCLUDE_DIR;pmem_pmem_INCLUDE_DIR (found
suitable version "1.10", minimum required is "1.7")
in this change, the version spec is not specified when calling
`pkg_check_modules()`. so, `PKG_${component}_VERSION` is always set.
and we can always delegate the version checking to
`find_package_handle_standard_args()`. please note, we use the lower
version returned by pkg-config if multiple components are required and
both pkg-config settings return their versions.
ceph.spec.in: do not build with system pmdk by default
we need to use libpmem 1.10 in #40493.
without enabling the module stream offering libpmem 1.9.2, we can only
have access to libpmem 1.6.1. and fedora 33 only has libpmem 1.9
packaged. the same applies to openSUSE Tumbleweed and openSUSE Leap. so
let's stop using libpmem packaged by distro by default, until these
distros include libpmem 1.10.
Igor Fedotov [Thu, 15 Jul 2021 12:10:14 +0000 (15:10 +0300)]
os/bluestore: fix improper offset calculation when repairing.
While repairing misreferenced blobs BlueStore could improperly calculate
an offset within a blob being fixed. This could happen when single
physical extent has been replaced by multiple ones - the following
pextent (if any in the current blob) would be treated with the improper offset within the blob. Offset calculation didn't account for each of that new pextents but the last one only.
Fixes: https://tracker.ceph.com/issues/51682 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit ca4b6675fc3fd2f4cadad58044c97c5bb23d5938)
The marker was not working correctly as segments of the bucket index
were listed to shut down any incomplete multipart uploads. This fixes
the marker, so it's maintained properly across iterations.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Venky Shankar [Fri, 1 Oct 2021 08:55:40 +0000 (04:55 -0400)]
mds: skip journaling blocklisted clients when in `replay` state
When a standby MDS is transitioning to active, it passes through
`replay` state. When the MDS is in this state, there are no journal
segments available for recording journal updates. If the MDS receives
an OSDMap update in this state, journaling blocklisted clients causes
a crash since no journal segments are available. This is a bit hard
to reproduce as it requires correct timing of an OSDMap update along
with various other factors.
Note that, when the MDS reaches `reconnect` state, it will journal
the blocklisted clients anyway.
This partially fixes tracker: https://tracker.ceph.com/issues/51589
which mentions a similar crash but in `reconnect` state. However,
that crash was seen in nautilus.
A couple of minor changes include removing hardcoded function names
and carving out reusable parts into a separate function.
Adam C. Emerson [Tue, 2 Nov 2021 16:46:15 +0000 (12:46 -0400)]
rgw: Ensure buckets too old to decode a layout have layout logs
When decoding `RGWBucketInfo` data from before Pacific, we won't call
`rgw::BucketLayout::decode`, but will instead synthesize the layout
information. This leaves the `rgw::BucketLayout::logs` empty, as the
fallback to populate it only applies to old versions of
`rgw::BucketLayout`.
Add a check at the end of `RGWBUcketInfo::decode` to populate it if
empty.
Fixes: https://tracker.ceph.com/issues/53132 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3279509127e65314c07963a3e127e926308bd76a) Fixes: https://tracker.ceph.com/issues/53160 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
librbd/cache/pwl: cancel advance dispatch of external flush request
For external flush request, it new syncpoint after passing
guardedrequest and before dispatch. Then dispatch bypass deferred
queue But the last write request may still in the deferred queue.
It don't dispatch and not associated with any syncpoint. The
external flush request will bypass the previous write request in
deferred queue now. This does not conform to the semantics of
external flush requests. External flush request should strictly
follow the order of dispath.
But for internal flush request, it will be dispatched after all
write request which associated with previous syncpoint, persisted
in cache. C_gather guarantee it.
It is necessary to distinguish between external and internal
flush requests. Internal flush can and should be dispatched in
advance bypass deferred queue. At the same time, the order of
external requests needs to be kept unchanged. So cancel advance
dispatch of external flush request.
librbd/cache/pwl: fix assert in _aio_stop() during shutdown
For wait_for_ops(next_ctx). this next_ctx may run in aio_thread.
Then the next program runs on the aio thread. remove_pool_file()
calls bdev->close(), then calles _aio_stop(), exec aio_thread.join(),
cause assert. Thread can't join itself. Fix it by adding close ctx
to m_work_queue, so close() can run in work queue thread.
At the same time, correct the order of wait_for_ops().
flush_dirty_entries(next_ctx) may call wake_up() and start_op().
so moving wait_for_ops() behind flush_dirty_entries(next_ctx) is more
appropriate.
Yin Congmin [Fri, 27 Aug 2021 15:41:49 +0000 (15:41 +0000)]
librbd/cache/pwl/ssd: move finish_op() to the end of callback function
finish_op() of ssd cache is not in the end of callback function in
append_op_log_entries(), and after finish_op(), some operation also
need to get m_lock. So, during shutdown, wait_for_ops() thinks all OPs
are over, and no thread will acquire the m_lock, In the subsequent
operation of shutdown, the m_lock is obtained, and _aio_stop() in
bdev->close() waits for all aio_writes() and aio_submit() to end
when the m_lock is held, but the callback function of aio_write() is
waiting for the m_lock, causing a deadlock. Move finish_op() to the
end to fix dead lock.
Jianpeng Ma [Tue, 7 Sep 2021 02:19:55 +0000 (10:19 +0800)]
librbd/cache/pwl/ssd: Remove unused parameter.
Met the following compiler warning message:
>[38/80] Building CXX object
src/librbd/CMakeFiles/librbd_plugin_pwl_cache.dir/cache/pwl/ssd/WriteLog.cc.o
>../src/librbd/cache/pwl/ssd/WriteLog.cc:37:25: warning: unused variable
'ops_appended_together' [-Wunused-const-variable]
>const unsigned long int ops_appended_together = MAX_WRITES_PER_SYNC_POINT;
Jianpeng Ma [Tue, 7 Sep 2021 02:00:53 +0000 (10:00 +0800)]
librbd/cache/pwl/ssd: Fix a race between get_cache_bl() and remove_cache_bl()
In fact, although in get_cache_bl it use lock to protect, it can't
protect function "list& operator= (const list& other)".
So we should use copy_cache_bl.
Fixes: https://tracker.ceph.com/issues/52400 Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit fe72b3953735329441397f257d5dd18f6819187d)