luo rixin [Tue, 29 Dec 2020 06:39:21 +0000 (14:39 +0800)]
rgw/rgw_file: Fix the return value of read() and readlink()
Fixes: https://tracker.ceph.com/issues/49189 Signed-off-by: Dai zhiwei <daizhiwei3@huawei.com> Signed-off-by: luo rixin <luorixin@huawei.com>
(cherry picked from commit bfd83e8fa142873a0bdf09a4d1ad1b04127f5885)
Since inject_facts_as_vars is set to false in the ansible.cfg file then we
have to update the references to use ansible_facts[<thing>] instead of
ansible_<thing>.
We already install the dependency from ceph-ansible requirements.txt and to
avoid false positive (like after rebooting a node) we can retry failing test.
Without loading the ansible.cfg file from ceph-ansible project, we don't
have the pipelining enabled which can result in significant performance
improvement.
This removes the ANSIBLE_ACTION_PLUGINS, ANSIBLE_RETRY_FILES_ENABLED and
ANSIBLE_SSH_RETRIES environment variables as it is already included in the
ansible.cfg file.
ceph-volume/tests: update ansible ssh_args env var
The ansible ssh_args parameter is usually defined in the ansible.cfg file.
Currently this variable is overrided in tox to manage the vagrant ssh file
but we lost all default values.
rpm: drop use of $FIRST_ARG in ceph-immutable-object-cache
The use of $FIRST_ARG was probably required because the SUSE-specific
%service_* rpm macros were playing tricks on the shell positional parameters.
This is bad practice and error-prone, so let's assume that no macros should do
that anymore and hence it's safe to assume that positional parameters remain
unchanged after any rpm macro call.
Kefu Chai [Sat, 1 May 2021 15:30:18 +0000 (23:30 +0800)]
common/pick_address: define in_addr_t if it is not defined
neither mingw not not have in_addr_t defined, see
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/ns-winsock2-in_addr,
so define it if it is not defined.
Conflicts:
src/common/options/global.yaml.in: global.yaml.in was introduced
in master only, so, in this change src/common/options.cc is updated
instead.
because fmt is packaged in EPEL, while librados is packaged
in RHEL, so we cannot have fmt as a runtime dependency of librados.
to address this issue, we should compile librados either with static library
or with header-only library of fmt. but because the fedora packaging
guideline does no encourage us to package static libraries, and it would
be complicated to package both static and dynamic library for fmt.
the simpler solution would be to compile Ceph with the header-only
version of fmt.
in this change, we compile ceph with the header-only version of fmt
on RHEL to address the runtime dependency issue.
* an interface library named "fmt-header-only" is introduced. it brings
the support to the header only fmt library.
* fmt::fmt is renamed to fmt
* an option named "WITH_FMT_HEADER_ONLY" is introduced
* fmt::fmt is an alias of "fmt-header-only" if "WITH_FMT_HEADER_ONLY"
is "ON", and an alias of "fmt" otherwise.
because fmt is packaged in EPEL, while librados is packaged
in RHEL, so we cannot have fmt as a runtime dependency of librados.
to address this issue an option "WITH_FMT_HEADER_ONLY" is introduced, so
that we can enable it when building Ceph with the header version of fmt.
and the built packages won't have runtime dependency of fmt.
cephfs-mirror: record directory path cancel in DirRegistry
When removing a directory path from mirroring, cephfs-mirror records
this state in a thread-local storage. The replayer thread backs-off
in midst of mirroring the directory snapshots for this directory path.
However, the state (canceled state) is never cleared causing the thread
to incorrectly assume that other directory paths (which are picked up
by this thread) need backing-off, hence, marking these directory paths
as failed (to synchronize snapshots).
Fix is to store this state in the directory specific store which is
allocated when a thread picks up a directory path for synchronization.
cephfs-mirror: complete context when a mirror instance is not failed or blocklisted
Without this, the updater thread can start processing othere queued
contexts when a mirror instance is failed or blocklisted resulting
in unexpected behavior.
qa/standalone: fixing the timings when waiting for deep-scrub to start
initiate_and_fetch_state() initiates a scrub, then polls the published
PG state looking for 'scrubbing'. Calling flush_pg_stats() as part of
the polling process might cause the scrub and the following recovery to
be missed altogether.
Note: this polling mechanism is definitely not robust. Will be
redesigned in the future.
Ronen Friedman [Sat, 15 May 2021 19:14:38 +0000 (22:14 +0300)]
test: recovery_scrub: do not display 'repair' status on auto-repair deep-scrub
A new test: auto_repair_bluestore_tag.
Based on auto_repair_bluestore_basic. Sets auto-repair, starts a periodic
deep-scrub, then verifies that the PG state, while scrubbing, is 'scrubbing+deep'
and not 'scrubbing+deep+repair'.
Ronen Friedman [Mon, 10 May 2021 13:15:16 +0000 (16:15 +0300)]
osd/scrub: separate between PG state flags and internal scrubber operation
Modify the scrubber to rely on internal flags for 'should we repair' and
'is this a deep scrub', instead of using PG_STATE_REPAIR & PG_STATE_DEEP_SCRUB.
This enables us to implement the 'fix-as-you-go deepscrub' functionality
of 'osd_scrub_auto_repair', without displaying REPAIR status to the user.
Adam Kupczyk [Mon, 24 May 2021 12:27:05 +0000 (14:27 +0200)]
os/bluestore/bluefs: Add test that detects bluefs inconsistency
Add test that detects possible scenario that will cause BlueFS to have file
that contains data that has never been written. This is done by tricking
replay log to already accept file metadata (size, allocations), but actual data
stored in these allocations is not yet synced to disk.
Scenario:
1) write to file h1 on SLOW device
2) flush h1 (and trigger h1 mark to be added to bluefs replay log)
3) write to file h2
4) fsync h2 (forces replay log to be written)
The result is:
- bluefs log now has stable state of h1
- SLOW device is not yet flushed (no fdatasync())
Adam Kupczyk [Mon, 24 May 2021 12:49:51 +0000 (14:49 +0200)]
os/bluestore/bluefs: Remove possibility of bluefs replay log containing files without data
It had been possible to have a bluefs replay log to serialize file metadata (size, allocations),
but actual data stored in these allocations is not yet synced to disk.
This could happen if _flush_range(h1) allocated space for file h1 on device (like SLOW) that will not
be used when flushing future replay log. Such thing can happen when we have h2 that wrote to WAL and
out replay log is on DB. After fsync(h2) we write to replay log, wait for fdatasync on WAL and DB.
There is no waiting on SLOW, but h1 was dirty and has been serialized to replay log.
Solution is to delay notifying replay log that it has to include h1 after finishing fdatasync.
Fixes: https://tracker.ceph.com/issues/50965 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 03ac53f7d4c83e56f664ad371ffe3bc2d40e1837)
Kefu Chai [Thu, 10 Jun 2021 12:19:09 +0000 (20:19 +0800)]
tasks/ceph_manager: ignore EACCES when waiting for quorum
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
ceph-monstore-tool: use a large enough paxos/{first,last}_committed
so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.
when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.
so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.
Adam C. Emerson [Wed, 14 Jul 2021 15:02:21 +0000 (11:02 -0400)]
rgw: Robust notify invalidates on cache timeout
This avoids a potential race condition in which updates are delayed.
Fixes: https://tracker.ceph.com/issues/51674 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 76247990ff38049ee32dd47d31482b9648353673)
Conflicts:
src/rgw/services/svc_notify.cc
- Skip the renaming, since this is a backport and that's mostly a
matter of futureproofing.
Backport: https://tracker.ceph.com/issues/51679 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 7 Jul 2021 22:47:00 +0000 (18:47 -0400)]
rgw: distribute() takes RGWCacheNotifyInfo
So we don't have to parse the bufferlist back out to find what object
to throw out of the cache.
Fixes: https://tracker.ceph.com/issues/51674 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 7f952ad80114096322f202ba58279aaa4a002313)
Backport: https://tracker.ceph.com/issues/51679 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Tue, 13 Jul 2021 20:05:47 +0000 (16:05 -0400)]
rgw: Don't segfault on datalog trim
Synchronous (or yielded, basically other-than AioCompletion trim)
would try to dereference the past-the-end iterator if we were trimming
to a point in the most recent generation.
https://tracker.ceph.com/issues/51661 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 97305f03c16db1cfaceef04a74ee510bc1fc1e80)
https://tracker.ceph.com/issues/51675 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
pacific: qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
Cherry-pick notes:
- handle differences due to renaming of rgw::sal::RGWObject to rgw::sal::Object
- handle differences due to move of test_ps_s3_metadata_on_master test from tests_ps.py to test_bn.py
Moving the attrs into s->bucket_attrs before setting them results in
setting empty attrs into the bucket. This means that reading them back
later gets empty attrs, which can cause a segfault.
mgr/dashboard: remove usage of 'rgw_frontend_ssl_key'
Fixes: https://tracker.ceph.com/issues/51643 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Removing the usage of rgw_frontend_ssl_key from the rgw service form.
monitoring: remove instance label from ceph-cluster.json completely
The `instance` label is only useful if
- the exporter returns only data about its node or instance
- the exporter provides an instance label and then may return data about
other nodes
In this case, it's about the Prometheus mgr module, which is a single
exporter providing data about a whole cluster, so not only data related
to the node (or instance) the mgr module is running on. It is
completely irrelevant on which node the exporter runs on, the data
provided doesn't change. The exporter also doesn't provide `instance`
labels (which Prometheus wouldn't change due to our configuration, see
"honor_labels" setting).
(Actually there's one exception where `instance` labels are provided by
the Ceph mgr module, but that doesn't affect the Ceph Cluster
dashboard.)
Note that keeping that instance label on this particular dashboard would
enable the user to switch between a previously failed mgr instance and
the data collected from there and the currently running mgr instance (on
which the Prometheus mgr module runs on). That'd split the data, which
I don't think is a useful feature, but rather looks broken.
Fixes: https://tracker.ceph.com/issues/51212 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit 037410713f032c0a2a25243e411ae67dffcc1d1a)
mgr/dashboard: Fix Grafana Ceph Cluster health status widget
The health status widget doesn't show any status because it requires its
query to return a single result. But in case a mgr instance had failed,
it would return more, provided the incident has happened in the
requested time frame.
This is simply an issue of the `instant` switch being disabled for that
widget. As only one mgr instance can ever be providing data at a time,
enabling `instant` completely solves that issue.
Fixes: https://tracker.ceph.com/issues/51212 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit 4270a13d6c2400162896ce5e4145729615a001e2)
mgr/dashboard: Remove hard-coded timezone off Grafana dashboards
Remove hard-coded timezone off Grafana dashboards to enable the Grafana
administrator to decide which timezone should be used for dashboards.
If we hard-coded those values, changing the global settings in Grafana
wouldn't have an effect. And the administrators can't change the
automatically imported Grafana dashboards provided by us.
Fixes: https://tracker.ceph.com/issues/51212 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit 5527c1c54f4cdc8d0ba6c86b259baa555bbd9def)