git.apps.os.sepia.ceph.com Git

mgr/dashboard: include mfa_ids in rgw user-details section

Fixes: https://tracker.ceph.com/issues/53193
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing mfa_ids in user details section.

(cherry picked from commit 6a2234cd6b68adaa0b855219c411eb95eb2ff9be)

Merge pull request #43251 from kotreshhr/wip-52633-pacific

pacific: mds: Add new flag to MClientSession

Reviewed-by: Kotresh HR <khiremat@redhat.com>

Merge pull request #43223 from kotreshhr/wip-52628-pacific

pacific: mgr/volumes: Fix permission during subvol creation with mode

Reviewed-by: Kotresh HR <khiremat@redhat.com>

Merge pull request #43862 from ivancich/wip-multipart-purge-fix-pacific

pacific: rgw: fix bucket purge incomplete multipart uploads

Reviewed-by: Adam Emerson <aemerson@redhat.com>

Merge pull request #43772 from ideepika/wip-ssd-cache-testing-backports-pacific

pacific: librbd/cache/pwl: persistant cache backports

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>

cmake/modules/Findpmem: always set pmem_VERSION_STRING

before this change, `pmem_VERSION_STRING` is not set if it is not able
to fulfill the specified version requirement. the intention was to check
if the version is able to satisfy the requirement. but actually, passing
an empty `pmem_VERSION_STRING` to `find_package_handle_standard_args()`
as the option of `VERSION_VAR` does not fail this check. on the
contrary, it prints

-- Found pmem: pmem_pmemobj_INCLUDE_DIR;pmem_pmem_INCLUDE_DIR (Required
is at least version "1.17")

if we requires pmem 1.17, while the found version is, for instance,
1.10.

if the required version is 1.7, and the found version is 1.10, the
output from cmake is:

-- Found pmem: pmem_pmemobj_INCLUDE_DIR;pmem_pmem_INCLUDE_DIR (found
suitable version "1.10", minimum required is "1.7")

in this change, the version spec is not specified when calling
`pkg_check_modules()`. so, `PKG_${component}_VERSION` is always set.
and we can always delegate the version checking to
`find_package_handle_standard_args()`. please note, we use the lower
version returned by pkg-config if multiple components are required and
both pkg-config settings return their versions.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit ad85af2ef8b76481adc0d1500033cd5094b8f21e)

Conflicts:
cmake/modules/Findpmem.cmake
<<<<<<< HEAD
=======
  pkg_check_modules(PKG_${component} QUIET "lib${component}")
  if(NOT pmem_VERSION_STRING OR PKG_${component}_VERSION VERSION_LESS pmem_VERSION_STRING)
    set(pmem_VERSION_STRING ${PKG_${component}_VERSION})
  endif()
  find_path(pmem_${component}_INCLUDE_DIR
    NAMES lib${component}.h
    HINTS ${PKG_${component}_INCLUDE_DIRS})
  find_library(pmem_${component}_LIBRARY
    NAMES ${component}
    HINTS ${PKG_${component}_LIBRARY_DIRS})
>>>>>>> ad85af2ef8b (cmake/modules/Findpmem: always set pmem_VERSION_STRING)

ceph.spec.in: do not build with system pmdk by default

we need to use libpmem 1.10 in #40493.

without enabling the module stream offering libpmem 1.9.2, we can only
have access to libpmem 1.6.1. and fedora 33 only has libpmem 1.9
packaged. the same applies to openSUSE Tumbleweed and openSUSE Leap. so
let's stop using libpmem packaged by distro by default, until these
distros include libpmem 1.10.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit ecb8d2cae2c063acf4e7e1bffed887d52117762f)

cmake: bump to PMDK v1.10

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 636ab08f2604efd4cac3200d5741fa15b070f072)

cmake: use src/pmdk for building pmdk if it exists

so we can build with pmdk enabled if the dist tarball
contains pmdk

Signed-off-by: Feng Hualong <hualong.feng@intel.com>
(cherry picked from commit 14c2a2e59fbdc716d35c08735d50bdadfab8300d)

Conflicts:
cmake/modules/Buildpmem.cmake
trivial fix, adopt what master has:
1. external_project args >> source_dir_args
2. INSTALL_COMMAND ""

make-dist: add pmdk to dist tarball

Signed-off-by: Feng Hualong <hualong.feng@intel.com>
(cherry picked from commit 9d958d0b9d33bfa0e27323597986e541eed23951)

rgw: fix bucket purge incomplete multipart uploads

The marker was not working correctly as segments of the bucket index
were listed to shut down any incomplete multipart uploads. This fixes
the marker, so it's maintained properly across iterations.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Merge pull request #43787 from tchaikov/pacific-pr-41353

pacific: pybind/mgr/CMakeLists.txt: exclude files not used at runtime

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

Merge pull request #43682 from rhcs-dashboard/wip-52806-pacific

pacific: mgr/dashboard: BATCH incl.: NFS integration, Cluster Expansion Workflow, and Angular 11 upgrade

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

Merge pull request #43703 from cfsnyder/wip-52783-pacific

pacific: rgw/sts: fix for copy object operation using sts

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43809 from cbodley/wip-qa-rgw-java-pacific

pacific: qa/rgw: pacific branch targets ceph-pacific branch of java_s3tests

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Ali Maredia <amaredia@redhat.com>

Merge pull request #43777 from cfsnyder/wip-52468-pacific

pacific: rgw: use existing s->bucket in s3 website retarget()

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43708 from cfsnyder/wip-52598-pacific

pacific: ceph-volume: util/prepare fix osd_id_available()

Merge pull request #43812 from rhcs-dashboard/wip-53153-pacific

pacific: mgr/dashboard: fix missing alert rule details

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

librbd/cache/pwl: cancel advance dispatch of external flush request

For external flush request, it new syncpoint after passing
guardedrequest and before dispatch. Then dispatch bypass deferred
queue But the last write request may still in the deferred queue.
It don't dispatch and not associated with any syncpoint. The
external flush request will bypass the previous write request in
deferred queue now. This does not conform to the semantics of
external flush requests. External flush request should strictly
follow the order of dispath.

But for internal flush request, it will be dispatched after all
write request which associated with previous syncpoint, persisted
in cache. C_gather guarantee it.

It is necessary to distinguish between external and internal
flush requests. Internal flush can and should be dispatched in
advance bypass deferred queue. At the same time, the order of
external requests needs to be kept unchanged. So cancel advance
dispatch of external flush request.

Fixes: https://tracker.ceph.com/issues/52599
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 9951868fc33f0469cd4135fadebb5a8d2b4b55dd)

librbd/cache/pwl: fix assert in _aio_stop() during shutdown

For wait_for_ops(next_ctx). this next_ctx may run in aio_thread.
Then the next program runs on the aio thread. remove_pool_file()
calls bdev->close(), then calles _aio_stop(), exec aio_thread.join(),
cause assert. Thread can't join itself. Fix it by adding close ctx
to m_work_queue, so close() can run in work queue thread.

At the same time, correct the order of wait_for_ops().
flush_dirty_entries(next_ctx) may call wake_up() and start_op().
so moving wait_for_ops() behind flush_dirty_entries(next_ctx) is more
appropriate.

Fixes: https://tracker.ceph.com/issues/52566
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 94f9873718a82d2def8f268c1581fbf97fee0f49)

librbd/cache/pwl/ssd: move finish_op() to the end of callback function

finish_op() of ssd cache is not in the end of callback function in
append_op_log_entries(), and after finish_op(), some operation also
need to get m_lock. So, during shutdown, wait_for_ops() thinks all OPs
are over, and no thread will acquire the m_lock, In the subsequent
operation of shutdown, the m_lock is obtained, and _aio_stop() in
bdev->close() waits for all aio_writes() and aio_submit() to end
when the m_lock is held, but the callback function of aio_write() is
waiting for the m_lock, causing a deadlock. Move finish_op() to the
end to fix dead lock.

Fixes: https://tracker.ceph.com/issues/52235
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit c531768838e44ed8eb28e8b27594d7e03ca3ffcf)

librbd/cache/pwl: Check the cache is clean

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 066b8a6d2ee091839b9b21ac89b8dfcebf8825cd)

librbd/cache/pwl/ssd: Remove unused parameter.

Met the following compiler warning message:
>[38/80] Building CXX object
src/librbd/CMakeFiles/librbd_plugin_pwl_cache.dir/cache/pwl/ssd/WriteLog.cc.o
>../src/librbd/cache/pwl/ssd/WriteLog.cc:37:25: warning: unused variable
'ops_appended_together' [-Wunused-const-variable]
>const unsigned long int ops_appended_together = MAX_WRITES_PER_SYNC_POINT;

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 40dad4c30c3b46ed3ac06961ebe740f6d82a6bc1)

librbd/cache/pwl/ssd: Fix a race between get_cache_bl() and remove_cache_bl()

In fact, although in get_cache_bl it use lock to protect, it can't
protect function "list& operator= (const list& other)".
So we should use copy_cache_bl.

Fixes: https://tracker.ceph.com/issues/52400
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit fe72b3953735329441397f257d5dd18f6819187d)

librbd/cache/pwl/ssd: flushed_sync_gen capture is unused

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f8fb7609b30bf3e60a64e4282250b121151d1559)

librbd/cache/pwl/ssd: ensure first_{valid,free}_entry aren't bogus

Ensure first_{valid,free}_entry are inside the expected range when
scheduling root updates and decoding the root on recovery.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b46f81fbb22e67548e9825756da625d8e9017d10)

librbd/cache/pwl/ssd: avoid corrupting m_first_free_entry

In append_op_log_entries(), new_first_free_entry is read after
append_ops() returns.  This can result in accessing freed memory
because all I/Os may complete and append_ctx callback may run
by the time new_first_free_entry is read.  Garbage value gets
written to m_first_free_entry and depending on the circumstances
it may allow AbstractWriteLog code to accept more dirty user data
than we have space for.  Luckily we usually crash before then.

Fixes: https://tracker.ceph.com/issues/50832
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d83a0f6db8ff26eeb2c817b1bd192fb357f715df)

librbd/cache/pwl/ssd: avoid corrupting first_free_entry

In append_ops(), new_first_free_entry is assigned to after aio_submit()
is called.  This can result in accessing uninitialized or freed memory
because all I/Os may complete and append_ctx callback may run before the
assignment is executed.  Garbage value gets written to first_free_entry
and we eventually crash, most likely in bufferlist manipulation code.

But worse, the corrupted first_free_entry makes it to media almost all
the time.   The result is a corrupted cache -- dirty user data is lost.

Fixes: https://tracker.ceph.com/issues/50832
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ef381d993ce29c5d0d774a6af27c3af861392ca1)

librbd/cache/pwl/ssd: remove correct m_blocks_to_log_entries entry

When retiring, m_blocks_to_log_entries doesn't remove
corresponding write_entry (should be `*it` not `entry`)
that will be retired. It leads to read error. And
there should also consider discard entries.

Fixes: https://tracker.ceph.com/issues/52579
Signed-off-by: Feng Hualong <hualong.feng@intel.com>
(cherry picked from commit 01bb75a1056d26ae43832d567087b3d67ab84261)

librbd/cache/pwl/ssd: Remove unused parameter.

Met the following compiler warning message:
>[38/80] Building CXX object
src/librbd/CMakeFiles/librbd_plugin_pwl_cache.dir/cache/pwl/ssd/WriteLog.cc.o
>../src/librbd/cache/pwl/ssd/WriteLog.cc:37:25: warning: unused variable
'ops_appended_together' [-Wunused-const-variable]
>const unsigned long int ops_appended_together = MAX_WRITES_PER_SYNC_POINT;

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 40dad4c30c3b46ed3ac06961ebe740f6d82a6bc1)

librbd/cache/pwl/ssd: Remove useless locks.

Return a reference don't need by lock protect.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit ca9eb28b4f249725e3ecae328fbf8076c648819a)

librbd/cache/pwl/ssd: flushed_sync_gen capture is unused

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f8fb7609b30bf3e60a64e4282250b121151d1559)

librbd/cache/pwl/ssd: ensure first_{valid,free}_entry aren't bogus

Ensure first_{valid,free}_entry are inside the expected range when
scheduling root updates and decoding the root on recovery.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b46f81fbb22e67548e9825756da625d8e9017d10)

librbd/cache/pwl/ssd: avoid corrupting m_first_free_entry

In append_op_log_entries(), new_first_free_entry is read after
append_ops() returns.  This can result in accessing freed memory
because all I/Os may complete and append_ctx callback may run
by the time new_first_free_entry is read.  Garbage value gets
written to m_first_free_entry and depending on the circumstances
it may allow AbstractWriteLog code to accept more dirty user data
than we have space for.  Luckily we usually crash before then.

Fixes: https://tracker.ceph.com/issues/50832
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d83a0f6db8ff26eeb2c817b1bd192fb357f715df)

librbd/cache/pwl: don't clear next_sync_point_entry prematurely

In SyncPointLogOperation::clear_earlier_sync_point(),
sync_point->log_entry->next_sync_point_entry was prematurely set to
nullptr in clear_earlier_sync_point(). It is in write op stage, but
next_sync_point_entry is used in writeback stage in
handle_flushed_sync_point().

handle_flushed_sync_point() may pass a nullptr
cause assert in m_work_queue.The solution is to move the statement
that set next_sync_point_entry to nullptr after it is used.

Fixes: https://tracker.ceph.com/issues/52465
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit a1a20041d57c9a90bc0a60d86469445ba8efb5ea)

librbd/cache/pwl: Fix pmem cache fragment issue

I/O may hang due to pmem cache fragment issue when blocks are diffrent
in size. Call pmdk API(pmemobj_defrag) to solve.

Fixes: https://tracker.ceph.com/issues/49879
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit b53392a15380b57d6111cb2926083393627f1ed7)

librbd/cache/pwl/ssd: ensure bdev and pool sizes match

m_log_pool_size should be multiple of 1M but, just in case, prevent
is_valid_io() assert in KernelDevice::aio_write().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 41b95ac987954dc2c29090926236660b8348e9d2)

librbd/cache/pwl: make pool size a multiple of 1M

In ssd mode, we need it to be a multiple of bdev block size.
Instead of munging it after opening the bdev in ssd/WriteLog.cc, let's
impose a common restriction and round rbd_persistent_cache_size down to
a 1M boundary.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1fc722bc218acfe8a52ad7be2c06347bddb42bbe)

librbd/cache/pwl/ssd: ensure bdev and pool sizes match

m_log_pool_size should be multiple of 1M but, just in case, prevent
is_valid_io() assert in KernelDevice::aio_write().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 41b95ac987954dc2c29090926236660b8348e9d2)

librbd/cache/pwl: make pool size a multiple of 1M

In ssd mode, we need it to be a multiple of bdev block size.
Instead of munging it after opening the bdev in ssd/WriteLog.cc, let's
impose a common restriction and round rbd_persistent_cache_size down to
a 1M boundary.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1fc722bc218acfe8a52ad7be2c06347bddb42bbe)

librbd/cache/pwl: supplement error handle

add return after error handling and clean bdev before return

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 2cd9881116a47794051615fcf0a95a5e879ae7e6)

librdb/cache/pwl/ssd: fix m_bytes_allocated exceeding m_bytes_allocated_cap

SSD cache mode uses m_bytes_allocated and m_bytes_allocated_cap to control
whether the allocation is successful. m_bytes_allocated may exceeding
m_bytes_allocated_cap due to the multi-thread under no lock. When adding
to m_bytes_allocated, check again to solve this issue.

On the other hand, m_bytes_allocated_cap should be less than the actual
ring buffer size. keep m_first_free_entry == m_first_valid_entry
only means the cache is empty.

Fixes: https://tracker.ceph.com/issues/50670
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit eee12082ff77afd7a69c6261acf5c1f59f1f9fed)

librbd/cache/pwl: initialize number_log_entries

Using uninitialized number_log_entries cause writesame req space
calculation error. sometimes fail in TestMockCacheSSDWriteLog.writesame.

Fixes: https://tracker.ceph.com/issues/52852
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit dd33684733014a4dedf9551d75efe591d330d864)

librbd/cache/pwl: supplement error handle

add return after error handling and clean bdev before return

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 2cd9881116a47794051615fcf0a95a5e879ae7e6)

librbd/cache/pwl: fix reorder when flush cache-data to osd.

Consider the following workload:
writeA(0, 4096)
writeB(0, 512).
pwl can makre sure writeA persist to cache before writeB.
But when flush to osd, it use async-read to read data from cache and in
the callback function they issue write to osd.
So although we by order issue aio-read(4096), aio-read(512). But we
can't make sure the return order.
If aio-read(512) firstly return, the write order to next layer is
writeB(0, 512)
writeA(0, 4096).
This is wrong from the user point.

To avoid this occur, we should firstly read all data from cache. And
then send write by order.

Fiexs: https://tracker.ceph.com/issues/52511

Tested-by: Feng Hualong <hualong.feng@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 1fc3be248097eb6087560c22374c1f924bfe0735)

librbd/cache/pwl: Fix dead lock issue when pwl initialization failed

when pwl initialization failed, 'AbstractWriteLog' will release itself
in callback, it hold guard lock and want to get lock to delete data,
which causes dead lock. This PR works by release image_cache outside
the callback function.

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 937af36e204791554708632245b4bca1d52f49a6)

librbd/cache/pwl: solve the problem of calc m_bytes_allocated when reload entries.

Currently, it will load existing entries after restart and cacl
m_bytes_allocated based on those entries. But currently there are
the following problems：
1: The allocated of write-same is not calculated for rwl & ssd cache.
2: for ssd cache, it not include the size of log-entry itself and don't
consider data alignment. This will cause less calculation and more
allocatation later. And will overwrite the data which don't flush to osd
and make data lost.

The calculation methods of ssd and rwl are different. So add new api
allocated_and_cached_data() to implement their own method.

For SSD cache, we dirtly use m_first_valid_entry & m_first_free_entry to
calc m_bytes_allocated.

trivial fix: new code from PR, nothing unrelated: https://www.diffchecker.com/S1eXatpM
Fixes:https://tracker.ceph.com/issues/52341
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit a96ca93d69d5c1f302f3141082302d4699915397)

librbd/cache/pwl/ssd: fix first_valid_entry calculation in retire_entries()

Consider one control_block which cotain multi encode(WriteLogCacheEntry):
Log1: WriteLogEntry
Log2: WriteLogEntry
Log3: Non-WriteLogEntry
For this case, currently calc method is: control_block_pos + sizeof(control_block).
But in fact, it should: control_block_pos + sizeof(control_block) +
data_length(Log1 + Log2).

Wrong first_valid_entry will persist to superblock and restart to read.
This cause read wrong position and when decode(WriteLogCacheEntry) it
will report bug.

Fixes: https://tracker.ceph.com/issues/52323
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 2d337fb122d147e32d027d1e7211cd4156a5b72b)

librbd/cache/pwl/ssd: solve competition between read and retire

SSD read is not like rwl's. SSD need aio read. Therefore,
we cannot guarantee that the data will not be retire
during the period from sending the read request to the SSD
and receiving the data to the memory, which may cause
the corresponding data on the SSD to be overwritten.

Fixes: https://tracker.ceph.com/issues/52236
Signed-off-by: Feng Hualong <hualong.feng@intel.com>
(cherry picked from commit dfdb452aa996af40f7b0ec684670ccf9a9b2d4c1)

librbd/cache/pwl/rwl: fix buf_persist and add writeback_lat perf counters

initialize buf_persist_time, then change name buf_persist_time to
buf_persist_start_time, change flush to internal_flush. add
writeback_lat perf conters. update some print formats for perf.

Fixes: https://tracker.ceph.com/issues/52090
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 8ed3080078ad5eaa51145a9481c7c2223ad38765)

librbd/cache/pwl: avoid stack overflow caused by nested shared_ptr destruction

Destruction of nested shared_ptr will cause stack overflow.
With the explicit assignment of nullptr, the deleted node
is completely disconnected from the current linked list

-------              *******               -------
|sync | <--earlier-- |sync | <--earlier-x- |sync |
|point| --later----> |point| --later----x> |point|
-------              *******               -------
   |                    |                     |
   V                    V                     V
-------              -------               -------
|log_ | ---next----> |log_ | ---next----x> |log_ |
|entry|              |entry|               |entry|
-------              -------               -------

earlier: earlier_sync_point
later:   later_sync_point
next:    next_sync_point_entry

Fixes: https://tracker.ceph.com/issues/51418
Signed-off-by: Feng Hualong <hualong.feng@intel.com>
(cherry picked from commit e706b9db5c5d79366c5167d01ad46e13f8500936)

librbd/cache/pwl/ssd: fix use-after-free on C_BlockIORequest

In setup_schedule_append() function, its first expression
will cause the req to be deleted, and subsequent use of
the variable req becomes an illegal operation. And due to
delete, rep->m_image_ctx will be empty, so it lead to
segfault in AbstractWriteLog::get_context().
So pass the `req` into `schedule_append()` function.

Fixes: https://tracker.ceph.com/issues/50951
Signed-off-by: Hualong Feng <hualong.feng@intel.com>
(cherry picked from commit 2dc3b8881290f1e12c536a232bea37547a73a45b)

librbd/cache/pwl/ssd: flushed_sync_gen capture is unused

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f8fb7609b30bf3e60a64e4282250b121151d1559)

librbd/cache/pwl/ssd: ensure first_{valid,free}_entry aren't bogus

Ensure first_{valid,free}_entry are inside the expected range when
scheduling root updates and decoding the root on recovery.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b46f81fbb22e67548e9825756da625d8e9017d10)

librbd/cache/pwl/ssd: avoid corrupting m_first_free_entry

In append_op_log_entries(), new_first_free_entry is read after
append_ops() returns.  This can result in accessing freed memory
because all I/Os may complete and append_ctx callback may run
by the time new_first_free_entry is read.  Garbage value gets
written to m_first_free_entry and depending on the circumstances
it may allow AbstractWriteLog code to accept more dirty user data
than we have space for.  Luckily we usually crash before then.

Fixes: https://tracker.ceph.com/issues/50832
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d83a0f6db8ff26eeb2c817b1bd192fb357f715df)

librbd/cache/pwl/ssd: ensure bdev and pool sizes match

m_log_pool_size should be multiple of 1M but, just in case, prevent
is_valid_io() assert in KernelDevice::aio_write().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 41b95ac987954dc2c29090926236660b8348e9d2)

librbd/cache/pwl: make pool size a multiple of 1M

In ssd mode, we need it to be a multiple of bdev block size.
Instead of munging it after opening the bdev in ssd/WriteLog.cc, let's
impose a common restriction and round rbd_persistent_cache_size down to
a 1M boundary.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1fc722bc218acfe8a52ad7be2c06347bddb42bbe)

librbd/cache/pwl: supplement error handle

add return after error handling and clean bdev before return

trivial fix:
<<<<<<< head
      on_finish->complete(-1);
      return;
=======
      on_finish->complete(r);
      return false;
>>>>>>> 2cd9881116a (librbd/cache/pwl: supplement error handle)

picked second patch
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 2cd9881116a47794051615fcf0a95a5e879ae7e6)

librbd/cache/pwl/ssd: stronger assert in aio_read_data_blocks()

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ab05cc4a8f4709db649ad51d32141e429d9dc2b3)

librbd/cache/pwl/ssd: rename aio_read_data_block() overload

Rename the overload that deals with multiple data blocks to
aio_read_data_blocks().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 00363a0f7c20a7f002b37eac93d88791e5bc2568)

librbd/cache/pwl/ssd: persist correct write_data_pos

WriteLogCacheEntry gets appended to persist_log_entries before
write_data_pos is updated with the actual media offset. Because
push_back() makes a copy, the updated write_data_pos value never
makes it to media, making recovery impossible.

Fixes: https://tracker.ceph.com/issues/50669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fe757401ada7bfd6784b6f9ca5556e1459df7a69)

librbd/cache/pwl/ssd: set m_bytes_allocated_cap on recovery

Currently it's set only when a new cache is formatted.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit cb9b3afd87a28e96d3688e1c73900d8043fac6cc)

librbd/cache/pwl/ssd: actually use first_{valid,free}_entry on recovery

first_valid_entry and first_free_entry pointers are read from media
but not actually used: both m_first_valid_entry and m_first_free_entry
get assigned 0 (or garbage). next_log_pos gets the same value as well
meaning that not only no recovery is attempted but the cache also gets
corrupted because DATA_RING_BUFFER_OFFSET is not applied.

Fixes: https://tracker.ceph.com/issues/50669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ef020b85fb16c1730fc08eadfd1b51d3c4cd019a)

librbd/cache/pwl/ssd: don't count log entries

In ssd mode log entries are variable size. Attempting to count and
impose watermarks on the number of log entries is bogus because the
total number of entries it would take to fill the cache to capacity
is also variable and can't be precisely estimated.

had conflicts, but no new changes
Fixes: https://tracker.ceph.com/issues/50669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ea65553b4a9ee1349c6da8452d861afe579e99e9)

librbd/cache/pwl: fix AbstractWriteLog::check_allocation() signature

All parameters are integers and none of them are (in-)out, so don't
take them by reference. Additionally num_lanes, num_log_entries and
num_unpublished_reserves don't need to be 64-bit as their respective
fields in AbstractWriteLog are 32-bit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 74ecc4b76a10c53be928807b5be077f080d34724)

librbd/cache/pwl: rename m_log_pool_config_size to m_log_pool_size

trivial fix: no new changes: https://www.diffchecker.com/9btXJhCC
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 829ef952d2e408fe3676b38e7ecd26cbb04571a5)

librbd/cache/pwl: get rid of AbstractWriteLog::m_log_pool_actual_size

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 820bbecfb130ad99483f0d468f1b1c9612e54935)

librbd/cache/pwl/ssd: get rid of WriteLog::pool_size

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6cd36592f631051d2ae49b7d92f2a30cd12c9c41)

mgr/dashboard: fix missing alert rule details

Fixes: https://tracker.ceph.com/issues/53144
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit b47f9c83d87057bbb1a0c6052450088532a31f81)

Merge pull request #43793 from ifed01/wip-ifed-fix-omap-upgrade-pac

pacific: os/bluestore: fix invalid omap name conversion when upgrading to per-pg

Reviewed-by: Neha Ojha <nojha@redhat.com>

mgr/dashboard: NFS 'create export' form: fixes

* Do not allow a pseudo that is already in use by another export.
* Create mode form: prefill dropdown selectors if options > 0.
* Edit mode form: do not reset the field values that depend on other values that are being edited (unlike Create mode).
* Fix broken link: cluster service.
* Fix error message style for non-existent cephfs path.
* nfs-service.ts: lsDir: thow error if volume is not provided.
* File renaming: nfsganesha.py => nfs.py; test_ganesha.py => test_nfs.py

Fixes: https://tracker.ceph.com/issues/53083
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit d817a24e345516229bc637e2c675d12e6bfcc456)

Conflicts:
src/pybind/mgr/dashboard/tests/test_ganesha.py
- Delete file: in fact it's renamed as test_nfs.py

mgr/dashboard: python unit tests refactoring

* Controller tests: cherrypy config: authentication disabled by default; ability to pass custom config (e.g. enable authentication).
* Auth controller: add tests; test that unauthorized request fails when authentication is enabled.
* DocsTest: clear ENDPOINT_MAP so the test_gen_tags test becomes deterministic.
* pylint: disable=no-name-in-module: fix imports in tests.

Fixes: https://tracker.ceph.com/issues/53083
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit a6aeded5141ec3a959a86aa4002ccf5ac8d0a523)

Conflicts:
src/pybind/mgr/dashboard/tests/test_docs.py
- Resolved conflicts.
src/pybind/mgr/dashboard/tests/test_iscsi.py
- Resolved conflicts.

qa/rgw: pacific branch targets ceph-pacific branch of java_s3tests

this commit is applied directly to the pacific branch instead of a
backport from master, because it targets the ceph-pacific branch instead
of the ceph-master branch on master

Signed-off-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43631 from rhcs-dashboard/wip-52803-pacific

pacific: mgr/dashboard,prometheus: fix handling of server_addr

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #43544 from trociny/wip-52936-pacific

pacific: osd: handle inconsistent hash info during backfill and deep scrub gracefully

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43457 from gregsfortytwo/wip-52868-pacific

pacific: mon: Allow specifying new tiebreaker monitors

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43415 from trociny/wip-51909-pacific

pacific: crush: cancel upmaps with up set size != pool size

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43740 from cfsnyder/wip-53091-pacific

pacific: rgw: add abstraction for ops log destination and add file logger

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43695 from cfsnyder/wip-52960-pacific

pacific: rgw/rgw_rados: make RGW request IDs non-deterministic

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43662 from MrFreezeex/wip-53032-pacific

pacific: rbd-mirror: fix mirror image removal

Reviewed-by: Mykola Golub <mgolub@mirantis.com>

Merge pull request #43694 from rhcs-dashboard/wip-53065-pacific

pacific: monitoring: ethernet bonding filter in Network Load.

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

PendingReleaseNotes: document OMAP upgrade bug.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit dfacf162afe55c6f15a8afb8aca18b52814e80ee)

os/bluestore: fix invalid omap name conversion when upgrading to per-pg.

Fixes: https://tracker.ceph.com/issues/53062
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit cbc97018d883333f81ab9a3cfa99d2f68a9874cd)

test/store_test: add a UT for omap format upgrade.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit ccb6fdf3b6ba700da1efea597deabe79329a4504)

os/bluestore: permit legacy omap naming scheme in mkfs.

Primarily for debug purposes...

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 99e40a1e87c44705ffdb557d57bd54eaff31cfe3)

Conflicts:
src/common/options/global.yaml.in
old way of specifying config settings

rgw/rgw_rados: make RGW request IDs non-deterministic

Use a random number vs. incremental counter for first component of request ID.

Fixes: https://tracker.ceph.com/issues/52818
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit bce34dd68634d241b451111dcf2e931837eb4bfd)

rgw/sts: fix for copy object operation using sts
temporary credentials

Fixes: https://tracker.ceph.com/issues/47809
Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
(cherry picked from commit c9614ba2d4395ef535151caba349662fc183e483)

ceph-volume: util/prepare fix osd_id_available()

The current check only allows to request an OSD id that exists but
marked as 'destroyed'.
With this small fix, we can now use `--osd-id` with an id that doesn't
exist at all.

Fixes: https://tracker.ceph.com/issues/50880
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73bfa5d2b0157f92721d8bf36619fd35ee265cdd)

mgr/dashboard: improve error handling for gather_facts

Fixes: https://tracker.ceph.com/issues/53084
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 75c8d578ed106c4584c166d8ccd9153c69667d01)

mgr/dashboard: Cluster Creation Add multiple hosts at once

Add multiple hosts at once in cluster creation wizard

Fixes: https://tracker.ceph.com/issues/52759
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit caa177265c34530b1de10f9fe3664c23202d1b5f)

mgr/dashboard: gather facts should only be fetched when orch backend is cephadm

Fixes: https://tracker.ceph.com/issues/52981
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Gather facts in UI should only be fetched if there is orch available and
get_facts feature is available for that orch backend.

(cherry picked from commit 78b8af2bda39dea8aa46a42e0cea4c5fd53c1d42)

mgr/dashboard: Fix for form inside form closing issue

After the angular 11 upgrade the form in form behaviour got broken when
one tries to close that "embedded" form. This commit is to fix that
behaviour.

Fixes: https://tracker.ceph.com/issues/53020
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit b4ebca28d82eeb927c30720b97ba14838482c497)

mgr/dashboard: NFS exports: API + UI: integration with mgr/nfs; cleanups

mgr/dashboard: move NFS_GANESHA_SUPPORTED_FSALS to mgr_module.py

Importing from nfs module throws AttributeError because as a side effect the dashboard module is impersonating the nfs module.
https://gist.github.com/varshar16/61ac26426bbe5f5f562ebb14bcd0f548

mgr/dashboard: 'Create NFS export' form: list clusters from nfs module

mgr/dashboard: frontend+backend cleanups for NFS export

Removed all code and references related to daemons. UI cleanup and adopted unit-testing for
nfs-epxort create form for CEPHFS backend. Cleanup for export list/get/create/set/delete endpoints.

mgr/dashboard: rm set-ganesha ref + update docs

Remove existing set-ganesha-clusters-rados-pool-namespace references as
they are no longer required. Moreover, nfs doc in dashboard doc is
updated accordingly to the current nfs status.

mgr/dashboard: add nfs-export e2e test coverage

mgr/dashboard: 'Create NFS export' form: remove RGW user id field.

- Improve bucket typeahead behavior.
- Increase version for bucket list endpoint.
- Some refactoring.

mgr/dashboard: 'Create NFS export' form: allow RGW backend only when default realm is selected.

When RGW multisite is configured, the NFS module can only handle buckets in the default realm.

mgr/dashboard: 'Create service' form: fix NFS service creation.

After https://github.com/ceph/ceph/pull/42073, NFS pool and namespace are not customizable.

mgr/dashboard: 'Create NFS export' form: add bucket validation.

- Allow only existing buckets.
- Refactoring:
  - Moved bucket validator from bucket form to cd-validators.ts
  - Split bucket validator into 2: bucket name validator and bucket existence (that checks either existence or non-existence).

mgr/dashboard: 'Create NFS export' form: path validation refactor: allow only existing paths.

Fixes: https://tracker.ceph.com/issues/46493
Fixes: https://tracker.ceph.com/issues/51479
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 58a6ab2147c34d5b3f14bf48f9b47b03bea8a672)

Conflicts:
doc/mgr/dashboard.rst
  - Resolved conflicts.
src/pybind/mgr/dashboard/services/cephx.py
  - Deleted as in master.

mgr/dashboard: directly use ExportMgr and NFSCluster objects

Using the objects directly provides access to other methods and helps in
avoiding repeatition.

mgr/dashboard/nfsganesha: remove tag

Since NFS v3 is no longer supported. We can remove tag.

mgr/nfs: define global constant to list supported FSALs

mgr/dashboard: directly list nfs clusters by directly importing available_cluster() method

The current dashboard api returns a list of following dictionary

{
   'pool': 'nfs-ganesha',
   'namespace': cluster_id,
   'type': 'orchestrator',
   'daemon_conf': None
}

None of these values are required for listing nfs cluster by mgr/nfs module.
Instead directly list available cluster names

mgr/dashboard: add comment to remove listing of daemons

As the configs are per cluster. There is no need to list daemons per cluster.

mgr/dashboard/controllers/nfsganesha: Add comments to update/remove status endpoint

This endpoint can be updated in suggested way or even removed. As it was
initially[1] introduced to check if dashboard pool and namespace configuration was
set.

[1] https://github.com/ceph/ceph/commit/824726393b185b8e5a8f17e66487dfde9f3c8b5c

mgr/nfs: remove fetch_cluster_obj()

There is no need to fetch NFSCluster class object. Directly
available_clusters() can be imported to list nfs clusters.

mgr/dashboard/controllers/nfsganesha: list exports based on cluster id

As mgr/nfs module lists based on cluster id.

mgr/dashboard/nfs: get and delete export by export id

Fixes: https://tracker.ceph.com/issues/46493
Signed-off-by: Varsha Rao <varao@redhat.com>
(cherry picked from commit 64178e1f5b0f26f5c17898e85c3d5b439e5f9dbf)

mgr/nfs: stick to lazy evaluation of logger messages

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit e1c738cefe84c2b6dd15a6667d2df051a0f34e31)

Merge pull request #43728 from sebastian-philipp/pacific-backport-42970-43021-43039-43010-42989-42859-43143-43141-43115-43162-

pacific: cephadm: October batch

Reviewed-by: Adam King <adking@redhat.com>

mgr/nfs: change _cmd_rgw_export_create_rgw() name

To maintain uniformity in naming, modify _cmd_rgw_export_create_rgw() name to
_cmd_nfs_export_create_rgw().

Signed-off-by: Varsha Rao <varao@redhat.com>
(cherry picked from commit 4137f88d0c4278ad1a8a7bce9c6d432d1c9e7c7c)

mgr/nfs: don't log fsal keys

FSAL keys are written to export objects. Don't log it, instead log the initial
export before generating keys and user config object mostly will not contain
export block. It's okay to log user config object content.

Signed-off-by: Varsha Rao <varao@redhat.com>
(cherry picked from commit b8c47c8d223f45092325cb8c02fa137eeebefb25)

mgr/nfs: Add more debug log messages

Fixes: https://tracker.ceph.com/issues/52274
Signed-off-by: Varsha Rao <varao@redhat.com>
(cherry picked from commit 719e7024682d64da9bb2c22b90725ff311e83860)

mgr/dashboard: consume mgr/nfs via mgr.remote()

Stop using the dashboard version of the Ganesha config classes; consume
mgr/nfs instead via remote().

mgr/nfs/export: return Export from _apply_export

Future callers will want this.

mgr/nfs: new module methods for dashboard consumption

Add some new methods that are easy for the dashboard API to consume. These
are very similar to the CLI methods but do now have the @CLICommand and
related decorators, and have slightly different interfaces (e.g., returning
the created/modified Export dict).

mgr/dashboard: remove old ganesha code (and tests)

Fixes: https://tracker.ceph.com/issues/46493
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 6e5a1eefd0b48614f7c95f86fd9fadab1b2b03fe)

Conflicts:
src/pybind/mgr/dashboard/services/ganesha.py
- Deleted as in master.