Chunsong Feng [Wed, 20 Nov 2019 01:42:11 +0000 (09:42 +0800)]
msg/async/dpdk: exit condition waiting when DPDKStack is destructed
exit() will call pthread_cond_destroy attempting to destroy dpdk::eal::cond
upon which other threads are currently blocked results in undefine
behavior. Link different libc version test, libc-2.17 can exit,
libc-2.27 will deadlock, the call stack is as follows:
Thread 3 (Thread 0xffff7e5749f0 (LWP 62213)):
#0 0x0000ffff7f3c422c in futex_wait_cancelable (private=<optimized out>, expected=0,
futex_word=0xaaaadc0e30f4 <dpdk::eal::cond+44>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>, cond=0xaaaadc0e30c8 <dpdk::eal::cond>)
at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0xaaaadc0e30c8 <dpdk::eal::cond>, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>)
at pthread_cond_wait.c:655
#3 0x0000ffff7f1f1f80 in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#4 0x0000aaaad37f5078 in dpdk::eal::<lambda()>::operator()(void) const (__closure=<optimized out>, __closure=<optimized out>)
at ./src/msg/async/dpdk/dpdk_rte.cc:136
#5 0x0000ffff7f1f7ed4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#6 0x0000ffff7f3be088 in start_thread (arg=0xffffe73e197f) at pthread_create.c:463
#7 0x0000ffff7efc74ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
Thread 1 (Thread 0xffff7ee3b010 (LWP 62200)):
#0 0x0000ffff7f3c3c38 in futex_wait (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>)
at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1 futex_wait_simple (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>)
at ../sysdeps/nptl/futex-internal.h:135
#2 __pthread_cond_destroy (cond=0xaaaadc0e30c8 <dpdk::eal::cond>) at pthread_cond_destroy.c:54
#3 0x0000ffff7ef2be34 in __run_exit_handlers (status=-6, listp=0xffff7f04a5a0 <__exit_funcs>, run_list_atexit=255,
run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#4 0x0000ffff7ef2bf6c in __GI_exit (status=<optimized out>) at exit.c:139
#5 0x0000ffff7ef176e4 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:344
#6 0x0000aaaad2939db0 in _start () at ./src/include/buffer.h:642
Fixes: https://tracker.ceph.com/issues/42890 Signed-off-by: Chunsong Feng <fengchunsong@huawei.com> Signed-off-by: luo rixin <luorixin@huawei.com>
[lint:tsc ] cypress/integration/orchestrator/01-hosts.e2e-spec.ts(29,13): error TS2339: Property 'clickHostTab' does not exist on type 'HostsPageHelper'.
also change "host" to "hostname" to be more consistent
Samuel Just [Thu, 14 Oct 2021 21:51:38 +0000 (14:51 -0700)]
crimson/os/seastore/segment_manager/block: open with dsync
67efc4 appears to be simply incorrect, I don't see any calls
to flush(), so we do need to open with dsync until we
implement a smarter flushing scheme.
Also, refactor open_device to remove mode param -- we always
pass the same value.
Chunsong Feng [Wed, 13 Oct 2021 03:55:08 +0000 (03:55 +0000)]
src/msg/dpdk: reserve funcs capacity to avoid reallocation
When a new vector is added larger than then current vector capacity,
it reallocates space. lamda function accesses the previous adress will
cause a segment fault. Therefore, reserve sufficient funcs space to
avoid reallocation.
Kefu Chai [Wed, 13 Oct 2021 23:51:31 +0000 (07:51 +0800)]
common/pick_address: refactor pick_addresses()
* consolidate the logic handling CEPH_PICK_ADDRESS_PREFER_IPV4 using
std::sort(). this might be overkill. but it helps to explain
what CEPH_PICK_ADDRESS_PREFER_IPV4 is for, and helps to dedup
the code to order the addresses.
* let fill_in_one_address() return an optional<entity_addrvec_t>.
more readable this way
* early return if the required address is not found, instead of
checking variables like ipv4_r
* rename fill_in_one_address() to get_one_address() to reflect
the change of the function's return value's type
Laura Flores [Mon, 4 Oct 2021 04:41:10 +0000 (04:41 +0000)]
os/bluestore: update priorities and nicks of bluestore perf counters
These perf counters do not show up in telemetry unless they are set to a "useful" priority or higher. Fetching these counters in telemetry may help to diagnose problems with RocksDB / BlueFS prefetching / insufficient cache sizes.
1. The device preview disappearing when going to next step and coming back to the previous step
2. Even when clearing the device preview, the Storage Capacity count and the drive group spec doesn't get cleared.
3. Expanding the cluster without selecting any devices gives a 400
error.
4. Renamed "Delete Host" to "Remove Host"
5. Generalizing most of the sub component code
Nizamudeen A [Sun, 4 Jul 2021 13:16:45 +0000 (18:46 +0530)]
mgr/dashboard: Cluster Creation Add Host Section and e2es
Add host section of the cluster creation workflow.
1. Fix bug in the modal where going forward one step on the wizard and coming back opens up the add host modal.
2. Rename Create Cluster to Expand Cluster as per the discussions
3. A skip confirmation modal to warn the user when he tries to skip the
cluster creation
4. Adapted all the tests
5. Did some UI improvements like fixing and aligning the styles,
colors..
- Used routed modal for host Additon form
- Renamed the Create to Add in Host Form
Avan Thakkar [Tue, 1 Jun 2021 12:55:15 +0000 (18:25 +0530)]
mgr/dashboard: Create Cluster Workflow welcome screen and e2e tests
A module option called CLUSTER_STATUS has two option. INSTALLED
AND POST_INSTALLED. When CLUSTER_STATUS is INSTALLED it will allow to show the
create-cluster-wizard after login the initial time. After the cluster
creation is succesfull this option is set to POST_INSTALLED
Also has the e2e codes for the Review Section
Fixes: https://tracker.ceph.com/issues/50336 Signed-off-by: Avan Thakkar <athakkar@redhat.com> Signed-off-by: Nizamudeen A <nia@redhat.com>
This solves the tracker: https://tracker.ceph.com/issues/51724
Basically it is using 'generate_presigned_post()' boto3 API.
This is verified under AMQP endpoint.
Zack Cerza [Tue, 12 Oct 2021 18:43:34 +0000 (12:43 -0600)]
Revert "qa: support isal ec test for aarch64"
This commit has been causing scheduled jobs to request e.g. aarch64
smithi machines, which don't exist. The dispatcher then tries to find them forever, requiring the dispatcher to be killed and restarted. The queue
will sit idle until someone notices the problem.
The test is failing on deleting a host because the agent daemon is
present in that host. Its not possible to simply delete a host. We need
to drain it first and then delete it.
Fixes: https://tracker.ceph.com/issues/52764 Signed-off-by: Nizamudeen A <nia@redhat.com>
Xuehan Xu [Tue, 12 Oct 2021 01:55:21 +0000 (09:55 +0800)]
crimson/os/seastore: set ExtentPlacementManager::allocated_to before rolling segments
There are circumstances in which a transaction that are supposed to roll the current segment
is invalidated after it finished writing and before it rolls the segment. If we don't set
ExtentPlacementManager::allocated_to in this situation, another transaction can try to write
to the old "allocated_to" position, which would cause an invalid write error
Joseph Sawaya [Fri, 3 Sep 2021 17:30:43 +0000 (13:30 -0400)]
mgr/rook: apply mds using placement spec and osd_pool_default_size
This commit changes the apply_mds command in the rook orchestrator
to support some placement specs and also sets the replica size according
to the osd_pool_default_size ceph option.
This commit also adds `orch apply mds` to the QA to test if the command
runs.
crimson/os/seastore: adjust segment cleaner's space counter calculation
Until now, segment cleaner calculate available/used spaces assuming that only the journal
does all the writes, which is not true any more. This commit make segment cleaner track
all segment managers' empty/open segments and further calculate the various space usage
based on that information
crimson/os/seastore: enable SegmentCleaner to hold multiple segmented devices
For now, all segmented devices are treated the same as SEGMENTED. In the future,
there may be different kinds of segmeted devices, like SEGMENTED_NVME, SEGMENTED_SSD,
and even SEGMENTED_SATA. We plan to use a dedicated segment cleaner for each kind of
those devices, and if there are multiple devices of the same kind, they share the same
segment cleaner.