tools/ceph_objectstore_tool: fix 'dup' unable to duplicate meta PG
Recently we plan to bring a Jewel cluster into Luminous.
After that is done, which turns out to be a big success,
we then try to transform all FileStore osds into BlueStore ones
offline but with no luck. The ceph_objectstore_tool keeps complaining:
--------------------------------------------------------------------
dup from filestore: /var/lib/ceph/osd/ceph-20.old
to bluestore: /var/lib/ceph/osd/ceph-20
fsid d444b253-337d-4d15-9d63-86ae134ec9ac
65 collections
1/65 meta
cannot get bit count for collection meta: (61) No data available
--------------------------------------------------------------------
The root cause is that for FileStore Luminous will always try to rewrite
pg "bits" as a file attribute on "Load" if that is not available.
But since meta pg is never loaded (we skip it during OSD::load_pgs()),
we actually never get the chance to do so; hence making the
dup method from ceph_objectstore_tool very unhappy since it always
expects to see such a attribute from underlying store.
Fix the above problem by manually skipping loading the "bits" attribute
if underlying OS is FileStore for dup.
Adding Luminous release date, and dropping links for dev releases (since
they've been merged to 12.2.0), also rearranged the table so that newer
releases come left
xie xingguo [Thu, 31 Aug 2017 03:42:37 +0000 (11:42 +0800)]
os/bluestore: don't re-initialize csum-setting for existing blobs
The global checksum setting may change, e.g., from NONE to CRC32,
which can cause improper re-initialization of the csum-settings of
existing blobs(e.g., partial write/overwrite may turn out to shrink
'csum_data').
We could develop some complicated solutions but for now let's not
bother since the above scenario is rare.
Sage Weil [Thu, 31 Aug 2017 20:43:39 +0000 (16:43 -0400)]
os/bluestore: separate finisher for deferred_try_submit
Reusing finishers[0], which is used for completions back into the OSD,
is deadlock-prone: the OSD code might block trying to submit new IO or
while waiting for some other bluestore work to complete.
Fixes: http://tracker.ceph.com/issues/21207 Signed-off-by: Sage Weil <sage@redhat.com>
amitkuma [Mon, 7 Aug 2017 10:59:01 +0000 (16:29 +0530)]
rgw: Initializes uninitialized members of rgw
Fixes the coverity issues:
** 1352181 Uninitialized scalar field
2. uninit_member: Non-static class member field fh_hk.bucket is
not initialized in this constructor nor in any functions that it calls.
CID 1352181 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member field fh_hk.object is
not initialized in this constructor nor in any functions that it calls.
** 1353424 Uninitialized scalar field
CID 1353424 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
5. uninit_member: Non-static class member watch_handle is not initialized
in this constructor nor in any functions that it calls.
** 1355240 Uninitialized scalar field
CID 1355240 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member index_type is not initialized
in this constructor nor in any functions that it calls.
tools/rados: support for high precision time using stat2
This commit introduces `stat2` option for the rados cli, which is
similar to `stat` except that it returns the mtime in high precision,
which is useful for inspecting objects for example if the application
had used the related librados api calls (like radosgw using `mtime2`)
Zac Medico [Thu, 31 Aug 2017 03:59:32 +0000 (20:59 -0700)]
interval_set: optimize intersection_of
Iterate over all elements of the smaller set, and use find_inc to
locate elements from the larger set in logarithmic time. This greatly
improves performance when one set is much larger than the other:
2 +-+--+----+----+----+----+----+----+----+----+--+-+
P +* +
E |* |
R 1.8 +* +
F | * |
O | * |
R 1.6 + * +
M | * |
A | * |
N 1.4 + * +
C | * |
E | * |
1.2 + * +
R | * |
A | * |
T 1 + *** +
I | ****** |
O + ***********************************
0.8 +-+--+----+----+----+----+----+----+----+----+--+-+
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SET SIZE RATIO
The above plot compares performance of the new intersection_size_asym
function to the existing intersection_of function. The performance of
intersection_size_asym gets worse as the set size ratio approaches 1.
For set size ratios where the performance ratio is greater than 1, the
performance of intersection_size_asym is superior. Therefore, this
patch only uses intersection_size_asym when the set size ratio is less
than or equal to 0.1 (code uses the reciprocal which is 10).
The plot was generated using benchmark results produced by the
following program:
int main()
{
const int interval_count = 100000;
const int interval_distance = 4;
const int interval_size = 2;
const int sample_count = 8;
const int max_offset = interval_count * interval_distance;
interval_set<int> a, b, intersection;
for (int i = 0; i < max_offset; i+=interval_distance) {
a.insert(i, interval_size);
}
for (int m = 1; m < 100; m++) {
float ratio = 1 / float(m);
for (int i = 0; i < max_offset; i+=interval_distance*m) {
b.insert(i, interval_size);
}
struct timeb start, end;
int ms = 0;
for (int i = 0; i < sample_count; i++) {
ftime(&start);
intersection.intersection_of(a, b);
ftime(&end);
ms += (int) (1000.0 * (end.time - start.time)
+ (end.millitm - start.millitm));
intersection.clear();
}
b.clear();
std::cout << ratio << "\t" << ms << std::endl << std::flush;
}
}
Zac Medico [Sun, 27 Aug 2017 12:25:01 +0000 (05:25 -0700)]
interval_set: optimize intersect_of for identical spans
Optimize comparisons for identical spans of intervals.
When this patch is combined with the previous map insert
optimization, a benchmark using 400000 identical
intervals shows a 7 times performance improvement in
comparison to without the patches.
Use the std::map insert method with hint iterator to optimize
inserts. This increases performance more than 3.5 times for
large numbers of intervals. This will help performance
especially in the PGPool::update method, where profiling data
has shown that intersection operations are a hot spot. The
following benchmark data is for 400000 intervals:
4 +-+--+----+----+----+----+----+----+----+----+--+-+
P + + + + + + + + *************
E | ******** |
R 3.5 +-+ **** +-+
F | ****** |
O | ** |
R 3 +-+ **** +-+
M | *** |
A | ** |
N 2.5 +-+ * +-+
C | ** |
E | * |
2 +-+ ** +-+
R | ** |
A | ** |
T 1.5 +** +-+
I |** |
O +* + + + + + + + + + +
1 +*+--+----+----+----+----+----+----+----+----+--+-+
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
SET SIZE RATIO
The above chart was generated using benchmark results
from the following program:
Sage Weil [Tue, 29 Aug 2017 04:01:19 +0000 (00:01 -0400)]
mon: set purged_snapdirs OSDMap flag once snapsets have all converted
This makes it easier to test whether the upgrade + conversion has
completed. In particular, mimic+ will be able to simply test for this
flag without waiting for complete PG stats to know whether it is safe to
upgrade beyond luminous.