Adam Kupczyk [Wed, 2 Mar 2016 11:31:01 +0000 (12:31 +0100)]
[MON] Fixed calculation of %USED. Now it is shows (space used by all replicas)/(raw space available on OSDs). Before it was (size of pool)/(raw space available on OSDs).
src/rgw/rgw_bucket.cc
1. Do not use the rgw_user structure and remove the tenant parameter that describes as below
2. user_id is not used so just remove the line
3. instead of system_obj_set_attr you can use the method set_attr
Backport Change:
We do not use the rgw_user structure and remove the `tenant` parameter
because this feature is not introduced on hammer version.
The rgw multi-tenant feature is introduced on pr#6784 (https://github.com/ceph/ceph/pull/6784)
This feature is supported from v10.0.2 and later version.
ceph.spec.in: disable lttng and babeltrace explicitly
before this change, we do not pacakge tracepoint probe shared libraries
on rhel7. but "configure" script enables them if lttng is detected. and
rpm complains at seeing installed but not pacakged files. as EPEL-7 now
includes lttng-ust-devel and libbabeltrace-devel, we'd better
BuildRequire them, and build with them unless disabled otherwise. so in
this change
* make "lttng" an rpm build option enabled by default
* BuildRequire lttng-ust-devel and libbabeltrace-devel if the "lttng"
"lttng" option is enabled
* --without-lttng --without-babeltrace if the "lttng" option is disabled
hammer: monclient: avoid key renew storm on clock skew
Refreshing rotating keys too often is a symptom of a clock skew, try to
detect it and don't cause extra problems:
* MonClient::_check_auth_rotating:
- detect and report premature keys expiration due to a time skew
- rate limit refreshing the keys to avoid excessive RAM and CPU usage
(both by OSD in question and monitors which have to process a lot
of auth messages)
* MonClient::wait_auth_rotating: wait for valid (not expired) keys
* OSD::init(): bail out after 10 attempts to obtain the rotating keys
the gmt_hitset is enabled by default in the ctor of pg_pool_t, this
is intentional. because we want to remove this setting and make
gmt_hitset=true as a default in future. but this forces us to
disable it explicitly when preparing a new pool if any OSD does
not support gmt hitset.
Kefu Chai [Fri, 5 Jun 2015 13:06:48 +0000 (21:06 +0800)]
osd: use GMT time for the object name of hitsets
* bump the encoding version of pg_hit_set_info_t to 2, so we can
tell if the corresponding hit_set is named using localtime or
GMT
* bump the encoding version of pg_pool_t to 20, so we can know
if a pool is using GMT to name the hit_set archive or not. and
we can tell if current cluster allows OSDs not support GMT
mode or not.
* add an option named `osd_pool_use_gmt_hitset`. if enabled,
the cluster will try to use GMT mode when creating a new pool
if all the the up OSDs support GMT mode. if any of the
pools in the cluster is using GMT mode, then only OSDs
supporting GMT mode are allowed to join the cluster.
Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/osd_types.cc
src/osd/osd_types.h
fill pg_pool_t with default settings in master branch.
test/bufferlist: do not expect !is_page_aligned() after unaligned rebuild
if the size of a bufferlist is page aligned we allocate page aligned
memory chunk for it when rebuild() is called. otherwise we just call
the plain new() to allocate new memory chunk for holding the continuous
buffer. but we should not expect that `new` allocator always returns
unaligned memory chunks. instead, it *could* return page aligned
memory chunk as long as the allocator feels appropriate. so, the
`EXPECT_FALSE(bl.is_page_aligned())` after the `rebuild()` call is
removed.
Sage Weil [Tue, 6 Oct 2015 18:35:35 +0000 (14:35 -0400)]
osd/PG: fix generate_past_intervals
We may be only calculating older past intervals and have a valid
history.same_interval_since value, in which case the local
same_interval_since value will end at the newest old interval we had to
generate.
mon: Monitor: get rid of weighted clock skew reports
By weighting the reports we were making it really hard to get rid of a
clock skew warning once the cause had been fixed.
Instead, as soon as we get a clean bill of health, let's run a new round
and soon as possible and ascertain whether that was a transient fix or
for realsies. That should be better than the alternative of waiting for
an hour or something (for a large enough skew) for the warning to go
away - and with it, the admin's sanity ("WHAT AM I DOING WRONG???").
When in the presence of a clock skew, adjust the checking interval
according to how many rounds have gone by since the last clean check.
If a skew is detected, instead of waiting an additional 300 seconds we
will perform the check more frequently, gradually backing off the
frequency if the skew is still in place (up to a maximum of
'mon_timecheck_interval', default: 300s). This will help with transient
skews.
Conflicts:
src/common/config_opts.h
Merge the change line.
src/mon/Monitor.h
handle_timecheck_leader(MonOpRequestRef op) was replaced with handle_timecheck_leader(MTimeCheck *m)
also for handle_timecheck_peon and handle_timecheck.
Dan Mick [Thu, 26 Nov 2015 03:20:51 +0000 (19:20 -0800)]
test/librados/test.cc: clean up EC pools' crush rules too
SetUp was adding an erasure-coded pool, which automatically adds
a new crush rule named after the pool, but only removing the
pool. Remove the crush rule as well.
Sage Weil [Thu, 10 Mar 2016 13:28:59 +0000 (08:28 -0500)]
osd/MonCommand: add/fix up 'osd [test-]reweight-by-{pg,utilization}'
- show before/after pg placement stats
- add test- variants that don't do anything
- only allow --no-increasing on the -utilization versions (where
it won't conflict with the optional pool list and confuse the
arg parsing)
Dan van der Ster [Fri, 26 Feb 2016 20:52:41 +0000 (21:52 +0100)]
osd: add sure and no-increasing options to reweight-by-*
Add a --no-increasing option to reweight-by-* which can be used to only decrease
OSD weights without increasing any. This is useful for example if you need to
urgently lower the weight of nearly full OSDs.
Also add a --yes-i-really-mean-it confirmation to reweight-by-*.
Jason Dillaman [Wed, 9 Mar 2016 23:00:04 +0000 (18:00 -0500)]
librbd: complete cache reads on cache's dedicate thread
If a snapshot is created out-of-band, the next IO will result in the
cache being flushed. If pending writeback data performs a copy-on-write,
the read from the parent will be blocked.
Sage Weil [Mon, 16 Nov 2015 16:32:34 +0000 (11:32 -0500)]
osdc/Objecter: call notify completion only once
If we race with a reconnect we could get a second notify message
before the notify linger op is torn down. Ensure we only ever
call the notify completion once to prevent a segfault.
Brad Hubbard [Fri, 4 Mar 2016 03:06:47 +0000 (13:06 +1000)]
tests: Add TEST_no_segfault_for_bad_keyring to test/mon/misc.sh
94da46b6e31cac206cb32fc5bd3159209ee25e8c adds
TEST_no_segfault_for_bad_keyring which requires changes to run
in hammer since test/mon/misc.sh is not written to run multiple tests in
succession in the hammer version.
Dunrong Huang [Wed, 25 Nov 2015 11:03:03 +0000 (19:03 +0800)]
auth: fix a crash issue due to CryptoHandler::create() failed
In this case(e.g. user passes wrong key), attempts to call the CryptoKey.ckh will lead to a segfault.
This patch fixes crash issue like following:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffed10e700 (LWP 25051)]
0x00007ffff59896c6 in CryptoKey::encrypt (this=0x7fffed10d4f0, cct=0x555555829c30, in=..., out=..., error=0x7fffed10d440) at auth/cephx/../Crypto.h:110
110 return ckh->encrypt(in, out, error);
(gdb) bt
at auth/cephx/../Crypto.h:110
at auth/cephx/CephxProtocol.h:464
Piotr Dałek [Thu, 3 Mar 2016 10:30:53 +0000 (11:30 +0100)]
common/obj_bencher.cc: make verify error fatal
When run without "--no-verify", all verification errors are noted,
but they are not forwarded/reported anywhere else but to cerr, which
will cause automated testing to ignore them. Make seq_read_bench and
rand_read_bench return -EIO on any verification error which will,
in turn, return it back to caller.
hammer: tools: fix race condition in seq/rand bench (part 1)
src/common/obj_bencher.cc:601: the lock should be taken before calling completion_ret,
not after. Also note that if r < 0 the lock will be unlocked twice in a row.
As a result rados bench seq fails with assertion in Mutex::Unlock().
Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com> Signed-off-by: Alexey Sheplyakov <asheplyakov@mirantis.com>
(cherry picked from commit 0c8faf7c9982c564002771c3a41362a833ace9bb)
Conflicts:
src/common/obj_bencher.cc
src/common/obj_bencher.h
Pick only the lock related part to unbreak seq bench. The failure due
to the missing (or wrong sized) objects can be easily worked around, and
the changes required to fix this problem are way too intrusive for hammer.
client: use thread local data to track fuse request
When handling an operation, libcephfs code may want to access fuse
request for extra information. by tracking fuse request in thread
local data, we can avoid adding extra parameter to Client::ll_foo
functions.
Danny Al-Gaaf [Wed, 12 Aug 2015 16:38:38 +0000 (18:38 +0200)]
client/Client.cc: fix realloc memory leak
Fix handling of realloc. If realloc() fails it returns NULL, assigning
the return value of realloc() directly to the pointer without checking
for the result will lead to a memory leak.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de> Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 4f98dab99c35663de89a06e2dfdbd874f56aed41)
Nathan Cutler [Fri, 26 Feb 2016 17:30:49 +0000 (18:30 +0100)]
packaging: lsb_release build and runtime dependency
The lsb_release executable is being run in multiple places, not least in
src/common/util.cc, which calls it via shell in the collect_sys_info() code
path.
This patch addresses this issue on SUSE- and Debian-derivatives, as well
as reinstating the dependency for RHEL/Fedora after it was dropped in 15600572265bed397fbd80bdd2b7d83a0e9bd918.
Conflicts:
ceph.spec.in
The jewel specfile has diverged considerably from hammer:
systemd, package split, etc. This is more of a hand backport
than a cherry-pick.
Loic Dachary [Mon, 1 Feb 2016 12:32:13 +0000 (19:32 +0700)]
global: do not start two daemons with a single pid-file (part 2)
Fixes the following bugs:
* the fd is open(O_WRONLY) and cannot be read from, safe_read
always fails and never removes the pid file.
* pidfile_open(g_conf) is close(STDOUT_FILENO) and there is a risk that
pidfile_open gets STDOUT_FILENO only to have it closed and redirected
to /dev/null.
* Before writing the file, ftruncate it so that overriding a file
containing the pid 1234 with the pid 89 does not end up being
a file with 8934.
* Before reading the file, lseek back to offset 0 otherwise it
will read nothing.
* tests_pidfile was missing an argument when failing
TEST_without_pidfile and killed all process with ceph in their name,
leading to chaos and no useful error message.
* lstat(fd) cannot possibly return a result different from the one
obtained right after the file was open, stat(path) must be used
instead.
In addition to fixing the bugs above, refactor the pidfile.cc
implementation to:
* be systematic about error reporting (using cerr for when removing
the pidfile because derr is not available at this point and derr
when creating the pidfile).
* replace pidfile_open / pidfile_write with just pidfile_write since
there never is a case when they are not used together.
More test cases are added to test_pidfile to verify the bugs above are
fixed.
Conflicts:
src/global/global_init.cc
- the `flag` argument of `global_init_prefork()` is not used, so
it was removed in master. but the cleanup commit was not
cherry-picked to hammer, thus the conflict. we can just keep it
around in hammer to minimize the code churn, although it may
stand in the way of future backports.)
- s/nullptr/NULL/ as hammer does not support c++11.