The prior code caused binary omap values to be discarded. This fixes
them to use the same model as the xattr iterator, and correctly return
binary data as python strings, eg:
'object_prefix': '\x15\x00\x00\x00rbd_data.449d2ae8944a'
Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
Fixes: #12958
Head objects are mutable, so removing them can race with object removal
and a later recreation, so we might end up cleaning them up when we don't
need to.
The KeyServer class has a public method get_auth() that returns a boolean
value. This value is being checked here - fix the conditional so it triggers
when get_auth() returns false.
Only track read-after-write and write-after-write IO dependencies
via the associated write completions. All IO events after a write
completion are considered to be dependent and can be pruned down
to at most the number of concurrent IOs. This reduces the prep
time from a simple 'rbd bench-write' from over 4 hrs down to seconds.
Fixes: #13378, #13384
Backport: hammer
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
John Spray [Wed, 23 Sep 2015 11:58:46 +0000 (12:58 +0100)]
qa: avoid using sudo in fsstress
This test required root in order to copy its built
binary into /usr (presumably to avoid rebuilding it).
That's not really a good thing anyway because there's
no guarantee that a binary in that path is the binary
we wanted, so just run the thing straight out of /tmp. The
build is really quick anyway.
Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Sage Weil [Wed, 7 Oct 2015 15:49:01 +0000 (11:49 -0400)]
os/FileStore: kludge sloppy hammer temp objects into temp collection
When we are running with a mixed hammer cluster, hammer primaries
will generate temp object names that are sloppy. Make sure we still
put them into the temp collection.
Note that this isn't a problem on write because the primary (hammer)
OSD generated the transaction and explicitly specified a temp
collection; it's only transactions we do on our own with the sloppy
temp ghobject_t that trip over this.
Fixes: #13395 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 6 Oct 2015 18:35:35 +0000 (14:35 -0400)]
osd/PG: fix generate_past_intervals
We may be only calculating older past intervals and have a valid
history.same_interval_since value, in which case the local
same_interval_since value will end at the newest old interval we had to
generate.
Boris Ranto [Tue, 6 Oct 2015 01:57:40 +0000 (21:57 -0400)]
selinux: Fix man page location
The SELinux man page was previously located in two places and the man
page that was supposed to be updated when rgw selinux changes were
proposed did not get updated properly. Fixing this by moving
selinux/ceph_selinux.8 to man/ceph_selinux.8. Also, populate EXTRA_DIST
with ceph_selinux.8.
Sage Weil [Tue, 6 Oct 2015 14:54:50 +0000 (10:54 -0400)]
mon: do not remove proxied sessions
A proxied session (see handle_forward) isn't registered, so it doesn't
need remove_session. Moreover, s->con is null, so it will crash in
remove_session.
Fixes: #13379 Signed-off-by: Sage Weil <sage@redhat.com>
Nathan Cutler [Tue, 6 Oct 2015 13:07:41 +0000 (15:07 +0200)]
ceph.spec.in: remove comments regarding ceph UID/GID in SUSE
It is possible that the ceph user/group will not have fixed UID/GID in SUSE.
Instead, it is possible that the ceph package will depend on a separate package
whose sole purpose will be to create the ceph user/group if they do not exist.
Nathan Cutler [Tue, 6 Oct 2015 10:25:52 +0000 (12:25 +0200)]
ceph.spec.in: enable OBS post-build-checks to find systemd-tmpfiles
The openSUSE Build Service runs a number of "post-build checks" after the RPMs
have been generated. One of these tests the RPM scriptlets for idempotence.
Without this line in the specfile, the check fails on SLE_12 because it cannot
find the systemd-tmpfiles binary.
ceph.spec.in: Standardize systemd preun and postun scripts
Currently, the main ceph package and the ceph-radosgw behave
differently on upgrade. This commit unifies their behavior
to the following:
On package removal, disable and stop all related systemd units.
On package upgrade, do nothing unless there is a file /etc/sysconfig/ceph
containing a parameter CEPH_AUTO_RESTART_ON_UPGRADE. If parameter is set
to "yes", restart the systemd units iff they are running.
Nathan Cutler [Fri, 2 Oct 2015 10:15:08 +0000 (12:15 +0200)]
ceph.spec.in: fix for out-of-memory errors in OBS
Add "--param ggc-min-expand=20 --param ggc-min-heapsize=32768"
to RPM_OPT_FLAGS, ensuring gcc does not add debug symbols and is
more aggressive about garbage collection.
Thanks to Berthold Gunreben for debugging this issue.
Over in the SUSE sector, we are trying to enable the SLE_12 and openSUSE_13.2
build targets. The lttng/babeltrace stuff is currently available only in
SLE_12.
John Spray [Fri, 2 Oct 2015 21:14:38 +0000 (22:14 +0100)]
mds: avoid emitting cap warnings before evicting session
In the case where a client dies, and another client immediately
tries to access a file locked by the dead client, we would
previously *sometimes* emit a "client.xyz isn't responding to
mclientcaps" warning to the cluster log, right before
evicting the stale session. This was because the timeout
for the session eviction and the timeout for the
warning message are both 60s.
Fix this by checking the stale sessions before doing the
warning message check in Locker. If a session is going
to get evicted in this tick, it will already be gone
by the time Locker thinks about emitting the warning
message.
Fixes: #13334 Signed-off-by: John Spray <john.spray@redhat.com>
Boris Ranto [Fri, 2 Oct 2015 07:56:01 +0000 (09:56 +0200)]
ceph.spec.in: Do not always restart the daemons on upgrades
This patch minimizes the amount of daemon stop/start procedures when
upgrading ceph-selinux package. With this patch, the daemons get
restarted only if SELinux is enabled and the SELinux policy version
changed in the meantime.
Fixes: #13061 Signed-off-by: Boris Ranto <branto@redhat.com>
Sage Weil [Thu, 1 Oct 2015 19:03:22 +0000 (15:03 -0400)]
librados: expose OPERATION_FULL_TRY flag
Allow librados users to opt to receive ENOSPC or EDQUOT when they submit
an operation against a full cluster. This should only be used if the
librados app can handle those errors gracefully (librbd, for example,
cannot).
Also note that this allows savvy librados users to send delete operations;
they will get either a success or EDQUOT, depending on whether the
operation results in a net drop in space utilization.
Sage Weil [Thu, 1 Oct 2015 18:50:34 +0000 (14:50 -0400)]
osdc/Objecter: distinguish between multiple notify completions
We may send a notify to the cluster multiple times due to OSDMap
changes. In some cases, earlier notify attempts may complete with
an error, while later attempts succeed. We need to only pay
attention to the most-recently send notify's completion.
Do this by making note of the notify_id in the initial ACK (only
present when talking to newer OSDs). When we get a notify
completion, match it against our expected notify_id (if we have
one) or else discard it.
This is important because in some cases an early notify completion
may be an error while a later one succeeds.
Note that if we are talking to an old cluster we will simply not record a
notify_id and our behavior will be the same as before (we will trust any
notify completion we get).
Fixes: #13114 Signed-off-by: Sage Weil <sage@redhat.com>