git.apps.os.sepia.ceph.com Git

osd: include snaps in pg_log_entry_t::dump()

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 715d8717a0e8a08fbe97a3e7d3ffd33aa9529d90)

osd: unconditionally encode snaps buffer

Previously we would only encode the updated snaps vector for CLONE ops.
This doesn't work for MODIFY ops generated by the snap trimmer, which
may also adjust the clone collections. It is also possible that other
operations may need to populate this field in the future (e.g.,
LOST_REVERT may, although it currently does not).

Fixes: #4071, and possibly #4051.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 54b6dd924fea3af982f3d729150b6449f318daf2)

osd: improve debug output on snap collections

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 8b05492ca5f1479589bb19c1ce058b0d0988b74f)

PG: check_recovery_sources must happen even if not active

missing_loc/missing_loc_sources also must be cleaned up
if a peer goes down during peering:

1) pg is in GetInfo, acting is [3,1]
2) we find object A on osd [0] in GetInfo
3) 0 goes down, no new peering interval since it is neither up nor
acting, but peer_missing[0] is removed.
4) pg goes active and try to pull A from 0 since missing_loc did not get
cleaned up.

Backport: bobtail
Fixes: #4371
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit de22b186c497ce151217aecf17a8d35cdbf549bb)

HashIndex: _collection_list_partial must tolerate NULL next

Backport: bobtail
Fixes: #4379
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit ce4432adc67dc2fc06dd21ea08e59d179496bcc6)

OSD: lock not needed in ~DeletingState()

No further refs to the object can remain at this point.
Furthermore, the callbacks might lock mutexes of their
own.

Backport: bobtail
Fixes: #4378
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit e4bf1bcab159d7c5b720f5da01877c0f67c16d16)

ReplicatedPG: don't leak reservation on removal

Fixes: 4431
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 32bf131e0141faf407b5ff993f75f97516b27c12)

Conflicts:

src/osd/ReplicatedPG.cc

rgw: set up curl with CURL_NOSIGNAL

Fixes: #4425
Backport: bobtail
Apparently, libcurl needs that in order to be thread safe. Side
effect is that if libcurl is not compiled with c-ares support,
domain name lookups are not going to time out.
Issue affected keystone.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 88725316ddcfa02ff110e659f7a8131dc1ea2cfc)

osd: mark down connections from old peers

Close out any connection with an old peer. This avoids a race like:

- peer marked down
- we get map, mark down the con
- they reconnect and try to send us some stuff
- we share our map to tell them they are old and dead, but leave the con
open
...
- peer marks itself up a few times, eventually reuses the same port
- sends messages on their fresh con
- we discard because of our old con

This could cause a tight reconnect loop, but it is better than wrong
behavior.

Other possible fixes:
- make addr nonce truly unique (augment pid in nonce)
- make a smarter 'disposable' msgr state (bleh)

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 881e9d850c6762290f8be24da9e74b9dc112f1c9)

osd/PG: rename require_same_or_newer_map -> is_same_or_newer_map

This avoids confusion with the OSD method of the same name, and better
matches what the function tests (and does not do).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ba7e815a18cad110525f228db1b3fe39e011409e)

Conflicts:

src/osd/ReplicatedPG.cc

log: drop default 'log max recent' from 100k -> 10k

Use less memory.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c021c5ccf0c063cccd7314964420405cea6406de)

Fix radosgw actually reloading after rotating logs.

The --signal argument to Debian's start-stop-daemon doesn't
make it send a signal, but defines which signal should be send
when --stop is specified.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
(cherry picked from commit 44f1cc5bc42f9bb6d5a386037408d2de17dc5413)

common: reduce default in-memory logs for non-daemons

The default of 100000 can result in hundreds of MBs of extra memory
used. This was most obvious when using librbd with caching enabled,
since there was a dout(0) accidentally left in the ObjectCacher.

refs: #4352
backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 7c208d2f8e3f28f4055a4ae51eceae892dcef1dc)

osd: allow (some) log trim when degraded, but not during recovery

We allow some trim during degraded, although we keep more entries around to
improve our chances of a restarting OSD of doing log-based recovery.

Still disallow during recovery...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 6d89b34e5608c71b49ef33ab58340e90bd8da6e4)

osd: restructure calc_trim

No functional change, except that we log more debug, yay!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 86df164d04f6e31a0f20bbb94dbce0599c0e8b3d)

osd: allow pg log trim during (non-classic) scrub

Chunky (and deep) scrub do not care about PG log trimming. Classic scrub
still does.

Deep scrub can take a long time, so not trimming the log during that period
may eat lots of RAM; avoid that!

Might fix: #4179
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 0ba8db6b664205348d5499937759916eac0997bf)

msgr: drop messages on cons with CLOSED Pipes

Back in commit 6339c5d43974f4b495f15d199e01a141e74235f5, we tried to make
this deal with a race between a faulting pipe and new messages being
queued. The sequence is

- fault starts on pipe
- fault drops pipe_lock to unregister the pipe
- user (objecter) queues new message on the con
- submit_message reopens a Pipe (due to this bug)
- the message managed to make it out over the wire
- fault finishes faulting, calls ms_reset
- user (objecter) closes the con
- user (objecter) resends everything

It appears as though the previous patch *meant* to drop *m on the floor in
this case, which is what this patch does. And that fixes the crash I am
hitting; see #4271.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4)

Fix output of 'ceph osd tree --format=json'

Signed-off-by: Tyler Brekke <tyler.brekke@inktank.com>
(cherry picked from commit 9bcba944c6586ad5f007c0a30e69c6b5a886510b)

deb: Add ceph-coverage to ceph-test deb package

Teuthology uses the ceph-coverage script extensively
and expects it to be installed by the ceph task. Add
the script to the ceph-test debian package so that it
gets installed for that use case.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
(cherry picked from commit 376cca2d4d4f548ce6b00b4fc2928d2e6d41038f)

rgw: set attrs on various list bucket xml results (swift)

Fixes: #4247
The list buckets operation was missing some attrs on the different
xml result entities. This fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 4384e59ad046afc9ec53a2d2f1fff6a86e645505)

formatter: add the ability to dump attrs in xml entities

xml entities may have attrs assigned to them. Add the ability
to set them. A usage example:

formatter->open_array_section_with_attrs("container",
FormatterAttrs("name", "foo", NULL));

This will generate the following xml entity:
<container name="foo">

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 7cb6ee28073824591d8132a87ea09a11c44efd66)

Conflicts:
src/common/Formatter.cc

rgw: don't iterate through all objects when in namespace

Fixes: #4363
Backport: argonaut, bobtail
When listing objects in namespace don't iterate through all the
objects, only go though the ones that starts with the namespace
prefix

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 6669e73fa50e3908ec825ee030c31a6dbede6ac0)

ObjectCacher: fix debug log level in split

Level 0 should never be used for this kind of debugging.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit cb3ee33532fb60665f39f6ccb1d69d67279fd5e1)

rados: remove unused "check_stdio" parameter

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit bb860e49a7faeaf552538a9492ef0ba738c99760)

rados: obey op_size for 'get'

Otherwise we try to read the whole object in one go, which doesn't bode
well for large objects (either non-optimal or simply broken).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 234becd3447a679a919af458440bc31c8bd6b84f)

FileJournal::wrap_read_bl: adjust pos before returning

Otherwise, we may feed an offset past the end of the journal to
check_header in read_entry and incorrectly determine that the entry is
corrupt.

Fixes: 4296
Backport: bobtail
Backport: argonaut
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 5d54ab154ca790688a6a1a2ad5f869c17a23980a)

osd: leave osd_lock locked in shutdown()

No callers expect the lock to be dropped.

Fixes: #3816
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 98a763123240803741ac9f67846b8f405f1b005b)

msg: fix entity_addr_t::is_same_host() for IPv6

We weren't checking the memcmp return value properly! Aie...

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c8dd2b67b39a8c70e48441ecd1a5cc3c6200ae97)

osd: requeue pg waiters at the front of the finished queue

We could have a sequence like:

- op1
- notify
- op2

in the finished queue. Op1 gets put on waiting_for_pg, the notify
creates the pg and requeues op1 (and the end), op2 is handled, and
finally op1 is handled. That breaks ordering; see #2947.

Instead, when we wake up a pg, queue the waiting messages at the front
of the dispatch queue.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 56c5a07708d52de1699585c9560cff8b4e993d0a)

osd: pull requeued requests off one at a time

Pull items off the finished queue on at a time.  In certain cases, an
event may result in new items betting added to the finished queue that
will be put at the *front* instead of the back.  See latest incarnation
of #2947.

Note that this is a significant changed in behavior in that we can
theoretically starve if an event keeps resulting in new events getting
generated.  Beware!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit f1841e4189fce70ef5722d508289e516faa9af6a)

mds: open mydir after replay

In certain cases, we may replay the journal and not end up with the
dirfrag for mydir open. This is fine--we just need to open it up and
fetch it below.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e51299fbce6bdc3d6ec736e949ba8643afc965ec)

mds: use inode_t::layout for dir layout policy

Remove the default_file_layout struct, which was just a ceph_file_layout,
and store it in the inode_t. Rip out all the annoying code that put this
on the heap.

To aid in this usage, add a clear_layout() function to inode_t.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

mds: parse ceph.*.layout vxattr key/value content

Use qi to parse a strictly formatted set of key/value pairs. Be picky
about whitespace. Any subset of recognized keys is allowed. Parse the
same set of keys as the ceph.*.layout.* vxattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5551aa5b3b5c2e9e7006476b9cd8cc181d2c9a04)

rgw: fix multipart uploads listing

Fixes: #4177
Backport: bobtail
Listing multipart uploads had a typo, and was requiring the
wrong resource (uploadId instead of uploads).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit db99fb4417b87301a69cb37b00c35c838b77197e)

rgw: don't copy object when it's copied into itself

Fixes: #4150
Backport: bobtail

When object copied into itself, object will not be fully copied: tail
reference count stays the same, head part is rewritten.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 34f885be536d0ac89c10fd29b1518751d2ffc547)

PG: remove weirdness log for last_complete < log.tail

In the case of a divergent object prior to log.tail,
last_complete may end up before log.tail.

Backport: bobtail
Fixes #4174
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit dbadb3e2921297882c5836c67ca32bb8ecdc75db)

Conflicts:

src/osd/PG.cc

Strip any trailing whitespace from rbd showmapped

More recent versions of ceph append a bit of whitespace to the line
after the name of the /dev/rbdX device; this causes the monitor check
to fail as it can't find the device name due to the whitespace.

This fix excludes any characters after the /dev/rbdN match.
(cherry picked from commit ad84ea07cac5096de38b51b8fc452c99f016b8d8)

Merge pull request #64 from dalgaaf/wip-bobtail-memleaks

cherry-pick some memleak fixes from master to bobtail

rgw/rgw_rest.cc: fix 4K memory leak

Fix 4K memory leak in case RGWClientIO::read() fails in
read_all_chunked_input().

Error from cppcheck was:
Checking src/rgw/rgw_rest.cc...
[src/rgw/rgw_rest.cc:688]: (error) Memory leak: data

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 89df090e04ef9fc5aae29122df106b0347786fab)

SyntheticClient.cc: fix some memory leaks in the error handling

Fix some memory leaks in case of error handling due to failed
client->open() calls.

Error from cppcheck was:
[src/client/SyntheticClient.cc:1980]: (error) Memory leak: buf
[src/client/SyntheticClient.cc:2040]: (error) Memory leak: buf
[src/client/SyntheticClient.cc:2090]: (error) Memory leak: buf
(cherry picked from commit f0ba80756d1c3c313014ad7be18191981fb545be)

rgw/rgw_xml.cc: fix realloc memory leak in error case

Fix error from cppcheck:

[src/rgw/rgw_xml.cc:212]: (error) Common realloc mistake: 'buf'
nulled but not freed upon failure

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit d48cc789ea075ba2745754035640ada4131b2119)

os/FileStore.cc: fix realloc memory leak in error case

Fix error from cppcheck:

[src/os/FileStore.cc:512]: (error) Common realloc mistake: 'fiemap'
nulled but not freed upon failure

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit c92a0f552587a232f66620170660d6b2ab6fb3a5)

common/fiemap.cc: fix realloc memory leak

Fix error from cppcheck:

[src/common/fiemap.cc:73]: (error) Common realloc mistake: 'fiemap'
nulled but not freed upon failure

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit f26f1470e7af36fa1eb8dc59c8a7c62c3c3a22ba)

osd/OSDCap: add unit test for parsing pools/objects with _ and -

Hunting #4122, where a user saw

2013-02-13 19:39:25.467916 7f766fdb4700 10 osd.0 10 session 0x2c8cc60 client.libvirt has caps osdcap[grant(object_prefix rbd^@children class-read),grant(pool libvirt^@pool^@test rwx)] 'allow class-read object_prefix rbd_children, allow pool libvirt-pool-test rwx'

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2ce28ef1d7f95e71e1043912dfa269ea3b0d1599)
(cherry picked from commit a6534bc8a0247418d5263b765772d5266f99229c)

osd/OSDCap: tweak unquoted_word parsing in osd caps

Newer versions of spirit (1.49.0-3.1ubuntu1.1 in quantal, in particular)
dislike the construct with alnum and replace the - and _ with '\0' in the
resulting string.

Fixes: #4122
Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 6c504d96c1e4fbb67578fba0666ca453b939c218)

v0.56.3

rgw: change json formatting for swift list container

Fixes: #4048
There is some difference in the way swift formats the
xml output and the json output for list container. In
xml the entity is named 'name' and in json it is named
'subdir'.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 3e4d79fe42dfc3ca70dc4d5d2aff5223f62eb34b)

librbd: unprotect any non-unprotected snapshot

Include snapshots in the UNPROTECTING state as well, which can occur
after an unprotect is interrupted.

Fixes: #4100
Backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit fe283813b44a7c45def6768ea0788a3a0635957e)

java: make CephMountTest use user.* xattr names

Changes to the xattr code in Ceph require
a few tweaks to existing test cases.
Specifically, there is now a ceph.file.layout
xattr by default and user defined xattrs
are prepended with "user."

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>

mon: fix typo in C_Stats

Broken by previous commit.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3cf3710be0b4cccc8de152a97be50d983c35116d)

mon: retry PGStats message on EAGAIN

If we get EAGAIN from a paxos restart/election/whatever, we should
restart the message instead of just blindly acking it.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
(cherry picked from commit 4837063d447afb45554f55bb6fde1c97559acd4b)

mon: handle -EAGAIN in completion contexts

We can get ECANCELED, EAGAIN, or success out of the completion contexts,
but in the EAGAIN case (meaning there was an election) we were sending
a success to the client. This resulted in client hangs and all-around
confusion when the monitor cluster was thrashing.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
(cherry picked from commit 17827769f1fe6d7c4838253fcec3b3a4ad288f41)

osd: only share maps on hb connection of OSD_HBMSGS feature is set

Back in 1bc419a7affb056540ba8f9b332b6ff9380b37af we started sharing maps
with dead osds via the heartbeat connection, but old code will crash on an
unexpected message. Only do this if the OSD_HBMSGS feature is present.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 302b26ff70ee5539da3dcb2e5614e2b7e83b9dcd)

osd: tolerate unexpected messages on the heartbeat interface

We should note but not crash on unexpected messages. Announce this awesome
new "capability" via a feature bit.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit afda30aeaae0a65f83c6886658354ad2b57c4c43)

Conflicts:

src/include/ceph_features.h

Merge remote-tracking branch 'gh/wip-bobtail-osd-msgr' into bobtail

test_libcephfs: fix xattr test

Ignore the ceph.*.layout xattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit b0d4dd21c7be86eb47728a4702a3c67ca44424ac)

radosgw-admin: fix cli test

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1b05b0edbac09d1d7cf0da2e536829df05e48573)

Merge remote-tracking branch 'gh/wip-bobtail-vxattrs' into bobtail

mon: enforce reweight be between 0..1

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
(cherry picked from commit 4e29c95d6f61daa838888840cef0cceedc0fcfdd)

PG: dirty_info on handle_activate_map

We need to make sure the pg epoch is persisted during
activate_map.

Backport: bobtail
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit dbce1d0dc919e221523bd44e1d0834711da1577d)

osd: flush peering queue (consume maps) prior to boot

If the osd itself is behind on many maps during boot, it will get more and
(as part of that) flush the peering wq to ensure the pgs consume them.
However, it is possible for OSD to have latest/recnet maps, but pgs to be
behind, and to jump directly to boot and join. The OSD is then laggy and
unresponsive because the peering wq is way behind.

To avoid this, call consume_map() (kick the peering wq) at the end of
init and flush it to ensure we are *internally* all caught up before we
consider joining the cluster.

I'm pretty sure this is the root cause of #3905 and possibly #3995.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit af95d934b039d65d3667fc022e2ecaebba107b01)

rgw: a tool to fix clobbered bucket info in user's bucket list

This fixes bad entries in user's bucket list that may have occured
due to issue #4039. Syntax:

$ radosgw-admin user check --uid=<uid> [--fix]

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9cb6c33f0e2281b66cc690a28e08459f2e62ca13)

Conflicts:
src/rgw/rgw_admin.cc

rgw: bucket recreation should not clobber bucket info

Fixes: #4039
User's list of buckets is getting modified even if bucket already
exists. This fix removes the newly created directory object, and
makes sure that user info's data points at the correct bucket.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9d006ec40ced9d97b590ee07ca9171f0c9bec6e9)

Conflicts:
src/rgw/rgw_op.cc
src/rgw/rgw_rados.cc

rgw: a tool to fix buckets with leaked multipart references

Checks specified bucket for the #4011 symptoms, optionally fix
the issue.

sytax:
radosgw-admin bucket check --bucket=<bucket> [--fix]

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 2d8faf8e5f15e833e6b556b0f3c4ac92e4a4151e)

Conflicts:
src/rgw/rgw_admin.cc
src/rgw/rgw_rados.h

rgw: radosgw-admin object unlink

Add a radosgw-admin option to remove object from bucket index

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 16235a7acb9543d60470170bb2a09956364626cd)

Conflicts:
src/rgw/rgw_admin.cc
src/rgw/rgw_rados.h
src/test/cli/radosgw-admin/help.t

osd: kill unused addr-based send_map()

Not used, old API, bad.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e359a862199c8a94cb238f7271ba1b0edcc0863c)

osd: share incoming maps via Connection*, not addrs

Kill a set of parallel methods that are using the old addr/inst-based
msgr APIs, and instead use Connection handles. This is much safer and gets
us closer to killing the old msgr API.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5e2fab54a4fdf2f59e2b635cbddef8a5909acb7c)

osd: pass new maps to dead osds via existing Connection

Previously we were sending these maps to dead osds via their old addrs
using a new outgoing connection and setting the flags so that the msgr
would clean up. That mechanism is possibly buggy and fragile, and we can
avoid it entirely if we just reuse the existing heartbeat Connection.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1bc419a7affb056540ba8f9b332b6ff9380b37af)

osd: requeue osdmaps on heartbeat connections for cluster connection

If we receive an OSDMap on the cluster connection, requeue it for the
cluster messenger, and process it there where we normally do. This avoids
any concerns about locking and ordering rules.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 76705ace2e9767939aa9acf5d9257c800f838854)

msgr: add get_loopback_connection() method

Return the Connection* for ourselves, so we can queue messages for
ourselves.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a7059eb3f3922cf08c1e5bb5958acc2d45952482)

qa: add layout_vxattrs.sh test script

Test virtual xattrs for file and directory layouts.

TODO: create a data pool, add it to the fs, and make sure we can use it.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 61fbe27a52d12ecd98ddeb5fc0965c4f8ee7841a)

mds: allow dir layout/policy to be removed via removexattr on ceph.dir.layout

This lets a user remove a policy that was previously set on a dir.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit db31a1f9f27416e4d531fda716e32d42a275e84f)

mds: handle ceph.*.layout.* setxattr

Allow individual fields of file or dir layouts to be set via setxattr.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ebebf72f0993d028e795c78a986e1aee542ca5e0)

mdsmap: backported is_data_pool()

This roughly corresponds to mainline commit 99d9e1d.

Signed-off-by: Sage Weil <sage@inktank.com>

mds: fix client view of dir layout when layout is removed

We weren't handling the case where the projected node has NULL for the
layout properly. Fixes the client's view when we remove the dir layout.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 09f28541e374ffac198e4d48082b064aae93cb2c)

client: note presence of dir layout in inode operator<<

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 84751489ca208964e617516e04556722008ddf67)

client: list only aggregate xattr, but allow setting subfield xattrs

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ba32ea9454d36072ec5ea3e6483dc3daf9199903)

client: implement ceph.file.* and ceph.dir.* vxattrs

Display ceph.file.* vxattrs on any regular file, and ceph.dir.* vxattrs
on any directory that has a policy set.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3f82912a891536dd7e930f98e28d9a8c18fab756)

client: move xattr namespace enforcement into internal method

This captures libcephfs users now too.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit febb96509559084357bfaabf7e4d28e494c274aa)

client: allow ceph.* xattrs

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ad7ebad70bf810fde45067f78f316f130a243b9c)

rgw_rest: Make fallback uri configurable.

Some HTTP servers, notabily lighttp, do not set SCRIPT_URI, make the fallback
string configurable.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit b3a2e7e955547a863d29566aab62bcc480e27a65)

Conflicts:
src/rgw/rgw_rest.cc

rgw: fix setting of NULL to string

Fixes: #3777
s->env->get() returns char * and not string and can return NULL.
Also, remove some old unused code.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9019fbbe8f84f530b6a8700dfe99dfeb03e0ed3d)

OSD: check for empty command in do_command

Fixes: #3878
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 8cf79f252a1bcea5713065390180a36f31d66dfd)

PGMap: fix -Wsign-compare warning

Fix -Wsign-compare compiler warning:

mon/PGMap.cc: In member function 'void PGMap::apply_incremental
(CephContext*, const PGMap::Incremental&)':
mon/PGMap.cc:247:30: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit b571f8ee2d22a3894120204bc5f119ff37e1de53)

mon: smooth pg stat rates over last N pgmaps

This smooths the recovery and throughput stats over the last N pgmaps,
defaulting to 2.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a7d15afb529615db56bae038b18b66e60d827a96)

mon/PGMap: report IO rates

This does not appear to be very accurate; probably the stat values we're
displaying are not being calculated correctly.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3f6837e022176ec4b530219043cf12e009d1ed6e)

mon/PGMap: report recovery rates

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 208b02a748d97378f312beaa5110d8630c853ced)

mon/PGMap: include timestamp

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 76e9fe5f06411eb0e96753dcd708dd6e43ab2c02)

osd: track recovery ops in stats

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a2495f658c6d17f56ea0a2ab1043299a59a7115b)

osd_types: add recovery counts to object_sum_stats_t

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4aea19ee60fbe1106bdd71de2d172aa2941e8aab)

v0.56.2

cls_rbd, cls_rgw: use PRI*64 when printing/logging 64-bit values

caused segfaults in 32-bit build

Fixes: #3961
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e253830abac76af03c63239302691f7fac1af381)

init-ceph: make ulimit -n be part of daemon command

ulimit -n from 'max open files' was being set only on the machine
running /etc/init.d/ceph. It needs to be added to the commands to
start the daemons, and run both locally and remotely.

Verified by examining /proc/<pid>/limits on local and remote hosts

Fixes: #3900
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Loïc Dachary <loic@dachary.org>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit 84a024b647c0ac2ee5a91bacdd4b8c966e44175c)

mon: OSDMonitor: only share osdmap with up OSDs

Try to share the map with a randomly picked OSD; if the picked monitor is
not 'up', then try to find the nearest 'up' OSD in the map by doing a
backward and a forward linear search on the map -- this would be O(n) in
the worst case scenario, as we only do a single iteration starting on the
picked position, incrementing and decrementing two different iterators
until we find an appropriate OSD or we exhaust the map.

Fixes: #3629
Backport: bobtail

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3610e72e4f9117af712f34a2e12c5e9537a5746f)

utime: fix narrowing conversion compiler warning in sleep()

Fix compiler warning:
./include/utime.h: In member function 'void utime_t::sleep()':
./include/utime.h:139:50: warning: narrowing conversion of
'((utime_t*)this)->utime_t::tv.utime_t::<anonymous struct>::tv_sec' from
'__u32 {aka unsigned int}' to '__time_t {aka long int}' inside { } is
ill-formed in C++11 [-Wnarrowing]
./include/utime.h:139:50: warning: narrowing conversion of
'((utime_t*)this)->utime_t::tv.utime_t::<anonymous struct>::tv_nsec' from
'__u32 {aka unsigned int}' to 'long int' inside { } is
ill-formed in C++11 [-Wnarrowing]

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 014fc6d6c1c68e2e3ad0117d08c4e46e4030d49e)

rgw: fix crash when missing content-type in POST object

Fixes: #3941
This fixes a crash when handling S3 POST request and content type
is not provided.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit f41010c44b3a4489525d25cd35084a168dc5f537)

ReplicatedPG: make_snap_collection when moving snap link in snap_trimmer

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 88956e3186798058a1170803f8abfc0f3cf77a07)

ReplicatedPG: correctly handle new snap collections on replica

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 9e44fca13bf1ba39dbcad29111b29f46c49d59f7)

mon: Elector: reset the acked leader when the election finishes and we lost

Failure to do so will mean that we will always ack the same leader during
an election started by another monitor. This had been working so far
because we were still acking the existing leader if he was supposed to
still be the leader; or we were acking a new potentially leader; or we
would eventually fall behind on an election and start a new election
ourselves, thus resetting the previously acked leader. While this wasn't
something that mattered much until now, the timechecks code stumbled into
this tiny issue and was failing hard at completing a round because there
wouldn't be a reset before the election started -- timechecks are bound
to election epochs.

Fixes: #3854
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit c54781618569680898e77e151dd7364f22ac4aa1)

rbd: fix bench-write infinite loop

I/O was continously submitted as long as there were few enough ops in
flight. If the number of 'threads' was high, or caching was turned on,
there would never be that many ops in flight, so the loop would continue
indefinitely. Instead, submit at most io_threads ops per offset.

Fixes: #3413
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
(cherry picked from commit d81ac8418f9e6bbc9adcc69b2e7cb98dd4db6abb)