Kefu Chai [Wed, 17 Aug 2016 06:45:18 +0000 (23:45 -0700)]
rocksdb: disable tcmalloc if disabled
the rocksdb/configure.ac does not support --with-tcmalloc before,
and the updated rocksdb commit has the change for '--with-tcmalloc'
option, so let's pick it up.
Ken Dreyer [Thu, 11 Aug 2016 23:11:41 +0000 (17:11 -0600)]
doc: fix by-parttypeuuid in ceph-disk(8) nroff
Commit 221efb0b893adbfd7a19df171cf967fee87afcc7 altered the rST source
for the ceph-disk man page. In Hammer, we also have to modify the nroff
sources, because static copies of the generated man pages are stored in
Git.
Fixes: http://tracker.ceph.com/issues/15867 Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Nathan Cutler [Thu, 11 Aug 2016 20:58:33 +0000 (22:58 +0200)]
Merge pull request #9741 from SUSE/wip-16343-hammer
hammer: boost uuid makes valgrind complain
Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Ken Dreyer <kdreyer@redhat.com> Reviewed-by: Nathan Cutler <ncutler@suse.com>
By calling reweight_by_utilization() method, we are aiming at an evener result
of utilization among all osds. To achieve this, we shall decrease weights of
osds which are currently overloaded, and try to increase weights of osds which
are currently underloaded when it is possible.
However, we can't do this all at a time in order to avoid a massive pg migrations
between osds. Thus we introduce a max_osds limit to smooth the progress.
The problem here is that we have sorted the utilization of all osds in a descending
manner and we always try to decrease the weights of the most overloaded osds
since they are most likely to encounter a nearfull/full transition soon, but
we won't increase the weights from the most underloaded(least utilized by contrast)
at the same time, which I think is not quite reasonable.
Actually, the best thing would probably be to iterate over teh low and high osds
in parallel, and do the ones that are furthest from the average first.
Resolved by picking the lambda implemenation.
NOTE: Because hammer does not support C++11, the lambda functionality from the
current master has been moved into the "Sorter" function object.
Kefu Chai [Thu, 12 May 2016 12:28:11 +0000 (20:28 +0800)]
osd: reset session->osdmap if session is not waiting for a map anymore
we should release the osdmap reference once we are done with it,
otherwise we might need to wait very long to update that reference with
a newer osdmap ref. this appears to be an OSDMap leak: it is held by an
quiet OSD::Session forever.
the osdmap is not reset in OSD::session_notify_pg_create(), because its
only caller is wake_pg_waiters(), which will call
dispatch_session_waiting() later. and dispatch_session_waiting() will
check the session->osdmap, and will also reset the osdmap if
session->waiting_for_pg.empty().
Sage Weil [Thu, 10 Mar 2016 14:50:07 +0000 (09:50 -0500)]
log: do not repeat errors to stderr
If we get an error writing to the log, log it only once to stderr.
This avoids generating, say, 72 GB of ENOSPC errors in
teuthology.log when /var/log fills up.
Conflicts:
src/log/Log.cc (drop m_uid and m_gid which are not used in hammer;
order of do_stderr, do_syslog, do_fd conditional blocks is reversed in
hammer; drop irrelevant speed optimization code from 5bfe05aebfefdff9022f0eb990805758e0edb1dc)
mds: only open non-regular inode with mode FILE_MODE_PIN
ceph_atomic_open() in kernel client does lookup and open at the same
time. So it can open a symlink inode with mode CEPH_FILE_MODE_WR.
Open a symlink inode with mode CEPH_FILE_MODE_WR triggers assertion
in Locker::check_inode_max_size();
Multi-delete is triggered by a query parameter on POST, but there are
multiple valid ways of representing it, and Ceph should accept ANY way
that has the query parameter set, regardless of what value or absence of
value.
This caused the RubyGem aws-sdk-v1 to break, and has been present since
multi-delete was first added in commit 0a1f4a97da, for the bobtail
release.
Fixes: http://tracker.ceph.com/issues/16618 Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
(cherry picked from commit a7016e1b67e82641f0702fda4eae799e953063e6)
shun-s [Tue, 28 Jun 2016 07:30:16 +0000 (15:30 +0800)]
replcatedBackend: delete one useless op->mark_started as there are two in ReplicatedBackend::sub_op_modify_impl
delete one mark_start event as there are two same op->mark_started in ReplicatedBackend::sub_op_modify_impl Fixes: http://tracker.ceph.com/issues/16572 Signed-off-by: shun-s <song.shun3@zte.com.cn>
rgw: Set Access-Control-Allow-Origin to a Asterisk if allowed in a rule
Before this patch the RGW would respond with the Origin send by the client in the request
if a wildcard/asterisk was specified as a valid Origin.
This patch makes sure we respond with a header like this:
Access-Control-Allow-Origin: *
This way a resource can be used on different Origins by the same browser and that browser
will use the content as the asterisk.
We also keep in mind that when Authorization is send by the client different rules apply.
In the case of Authorization we may not respond with an Asterisk, but we do have to
add the Vary header with 'Origin' as a value to let the browser know that for different
Origins it has to perform a new request.
More information: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS
Conflicts:
src/mon/Monitor.cc (the signature of Monitor::reply_command()
changed a little bit in master, so adapt the
commit to work with the old method)
Conflicts:
src/rgw/rgw_user.cc The "if (op_state.will_purge_keys())" block was
later changed to "always purge all associated keys" by e7b7e1afc7a81c3f97976f7442fbdc5118b532b5 - keep the hammer version
Jianpeng Ma [Tue, 14 Apr 2015 01:11:58 +0000 (09:11 +0800)]
osd: Fix ec pg repair endless when met unrecover object.
In repair_object, if bad_peer is replica, it don't add soid in
MissingLoc for ec pool. If there are more bad replica for ec pool
which cause object can't recover, the later recoverying will endless.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit d51806f5b330d5f112281fbb95ea6addf994324e)
Sage Weil [Mon, 24 Aug 2015 18:51:47 +0000 (14:51 -0400)]
uuid: use boost::random:random_device
The boost mt code uses uninitialized memory for extra randomness,
which is a bad idea in general but more importantly makes valgrind
unhappy. Use /dev/urandom instead.
Unfortunately this introduces a link time dependency.. meh!
Signed-off-by: Rohan Mars <code@rohanmars.com> Reviewed-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 62bfc7a1ab1587e81ed3bff0ddfbb1aa69d1c299)
Conflicts:
debian/control (trivial resolution)
src/common/Makefile.am (trivial resolution)
src/common/blkdev.cc (no get_device_by_uuid() function in hammer)
Yehuda Sadeh [Thu, 26 Mar 2015 00:35:40 +0000 (17:35 -0700)]
rgw: identify racing writes when using copy-if-newer
When copying an object from a different zone, and copy-if-newer is
specified, if the final meta write is canceled check whether the
destinatioin that was created is actually newer than our mtime,
otherwise retry.
Yehuda Sadeh [Wed, 23 Mar 2016 01:14:57 +0000 (18:14 -0700)]
rgw: convert plain object to versioned (with null version) when removing
Fixes #15243
When removing a plain null versioned object (was created prior to bucket versioning
enabled), need to convert the bucket index representation to a versioned one. This
is needed so that all the versioning mechanics play together.
Conflicts:
src/rgw/rgw_rados.cc
- hammer is missing get_zone() API from which log_data can be
obtained. Needed to fall back to zone_public_config
structure in bucket_index_unlink_instance() definition.
- olh_tag string parameter added to
bucket_index_unlink_instance() definition.
src/rgw/rgw_rados.h
- olh_tag string parameter added to
bucket_index_unlink_instance() declaration.
Yehuda Sadeh [Thu, 5 May 2016 21:02:25 +0000 (14:02 -0700)]
rgw: handle stripe transition when flushing final pending_data_bl
Fixes: http://tracker.ceph.com/issues/15745
When complete_writing_data() is called, if pending_data_bl is not empty
we still need to handle stripe transition correctly. If pending_data_bl
has more data that we can allow in current stripe, move to the next one.
Sage Weil [Fri, 6 May 2016 13:09:43 +0000 (09:09 -0400)]
osdc/Objecter: upper bound watch_check result
This way we always return a safe upper bound on the amount of time
since we did a check. Among other things, this prevents us from
returning a value of 0, which is confusing.