zqkkqz [Fri, 7 Aug 2015 02:49:45 +0000 (10:49 +0800)]
Common/Thread: pthread_attr_destroy(thread_attr) when done with it
When a thread attributes object is no longer required, it should be destroyed using the
pthread_attr_destroy() function. Destroying a thread attributes object has no effect on threads that were created using that object.
Piotr Dałek [Fri, 17 Jul 2015 10:43:52 +0000 (12:43 +0200)]
Thread.cc: remove malloc/free pair
There's no need for mallocing pthread_attr_t in Thread::try_create(),
it can be located on stack as it is freed in same function. This reduces
pressure put on memory manager.
Boris Ranto [Fri, 15 Aug 2014 17:34:27 +0000 (19:34 +0200)]
Fix -Wno-format and -Werror=format-security options clash
This causes build failure in latest fedora builds, ceph_test_librbd_fsx adds -Wno-format cflag but the default AM_CFLAGS already contain -Werror=format-security, in previous releases, this was tolerated but in the latest fedora rawhide it no longer is, ceph_test_librbd_fsx builds fine without -Wno-format on x86_64 so there is likely no need for the flag anymore
Loic Dachary [Tue, 3 Feb 2015 15:14:23 +0000 (16:14 +0100)]
ceph.spec.in: junit always except for EPEL 6
The package was renamed a long time ago (around the Fedora 15
timeframe). The "junit4" name is only relevant for EPEL 6. For EPEL 7
and Fedora 20, the "junit" package has "Provides: junit4". And most
recently, in the junit package that ships in Fedora 21 and 22, the
package maintainer dropped the old Provides: line.
Sage Weil [Wed, 29 Apr 2015 19:34:25 +0000 (12:34 -0700)]
mon: prevent pool with snapshot state from being used as a tier
If we add a pool with snap state as a tier the snap state gets clobbered
by OSDMap::Incremental::propogate_snaps_to_tiers(), and may prevent OSDs
from starting. Disallow this.
Conflicts:
qa/workunits/cephtool/test.sh
properly co-exist with "# make sure we can't create an ec pool tier"
src/mon/OSDMonitor.cc
properly co-exist with preceding "if (tp->ec_pool())"
(The changes to both files would have applied cleanly if
https://github.com/ceph/ceph/pull/5389 had not been merged first.)
Conflicts:
src/test/librados/TestCase.cc
for it of type ObjectIterator:
- use it->first instead of it->get_oid()
- use it->second instead of it->get_locator()
Sage Weil [Fri, 14 Nov 2014 06:32:20 +0000 (22:32 -0800)]
osd/ReplicatedPG: allow whiteout deletion with IGNORE_CACHE flag
If the client specifies IGNORE_CACHE, allow a regular DELETE to zap a
whiteout. Expand test case to verify this works.
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 34e4d24)
Conflicts:
src/test/librados/tier.cc
replaced NObjectIterator -> ObjectIterator
replaced cache_ioctx.nobjects_begin -> cache_ioctx.objects_begin
replaced cache_ioctx.nobjects_end -> cache_ioctx.objects_end
replace it->get_oid() with it->first for it of type ObjectIterator
Sage Weil [Wed, 23 Sep 2015 14:58:01 +0000 (10:58 -0400)]
mon/Elector: do a trivial write on every election cycle
Currently we already do a small write when the *first* election in
a round happens (to update the election epoch). If the backend
happens to fail while we are already in the midst of elections,
however, we may continue to call elections without verifying we
are still writeable.
Conflicts:
src/ceph_fuse.cc
src/ceph_syn.cc
src/libcephfs.cc
src/librados/RadosClient.cc
src/mds/MDSUtility.cc
src/mon/MonClient.cc
src/test/mon/test_mon_workloadgen.cc
- different arguments to Messenger::create() in firefly
Several callers create messengers using exactly the same parameters:
- reading the ms type from cct that is also passed in
- a default entity_name_t::CLIENT
- the default features
Additionally, the nonce should be randomized and not depend on
e.g. pid, as it does in several callers now. Clients running in
containers can easily have pid collisions, leading to hangs, so
randomize the nonce in this simplified constructor rather than
duplicating that logic in every caller.
Daemons have meaningful entity_name_ts, and monitors currently depend
on using 0 as a nonce, so make this simple constructer
client-specific.
Sage Weil [Mon, 19 Jan 2015 00:49:20 +0000 (16:49 -0800)]
mon: handle case where mon_globalid_prealloc > max_global_id
This triggers with the new larger mon_globalid_prealloc value. It didn't
trigger on the existing cluster I tested on because it already had a very
large max.
Sage Weil [Sun, 18 Jan 2015 18:39:25 +0000 (10:39 -0800)]
mon: change mon_globalid_prealloc to 10000 (from 100)
100 ids (session 100 authentications) can be consumed quite quickly if
the monitor is being queried by the CLI via scripts or on a large cluster,
especially if the propose interval is long (many seconds). These live in
a 64-bit value and are only "lost" if we have a mon election before they
are consumed, so there's no real risk here.
Backport: giant, firefly Reviewed-by: Joao Eduardo Luis <joao@redhat.com> Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 1d1215fe5f95c2bafee5b670cdae1353104636a0)
Greg Farnum [Tue, 16 Jun 2015 15:13:41 +0000 (08:13 -0700)]
qa: update to newer Linux tarball
This should make newer gcc releases happier in their default configuration.
kernel.org is now distributing tarballs as .xz files so we change to that
as well when decompressing (it is supported by Ubuntu Precise so we should
be all good).
Sage Weil [Wed, 3 Jun 2015 18:57:34 +0000 (14:57 -0400)]
upstart: limit respawn to 3 in 30 mins (instead of 5 in 30s)
It may take tens of seconds to restart each time, so 5 in 30s does not stop
the crash on startup respawn loop in many cases. In particular, we'd like
to catch the case where the internal heartbeats fail.
This should be enough for all but the most sluggish of OSDs and capture
many cases of failure shortly after startup.
Jason Dillaman [Mon, 10 Aug 2015 23:10:19 +0000 (19:10 -0400)]
WorkQueue: add/remove_work_queue methods now thread safe
These methods were not acquiring the ThreadPool lock when
manipulating the work_queue collection. This was causing
occasional crashes within librbd when opening and closing
images.
Fixes: #12662
Backport: hammer, firefly Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 3e18449b01c1ab78d1bbfc1cf111aa9bdbef7b1f)
Samuel Just [Thu, 27 Aug 2015 18:08:33 +0000 (11:08 -0700)]
PG::handle_advance_map: on_pool_change after handling the map change
Otherwise, the is_active() checks in the hitset code can erroneously
return true firing off repops stamped with the new epoch which then get
cleared in the map change code. The filestore callbacks then pass the
interval check and call into a destroyed repop structure.
mon: MonitorDBStore: make get_next_key() work properly
We introduced a significant bug with 2cc7aee, when we fixed issue #11786.
Although that patch would fix the problem described in #11786, we
managed to not increment the iterator upon returning the current key.
This would have the iterator iterating over the same key, forever and
ever.
Sage Weil [Sun, 9 Aug 2015 14:46:10 +0000 (10:46 -0400)]
osd/PGLog: dirty_to is inclusive
There are only two callers of mark_dirty_to who do not pass max,
and they are both in the merge_log extending tail path. In that
case, we want to include the last version specified in the log
writeout. Fix the tail extending code to always specify the
last entry added, inclusive.
Josh Durgin [Mon, 24 Aug 2015 22:40:39 +0000 (15:40 -0700)]
config: skip lockdep for intentionally recursive md_config_t lock
lockdep can't handle recursive locks, resulting in false positive
reports for certain set_val_or_die() calls, like via
md_config_t::parse_argv() passed "-m".
Loic Dachary [Thu, 13 Aug 2015 11:47:24 +0000 (13:47 +0200)]
osd: trigger the cache agent after a promotion
When a proxy read happens, the object promotion is done in parallel. The
agent_choose_mode function must be called to reconsider the situation
to protect against the following scenario:
* proxy read
* agent_choose_mode finds no object exists and the agent
goes idle
* object promotion happens
* the agent does not reconsider and eviction does not happen
although it should
Yehuda Sadeh [Wed, 26 Aug 2015 21:34:30 +0000 (14:34 -0700)]
rgw: init some manifest fields when handling explicit objs
Fixes: #11455
When dealing with old manifest that has explicit objs, we also
need to set the head size and head object correctly so that
code that relies on this info doesn't break.
Jason Dillaman [Fri, 21 Aug 2015 15:32:39 +0000 (11:32 -0400)]
Objecter: pg_interval_t::is_new_interval needs pgid from previous pool
When increasing the pg_num of a pool, an assert would fail since the
calculated pgid seed would be for the pool's new pg_num value instead
of the previous pg_num value.
Fixes: #10399
Backport: infernalis, hammer, firefly Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit f20f7a23e913d09cc7fc22fb3df07f9938ddc144)
Conflicts: (hobject_t sort order not backported, trivial resolution)
src/osdc/Objecter.cc
src/osdc/Objecter.h
Dan van der Ster [Tue, 18 Nov 2014 14:51:46 +0000 (15:51 +0100)]
ceph-disk: don't change the journal partition uuid
We observe that the new /dev/disk/by-partuuid/<journal_uuid>
symlink is not always created by udev when reusing a journal
partition. Fix by not changing the uuid of a journal partition
in this case -- instead we can reuse the existing uuid (and
journal_symlink) instead. We also now assert that the symlink
exists before further preparing the OSD.
Fixes: #10146 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Tested-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 29eb1350b4acaeabfe1d2b19efedbce22641d8cc)
Dan van der Ster [Mon, 29 Sep 2014 11:20:10 +0000 (13:20 +0200)]
ceph-disk: set guid if reusing a journal partition
When reusing a journal partition (e.g. /dev/sda2) we should set a
new partition guid and link it correctly with the OSD. This way
the journal is symlinked by its persistent name and ceph-disk list
works correctly.
Xinze Chi [Fri, 3 Jul 2015 10:27:13 +0000 (18:27 +0800)]
mon/PGMonitor: bug fix pg monitor get crush rule
when some rules have been deleted before, the index in array of crush->rules
is not always equals to crush_ruleset of pool.
Fixes: #12210 Reported-by: Ning Yao <zay11022@gmail.com> Signed-off-by: Xinze Chi <xmdxcxz@gmail.com>
(cherry picked from commit 498793393c81c0a8e37911237969fba495a3a183)
wuxingyi [Wed, 11 Mar 2015 09:34:40 +0000 (17:34 +0800)]
rgw/logrotate.conf: Rename service name
The service name for ceph rados gateway was changed to "ceph-radosgw",
the previous version of service name "radosgw" would cause a failed reload,
and finally make it impossible to write any log data to the log file.
Conflicts:
qa/workunits/cephtool/test.sh
no "# make sure we can't clobber snapshot state" tests in firefly
src/mon/OSDMonitor.cc
no tp->removed_snaps.empty() in firefly