Sage Weil [Fri, 30 Apr 2010 18:05:39 +0000 (11:05 -0700)]
mds: fix trim_dentry on dentry under unlinked dir
We can get a dentry that is trimmable (e.g. null) under a new unlinked dir,
which has no subtree. This will only happen on the auth. In that case,
having no container is harmless--it's only needed for replicas.
This fixes the following crash:
mds/MDCache.cc: In function 'void MDCache::trim_dentry(CDentry*, std::map<int, MCacheExpire*, std::less<int>, std::allocator<std::pair<const int, MCacheExpire*> > >&)':
mds/MDCache.cc:4797: FAILED assert(con)
1: (MDCache::trim(int)+0x214) [0x4ffbc4]
2: (MDS::tick()+0x4c1) [0x48f3b1]
3: (SafeTimer::EventWrapper::finish(int)+0x269) [0x683a89]
4: (Timer::timer_entry()+0x819) [0x685909]
5: (Timer::TimerThread::entry()+0xd) [0x47528d]
6: (Thread::_entry_func(void*)+0x7) [0x48a8a7]
7: /lib/libpthread.so.0 [0x7ffe62356fc7]
8: (clone()+0x6d) [0x7ffe615835ad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sage Weil [Fri, 30 Apr 2010 17:46:17 +0000 (10:46 -0700)]
osd: fix pg_to_acting_osds() calculation to consider pg_temp
The OSD was using pg_to_up_acting, the client uses pg_to_acting, and their
calculations of 'acting' weren't consistent because the latter did not
consider pg_temp.
Sage Weil [Thu, 29 Apr 2010 17:27:02 +0000 (10:27 -0700)]
mon: m->get_session() may return null if session has closed
because the session close clears connection->priv. We need to check at
each site anyway, either for null session, or for session->closed. So
check for null session.
Sage Weil [Fri, 23 Apr 2010 18:07:16 +0000 (11:07 -0700)]
mds: skip client snap notification on unlink
This is cheating a bit, but should be harmless. Basically, we spit off the
snaprealm when we unlink to keep the hierarchy vs snaprealm invariants
intact. But we don't really care if the client does so, so we skip the
client_snap notifications.
That means the client will leave unlinked inodes in the realm they were
in at the time of unlink. I'm pretty sure that won't cause problems
later.
Sage Weil [Wed, 21 Apr 2010 22:23:46 +0000 (15:23 -0700)]
filestore: default to writeahead journal, and no btrfs snaps
At least until btrfs snap deletion doesn't require a full commit (i.e. each
commit cycle doesn't do a commit for the snap creation AND another for the
old snap deletion).
Sage Weil [Tue, 20 Apr 2010 17:47:10 +0000 (10:47 -0700)]
osd: don't capture SIGINT/SIGTERM; journal and/or btrfs snaps are sufficient.
We used to do this to avoid corrupting the filestore, but since we can now
either roll forward with the writeahead journal OR roll back using btrfs
snaps, this is useless. It wasn't a full solution anyway.
Sage Weil [Tue, 20 Apr 2010 03:29:14 +0000 (20:29 -0700)]
mds: clone dentry for multiversion dir if linkage is changing...
...even if the inode itself doesn't need to be cowed. In particular, we
do the pre_cow_old_inode() thing, so it frequently doesn't need to be cowed,
but the dentry does when we are say unlinking a directory or some such.
Jim Schutt [Fri, 16 Apr 2010 22:25:49 +0000 (16:25 -0600)]
autoconf: Fix detection of sync_file_range.
Without this patch, on CentOS 5.4 ./configure reports that
sync_file_range is missing, but HAVE_SYNC_FILE_RANGE ends
up being defined in src/acconfig.h anyway.
Compile tested on CentOS 5.4 (which does not have sync_file_range(2)
in distro glibc) and Fedora 11 (which does).
Sage Weil [Fri, 16 Apr 2010 20:51:58 +0000 (13:51 -0700)]
mds: xlock versionlock on rename if witnesses
This ensures that we don't pipeline dentry linkage updates when there
are witnesses. That can cause problems because replicas don't see
projected dentry linkage info, and will get confused when they look at
the replica of the srcdn and it's, say, NULL and not srci.
Revert "osd: replace the ALLOW_MESSAGES_FROM macro with use of OSDCaps functions"
This reverts commit 9ceb3be9a62f7b777b23a4d65b1cc302ec834e6b.
This will be coming back in another form shortly, but apparently we have
more connections without sessions on the OSD.
Sage Weil [Fri, 9 Apr 2010 22:44:50 +0000 (15:44 -0700)]
osdmap: move pool names into first part of encoding; add version to second part
This gives the kclient access to pool names while still ignoring the
osd_info_t crap. And also will make adding more stuff to the second half
later possible without breaking the kclient.