]> git-server-git.apps.pok.os.sepia.ceph.com Git - teuthology.git/commit
misc: temporary fix for "No space left on device" errors 1335/head
authorNathan Cutler <ncutler@suse.com>
Wed, 16 Oct 2019 21:04:42 +0000 (23:04 +0200)
committerNathan Cutler <ncutler@suse.com>
Thu, 17 Oct 2019 12:19:47 +0000 (14:19 +0200)
commitb2f1bca2fbd15fde497974ead4fae0750a62299b
tree5cf1d65ee8211e1c630c877b3d9f26cab7fcf549
parent40070d6e92386872210079847964cef4fabd84b8
misc: temporary fix for "No space left on device" errors

41a13eca480e38cfeeba7a180b4516b90598c39b fixed a longstanding bug that the lab
was relying on. Before the bug was fixed, the get_wwn_id_map function was doing:

    try:
        r = remote.run(
            args=[
                'ls',
                '-l',
                '/dev/disk/by-id/wwn-*',
            ],
            stdout=StringIO(),
        )
        stdout = r.stdout.getvalue()
    except Exception:
        log.info('Failed to get wwn devices! Using /dev/sd* devices...')
        return dict((d, d) for d in devs)

The bug was that "remote.run" was putting single quotes around the string
"/dev/disk/by-id/wwn-*" because it wasn't enclosed in Raw(...). The single
quotes were causing the command to fail, triggering the except clause, and that
was happening 100% of the time.

The fix in 41a13eca480e38cfeeba7a180b4516b90598c39b caused the command to start
succeeding, which caused execution to continue. As a result, MON stores and
OSDs started getting created on the wrong devices, and tests that were
previously succeeding started to fail due to "No space left on device".

In short, the wwn devices on today's smithis are not big enough for
/var/lib/ceph.

This commit "fixes the fix" by dropping the dead code and always returning the
value that qa/tasks/ceph.py has come to expect.

Fixes: https://tracker.ceph.com/issues/42313
Signed-off-by: Nathan Cutler <ncutler@suse.com>
teuthology/misc.py