A workaround for a powerpc saltstack bug

From PDP/Grid Wiki
Jump to navigationJump to search

With the arrival of the first Power9 systems as data movers in our storage setup we faced the many quirks and glitches that a whole new architecture brings to our configuration management. While the commit log of the salt/reclass repository gives testimony to that effort, it is necessary to point out an issue that actually lives upstream and has been open and unresolved for a year in spite of being nearly trivial.

There are several things going on here and the journey to find both cause and a decent workaround are worthwhile to share in a public space such as this wiki. The issue is transient as it should eventually be resolved and a software update will bring this to close but for the moment we have to remain aware that there is a sizable workaround in place.

First of all when you look at a Power9 system installed with CentOS7 observe the output of uname -a:

Linux hooikanon-01.nikhef.nl 4.14.0-115.el7a.0.1.ppc64le #1 SMP Sun Nov 25 20:39:39 GMT 2018 ppc64le ppc64le ppc64le GNU/Linux

Whether it is the machine hardware name, processor type or hardware platform, the common theme is 'ppc64le' meaning powerPC, 64 bit, Little-endian.

Now ask the salt system for its grains and it will report this nugget:

[root@hooikanon-01 ~]# salt-call grains.get osarch
local:
    powerpc64le

Notice that this string is objectively different from what we saw above. A deep dive in the python code reveals that this is derived from the output of

rpm --eval %{_host_cpu}

and already a very questionable choice for a method by which to derive OS information. In this case it is literally asking a package management system what it knows about the CPU, but that is not even true: that particular macro in the RPM packaging system is set at build time (i.e. when the RPM software itself was built) to whatever the output of automake told it. Automake is of course part of that venerable but dated GNU build system that still covers the majority of open source software but is by no means the only one.

So to recap: whatever Automake's idea of the host cpu was on the build machine where the rpm software was compiled now decides what the osarch grain should be on the runtime machine where the rpm command is run.

(At this moment I am not even sure what this grain would be when the machine is installed with Debian.)

The mismatch between the two strings leads to at least one problem when installing packages. Investigation of the code showed how salt needs to treat the architecture of installed packages in a special way on x86_64 systems, because these allow the installation of 32-bit software alongside the 64-bit versions. So a package like libc is different from libc.i386. So when dealing with lists of installed packages, salt will internally tack on the architecture label, but strip it off unless it needs special care (like i386). Guess what it will try to match the rpm architecture with?

It is the osarch grain. And because that does not match the architecture field in the rpm packages, none of the os-dependent software packages get their architectures stripped. The PyYAML packages is reported as PyYAML.ppc64le so when you ask if PyYAML is installed it will tell you no.

If you think that may be innocent, it really isn't because when calling on the pkg.installed state to install packages, first yum is asked to install packages (and yum will say, ok, sure, done) but then this is checked by asking whether the packages are installed and because of the above bug the answer is no except for architecture independent packages.

That is the basis for the pull request that maps powerpc64le onto ppc64le when resolving the package name.

Here we find ourselves in a tricky position. Because the pull request is not currently part of any release of saltstack, we cannot upgrade to solve the issue (yet). And because the fix is in the file /usr/lib/python2.7/site-packages/salt/utils/pkg/rpm.py we cannot override its functionality through the modules extension system of saltstack.

But what we can do is override the yumpkg.py module that ultimately uses the utils/pkg/rpm.py file, by taking the offending code from the latter and sticking it into our own version of the former. With a little bit of rerouting calls in Python we can make sure we use the correct version of resolve_name(). And for the moment this fix lives in /srv/salt/env/.../files/_modules/yumpkg.py. It should be removed once the fix comes out in a later saltstack release.

By the way, there is one more thing wrong with the get_osarch() code; it takes the output of subprocess.Popen() but does not take into account the fact that the output has a newline at the end. So a spurious newline is present at the end of the osarch value and that messes with the mapping in the fix. So our version of the fix has an extra strip() to get rid of the newline.


Diff between the original yumpkg.py and the temporary fix

--- /tmp/origyumpkg.py	2019-08-15 16:04:57.814288965 +0200
+++ /tmp/newyumpkg.py	2019-08-15 16:04:42.894251743 +0200
@@ -14,6 +14,13 @@
     automatically in place of YUM in Fedora 22 and newer.
 
 
+# NOTE: this is a local override to pull some of the salt.utils.pkg.rpm
+# functions into this module directly and modify them; there is no
+# way to override salt.utils.pkg.rpm directly.
+#
+# This concerns the resolve_name function which has a bug dealing
+# with powerpc architectures (ppc64le) and package name resolution.
+
 # Import python libs
 from __future__ import absolute_import, print_function, unicode_literals
 import contextlib
@@ -24,6 +31,7 @@
 import os
 import re
 import string
+import sys
 
 # pylint: disable=import-error,redefined-builtin
 # Import 3rd-party libs
@@ -69,7 +77,6 @@
 # Define the module's virtual name
 __virtualname__ = 'pkg'
 
-
 def __virtual__():
     
     Confine this module to yum based systems
@@ -88,6 +95,52 @@
         return __virtualname__
     return (False, "Module yumpkg: no yum based system detected")
 
+# These functions are lifted from salt.utils.pkg.rpm
+ARCHMAP = {'powerpc64le': 'ppc64le'}
+
+def resolve_name(name, arch, osarch=None):
+    
+    Resolve the package name and arch into a unique name referred to by salt.
+    For example, on a 64-bit OS, a 32-bit package will be pkgname.i386.
+    
+    if osarch is None:
+        osarch = salt.utils.pkg.rpm.get_osarch()
+
+    # fix a whitespace bug in the get_osarch function
+    osarch = osarch.strip()
+    
+    if not salt.utils.pkg.rpm.check_32(arch, osarch) and arch not in (ARCHMAP.get(osarch, osarch), 'noarch'):
+        name += '.{0}'.format(arch)
+    return name
+
+def parse_pkginfo(line, osarch=None):
+    
+    A small helper to parse an rpm/repoquery command's output. Returns a
+    pkginfo namedtuple.
+    
+    try:
+        name, epoch, version, release, arch, repoid, install_time = line.split('_|-')
+    # Handle unpack errors (should never happen with the queryformat we are
+    # using, but can't hurt to be careful).
+    except ValueError:
+        return None
+
+    name = resolve_name(name, arch, osarch)
+    if release:
+        version += '-{0}'.format(release)
+    if epoch not in ('(none)', '0'):
+        version = ':'.join((epoch, version))
+
+    if install_time not in ('(none)', '0'):
+        install_date = datetime.datetime.utcfromtimestamp(int(install_time)).isoformat() + "Z"
+        install_date_time_t = int(install_time)
+    else:
+        install_date = None
+        install_date_time_t = None
+
+    return salt.utils.pkg.rpm.pkginfo(name, version, arch, repoid, install_date, install_date_time_t)
+
+# End of additions
 
 def _strip_headers(output, *args):
     if not args:
@@ -173,6 +226,7 @@
     names are long) retrieving the name, version, etc., and return a list of
     pkginfo namedtuples.
     
+    log.debug("Using override module yumpkg")
     cur = {}
     keys = itertools.cycle(('name', 'version', 'repoid'))
     values = salt.utils.itertools.split(_strip_headers(output))
@@ -184,9 +238,10 @@
             except ValueError:
                 cur['name'] = value
                 cur['arch'] = osarch
-            cur['name'] = salt.utils.pkg.rpm.resolve_name(cur['name'],
-                                                          cur['arch'],
-                                                          osarch)
+            # use resolve_name locally instead of salt.utils.pkg.rpm.resolve_name
+            cur['name'] = resolve_name(cur['name'],
+                                       cur['arch'],
+                                       osarch)
         else:
             if key == 'version':
                 # Suppport packages with no 'Release' parameter
@@ -669,7 +724,8 @@
                                      python_shell=False,
                                      output_loglevel='trace')
         for line in output.splitlines():
-            pkginfo = salt.utils.pkg.rpm.parse_pkginfo(
+            # use local parse_pkginfo instead of salt.utils.pkg.rpm.parse_pkginfo
+            pkginfo = parse_pkginfo(
                 line,
                 osarch=__grains__['osarch']
             )