2016-06-02

Difference in Spacewalk's API and almost dirrect SQL performance

Imagine you want to get list of hosts registered to your Spacewalk, ideally with groups they are registered to and you want to do it repeatedly, so performance matters. Lets measure it.

I have Spacewalk 2.4 on a 2 CPU virtual system with 4 GB or RAM (Virtual, really? Not ideal for perf measurement, I know.) and I have created 1000 system profiles on it. There are 2 ways how to get the data out of the Server: command-line spacewalk-report inventory utility (needs to be run on a system running Spacewalk, queries directly the database) or system API (can be ran from anywhere, but data have to go from DB through spacewalk's Java stack and to XML which is then transferred to you over the network). API script to measure can look like this (well, this does not output obtained data):

#!/usr/bin/env python

import xmlrpclib
import time

server = xmlrpclib.Server('http://<fqdn>/rpc/api')
key = server.auth.login('<user>', '<pass>')
for i in range(100):
  before = time.time()
  systems = server.system.listUserSystems(key)
  for s in systems:
    detail = server.system.getNetwork(key, s['id'])
    groups = server.system.listGroups(key, s['id'])
  after = time.time()
  print "%s %s %s %s" % (len(systems), before, after, after-before)
server.auth.logout(key)

Here are mine results (averages from 100 repetitions performed directly after spacewalk-service restart):

method average duration note
spacewalk-report inventory 1.4 seconds Needs to run directly on Spacewalk
API with system.listUserSystems() only 0.9 seconds Provides systm ID and profile name only (does not equal to hostname)
API with system.listUserSystems() and system.getNetwork() 23.8 seconds Gives you IP and hostname
API with system.listUserSystems() and system.getDetails() 27.5 seconds Gives plenty of info, including hostname, but not groups
API with system.listUserSystems(), system.getNetwork() and system.listGroups() 52.4 seconds Finally, ths one gathers hostname and system groups

So, depends on what you want to achieve and how often do you want to run the script. Also, in API script case, you have to keep login (or logins when you need to run for multiple organizations) somewhere. Fortunatelly you can use read-only API user for this.