When continuous builds go mad...

On a build server far, far away there was a long-running build blocking our continuous integration for hours. One step in our build was waiting for infinity... nobody in the team has access to the build server to kill the mad process.

A fortune gave us access to the shell scripts running during the build and deployment process. With a little shell twiddling:

#!/bin/zsh
# ¸.·´¯`·.´¯`·.¸¸.·´¯`·.¸><(((º>
# .¸¸.·´¯`·.¸><(((º>
#

function alarm_handler {
  echo "[$$] handling alarm"
  kill_subshell
  exit 1
}

function kill_subshell {
  echo "[$$] killing subshell...${PIDS:-$!}"
  kill ${PIDS:-$!}
  if [ $? -eq 0 ]
  then
    echo "[$$] longrunning subshell killed"
  else
    echo "[$$] longrunning subshell ${PIDS} *not* killed due to error"
  fi
}

function set_timer {
  SLEEP_TIME=${1:-10}
  if [ ${SLEEP_TIME} -gt 0 ]
  then
    echo "[$$] timer set to ${SLEEP_TIME} seconds"
    (echo -n "[$$]" && sleep ${SLEEP_TIME} && kill -ALRM $$) & PIDS="${PIDS} $!"
    TIMER_PID=$!
  fi
}

function unset_timer {
  kill ${TIMER_PID}
  if [ $? -eq 0 ]
  then
    echo "[$$] timer successfully unset"
  fi
}

echo "[$$] running a timer test"

trap alarm_handler ALRM
set_timer ${1:-15}

# create a long running subshell ;-)
(for i in 5 10 30 ; do sleep ${i} ; echo -n "[$!] " ; date +"%H:%M:%S" ; done) & PIDS="$!"

echo "[$$] waiting for $!..."
wait $!

unset_timer

echo "[$$] xfinished job in time"
exit 0

...and the continuous build was happily ever after.


Cover photo by Brodie Vissers from Burst