Continuous Delivery in depth #2
The not so lean side
Remember issue #1 published in the summer? We are back with the next part in the series, wearing the hat of Pitfall Harry to look at some of the issues we have come across and how these have impacted our day-to-day job. We also include some tips for overcoming them.
First things first: Jenkins’ pipelines are an awesome improvement over basic Jenkins funcionalities, allowing us to easily build complex continuous delivery flows, with extreme reusability and maintainability attributes.
Having said this, pipelines are code. And code is written by human beings. Human beings make mistakes. Such errors are reflected as software defects and execution failures.
This post will take a look at some of the defects, pitfalls and limitations of our (amazing) Jenkins’ pipelines, defining some possible workarounds.
The timeout step is the current final defence against hung runs. If a run usually averages X minutes, X*3 is, with almost 100% certainty, a hung run. And as nobody likes zombies crawling around their house, better to dispose of them.
As 90% of our methods run enclosed in withX elements, we are currently dealing with this issue, forcing us to hand-abort runs.
Workaround: Wait and cross fingers. Manual actions applicable when needed.
Jenkins serializes objects at several points to be ready for surviving restarts or node failures. This is a really nice feature for overcoming temporal service loss and restarts. The serialization to disk allows unserializing if some actions need to be restarted. Checkpoints plugin will probably use this serialization for its own good. Whenever a non-freed non-serializable object exists and Jenkins tries to serialize it, BOOM! NonSerializableException happens.
A NonSerializableException will be thrown, stopping the job run. Pretty big Impact, right?
Workaround: Watch out for iterators, matchers, jsonslurpers, or any other class not implementing the java Serializable interface usage. As it is difficult to infer what class are you working with, as Groovy can be untyped or even inner classes could be unknown (JsonSlurper usesa LazyMap internally not a Serializable), testing will throw a hand.
@NonCPS annotation is useful to overcome this, as no serialization will happen on the annotated method. It is not a 100% solution as you cannot invoke pipeline steps inside such methods.
We need to test to ensure the implemented functionality is properly coded. Corner cases are handled and no NonSerializableException are thrown. But – unfortunately – there is no testing harness for complex pipelines.
No tests, no fun. To ensure a solid code, testing is a must.
Workaround: Testing needs to be done manually, using Jenkins’ script console, auxiliary jobs or by replaying jobs. Another option is to limit the possible scope of error and to let the run continue.
Two left hands
Humans write pipeline libraries. As there is no way to locally build the pipeline itself, invalid code could end up in the SCM tool. Badly-written code will lead to errors whilst parsing it, so the affected runs won’t continue.
Workaround: Eat your own dogfood. PRs for each functionality, with its own pipeline. Static code analysis and code reviews will help.
Node connection loss
Remember the NonSerializableException pitfall? It was meant to enable jenkins to survive restart or node reconnections. A problem arises when the agent is a cloud-provided one. Such agents will have a variable name (usually with a generated hash) that uniquely identifies each one. This inhibits a job from switching to another agent, even if the first one is not currently alive.
Our environment is stacked over a docker swarm, so this happens rather frequently.
Workaround: Put up with it or propose a change to the docker plugin. Cloudbees have solved this snag at their Jenkins Operations Center.
Pipelines as code are great (have I not already said this?). They allow us to implement complex structures. One of these is a Map <String, GString>, and each GString is composed with several lazy evaluated GStrings (who said Inception?), each with another lazy evaluated vars (AKA another GString). This structure, probably coming from a CPS issue, led us to the wrong top-level GString evaluation. Gstings are objects with two arrays: strings and values. Those arrays get blended on a call or toString invocation, resulting in an eat String object.
This is probablly the toughest issue we had to overcome and it almost put our project on hold. The data structures we needed were heavily impacted by this issue of Lazy evaluation + further GStrings.
Workaround: The first inmediate idea would be to NonCPS the affected code: WRONG! The returned string was also incorrect, cut at the first GString string element (from the strings array).
We ended up parsing an Object with class GStringImpl, and returning its arrays of strings and values properly placed. This is known as the famous gStringyHackifier method.
These are a few of the remarkable issues we have had to deal with. We share them with you as a kind of reference in case any of you encounter the same, thus providing tips on how to continue pipelining.
Automation fanatic, continuous tasks (inspection, testing, delivery) evangelist and perpetual new-knowledge addict.