Michael Nygard is a professional programmer and architect with over 15 years experience, and also the author of “Release It! Design and Deploy Production-Ready Software”.
Having encountered this book on many “Top Books a Software Engineer Must Read” lists, I thought i’d give it a try.
This will not be a in-depth review, as there are already many out there, but rather a summary of ideas that will stay with me after this first read.
Michael’s book is all about the problematics related to successfully executing software systems in production, and although it is centered on high availability and large scale distributed systems, many of the discussed topics can be applied to any software system.
The book covers all the possible elements and layers of software engineering I can think of: management, delivery, testing, design, coding…, front-end, back-end, deployment, etc., and rather than trying to answer all the questions, it tries to point out the good direction in which to go.
This is certainly a book that I expect to come back to in the future, as the ideas and scenarios that will stay with me after this first read are only the ones which I can relate to, because I have encountered them in current and past projects.
Stability and Testing
The first part of the book discusses stability, and what should be done in order to achieve a high availability 24/7 system that is able to meet strict SLA requirements.
One aspect that I could relate to, and I found to be of particular interest, was the topic of how to and what to exactly test when assuring the stability of your system.
For example, the book mentions the use of a ”nasty test harness”: Your system should be mistreated and abused in all the possible ways you can imagine.
This is something I encountered in the past: the complexity of some projects made it not feasible to test the complete workflow of all test cases.
Instead, in a reductio ad absurdum style, a certain time frame was established in which the system was punished in order to find a unstable system state.
If it is not possible to test for stability, then the search for instability could help find some weak spots in your system.
Another idea that I can relate to, and which I have witnessed, is that in an EAI / SOA / ESB scenario, where your system is forced to live and communicate with strange and alien systems, every integration point is a major risk in your system stability.
Unit tests, functional tests, integration tests, point to point tests, component tests…etc…all test the system’s behavior under normal or spec conditions, but do you know how your system behaves in out of spec conditions?
In effect, for every integration point, it is important to know what happens to your system if it receives unexpected or inexistent responses.
The same can be said in the other direction: how do the other systems react if your system sends out strange or inexistent responses?
Another topic which I found to be interesting was “Longevity Testing”.
The idea of longevity testing is to have production-like tests run in a production-like frequency against a long running production-like server, in order to detect otherwise undetectable system failures, such as the dreaded OutOfMemory exception (and avoid shameful quick fix production daily server restarts).
Application Monitoring and Resource Use
Having worked in the past in a Winston Wolf role, that is, as a technical consultant, I got to know the hard way just how important it is to be able to easily monitor and diagnose your system in production environments.
Michael’s book highlights this, and points out what vital parts of the system should be monitored.
One topic that is discussed, and that I can relate to, is probably the number one output of monitoring information that a software system in production relies on: logging.
In my experience, each time a real-time debug session is needed to fix a bug, and logging was not enough, a risk emerges: the system could not be telling us something important.
The simple fact of not logging certain information, could make fixing bugs that appear in production a long and weary task: sometimes data exports, message dumps and others are not possible, and reproducing production settings in a development environment for analysis can be a difficult or even a impossible task.
Release it! also outlines how important it is to understand exactly how your system is using resources.
These resources vary from the ”big four” (CPU, Memory, Storage, Bandwidth), to resource pools, threads, database connection pools, and any other resource.
Development and production metrics can help on understanding where our system can break, and even avoid future disasters.
After reading this book, one becomes resource-paranoid, which is probably a good thing.
Every time you will work with a connection pool, set an isolation level, start a transaction, or even set a JVM property, you will probably find yourself wondering what happens to your system if no connection is availablel, if the transaction fails or if your application server dies.
Architecture, Patterns and Anti-Patterns
Another significant topic treated in the book is system architecture, patterns and anti-patterns.
What will stick with me, and which is quite obvious, but I find never to be mentioned enough, is the idea that the more coupled your systems are in your landscape, the more risk for global failure.
And to this I would add, the more systems in your landscape the higher the risk.
The first rule in simplifying an integration scenario is to eliminate systems.
If, for whatever reasons (political and contractual are the most often and most unjustifiable ones) this is not possible, several patterns are presented to avoid tight coupling between systems, and examples of just how one strong dependency can bring down the whole system are described.
Conclusion
Release it! is a great book for people who have some relevant experience in taking a software system to production.
The content exposed is vast enough to be useful in almost any software project, although readers working on high availability distributed systems will get the most out of it.
The book has a wonderful pace, and Michael gives some great real life examples of how systems can break in both a spectacular and frightening manner.
Easy reading and a valuable source of knowledge, not to be missed!
Related posts:
Tags: Knowledge Source, Michael Nygard