Machine image management is an old topic, but I think one worth revisiting. When automating server provisioning, how much do you put into the machine image and how much do you do at instance start time?
Background Information
First a bit of background. When provisioning servers in the cloud, you start from a machine image. In AWS terminology, this is called an Amazon Machine Image or AMI. Community AMIs are available for all sorts of operating systems and software products. You can also find AWS Marketplace AMIs when you can get pay by the hour licensing for commercial products and support. An alternative is to create your own AMIs manually or with a tool like HashiCorp Packer.
Machine Image Customization
Once you have an image, you start your EC2 instance from that image. These usually need customization and configuration to get them to where you want them to be. That is when the second part of the equation comes in, boot time configuration. There are many tools to handle configuration at boot time, but on AWS the common way is using a user data script. User data is a way to pass information to your instance at startup that can be used to customize or configure the base image. The customization or configuration need not be the same on each instance started from the same image. For example, you could pass a different domain name to each instance.
How Much to Include in Your Machine Image
The question for those deploying and managing applications is then how much logic to put into the base AMI from which all server instances are created versus how much customization and configuration to do after instance startup. The process is generally referred to as baking into the AMI. All things being equal, it is more flexible to put as little into the AMI as is possible. In the real world, instance startup configuration usually takes more time than when it is built into the AMI.
Consider the situation where you need to install an Apache web server on an instance. You have the option to include it in the AMI, so that every instance started with that AMI has Apache installed, or you could install it at instance startup. It may only take seconds or minutes to install Apache, but you also need to configure it for your situation, say adjusting SSL certificates, domain names, or .htaccess files. You may opt to install the base version of Apache on the AMI and do the individual configuration on instance startup. When you include Apache on the AMI, you then have to rebuild your AMI if you want to switch to a newer version of Apache. Maybe you decide to switch to NGINX instead. Now, also imagine that these web servers are part of your companies website hosting. There are potentially dozens of web servers and you want to be able to quickly scale the number of servers as demand increases. In this case, you may value speed of instance startup over configurability at instance startup.
Conclusion
So in the end, there is no “right” answer to how much logic to bake into the AMI. It’s a series of trade-offs that depend largely on your requirements and your teams abilities. Here are some of the questions you consider when trying to decide where to put your configuration and application installation logic.
How frequent do your machine images change?
Very frequently? Daily? It is still viable to include all or most of your logic in the AMI as long as you automate the process of creating images. You don’t really want to do this manually if it is done frequently. You don’t really want to do it manually anyway for a number of other reasons including reliability. AMIs would always change when OS versions are changed, which you should do to get the latest security patches.
How frequently do you provision/deprovision instances?
If this happens multiple times/day or hour, they you may want to consider including as much as possible into the AMI. You gain more benefits of elasticity when your instances start quickly, saving you money in the end.
How much configuration is necessary at startup?
If you use the same base machine image for all of your different application server types, e.g. web server, application server, message queue, etc., then a lot of configuration at startup is likely necessary. This should be automated. It also lends itself to doing more configuration at instance startup time.
How much business overhead is involved with certifying images?
Some enterprises have stringent security requirements around the review and use of AMIs. If your business requires that a security team review each new image, it may not make sense for you to produce images frequently. Configuration at startup may be a better way to go.