I want to build static sites to capitalize on really cheap hosting via S3. I have probably a dozen sites I’ll host on here, so all solutions will be 10x more expensive. I also want to know if building a CI/CD platform on Lambda is feasible. With systems like Jenkins, it’s really hard to hit 100% utilization. Lambda gives us that possibility, provided this all works.
My code for reference: lambda-hugo-builder
Building a static site with Hugo is pretty straightforward:
- Check out the code from the repo
- Run the Hugo binary
- Upload the resulting static files to the web host.
Seems simple enough. Why was this such a bugger? Well…
At first, I set up a CodeBuild job that used a custom Docker image to build and push the files to S3. I was happy with this, but there wasn’t a way to trigger CodeBuild from CodeCommit. I could use a Lambda function to trigger the build, but it felt dumb to me. If I have to invoke Lambda, I want to do the whole build process in Lambda.
Then, I found out that CodePipeline can respond to events from CodeCommit. I was determined to use CodeCommit for my source control, so I decided to build a pipeline. It was really easy to set up. Repo events triggered the pipeline, which built the files using the same image from before. This was great, until I found out that pipelines cost $1 per month they’re active, plus the underlying resources used. I’m trying to build really cheap websites, so tacking on $1 per month for each one wasn’t going to work. If I were to run the smallest Lambda to rack up $1 in costs, I’d need to have almost 500,000 seconds of execution per month. As I found out later, each execution is about 20 seconds, so that would be about 25,000 builds per month. There’s no way I can write that many blog posts in a month.
Ok, so to handle this in Lambda, I need a git client. I wasn’t feeling like branching out of Python, so I found a library called Dulwich that does all the git commands. Actually, Dulwich is a low-level implementation of the API. Porcelain is the high-level version which mimics the CLI commands. Porcelain is included in the library.
So, I wrote a Docker image that would package up my python code so I could test the Lambda. The initial package was about 5MB. Nice and sleek. The only thing I had in there was the Hugo executable, so it should be small.
Once I had a reliable packaging solution, I started testing the checkout. I needed a way for my Lambda to access the code in CodeCommit. The CodeBuild job had a role that had access, so I tested with that one. It would not connect. I tried creating a new role with the same policies, or with the same permissions inline. None of them would connect. After fighting with it for a while, I gave up and created a user account in IAM with CodeCommit credentials. Using these in the url, I was able to connect and clone the repo. But, we can’t commit credentials to version control. That’s a big no-no.
Never commit credentials to version control, public or private.
Instead, I put the credentials in the Parameter Store in SSM. I encrypted the password with a KMS key. I did all this manually, but it should be automated somehow. Side note: the version of boto3 that is currently bundled with Lambda does not support the get_parameter method, so I had to start including the most recent boto3 in my package. It swelled to around 8MB. Still not terrible, but not as sleek as before.
After having credentials for the repo, I set out to run the Hugo binary. It was pretty simple to use subprocess to run it and build the files. I had to fiddle a little with the file paths, but it wasn’t difficult.
With the Hugo build complete, I needed to push the files to the S3 bucket for hosting. I looked at doing this with raw boto3 commands, but I didn’t want to write the looping logic to get all the files in the right places. So I bundled the awscli into the package. It swelled to almost 13MB. Kind of disappointing. I had trouble getting the aws wrapper into the package. When I did a
pip install -t, the libraries would install, but the executable was nowhere to be found. I found some solutions about pulling it from the raw github links. That seemed a little unreliable to me, so I went overkill on it. In a second step of the package process, I set up a virtualenv for the code. I install the same requirements file, only into the virtual environment. This actually installs the wrapper. I copy the wrapper out of the virtualenv and into my build.
Using subprocess again, I was able to use the
aws s3 sync command to push the files to the hosting bucket reliably. After fighting with some of the S3 hosting fun, covered previously, the build was working as expected. Runtime about 20 seconds. That’s much faster than the few minutes the CodeBuild job took.
I spent the next few hours getting the whole deployment into CloudFormation and writing the deployment cli command. I had to add a template for the CodeCommit repo, so that I could add the trigger for the lambda. It doesn’t appear that this can exist separately.
While working on the deployment, I realized that there was no reason to deploy one of these functions with each project. Instead, I configured the repo to send CustomData with the event to the lambda. Now the lambda looks at the event for the repo it should check out and which bucket to upload the files. Even though I plan on deploying a dozen static sites in my AWS account, I can use the same Lambda function for all of them. When you’re trying to reduce costs, every MB of storage counts.
This was a fun project and it illuminated some questions I’ve had around using Lambdas for a continuous deployment pipeline. Implementations of git in other languages may be more robust. Paired with a better version control service, I think provided that the build fits within the 5 minute execution restriction, Lambda can be a solid platform for building and deploying software.
I’m rather disappointed with CodeCommit. The options for interacting with it are pretty minimal and restrictive. I hate using a user account with credentials. I’d much rather have the lambda assume a role that has access. I’m still investigating options for this. Until then, I have to maintain user credentials and an encryption key. The key costs me $1 per month, but it is shared for all the projects, so I’m willing to accept that cost.
I’d also like to see another way to interact with the repo besides git commands. Github gives us ways to pull releases or even raw files. The documentation around CodeCommit is pretty weak in general, so it may exist, but it wasn’t evident to me. Seems very few professionals are using it. Nobody is really talking about it positively.
The codebase in each repo is relatively small, but if there are a lot of posts on one page, the history could quickly dwarf the actual source files. I really wish that Porcelain supported a shallow checkout. I don’t need the history when building from the source files. The project is 10 years old, so it’s hard to say if they’ll get around to it.