Using git on AWS Lambda

or a more modern “De Hominis Dignitate Oratio

Cătălin Buruiană
Enki Blog

--

There is a newer, more robust way of getting λ jobs run within custom environments (including git, but not only!) — AWS Layers. Check out this other article I wrote to get a sense of what layers are and how to make git work within your jobs.

Me, myself and I

I joined the Enki squad just four months after I started my first year of university at King’s College London. Needless to say, the flow of challenges entailed by both these roles had a fruitful impact on my professional and personal development.

On top of this, what truly shone was Enki’s core principle — continuous learning, which incrementally refracted upon myself. This sooner or later became a mantra which I hold onto and will to hold onto for the rest of my life.

Two years and a bit later, I’ve written and revised a plethora of content on React, npm, git, networking, python et cetera, but also had the opportunity to show off my shoddy programming skills while revamping, brick by brick, the pipelines we use to consume content through GitHub.

The conundrum

Having dwelt on the weirdest and most fascinating quirks and caveats of many programming languages while working on Enki’s curricula, I often end up broadly considering myself able to tackle any trial ahead; especially when accompanied with my paraphernalia, mostly consisting of our beloved stackoverflow.com and as many man-pages you can fit in it.

In theory, there is no difference between theory and practice, but not in practice. — someone

xkdc: Existential Bug Reports

Alas, it is ironically funny how easy one can not see past his own vision field and get fuelled with this feeling of self-accomplishment only to have his whole world in shambles the moment he faces something he does not know jack 💩 about.

These impossible problems, esoteric situations, unsolvable issues we encounter and must deal with are existential suffering and can undoubtedly leave anxious scars on ourselves…

…until we figure the whole thing out and we become just like a dog with two tails.

I personally feed on these moments full of bliss. Hence, you can deduce my permanent “Yes I can” attitude. I need it.

We must, in fact, always strive and prosper and aim at the stars as we are not like Icarus, but engineers that keep the world spinning.

Recently going through such a rollercoaster of emotions motivated me to share my journey, together with some peculiarities of, broadly speaking, software most of us interact with at some point in our careers.

The bit where technicalities begin

Hoping I did not bore all of you yet, I will stop beating around the bush. Back at Enki and λ functions.

As many of you probably know, we have open-sourced our content a few months ago and since then tried to enhance the experience for our open-source contributors (and our in-house team). This content is the heart all our products, so we needed to make it simple to jump in and easily contribute.

We realised that a per-topic overview page, readily available, couldn’t possibly hurt anyone and will, in fact, help us from time to time. This shouldn’t take too much coding; probably a day, give or take, right?

Before going any further, I feel compelled to mention that I’m an adept of web’s lingua franca, JavaScript, and all of the code in this post will be written in it. Although, the quirks I’m bringing up are really language agnostic.

I already had on my hands a npm package named curriculum-tools we frequently use internally that allows us to test, mass manipulate or analyse locally our curriculum repository by cloning and parsing it. This came in really handy as by the end of the day I’ve come up with a display format and then wrote a script to generate these pages locally.

Thinking I will most likely forget to update the pages manually from time to time, I figured it shouldn’t be too hard to export all this flow to a daily cron job ran from somewhere. I instinctively picked out AWS Lambda functions as I had some prior experience with them.

Here is when the Sisyphean labor began.

λ + git == </3

It didn’t take too much time to realise that there is no git in the lambda running environment and it would probably take to me too much time to update our package to get the data through GitHub’s API, instead of git operations. Nevertheless, you can’t update a repository’s wiki pages via the API.

I wanted to hack around it. 🔧

I soon concluded that it’s all doable by using a binary version of git which had to be installed somehow. Much to my delight, there’s an npm package that does all of that for you — lambda-git (thank you sincerely, Tim Perry).

I confidently added a require('lambda-git')() at the top of my code and went on with trying out if it works. I started by firing a git clone command using the spawnSync method of Node’s child_process built-in package:

const { spawnSync } = require('child_process')
require('lambda-git')()
exports.handle = async function (event, context, callback) {
const clone = spawnSync('git', ['clone', someGitRepo])
console.log(clone.output.toString())
}

But this didn’t work as output, stderr and stdout of my git command were all null. It was like the console.log statement was fired before the spawnSync finished executing.

How can this happen?

After going a bit mad while trying some variations of the same code and googling my issue with no luck I almost conceded. I couldn’t get an error message whatsoever.

Then, it dawned on me: I should search GitHub, maybe I was using the lambda-git package wrong. And I did, luckily. There’s a special kind of happiness when you realise you were the one to make a mistake and your tooling is not broken.

The function call was, in fact, asynchronous. awaiting that call within my handle, I managed to get an error:

,,fatal: could not create work tree dir 'sample-repository': Read-only file system

That makes sense though, you shouldn’t be able to write stuff on the system on top of which λ runs. But there is a place where you have write permission, as AWS points out in their FAQs, and that’s the /tmp directory.

const { spawnSync } = require('child_process')exports.handle = async function (event, context, callback) {
await require('lambda-git')()
const clone = spawnSync('git', ['clone', someGitRepo], {
cwd: '/tmp'
})
console.log(clone.output.toString())
}

Finally, something worked. From here it was mostly smooth sailing up to one point. I managed to paste in my already working script and got to:

  • clone the needed repos in /tmp
  • generate the wiki pages in-memory
  • update the local files within the cloned repository

All I had left to do now was to:

  1. add, commit and push the changes to the remote master branch
  2. check each spawnSync call for mishaps as they wouldn’t throw errors by themselves

To the remote and back 🚀

This step called for an intense googling session and some trial-end-error, but I ended up with this flow (note that all these were run from within the repository I wanted to update):

git config --local user.email GITHUB_EMAIL
git config --local user.name GITHUB_USERNAME
git add .
git commit -m "Automatically update pages via λ job"
git remote rm origin
git remote add origin https://GITHUB_USERNAME:GITHUB_TOKEN@github.com/owner/repository.git
git push --set-upstream origin master

The reason why I added an already-authenticated remote was that git asks you for both your username and password when pushing and it would probably overcomplicate things to add a listener to handle these.

There are probably better ways of achieving the same result, but I was content with what I had so far. My goal was to minimise the time spent on this and get it out ASAP.

Noteworthy, as there is no global, properly installed git on the Linux distro that hosts the λ function and there are no standard environmental variables exposed such as $HOME, you cannot do git config --global or anything like that. You need to do everything locally.

Porcelain

The last step of my quest was to catch any potential errors of the job and automatically send us a report on Slack via their incoming-webhooks.

With help from their nice API and the run-of-the-mill try/catch, the task proved banal. In order to check my spawnSync calls for errors I resorted to:

const command = spawnSync('git', ['push', '--set-upstream', 'origin', 'master'], {
cwd: localRepoPath
}
if (command.stderr.toString()) {
// houston, we have a problem
}

Weirdly enough, this approach would catch errors even when there were absolutely none!

What happened was that I’ve stumbled upon a particularity of git. By design, Torvald’s baby used stderr when logging successful messages. This is a bit counterintuitive, but I’m sure there is a good reason behind all this.

In fact, git push is actually a “porcelain” command, meant to be readable by humans. Deep down this makes use of a “plumbing” command which is machine friendly and easier to parse. It’s widely recommended to only use porcelain commands as they have stabler output while plumbing commands’ interface can change over time.

However, you can get machine-friendly output from porcelain commands too, via the *drum roll* --porcelain flag.

All this amounted to:

git push --set-origin master --porcelain...
Branch 'master' set up to track remote branch 'master' from 'origin'.
Done

Similarly, in the case of git clone you can use the --quiet flag.

Epilogue

And that was it, my script was up an running and didn’t complain a bit ever since. Take a look at the end result on our curriculum’s wiki page: https://github.com/enkidevs/curriculum/wiki/SQL-Topic

Moreover, I’ve uploaded a prototype gist available for anyone who wants to take a closer look at the wrapped-up code and approach discussed so far: https://goo.gl/dwZoAt.

It has been a really fun ride and although I most probably did not abide by all industry standards and good practices, I valued the whole modus operandi highly educational.

I hope everyone got their bit of entertainment while reading this and that someone will find this post informational while frenetically searching “git on aws lambda” — like I did.

Cătălin,
Software Developer & Technical Content Manager

For any questions or comments you can reach me at catalin@enki.com.

If you want to get in contact with us, as a company, feel free to write at support@enki.com.

Joining our community can be done by filling in a typeform for our contributors Slack. Furthermore, we have an Open Source Curriculum Fellowship available for anyone feeling like contributing.

Stay tuned for some really cool stuff coming from the Enki team ❤.

--

--