Dogfooding with Jekyll
Using the new `data_source` configuration to serve mankind
Published on: Nov 29, 2014

Yesterday, I learned that Jekyll, the well-known powerful static-site generator, has a little-known feature that is kind of a big deal for open-data sites hosted on Github.

tl;dr: Jekyll can let you consume and publish data files with the data_source configuration setting

Dog food 203365

Working with _data

Just a few weeks ago, I was complaining to some friends about the _data folder in Jekyll. You see, the idea of using simple, flat data files to power a website is a smart thing to do. And while Jekyll makes it easy to consume data from YAML, JSON, and CSV files, because Jekyll ignores folders with a leading underscore (and therefore _data is not published), Jekyll made it nearly impossible to publish the data.

For those of us who like to share our data and use Github, this meant one of two unattractive options, neither of which really worked well:

  1. Publish the data twice, once in the _data folder and once in a separate data folder; or
  2. Publish the data in a separate git repository, and use it a submodule.

The obvious alternative – symlinks – doesn’t work either; because Github uses the --safe flag when publishing the site, symlinks are not an option for sites hosted on Github Pages.

This was an unfortunate state of affairs. Despite demands for data publishers to eat our own dogfood, Github Pages and Jekyll could not deliver… But that was then. The future of _data is here!

data_source to the rescue!

It turns turns out, the _data folder is just the default location for data files in Jekyll. In your _config.yml file, you can set a different location for your data folder using the data_source configuration setting.

So, let’s say you want to publish your data in a folder called data. To accomplish this, you simply need to rename the _data folder to be data, and add this line to your _config.yml file:

data_source: data

That’s it. Now the data folder is used by Jekyll to power your site and is published for the world to see at a url endpoint /data! You have now opened up your dataset. That easy. (If you want to see it in action, check out this basic repository at with a demo published here:

Bon appetit!