17 Jun 2017, 19:28

Deploying a Static Blog with Continuous Integration

In recent years, static sites have seen a resurgence. In this post we’ll explore how to use Hugo, a static site generator, in conjunction with a remote web server and a continuous integration provider. Although the site generation software here is Hugo it could easily be another provider such as Jekyll. The web server will be hosted on a Digital Ocean droplet running Nginx, and CircleCI will be used for for the continuous integration (often abbreviated to simply “CI”). Although it is difficult to speak in specifics due to the multitude of alternatives out there, it should be (hopefully) fairly straightforward to deduce a similar process for other providers.

Firstly, for those of you unfamiliar what Continuous Integration is, here is a definition:

“Continuous Integration is a DevOps software development practice where developers regularly merge their code changes into a central repository, after which automated builds and tests are run” (Amazon Web Services, 2017)

Generally CI is leveraged when there are many developers checking in code to a distributed version control system to ensure the integrity of software before its release to some environment. However we can leverage another one of its key tenets, which is that the building, deployment and testing is being done by a remote service, saving us time and energy. This in turn allows you to focus on writing the blog posts and not how or where the content needs to be deployed.

Some of you may be deploying your static Hugo site (or other static blog) via a bash script or some sort of manual process. This guide will explain how to deploy your blog every single time you commit that code to GitHub, or other cloud based git provider.

Why Bother?

There are a few benefits to setting up CI with your static blog:

  • Avoid repetitive local build and production steps, offload work to another machine
  • Those build steps will always run, so you can’t forget to do them locally (i.e. for theme CSS/JS modification etc)
  • As you need to do it to deploy, it helps you remember to commit your code
  • Clone code to any machine and edit posts from there without worrying about build/deployment dependencies or processes
  • Allows you to edit/create posts using GitHub’s user interface directly
  • Helps keep secrets and sensitive information out of your source code

Setting Up Circle CI and SSH Keys

You will need to register for an account with CircleCI and associate it with your GitHub account. From here you can begin to build GitHub projects from CircleCI. Check out this guide from CircleCI for more specifics regarding that process.

Before we dig into the details, it is important to explain that in order to get the Circle CI talking to our remote server (this case a Digital Ocean droplet), we must setup SSH keys. SSH allows keys allow us to be authenticated by the server whilst avoiding the use of passwords via Public-key cryptography.

It is a little outside the scope of this guide to delve into setting up the keys, however a great guide to generating SSH keys is available from GitHub here, and an explanation of how to use them in conjunction with Circle CI can be found in their docs here. Once you have generated the keys and registered them with CircleCI you can move onto the next section.

Environment Variables

It is desirable to avoid storing any secrets or sensitive information inside the blogs source code itself. One way to do this is to use environment variables on the CI server. CircleCI provides a UI for setting environment in the project settings, on the left hand panel.

For example in this case, the user login password is set as DIGITALOCEAN and the IP of the server is DIGITALOCEAN_IP. This also makes them reusable should you need to use them more than once in your configuration scripts.

The Configuration File

Once our target server and CircleCI are setup with the right keys we can begin to look into how to deploy from CircleCI. With CircleCI we provide a configuration file in the form of ‘circle.yml’. The config is written in YAML which is a minimalist markup language, often used for configs. Here is what my specific circle.yml file looks like:


dependencies:
  pre:
    - wget https://github.com/gohugoio/hugo/releases/download/v0.23/hugo_0.23_Linux-64bit.deb
    - sudo dpkg -i hugo*.deb
    - sudo apt-get install sshpass rsync
    - cd ./themes/impurehugo/ && npm install 

deployment:
  prod:
    branch: master
    commands:
      - cd ./themes/impurehugo/ && npm run build
      - hugo -v
      - sudo sshpass -p "$DIGITALOCEAN" rsync -avz ./public $DIGITALOCEAN_IP:/var/www/html/
      
test:
  override:
    - "true"

You can see in the dependencies section we install hugo at the current latest release (v0.23), and we install the package. We also install sshpass as we will need it to login on the target deployment server.

In my case I do a little extra work with npm and gulp to do some preprocessing (JavaScript/CSS minification, image compression) so we change to the target theme folder and do an npm install there to ge the node dependencies and install gulp. This step can be skipped if you aren’t interested in these preprocessing steps.

With deployment, we pull from master and then we do the frontend pre-processing previously mentioned with gulp compress. After this we run Hugo with the verbose flag (useful for debugging purposes if necessary). Next because Hugo only produces static assets, we can simply move them over to our target server. Here we use rysnc to copy the files to the remote server. rsync has the benefit over scp that it only transfers files that have changed since the last upload. However, you could use something like scp if you are so inclined.

Lastly we override the tests section to simply pass if as our only real major concern is if there are errors in the actual build and deployment steps.

Conclusion

This should hopefully give an overview of how to setup continuous integration with Hugo, and hopefully enough inspiration to adapt it to work with other static site providers if necessary. I really welcome any improvements, requests for clarity of other feedback you might have for me. Feel free to reach out to me on Twitter, or drop me an email.

23 Dec 2016, 16:56

Visualising Facebook's Population Data on the Web

Over the past month or so (on and off!) I’ve had the opportunity to explore two really interesting ideas. The first was relating to a talk I saw at FOSS4G in Bonn this year by Fabian Schindler. Fabian’s talk was on how to use GeoTIFF data in the web using a set of libraries called geotiff.js and plotty.js. You can see a post about his talk here. The talk got me thinking and interesting in what sort of GeoTIFF data could be pulled into the web.

Around the same time, a news article caught my eye about Facebook having teamed up with a research institute to create a global dataset of population densities. I later managed to find the data on the International Earth Science Information Network (CIESIN) website here. From here I explored ways of making use of the datasets in the web as per the talk I’d seen at FOSS4G.

The Problems

Upon examining the raw data I noticed some immediate problems with my plan:

  • The data is relatively large for the web - a country could be upward of 100mb
  • The data has multiple bands that may not all be useful to end users
  • WebGL isn’t great with large textures

To take on all these problems it was necessary to preprocess the data into a more manageable and usable format.

Preprocessing the Data

To solve the listed problems was a relatively tricky endeavour. There was a multistep process:

  • Extact a single band
  • Downsample the raster (reduce it’s resolution)
  • Split the raster into managable chunks
  • Allow the browser to know where these chunks are somehow

For all of these processes I made use of Python and GDAL. The band extraction was fairly straight forward process, but the downsampling and splitting were somewhat more complicated issues.

You can see the full solutions to the downsampling and band extraction problems on the GitHub repo. Splitting the data was probably the hardest problem to solve as I struggled to find any examples of this being done across the web that weren’t making call outs to the shell from within Python (something I wanted to avoid).

In order to correctly split data it was necessary to subdivide the raster into a given size grid. For this to work correctly we needed to get the top left and bottom right coordinates of all the grid cells. After some thought on solving this mental puzzle, I deduced that you can create an arbitrary (n by n) sized grid of such coordinates using the following function:

def create_tiles(minx, miny, maxx, maxy, n):

    width = maxx - minx
    height = maxy - miny

    matrix = []

    for j in range(n, 0, -1):
        for i in range(0, n):

            ulx = minx + (width/n) * i 
            uly = miny + (height/n) * j 

            lrx = minx + (width/n) * (i + 1)
            lry = miny + (height/n) * (j - 1)
            matrix.append([[ulx, uly], [lrx, lry]])

    return matrix

Splitting the tiles allows us to send the raster in chunks whilst avoiding using a tile server or any kind of dynamic backend. I created a JSON file that contained metadata for all the necessary resulting files, allowing us to determine their centroid and file location prior to requesting all of them.

Displaying the Data

Displaying the data on the frontend took a little bit of trial and error. I used a combination of OpenLayers 3, plotty.js and geotiff.js to accomplish the end result. geotiff.js allows us to read the GeoTIFF data, and plotty.js allows us to create a canvas element that can be used by OpenLayers 3 to correctly place the elements.

To request and handle the asynchronous loading of the data I used the Fetch API and Promises (I’ve included polyfills for both in the demo). Once all the promises have resolved we now have all the tiffs loaded into memory. From here we can use a select dropdown that allows us to change the colors used for presenting the data.

The end result looks a little something like this:

Pros and Cons of this Method

Pros

  • We got to a point where we can display the data in the web
  • The data can be restyled dynamically clientside
  • No dynamic backend or file server required, just static files after preprocessing

Cons

  • Tiles are actually less size efficient than one big file, but are necessary to get the data to display
  • The downsampling factors have to be quite large to get it to be a reasonable file size
  • Even tiled, file sizes are quite large (i.e. 9 tiles at 2mb a file == 18mb which is a lot for the web)

One interesting idea about having the data client side as opposed to a raw image is you could even go as far as to figure out how to do basic visual analytics on the client. For example you could find a way to display only the highest values or lowest values for a given GeoTIFF.

Have a Go

The repo is located at : https://www.github.com/JamesMilnerUK/facebook-data-viz

There are instructions in the README for usage.

04 Sep 2016, 19:28

10kB Web Pages

Over recent times there has been a lot of stir around the growth of website assets and total transfer over the wire. It has been pointed out that the average size of a website is now larger than Doom! (Credit: mobiForge)

In response to these acusations of page bloat, we have seen an emerging trend which is that top websites are now decreasing there page weights. For example Financial Times has stated they are moving from a “culture of addition” to a “culture of subtraction” in order to reduce page load times. Customer sastisfaction is the obvious benefit here, faster pages means people get to the content they want quicker. But there is also another motive at play; research has shown load speed costs money, some times in a big way. Not only are there costs to the provider for heavy website, but it can also cost users as mobile data plans are often expensive, especially in developing countries.

With all this talk of reducing website page bloat, I thought I’d share with you something I came across over the last week or so called 10k Apart. 10k Apart is a challenge to “Build a compelling web experience that can be delivered in 10kB and works without JavaScript”. I found this an interesting proposition, with modern emphasis on sometimes complex and generally heavyweight JavaScript frameworks it takes a step back to the first princple technologies of the web. I came up with a couple of entries for the competition, both more simple experiences rather than a traditional website per se. My goal was to produce something small, experiential, simple and of course, sans JavaScript. I ended up doing two entries, the first was 10k-tiles and the second was 10k-quadtree. I won’t say too much about them and rather let you have a play. Overall 10k-tiles came to 4.5kB ungzipped and 10k-quadtree came in at 8.7kB. The quadtree was slightly heavier because of all the necessary divs. Overall it was a fun experience which has made me reflect on keeping the web lean and no more complex than it needs to be. In addition I learned more about one of my cryptonites; CSS!

As a final thought I thought I’d leave you with two articles I found interesting regarding page weight; the first is Chris Zacharia’s eye opener post about reducing page weight at YouTube and the positive (if unexpected) effects that had. In addition there is John Allsopp’s commical post/talk ‘The Website Obiesity Crisis’ which is certainly worth a delve.