Category Archives: Projects

Monolith to Dockerlith: Learnings from Migrating a Monolith to Docker

 

Docker Logo

NodeJS Logo

 

Like everyone’s monolith, Earnest’s is complex and was once a little bit out of control. Earnest’s monolith is involved in all major aspects of the business. It is responsible for accepting loan applications from clients, making a decision for each of those applicants, and even supporting clients after loans have been signed. These wide-ranging responsibilities mean that the monolith is really five different applications in one body of code.

Over 100 developers have contributed to its codebase since inception. The complexity of so many revisions and updates made it difficult to set up and maintain. Beyond standard npm libraries, almost a dozen different dependencies from a database to a mock SQS queue to a Hashicorp Vault instance needed to be set up correctly for it to work completely on a developer’s computer. Engineering teams had come to expect that getting this application set up on a new computer would take at least a week, and would require the assistance of multiple developers who had been at the company long enough to acquire the necessary tribal knowledge.

As an engineering team, Earnest needed a way to ensure everyone had a consistent local environment. It needed to be quick and easy to set up. Finally, the local environment should have greater parity with the CI, staging, and production environments. In order to accomplish these objectives, I turned to Docker, Docker Compose, and The Go Script pattern.

Dockerlith Architecture

I started the solution by addressing the shared node_modules folder between all five applications. All application containers shared the same node_modules folder inside a Docker volume. Any of these containers could be started in any order and update the npm dependencies. Therefore it became necessary to ensure only one container could write to node_modules at a time.

While there are many ways to control the start up order of Docker containers, I chose to create a bash script that would lock file descriptors at runtime and then executed this script in the entrypoint of each container. After this script ran, it would invoke the application’s process and the application container would be usable by the developer. An application container’s Docker Compose file looks like this:

Dockerlith Docker Compose 1

Here is the entrypoint script for each application container.


So: the first time a user starts up the application containers, one of them will grab the lock and install the dependencies. The rest of the containers will wait for it to finish, see that the dependencies are valid before turning the control over to the grunt startup tasks. Dependencies are automatically checked and updated, but subsequent start ups will occur quickly and without calls to “npm install” until the dependencies change.

In the event of a container shutdown, networking failure, or Docker daemon shutdown, the lock on the file descriptor is released automatically. Developers can restart the Docker containers and continue with their workflow to recover from this unexpected failure.

In addition to the container startup synchronization system, there is a Docker image that contains the correct versions of node, npm, and other programs. Docker Compose links the application containers, a Postgres container, a mock Amazon SQS queue, and other supporting containers.

I have implemented The Go Script pattern as the last piece of the puzzle to make setting up the application, starting it, and running the tests one-step commands. This pattern is used by almost every project at Earnest, and it’s implementation in this project brings it in line with the rest of Earnest’s tooling. Developers new to the project can become productive quickly, and all developers can keep their focus on high-level goals instead of low-level implementation details.

Accomplishing these goals was time-consuming and difficult, but worth it. Team morale improved as a longstanding pain point in the daily life of Earnest software developers was eliminated. Earnest’s software developers reported that this tool increased their efficiency by an average of 32.5% when working on the monolith. On an average workday, this tool is used around 200 times by the engineering team.

 Dockerlith Usage

This was originally a post on Earnest’s engineering blog, but has been cross posted to my blog as I am the original author and did the work described in the post.

3D Bitcoin Prices

I’ve been interested in the bitcoin world since learning about it in January. I own some bitcoins, so the how much they are worth matters to me enough that I constantly check the price. The problem is, I don’t like the standard graphs that bitcoin websites use to show price information. Lets look at an example from bitcoinity.

Bitcoin original graph

Can you find the story here?

This appears to be a pretty standard issue financial graph. Every graph I’ve seen of bitcoin prices follows the same general 2D format of time on the X axis, and price/volume on the Y axis. Various elements are used to convey market information, a yellow line is used for a weighted average price, green + red shadows for local high/lows, and blue rectangles for volume. Some sites use slightly different elements such as candlesticks or EMA lines, but I like comboy’s combination of elements and graphic design the most.

The problem I have with these graphs is that they separate related variables into differing elements. It’s been my experience that price levels, volume, and price levels are linked together when it comes to building a high level narrative of what was actually going on in the markets.

To give an example, on October 2nd the infamous Silk Road black market was shut down. BTC was the currency used on the Silk Road, and a huge number of BTC holders decided that their bitcoins were now worth considerably less money. Within minutes trading volume exploded as traders holding BTC sold them to anyone unfortunate enough to have a bid (offer to buy) posted at the time. Prices slid and then entered free fall as the bids holding the price up were pulled or eliminated. Volume hit record levels as the bitcoin community collectively tried to find a price for a bitcoin without SR.

Roughly an hour after the news broke, the bottom was found with bitcoins worth roughly 66% of pre-crash values. Two hours after the crash, the market was all the way back up to approximately 90% of its pre-crash values. It seems that the bitcoin community had once again engaged in its favourite past time: buying cheap coins off panicky newcomers and then reselling those coins hours later to the same people at a recovered price.

Bitcoin graph

Credit to bitcoin charts

I think it is pretty hard for someone without a financial background to build a high level narrative of events like the one I just gave out of a standard 2D graph like this. Price and volatility are connected, but volume is on its own Y axis and often clashes with the price boxes. It is pretty hard to put data that has three dimensions into a 2D world, and this is about as good as it gets if we absolutely must jam data into 2D.

But what is forcing us to put this three dimensional data into a 2D format? In the end, it must only be tradition, convention, or the tools people use to visualize data. A 3D data visualization for data that has, well, three dimensions, is only natural. A cube, with width, height, and depth, can be used to unify the three key variables involved in bitcoin market data in one element and without duplicating the Y axis. Additional data can be conveyed with color. I chose to use red cubes for periods of time with the average price lower than the previous cube, and green cubes for a higher average price.

With these thoughts in mind, I decided to try a new way of displaying market data. My weapon of choice for data visualization on the web is processing.js, which I find to be reasonably powerful and very easy to share. At the beginning of the visualization, I take a moment to explain the axis and element setup. Screenshots of that process explain the graph and element setup more efficiently than words.

Graph Concept

X = Time, Y = Price, Z = Volume

Graph Concept

Every block is a uniform period of time, in this case each block contains an hour’s worth of trade data.

Graph Concept

A block is placed between it’s highest and lowest trade prices.

Graph Concept

Volume is an absolute value, so blocks are placed in the middle of the Z axis and grow equally on each side. I find this does a better job at conveying volume than starting blocks at z = 0.

Now that the format of one block has been explained, lets throw it all together and re-visualize the data. The most interesting dataset I have to visualize is the SR crash, so lets check that out. I won’t be commenting on many of the graphic design choices I made, in order to focus on why I thought cubes were the most appropriate shape to use as the central element in the graph.

New Graph

This shows the trade activity of Bitstamp (one of the major exchanges) during the crash. Here the cube proves it’s usefulness in conveying the story. With high volume, the price fell to a low of 87.5 before rebounding sharply. Toward the bottom we can see that volume is still high but the price volatility of those blocks is low. This means a lot of trading was done without shifting the price much, indicating that the market is reaching the point at which strong support for a given price has been found.

Just 5 minutes after the bottom, the price was skyrocketing back up. We can see a couple tall blocks, indicating the price shifted considerably upward during each 5 minute period. Fifteen minutes after the bottom, the upward trend starts to face resistance, volume is high but the price isn’t moving up. This is symbolized nicely by a wide thin block.

It took me some time to instinctively grasp the new format, but eventually the shapes began to take on their desired meaning.

Bitcoin Graph

A stable market.

These long thin boxes with alternating colors tell me the price isn’t moving much. A few sporadic large buys and sells happen, but nothing really changes.

Bitcoin Graph

A slight downtrend.

In the middle of this timespan there are seven consecutive red boxes. With the average price of each block being lower than the one before it, a downtrend has clearly occurred. The narrow blocks at the end indicate volume goes down and the price doesn’t move much – a new price has temporarily been found and agreed upon.

One major disadvantage of the format I chose to use was labeling price and volume for specific blocks. The visualization is a camera that runs down the timespan, and throwing up even sporadic labels quickly becomes confusing as the eye alternates between the blocks and the labels, attempting to make sense of it all and failing. I chose to display dataset high and low prices, and to display volume at the halfway and end marks. These give the viewer a sense of proportion and absolute value, without excessive distraction.

In combination, the graphic design choices limit the utility of the visualization in terms of mathematically studying the market data. I think this is acceptable, as there are plenty of tools for mathematical analysis of bitcoin market data. What is not so easy to find is a visualization that helps quickly tell a high level story by highlighting the key data and removing technical elements unnecessary for that process.

I readily admit this may not be the ideal solution to the problem of displaying BTC market data, but I think it shows that 3D objects can be powerful representations for datasets that have three distinct, yet related, variables. By using cubes it is possible to show price, price volatility, and volume with just one object. No longer do my eyes need to dance around separate elements in order to put together a picture of what happened during a certain period of time. The shape, color, and position of the cubes tell me what I want to know and help unify a complicated dataset with just one repeated core element.

Predicting Kickstarter

As an independent game developer, finding ways to finance projects is a core concern of mine. Kickstarter has been a viable way for games to be funded for a few years now, but is certainly not a silver bullet solution. During my senior year at MSU, I took a course on data visualization and became fascinated by the potential for it to provide real world solutions. For my final project, I chose to data mine all the new kickstarter game projects that came out for a few months.Using this constantly updating database, I can then compare new projects with old ones to accurately predict if the new project will be successfully funded or not, and also how much funding it is likely to receive.

The point of all this is to see what the odds of a live kickstarter game really are. I’m also keeping track of the performance of my own program, to share how accurate the program is. The program shows all the new projects side by side, so viewers can how many new projects are likely to succeed vs how many are likely to fail. Viewers can then visit the pages of each kickstarter and form their own conclusions about why one project is likely to succeed and the other likely to fail. I believe this knowledge to be valuble to any independent game developer thinking of getting funding by using kickstarter or other crowdfunding sites.

screenshots-iconScreenshots