Golocron – Software Development With Go

by Pavel Burmistrov

TBD.

Project Layout

Starting from a clean slate is always a pleasure when it comes to a new software project. There is no legacy yet, no technical debt and any other sources of anxiety. The pleasant experience can be maintained at a high level throughout the project's lifecycle. One of the key contributors to the experience of developing and maintaining a project is its layout, the file and folder structure and organisation. In the world of Go it's extremely important because a structure can be both helping and limiting, depending on how it's implemented.

This unit focuses on the project layout design. A layout design is how the folders and files are organised in a codebase, and thus in the repository containing the codebase. A layout sets the stage for all further design and architecture decisions for a project, such as package layout, logical and functional grouping. It also affects maintenance routines like producing build artefacts, CI and CD, rolling out and shipping to the user.

Depending on whether a layout is good or not, it may either significantly help and simplify everyday tasks, or it can make them quite complicated and poor. A good project layout helps you leverage and benefit from technology, whereas with a bad design any change to the structure is more of a fight with technology.

The importance of creating and following a good layout is clear. You create a project once, and this operation is technically quick and easy. But you have to work with the project until it's sunsetted, and this may and should last for years. The ease of creating a new folder hides the importance of putting a thought into how to set a project in a way that's both helping, convenient and makes sense. It's better to take some time to think about how this is going to look like in a week, a month, half a year, and two years from now. If this is done before anything is created, then working on the project will be much easier.

The rest of the unit covers the following:

  • Good vs. Bad project layout
  • Layout for a library
  • Single application layout
  • Layout for a monolithic, multi-service repository
  • Versioning and Go
  • Creating a Changelog.

Now, let's figure out what makes a project layout good and bad.

Disclaimer: Terms "good" and "bad" are used as simple words to describe something that feels right, positive, helps and supports, as opposed to something that is less right, inconvenient, slows down and distracts. They're used not as judgements or labels, but as synonyms for respective broader terms.

Good and Bad Layout

You can't know if something is good until you learn what a similar bad thing looks like. A bad thing might be perceived as good until you experience a truly good thing. Take, for instance, coffee. For someone who has never had it, almost any coffee can be considered as coffee, and it may even be unquestioned. But when after trying various kinds you take a sip of Supreme coffee made well, you no longer think that coffee from a Nespresso machine is tasty.

So, what makes a project layout bad? Can it even be bad? Why should we bother about it?

Bad Layout

Have you ever been supporting a project with so long list of files scattered in the root directory, so it takes you several scrolls to get to the README file on the project's repository homepage on GitHub?

Have you ever been maintaining a project where half of the application is contained in the main package, and the entirety of the main package resides at the root level of the repository?

How often do you find unnecessary artefacts in a repository after building an application or running tests?

How often did you need to replace or move parts of a project and then fix the imports to fix an import cycle, or when the structure no longer makes sense?

How often do you find a folder that exists only to contain another folder?

Positive answers to any of these and other similar questions are signs of that you've encountered a bad project layout. When working with such a project, even as simple operation as building a binary can turn into trouble since you don't know whether the resulting binary will be accidentally included with the next commit or not.

So what are the characteristics of a suboptimal layout anyway? It can be nearly anything that leads to any of the following symptoms:

  • a long list of files scattered at the root or any other level of a project
  • a long list directories scattered at the root or any other level of a project
  • files of different purposes are mixed at the root or any other level
    • code and configuration files (the app's configuration files)
    • code and CI scripts
    • code and tool's configurations
    • images or any other content
  • there is no definite place for outputting build artefacts
  • .gitignore does not exclude anything that should not be committed
  • the structure does not support or reflect architecture of a project
    • having both app and server directories at the same level - does this project provide an app? Or a server? Or it's a poor project layout design?
  • the project structure leads to leakage of implementation details to the public API surface level
    • users can use parts of the implementation that are supposed to be internal
    • thus maintenance of the project is now complicated and slowed down since the public API is much broader than it should be
  • the project structure is unnecessarily complicated and contains a multi-level hierarchy that implements a taxonomy of packages

A mixture of even two of the symptoms listed above can significantly complicate project development and maintenance in long-term. Even worse may things become if a project is large, and is being developed by more than one person. Suffering can be endless, if that's an open-source project, and it's a popular one.

So you really want to invest in a good project structure that will help and support you as the application evolves. What does make a good layout?

Good Layout

Now as we have learned some of symptoms of a bad project layout, it's time to know how a good one looks and feels like?

Overall, a good structure should be unnoticeable in everyday work. It exists, but as if it were invisible. Once defined, described and set, it just guides you. It limits the number of mistakes, helps to onboard new developers, supports maintenance routines, and does help in many other different ways. Let's look at some of those.

A good design for layout suits well for a project:

  • if a project is a library, the layout should reflect and support that
  • if a project is an application, then the layout should make development and maintenance easier
  • if a project uses the monolithic repository approach, it's clear where different services, libraries and other parts reside, and where to add new things.

A good layout limits the chances of misplacing something:

  • when files and directories are logically and functionally well organised, it's harder to misplace a new file
  • the structure helps in deciding whether a new file should be a part of the existing tree, or it deserves a new level.

A good layout reduces the mental load required to work with a project day by day:

  • it's clear where binaries should go
  • test coverage reports do not pollute commits
  • changing CI configuration is straightforward
  • files of different purposes are grouped in a way that makes logical and functional sense
  • it's hard to accidentally commit and push garbage files.

A good layout simplifies navigation through a project:

  • it's convenient and easy to navigate in an editor/IDE
  • the homepage looks nice and well organised
  • in a terminal window, it's easy to work with files using standard and simple console tools.

A good layout helps you to maintain a project in a healthy state:

  • implementation details are kept private, yet still available as exported within a project
  • the possibility of creating an import cycle is reduced by design
  • the structure and hierarchy represent the business logic and relations between different parts of a project.

As you can see, a good layout is more than just the opposite of the symptoms of a bad one. A good layout is the foundation for a development process focused on producing great software. If you have it, then you've got more mental energy to focus on creative solutions for hard problems. If you don't, then part of the energy is wasted.

The layout of a project depends on many factors. The most important one is the kind of a project. The layout will be different for a library, a single application or service, and for a monolithic repository, while there are some commonalities that make better any kind of a project.

As we now have learned what makes a good layout, it's time to know how to structure projects of different types.

Types of Layouts

The layout of a project depends on its type and purpose. The layout should suit the project well to help supporting it.

A good way to describe the importance of using a good layout is an analogy with clothes. We all have things that suit us well; in this case the clothes fit us naturally so we don't even notice its presence. When a thing doesn't suit well, we might not notice it too, but by the middle of a day we've experienced more stress; when the garment is taken off at the end of a day, it feels much better.

The same is true for a project. Its structure may be unnoticed because it's natural and convenient, whereas with other projects we sometimes can't recognise the source of anxiety. That anxiety, in fact, is a sum of many factors, and one of them might be an inconvenient structure of a project.

A project is created once, and is maintained until its end of life. This means that you really want to invest in a good structure so interactions with codebase are on the positive side, and facilitate productivity.

All projects are different, and there is no a hard set of rules one can follow without thinking about a particular use case. Yet we can confidently say that the majority of projects fall into one of the following three categories:

  • libraries (packages, modules, frameworks, you name it, depending on the language and environment)
  • single services/applications (web services, CLI tools, GUI applications, etc)
  • all-in-one, or monolithic repositories (multiple services/applications and libraries).

Some recommendations for different types of projects can be applied to all of them, and some are specific to a particular use case. For example, a library layout will be slightly different from a monolithic one.

By a project we mean a few related and similar things at the same time:

  • the directory containing a project
  • the repository containing a project.

In the modern world it's hard to imagine a project that is not published on some service for collaboration and synchronisation with other people. Even if a developer is working on a project alone, they still need to keep it in sync between different devices, and to make sure the code is not lost, if something happens to a computer.

Given that, the section assumes that some service is used to store code of a project. While there are many of them, for simplicity we assume it's GitHub.

The rest of the section focuses on describing guidelines; we start with the commonalities, and then consider aspects that are different. Because it's not hard rules, but rather suggestions, they're flexible, and will differ from project to project. There are also some exceptions, which is natural for any healthy-sense driven guidelines.

Disclaimer: Neither the author, nor this work are affiliated or benefit in any way from GitHub.

Common

The ultimate goal of choosing an appropriate structure is to keep things nice and tidy which simplifies interactions with a codebase.

This includes:

  • providing required files for a project
  • keeping the number of elements at the root level at a reasonable minimum
  • keeping the root (and any other) level clean and up to date
  • grouping non-code files by their purpose
  • refraining from creating unnecessary hierarchy of folders
  • having a single entry point for routine tasks
  • storing credentials somewhere else

Let's talk about these in more detail.

Provide Files Required for a Project

A project should contain files that describe its purpose, provide required information for the programming language and tools, and simplify its maintenance.

The following files are considered as required:

NameDescription
.gitignoreProvides rules that govern what's allowed to be committed. It helps keeping a repository free from garbage files.
go.mod, go.sumIdentify a Go module.
README.mdProvides the description and any useful information and documentation for a project.
LICENCEProvides clarity on licensing for a project, and helps to avoid potential legal inconveniences.
MakefileProvides a single entry point to all maintenance and routine tasks.

The files listed above are a bare minimum, but required for a modern project. You do this once, but if it's done right, it helps you all the way further.

One thing to note is that even with no code at all, a repository is likely to contain about 6 files. This is why the next suggestion is important.

Keep the Number of Root Elements at Minimum

The number of folders and files at the root level should be as minimal as it's possible and makes sense.

It's important because it allows you to stay focused on the task you intend to perform, and not be distracted by the need of wading through the files, scripts and configs.

To continue our clothes analogy, it's like with a wardrobe - in the morning it's better to take a nicely folded thing from a shelf. But this convenience costs you the responsibility. To get it, you have to fold clothes neatly after the use, and it's also important to carefully choose what to put into the wardrobe.

While this suggestion sounds vague, and it is, it can help you. Each time when you're about to add something at the root level, by default question the need. Usually, things are set at the root level at the beginning, and then added here rarely. When you justify the presence of something at the top level, think twice about a better place for a file/folder, or if it's needed at all.

Keep Any Level Clean and Up to Date

Similarly to the root level, any level of the file tree in a project should be clean, with reasonably minimal number of files, free from garbage, and up to date.

This is important because:

  • long file listings make navigation harder
  • poor organisation complicates maintenance of the mental model of a project in the developer's head
  • out-of-date files take up extra mental and physical space
  • accidentally committed files are misleading and confusing.

When a new developer joins a project, how much time do they need to get a firm understanding of what's going on in a project? Does the structure help and guide your colleagues through the project? Does it reduce the effort to keep the important details in their heads? Or do they have to stop for a moment each time when stumbled upon a file which no one knows about, whether it's current or not?

A well-organised and maintained project itself helps people in their everyday jobs. Each time when adding a new file, think twice what else can be added in it further. When creating a directory, consider different scenarios a few months in advance. You don't want to end up with a mess one year after, and tidying up a larger and live project is much more of an effort than keeping it clean everyday. If the structure is supported on a daily basis, it will pay off in long-term.

While it's important to have a neat structure, sometimes it's easy to get too serious about a hierarchy. This is what the next suggestion is about.

Don't Create Artificial Hierarchy

Keep the directory tree in a project as flat as it possible and makes sense. In other words, don't create artificial hierarchy just for the sake of hiding something.

This applies to both, files containing actual code (and we will discuss it in Unit 2), and other files like scripts, configurations, documentation, etc.

A general piece of advice here is that any directory that contains only a single or multiple directories, or a single file, should be questioned for its purpose and existence. Does it belong in here, or somewhere else? Sometimes it might not be clear at the beginning, and this is why it's important to think about evolution of a project some time further.

Consider a situation with scripts. One might have a root-level directory called scripts, and the directory contains several files providing tools for CI/CD process, maybe tests, some pre- and post-install steps. That's a good example.

On the opposite, consider the same number of files but located under separate directories like scripts/build/build.sh, scripts/install/install.sh, and then scripts/ci.sh that sources those files. Don't do this.

Here's another example. Say we have an http directory. From here, we've got two options. If we know for sure that there will be no other networking stuff in the project any time soon (means a couple of years), then it makes sense to keep it simply as http. On the other hand, if there is a non-zero chance of implementing things for smtp and dns, then it does make sense to put it under net. Later, everyone will be happier with the structure because it reflects the nature of things and supports the purpose of the files.

At this point, it becomes clear that grouping for files and folders should be done based on its purpose. Even for non-code files.

Group Non-Code Files by Their Purpose

Group and keep maintenance and routine files together by their purpose, in one or a few directories.

This helps in keeping the root level free from clatter, as well as reduces the time required to find a file when looking at the listing. You simply know where all your scripts reside, so no time and energy is wasted. When a new developer joins the project, they need minimal time to get an idea of what is where. Everyone is a bit happier.

As it was shown previously, group all scripts in the scripts directory. If there is plenty of them, and they're specific to the project itself (like startup scripts, init system scripts, cron files, etc), then it might make sense to keep the CI/CD stuff separately. Stay lean though, having ci/scripts and scripts is not a good sign. What else do you have for CI but scripts? If that's a couple of yaml files, let them stay under the same ci directory (see the previous point).

Nowadays many projects use Docker. If a project uses only one Dockerfile, and maybe one file for Docker Compose, then it's fine to keep them at the root. If you have multiple Docker files, and also have some scripts for building containers, then keep them grouped in a directory.

At this point, a valid concern may arise: if all those tools are in directories, it's less convenient to type commands in the terminal. This is what the next recommendation helps you with.

Provide the Makefile

Provide the Makefile for a project to automate and simplify routine and maintenance tasks.

The make tool is available on almost any platform one can imagine, even on Windows without WSL. A minimal knowledge of the syntax of a Makefile is enough to produce a handy set of targets that simplify everyday workflows. The tool have existed for more than 44 years now, and it's likely to continue existing for another 40 years. And in the course of its life, the syntax hasn't changed very much. So it makes a lot of sense investing in learning it once, and benefit from it for the rest of your career.

Just to give you an idea how it can be helpful, consider several, rather basic, examples below.

  1. Building a binary.

Without make and Makefile:

GOOS=linux GOARCH=amd64 go build -ldflags "-X main.revision=`git rev-parse --short HEAD`" -o bin/my-app ./cmd/my-app

With make:

make my-app
  1. Building a container and running an app in it.

Without make:

docker-compose build --no-cache my-app
docker-compose up -d my-app

With make:

make run-my-app

And so forth. While being basic, these examples make a huge difference in everyday experience.

To make your and colleagues' lives easier, define all routine tasks as targets in the Makefile. It can be then used not only by developers directly, but also in your CI/CD scenarios. This makes the configurations clean, straightforward, and easy to understand and maintain.

It's important to give targets in the Makefile meaningful names. A meaningful name reflects the purpose and meaning of the operation it describes. When this is done well, instead of cryptic bash incantations your config for CI may look something like the one below. Pretty nice, isn't it?

# skipped

script:
  - make test
  - make cover
  - make build
  - make deploy

# skipped

We're almost done with common suggestions. Last in the list, but not the least in importance, is a piece of advice about storing credentials.

Store Credentials Somewhere Else

Under no circumstance should you store any sensitive data in a repository, even if it's a private one. Just don't do that.

Don't add ssh keys, files with passwords, private ssl keys, API or any other credentials. It's alright if a repository contains those for development purpose. Anything else is a taboo. Good sleep at night is worth more than deceptive convenience of having production credentials committed to a repository.

What can be used instead? A private bucket in a block storage that contains an encrypted file with credentials might do a good job. A specialised service from your cloud provider of choice might help too. A special service running in your infrastructure that is responsible for storing and providing other services with sensitive data may be even a better option. But once again, do not store any credentials in a repository along with code.


All these suggestions are supposed to help and support a team working with a project throughout its lifecycle. Considered and implemented carefully, they form a strong foundation for the project's success.

Library

We create libraries for re-using in different places, projects, and by other people. The library's code is read more often than it's modified, simply because it's maintained by a limited number of people, whereas the number of users can be much higher. This is especially true for an open-source library. The library's code is consumed by a computer, but a person is the user and who interacts with it.

The layout of a library project should take these factors into account. This means that the layout needs to be optimised for interaction with the user, a person. We seek for answers in documentation, and read the code provided by a library to understand how it works. If a library is well-organised and neat, the process of using it smooth and unobtrusive.

So how can we make this interaction more positive from the layout perspective? If the Common guidelines are taken into account and implemented, that's mostly enough. They cover the most important aspects, and can and should be applied to a library project. Additionally, the following will help you in providing a better experience for the users of a library:

  • providing a good README file
  • keeping the number of files at the top level of a manageable size
  • providing examples of using the library
  • providing benchmarking information, if applicable

The rest of the section describes these suggestions in detail.

Provide a Good README File

Provide a well-organised README file with the most important information about the library.

The README file should give answers to the following questions:

  • What is it? Provide a short description of the library. Ideally, one-two sentences.
  • What does it do? Describe the problem the library is solving.
  • Why does it exist? It should be clear why this library exists, especially if it's not the only library solving a particular problem. What does it do differently from others in its area?
  • How does it solve the problem? Give the users an idea on what's going on under the hood, and what to expect from it.

In addition to the above, the README should contain links to related resources like:

  • the GoDoc page
  • any resources that describe details of algorithms or approaches taken in the library
  • articles or other resources that might be related to the topic.

Your goal with the README file is to help your users in answering the most important question about the library: "Does it help me in solving my problem?". Even if a library does solve a hard problem in a great way, if it isn't reflected in the documentation, that might be unnoticed, thus leading to less people using the library. A well-prepared README file helps your users to answer their questions.

Keep the Top-Level List of a Manageable Size

Keep the root level list of files short and focused. While this sounds very similar to this guideline, it's worth a mention here.

Strive to keep the number of root elements such that it takes less than a full page height, so the user doesn't need to scroll to get to the README. It's also important when the user interacts with a library from their IDE. An endless list of files visually complicates navigation, and gives an impression of a poor organisation.

Additional recommendations will be given further in the unit about designing a good layout for a package.

Provide Examples

In the README, provide clear examples of using your library.

A user should not be forced to go to the GoDoc page, or consult with the code on what using the library looks like. Think about possible use cases, and provide appropriate examples that show it.

Pay special attention to how you're showing the use of the library. Write high-quality code as if it was production code. A half, if not more, of the users will copy and paste code from the examples as it is given. So make sure the examples are not misleading, and do not contain things that you wouldn't want to find in your own production codebase.

Follow these suggestions in order to provide your users with good and quality examples:

  • handle errors properly instead of omitting them
  • avoid globals as much as possible
  • avoid using the init function
  • use regular imports instead of dot-imports
  • handle errors properly instead of instead of panic-ing, log.Fatal-ing.

This is not an exhaustive list, and more detail will be given in the unit about writing good code. But this short list is enough to make examples better, as well as the code that your users write.

Provide Benchmarking Results

Provide benchmarks whenever possible.

When someone uses your library, it becomes part of their application. For those who do care about performance, it's important. Those who aren't interested in performance, will skip it. But those who are, will appreciate that. Oftentimes, performance is what differentiates one library from another, and the interested users would love to know about it.

Go provides us with all we need to write benchmarks, and to share the results with our users. So help your users in making the decision about the library, write and run benchmarks, and add the results to the README. Then, keep it up to date.


The suggestions given above are simple, yet not always easy to follow. Take the time, make sure the project is prepared and organised. Eventually, the users of the library will find it helpful, and they will thank you for having done this for them.

Examples

Below you can find some examples of well-prepared libraries, and a few that could have been organised better.

Good Examples

Can Be Improved

  • lib/pq While the README lists some features, it doesn't answer many questions, nor it gives us examples of using it, benchmarks, or any other information about why should we use it. The file organisation could also be improved.
  • go-redis/redis As the number of Redis clients is relatively high, it would be valuable to provide users with comparisons and benchmarks that show the difference from other packages. Also, the structure of the repository can be better as not everything need to live at the root level of the package.

Single Application

A project (and its repository) for a single application is a further development of the guidelines given so far. Almost, if not everything, from the Common and Library sections can, and most likely, should be applied when working on a single service from the layout perspective. There are a couple of recommendations that make the experience better.

One distinction between a service and a library is that there is always at least one artefact, a binary. It also means that a project has at least one main.go file, which is not needed for a library. The experience of working on a project depends, to some extent, on where those files go.

The following will make interaction with a single-application project more convenient and easier:

  • use a separate directory for entry points to the service and satellite tools/applications
  • use a special, excluded from Git, directory for outputting build artefacts.

Next couple of sections go in expand the suggestions.

Use cmd Directory for Entry Points

Use cmd directory for entry points to your service. Inside the directory, create a directory per each entry point, named after the application. The inner directories should mainly contain only one file, main.go.

Even when a project is a single application, oftentimes it has a number of satellite CLI and development tools, or different modes. Instead of a long list of if statements, this approach allows for small and focused, yet different entry points.

Create a directory named cmd at the top level of the file tree, and move any entry points, i.e. main.go files in to appropriate directories inside cmd. These second-level directories should be named as you want the binaries to be. Each of those directories is an isolated main package, hence a separate entry point to the app, defined in it.

Use bin Directory for Binaries

Use bin directory for build artefacts. The directory must be excluded from the source control system. The directory is used in both development and CI processes.

As you remember, there should be no garbage in a repository. A binary that is built during development process should not appear among committed content.

Usually, a binary is excluded by listing its name in the .gitignore file. While this gives an immediate result and may work, it doesn't mean there is no a better option. This approach is not scalable. If someone builds the service with a custom name for the binary, the output can be accidentally committed. The same with CI, the process will just put files right at the root level.

A way to improve the situation is to have a separate directory in the project which is:

  • excluded from Git
  • managed (created and cleaned) automatically
  • and used by both developers and build tools.

The special directory is named bin, and its place is at the root level. The entire directory is excluded from Git.

Then, since the Common recommendations are in effect, you've got the Makefile. Given the previous tip, the targets work like this:

  • inputs are taken from the cmd directory, e.g. cmd/my-app
  • binaries are put to the bin directory, e.g. bin/my-app

Combined together, we get:

go build -ldflags "-X main.revision=`git rev-parse --short HEAD`" -o bin/my-app ./cmd/my-app

If the directory doest not exist, make sure it's created as needed. When you do make clean, the target should remove everything from that directory.

All targets in the build or testing pipelines also use this path for outputting and accessing binaries. When running the service locally, you run it from the directory. Nice and tidy: bin/my-app. If typing in extra characters is a concern, define a target the Makefile, and use make run-my-app. Having this as a target is a scalable way since everyone in the team can use it. If need, you can also create a shell alias as alias run-my-app='make run-my-app.

The targets used in CI process also use the bin directory for artefacts, and then packaging and deploy steps seek for the artefacts in there. The process is structured and organised.

Having a separate directory for binaries is important because chances of multiple binaries are very high. One popular scenario is when a developer works under macOS, and uses Docker. While local version is going to be built for the Mac, the Docker version is built for Linux, hence there are two different binaries already.

It's likely that sometimes there will be a need to build binaries for many systems locally. Thus, two entries should be present in the .gitignore file. Then, if a new binary is introduced later, it will be another two entries. That's already a lot to keep track of, and too many files are at the root level.

Instead, define the bin directory once, add it to the .gitignore file, set up all appropriate targets in the Makefile to use the directory, and don't worry about it for the rest of the project's life.


Here is how a single-application repository may look like with the two recent suggestions applied:

├── bin
│   ├── example-agent
│   ├── example-cli
│   ├── example-devsrv
│   └── example-plugins
└── cmd
    ├── example-agent
    │   └── main.go
    ├── example-cli
    │   └── main.go
    ├── example-devsrv
    │   └── main.go
    └── example-plugins
        └── main.go

As you see, the structure is organised in a good order, and easy to work with.

Even more important the bin and cmd directories are when a project contains code for multiple different services. In this case, it's a monolithic repository, and many people, potentially several teams, are working on it. All previously given recommendations work especially well with a monorepo. That's what we will talk about in the next section.

Monolithic Repository

Warning: Do not confuse a monolithic repository with a monolithic application. They're absolutely different things.

Unless you're in one of the FAANG or similarly-sized companies, or just don't like the idea, there are probably few technical reasons, if any, against using the monorepo approach to organising a project that has multiple moving parts. And Go projects can benefit quite well from it.

Moreover, teams are likely to benefit from this technique as it has several strong supporting arguments like:

  • simplified workflows for CI/CD and testing
  • reduced burden with managing internal dependencies
  • significantly less chores with permissions/access
  • unified tooling
  • easier to package/ship and so forth.

Of course, there are some limitations and downsides, but they are not in effect until after at least a couple of hundreds of thousands SLOCs have been committed in a couple of millions of commits.

A valid concern may be access control, but that's only partially true. If the security policies at a company do make sense and implemented and respected well, there is little to worry about. Experience of companies for which security is the biggest asset, suggests that the attitude and mindset are what matter. When everyone is accountable for what they're doing, there is no need in introducing artificial barriers.

The decision on whether to use a monorepo or not is upon you, the reader. The goal of this section is not to make you use this approach, although it makes sense rather often than not. This work covers various aspects of software development, is based on experience, and provides practical advice and guidelines. Several services will be introduced in the third module (it's quite far from this point), and the materials do use a monolithic repository.

With that in mind, further we talk about how to structure a monorepo in a way that makes the experience of using it better. As there are several ways to approach organising a monorepo, and to understand what makes a good one, it's worth getting familiar with some of them. Finally, one reasonable approach will be introduced, and we'll learn more about it.

There is no official classification for monolithic repositories. Yet some of common traits can be derived after investigating and working with different projects. The experience shows that most of the time a monorepo is of one of the following types:

  • naive
  • based on services
  • focused on entities.

Let's have a quick look at these and consider their pros and cons. Then we introduce an alternative, the structured approach.

Naive Monorepo

A naive monorepo is exactly what its name says. It's what usually happens when a team decides to use a monorepo, without taking the time to learn and putting a good thought into how to make it work:

  • a new repository is created
  • then each of the existing services is copied over to the new repo, as is
  • done
  • alternatively, everything can be moved into one of the existing repositories, but this doesn't do much of a difference.

As a result, the team got just a rather large directory with lots of semi-independent parts, lots of duplicated scripts, files, words, whatnot. One of the immediate benefits seems to be that now something from service A can be imported by service B, but this will quickly become troublesome, especially if the original repositories did not follow the suggestions (such as) listed above, which is highly likely.

This approach has many downsides. Besides being messy and containing many duplications, it is the shortest path to creating dependency cycles. This can often lead to either a poor hierarchy and structure that is created just for the sake of breaking a cycle, or even to concluding that the monorepo was a mistake, and should be avoided. Neither of the those leads to productive and efficient collaboration.

It's hard to justify following this way. Instead, plan the migration, and then execute the plan iteratively. You need to come up with a good structure that suits well and will serve your needs for years. Don't rush, think, plan, try, and only then execute. Not the other way around.

After having realised that this way is not better than it was before the move, the team may decide to organise the repository somehow. One of obvious ways is to use a service as a boundary and splitting criterion. This is how a naive monorepo may evolve to a service-based one.

Service-Based Monorepo

The service-based approach is a slight improvement of naive. The main difference is that some of the duplicated components among services are unified, e.g. CI and building routines, but the codebase continues using services as boundaries for packages. Put simply, each folder at the root level contains a service along with everything that's in the service's scope - data types, business and transport logic, etc. When a service needs something that's already implemented somewhere, it just imports that. New functionality is being developed within the boundaries of a service.

While it might work for some time, such a repository still has exactly the same major downside as naive - it's too easy to end up with a dependency cycle, more so when you try to re-use some code with business logic. Also, there isn't much of an order, since data, logic and utility code are spread across the entire codebase.

A few other serious downsides enter the stage at this point, caused by importing different parts of various services:

  • increased sizes of binaries
  • increased compilation times
  • not always clear what to do with tests.

As the project evolves, it might seem natural to think of grouping code based on entities it belongs to. Here is how a monorepo may transform into entity-focused.

Entity-Focused Monorepo

The entity-focused technique is organising code around a particular entity. Packages are often created for different units of the business domain of a project, such as user, photo, library and so forth. Developers add logic into appropriate packages, and then use it in services.

This approach is a bit better than the previous two. It allows for working on services' part separately from the business logic, if implemented correctly.

Still, there are two potential problems:

  • different levels of representation and responsibility could be mixed together, such as data types, methods for accessing storage, business logic and transport details
  • a major risk in creating cycles, specifically at three levels:
    • entity - when several entities depend on each other
    • business logic - when a business process depends on the logic of several entities
    • transport and representation - when representations of several entities/processes depend on each other.

The second issue comes from a fact that it's rare for entities, business logic and their representations to be independent from each other, i.e.:

  • entities often aggregate or are parts of other entities
  • business logic for one process depends or is included into another process
  • representation for one area of the domain requires other parts.

Is there a solution to address these and the problems described above? How to organise a repository in a way which reduces the dependency cycle risk to a minimum (or better to zero)? How to organise a repository in a way when it's easy to re-use entities and logic in different services? How to make developers happier, and services better organised?

The first problem is to be addressed by making better package and architectural design decisions. The former is the subject for Unit 2 in this module, the latter is the topic for Module 3. Some suggestions will be given shortly, in the very next section.

There is no ultimate solution, of course. But there is an option which, if implemented carefully and everyone respects the process, can help to achieve better efficiency and maintainability.

The Structured Monorepo

The structured approach is based on grouping code by responsibilities, and levels where objects play their roles. In other words, things are put together by what they do and where they belong:

  • data layer (models)
  • database layer (repositories)
  • business processes (controllers)
  • transport representations (handlers) and so on.

By doing this way, we avoid problems described above, and get some additional benefits:

  • at the model level, any model can safely relate to another
  • at the business process level, any process is free to do the following, without the risk of introducing a cycle:
    • use any model or combine multiple
    • include any other business process, or be included in other business process, as a step
    • interact with database representations for the models it works with
  • similarly, at the transport level, any service can use or combine various business processes, and more.

The insightful reader might have already noticed that this has many similarities with properly designing and structuring a good single service. Moreover, if a single service has followed this way, adding another service wouldn't even require to do anything, since the project, hence the repo, is already prepared to accommodate as many applications as needed.

The Layout of a Structured Monorepo

Everything that has been discussed about different layouts so far comes together and applies to a monolithic repository. Taken into account, implemented and followed carefully, the practices establish the foundation for a good monorepo.

This is what a project may look like at this point:

  • the documentation is provided and up to date
  • the list of elements at any level is of a reasonable size
  • all maintenance scripts and other non-code files are organised
  • the entry points to services are located in the cmd directory
  • the binaries automatically go and picked up from the bin directory
  • code that implements the project's data and logic is grouped by responsibilities and roles.

A few questions inevitably arise while working on a reasonably large project within one or a couple of teams:

  • Where do we put code that is meant to be used by many services?
  • Where should utility code go?
  • How to gradually and safely introduce something new or a breaking change?

There is no simple and direct answer. It's also where good planning and thinking should be done, as well as some exceptions. With that in mind, let's consider the following suggestions that help to keep things at the right places:

  • use appropriate position at the file tree to reflect the importance of a package
  • organise utility code as own small standard library
  • keep breaking changes in sandbox.

Use Appropriate Position at the File Tree

Place packages appropriately in the file tree of a monolithic repository to reflect the importance and nature of a package.

What does it mean and what properties can we use to determine a right placement of a package? There's no hard set of rules. The following heuristics can help in understanding where a package should be placed:

  • An approximate position of a package in the dependency graph. A package that is imported by many other different packages should be located closer to the root of the hierarchy. A rarely imported package is most likely an implementation detail, and should go deeper in the tree.
  • The frequency of use. The position of a package is in direct proportion to of how frequently the package is used.
  • The importance of a package. Something that is unique, and provides and implements an important piece of functionality should be placed closer to the top.
  • The level of abstraction and role. The higher the abstraction, the higher the level at which a package should be placed.

For example, a package with code for converting internal errors from their internal representation to the external, and which is used by the most of the packages that implement application functionality, should be placed at the top level of the structure.

Another example is the packages that are the definitions of the business logic - packages with models, controllers, and the database layer.

On the other hand, a set of middleware handlers for an API should be located deeper in the tree, as it's used only in the context of API, and only by an instance of API. Similarly, routines for data validation, migrations, etc are better to be placed at the second or third level of the tree.

More on this to come in later units covering package design and architecture of a service.

Organise Utility Code as Own Standard Library

Organise, treat, and maintain all utility code as a small private extension to the standard library. If possible, consider releasing it open source.

This recommendation sounds controversial, and needs further explanations. To understand it better, we first need to clarify what differentiates utility code from other, and then go deeper into how to apply it in real life.

Utility Code

Utility code is code that implements technical details of a process, and is independent from the business logic of a project. The independence from the business logic is the crucial part that distinguishes utility code from any other.

The following traits are common for utility code:

  • it's independent from any other code besides the standard library and/or itself
  • most of the methods operate on built-in types, or on the types from the standard library
  • provides common and often used routines
  • it can be extracted as a separate library
  • it can be open sourced
  • provides methods that do not exist in the standard library.

Here are some examples of code that can be part of a private extension to the standard library:

  • managing the lifecycle of a process, including proper signal handling, graceful and forced shutdown
  • archiving/compressing directories
  • extended configuration and methods for the http.Client, such as custom transports, file downloading, etc
  • handling multipart uploads
  • advanced strings manipulation
  • some standard and generic data structures and algorithms, such as queues, graphs, lists, and so forth
  • concurrency patterns
  • file tree traversing utilities and so forth.

Having clarified what utility code is and what is not, we can discuss what to do with it.

Details and Discussion

To begin with, it's worth reminding one of the most often repeated mantras in the Go community:

Prefer duplication over the wrong abstraction.

– Sandi Metz, and many gophers.

This is true, and this section should not be considered as something opposite. The author is one of those who respects and follows this advice.

Nonetheless, many rules do have exceptions. It's more about finding what works. What is good for a small project/service, can be a poor choice when applied to multiple services. What's good at a small scale, can significantly complicate things at a larger, and vice-versa.

The definition of utility code given above implicitly prohibits building additional abstractions on top of the standard library code. It can be only considered as an extension.

Duplication works well for a small service, when a small team maintains a couple of medium-sized services. In other words, duplication suits well when it's used moderately and rarely, and in isolation.

Things are different in a project that is supported by one large or several teams, when it's a monorepo, when the number of services grows. The duplication approach is not scalable. Employing it at a larger scale leads to unnecessary duplications, mess in codebase, lack of guarantees, and makes the project prone to errors. The quality of testing decreases.

One of the biggest strengths of Go as a programming language, is that there is mostly one way for accomplishing a task. It becomes almost a requirement when working with many services. There should be only one way to download a file, zip or unzip a directory, parse an authentication header and so forth. Failing to acknowledge this can turn out to be a major source of obscure and nasty bugs at later stages of a project.

So when a project is a monorepo, and has more than one service, the following is true about utility code:

  • it will inevitably occur
  • it should be reliable and trustworthy
  • it should be tested
  • it should be maintained
  • it should be standardised.

One of the ways to provide these guarantees is to have such code in one place, and being conscious about and responsible for it.

In Practice

The guideline can be simply employed like this:

  • at the root level, have the lib directory as home for utility code
  • put packages inside lib
  • for packages those names clash with ones from the standard library, add a prefix, for example, x, xstrings
  • alternatively, keep names for packages as is, but have an agreement on how a custom package should be named when imported along with a package from the standard library with the same name. Do not use custom names for the standard packages, ever.

This approach also works especially well when implementations of some routines are different between the platforms your project supports.

As a result, the file tree may look like this:

├── bin
├── cmd
└── lib
    ├── process
    │   ├── process.go
    │   ├── process_darwin.go
    │   ├── process_linux.go
    │   ├── process_posix_test.go
    │   ├── process_windows.go
    │   └── process_windows_test.go
    ├── files
    │   ├── files_posix.go
    │   ├── files_test.go
    │   └── files_windows.go
    ├── http
    │   ├── http.go
    │   └── http_test.go
    ├── os
    │   ├── os.go
    │   ├── os_posix_test.go
    │   └── os_windows_test.go
    └── strings
        ├── strings.go
        └── strings_test.go

When applied and followed carefully, this advice helps to consolidate and maintain low level code, providing one way for accomplishing a particular task, giving more guarantees about the quality and correctness of the implementation, and reducing the maintenance cost.

Have Sandbox for Breaking Changes

Another important aspect that is different when working with a monolithic repository, is introduction of breaking changes, or entirely new code that is not proved to be stable. Within a monorepo, a piece of code is relied upon potentially in hundreds of places, so it's better to test a change in isolation, and only then use elsewhere. How do we do about it in a monorepo?

In a monorepo, have a special place for adding potentially unstable code. An experimental or unstable directory at the root level would be a good choice. Then, inside that directory, follow similar structure as if it were the root level.

Details and Discussion

In a classic scenario, dependency management tools usually solve this problem. A dependency that is used in many places, is updated, and the change is then gradually introduced. Go modules and/or vendoring are handy tools here, and this is one of the main reasons for their existence.

However, these tools are not available anymore for addressing this problem as all code is in a single repository. At least, not directly. It's impossible to vendor a part of a repository, or use a separate import path for a package that is part of a large module.

A solution to this problem has existed for many years, and is in use by various kinds of software, from private projects to the Linux kernel, and many established Linux distributions and software. A common name for the solution is "experimental".

What do we mean by "experimental"?

Of course, there are different stages in release processes, such as development, alpha and beta versions, release candidates and so on. Somewhere between development and beta there is usually an experimental branch, or a stage. After reaching some level of stability, the project transitions to next stage, usually testing, or beta. This is followed by many projects, yet it's not exactly what current advice is about.

This guideline is about having experimental code included in the stable version, and made available for conscious use. If the reader have ever configured and compiled a Linux kernel, they would recall the EXPERIMENTAL label for drivers that are included in a stable release, but are still in active development, and included for use as is, without any guarantees.

Similarly, even the most stable version of Debian and many other Linux distributions have an experimental section in their repositories, which contain early releases of software. Such sources are turned off by default, but the user is free to use it.

So here's what to do in a monorepo. When introducing a breaking change or something entirely new, and such code is not guaranteed to work or work correctly (hence it's not for general use), consider this:

  • add a package for new code, if it doesn't exist yet, to the experimental or unstable directory
  • add new code to the package
  • use it, test it, change it, prove it works
  • once confirmed working:
    • for new code, move it to the stable tree
    • for changes to existing code, depending on the situation
      • move changes to the stable tree and make necessary adjustments
      • alternatively, use type aliases to redirect uses from old code to new
        • test and prove it works
        • move changes to the stable tree.

An important note about the process is that the experimental/unstable tree must be kept in a good order at all times. As with utility code, without discipline, it's too easy to let these places become junk yards. Keep them clear from clutter by making sure everyone on the team is following conventions and the boy scout rule, move working code to the stable tree, eliminate unused and incorrect code.

Monorepo: Additional Chapters

The approach to maintaining a monolithic repository outlined above is a good starting point, and works quite well in most cases. At the same time, larger organisations with a larger number of teams and services, may want or need more flexibility when working with a monorepo. Let's think of potential reasons for that, and how to address the needs.

Open Questions

What questions are left with no answers by the structured monorepo? Some of them include:

  • What if we need to maintain both, older and newer versions of a package? For example, this could easily be the case when a large number of services work with the same model, say a core business unit. You have to update the model, you tried to find a way to make a backward and forward-compatible change, but it seems to be either impossible or too resource demanding. Had this model been in a separate module, a new major version with a breaking change could have done the job. But with a monorepo, everything is versioned as a whole, or not versioned at all.
  • What if some packages are changed too often, such that it's hard for other teams to keep their services up to date with the changes? Or, how to allow quick and frequent changes in code that is relied upon by other code? While this should not be normally the case in a well-designed system, this happens. There might be places in a codebase which are depended upon by hot execution paths, and for some reason changes to them are made quite often. If several teams are forced to perform routine updates caused by the work of another team, this can quickly become a major bottleneck, a source of anxiety and even lead to a (rather wrong) conclusion that monorepos don't work.
  • What if infrastructure and utility code changes often enough to bother other teams with the need to keep their services up to date? Similarly to the previous question, this normally should not happen. Yet, sometimes this happens, especially when the initially written low-level code is not optimal (someone quickly prototyped, and forgot to finalise the work by taking the draft to a solution that is done well), and at some point it needs to be refactored. In such case, there will be breaking changes, and having everything in a single repository dictates the need to adjust all depending services at the same time.
  • What if all of the above is the case, the number of people working on a project is large, as well as the codebase, which is slightly aged and has got a fair amount of legacy in it?

These are just a few questions that you may have, either because it's your case, or you know someone who is in a similar situation, or for any other reason. There are, of course, many other potential questions. Our goal here is not to find a silver bullet that is the ultimate answer to these and all other questions. Instead, we're trying to find out something to start with, where you can begin experimenting and working out your own way.

Further we consider two additional options that you can employ to help in maintaining a healthy monolithic repository as well as to keep your teams happy and efficient, as much as it is possible in this context. It's advised to use them only when the structured approach is not enough, and it has either already limited you in some way, or the analysis shows some evidence that it might happen soon, and other changes have been tried to improve the situation.

As with everything in life, any flexibility comes at a cost of increased complexity and responsibility, and you have to keep the latter two in balance. Abusing the complexity, or neglecting the responsibilities are two of the most common reasons for any failure, especially and very often it is the case in software development.

Having put an extra stress on the importance of being cautious with added complexity, let's look at ways for achieving additional flexibility. The first one offers the use of nested Go modules, and the second is a bit more radical, but still an option for situations when changes happen too often, and are too breaking. And finally, we look at the combination of the two.

NOTE: In the context of the next section we use the term "nested modules" for a Go module that is located not at the root of a repository, potentially having siblings in neighbour directories. We deliberately refrain from using the term "submodule" to avoid confusing it with a Git submodule. Git submodules are not considered in and are outside of the scope of the book.

More Structure with Nested Modules

Consider converting the top-level packages into nested modules when need more fine grained control and flexibility over packages' lifecycle.

This is possible because Go modules support nested modules, which are sometimes referred to as submodules.

As you know, a Go module is identified by the go.mod file. Each module must have this file, and it's location defines the root of the module. While it is the most often use case, and it is strongly advised to have a single module per repository located at the root, it's possible to maintain multiple modules under the same repository. It's also possible to version nested modules independently using Git tags of a slightly different form.

In general, the technique works as follows:

  • some of the top-level packages are separate modules
  • each such package, i.e. module, is versioned separately
  • each module can import another module, as if it was just a package
  • following the conventions set by semantic versioning is mandatory, i.e. each breaking and incompatible change requires a new major version of such module This allows the use of a separate import path by the code that is interested in the newest version, while the code that is yet to be updated continues using the older version.

Now let's look at an example. Here, we have a monolithic repository called monorepo owned by the myorg account. In the repo, we've got two packages - model and controller. Everything up until now has been as a single module, defined at the root of the repository. Our task is to convert the two packages into nested modules, and to version them independently. The execution plan consists of three steps, for each package:

  • initialise a module
  • commit and push changes to the remote origin
  • version the new module by creating and pushing a tag, paying special attention to the name of the tag, it's module_name/vX.Y.Z, not just vX.Y.X.

The final directory structure looks like this:

.
├── controller
│   ├── controller.go
│   └── go.mod
├── model
│   ├── go.mod
│   └── model.go
└── go.mod

Let's go look at each step in detail for each of the two packages, starting with controller. For simplicity, we omit creating a separate branch and pull request:

  • first, convert the controller package into module
cd controller
go mod init github.com/myorg/monorepo/controller
  • now commit and push the change
git add go.mod
git commit -m "Convert controller package to module"
git push
  • next, version the module by creating a tag
git tag -a controller/v0.1.0 -m "Convert controller to module"
git push --tags

# OR
git push -u origin controller/v0.1.0

cd ..

Now we handle the model package, assuming it's version is somewhat bigger:

  • convert it into module:
cd model
go mod init github.com/myorg/monorepo/model
  • commit & push
git add go.mod
git commit -m "Convert model package to module"
git push
  • finally, version the module
git tag -a model/v0.4.1 -m "Convert model to module"
git push --tags

# OR
git push -u origin model/v0.4.1

cd ..

And that is it.

This approach allows for versioning packages within the monorepo independently. The biggest advantage is that you can now easily introduce a breaking change under a new major version, starting from v2.0.0. A detailed conversation about versioning Go modules is the subject for next section, right after we finish with monolithic repositories.

All above is, of course, good, but what if you're larger than Google, and one monorepo is too limiting for your needs, or for some other reason the options discussed so far offer not enough flexibility? Well, would two or more monolithic repos help you?

Multiple Monorepos

In a perfect world, and for some teams, a single monorepo works absolutely well. But what if your reality is somewhat less than ideal, and one monolithic repository is not enough? If a single repository is too limiting or restrictive, that is probably not a sign that it does not work. Perhaps you just need opt for two or more monorepos, each carrying unique responsibilities.

We leave aside some obvious and legit cases where a large organisation has several monolithic repositories as a matter of separation of concerns. For example, one company acquires another, and the two companies under the same roof continue development in two monorepos. Legal concerns might be involved too, so this is an absolutely natural reason for multiple monorepos.

Similarly, we leave aside situations where a company starts experimenting in a new field, or a new project. For example, it's natural for a game studio to develop each game in a separate repository.

Nonetheless, there are questions that some may want to ask. What if, for example, the infrastructure-specific code is evolving too quickly? Or what if CI/CD workflows for client and service code are too different? What if services in the monorepo use so different technologies that it somehow resulted to incompatible workflows? Or, what if one team wants to follow GitHub Flow, and the other is willing to follow Git Flow? What if one team is simply not willing to adapt the practice?

The questions above, although sounding real, are a subject to the overall engineering policy and culture at an organisation. Normally, an organisation has some sort of shared vision where everyone is aligned with and agreed on the foundational principles, approaches and tooling. So if those sorts of questions are present within your organisation, and everyone is doing whatever they want, that's probably a topic for a separate conversation. Here we focus on the technical aspects of dealing with a problem where one monorepo is not enough. And while the author has failed to find an example of a real reason for this (besides a few mentioned earlier and the likes), it should not prevent us from exploring what's available.

Below we briefly explore several available combinations for consideration should you need this. It's obvious that all options that are mentioned, as well as anything that you come up with, have pros and cons, and here are a few common. Unless given explicitly, a common advantage is increased flexibility, either technical or organisational/legal, which comes at a cost of the downside in a form of exponentially increased complexity, and increased cost of maintenance. Instead of reducing entropy and chaos, they facilitate it.

NOTE: Bear in mind that the options explored further won't work for all teams, and finding a silver bullet is not the goal of the exploration. It should be clear that no one is enforcing anything upon the reader. Some teams consist mostly of full stack developers, while others prefer specialised engineers. Therefore, it's obvious what is good for one case, might not work for another.

Multiple Single-Module Monorepos

A simplest option, if applying this adjective is even possible once we're here, is to have a few monorepos that follow the structured approach described in detail earlier in the unit.

It's relatively straightforward, and enables for easier separation of concerns. This option might suit for cases when code needs to be split for both, legal and technical reasons, when alternatives are somewhat less optimal.

It might also be helpful when there is a strong desire or need in splitting utility code from business logic code. In such situation, the monorepo with utility code becomes an official in-house extension to the standard library, and the monorepo with business logic and services is focused only on the company's business domain.

Architectural Monorepos

Another option is splitting all code into few layers, and put each layer into a separate monolithic repository.

A large organisation may have a huge project, or many projects, each of them have at the very least three major components: infrastructure, client-side and services. In such case, it might make sense to organise code as follows:

  • all infrastructure and operations code lives in one monorepo
  • the second monorepo is devoted to client code
    • it's also possible to go further and split web clients from mobile, if needed
  • finally, the code that is responsible for handling and persisting data, i.e. services, goes into its own repository.

With this setup, all major components are split by their nature, and this might be beneficial for the teams, and processes.

Technology-Specific Monorepos

An organisation may have hundreds of thousands lines of code written using one technology stack, i.e. language, and then switch to another, producing another hundreds of thousands of lines of code, or millions. And while this is not necessarily a goal on its own, it might be beneficial to split code into monolithic repositories focused on a specific technology.

This option may work quite well when a company is transitioning from one stack to another, or works with several in parallel. Go code stands out in that it's much easier to support compared to other languages, besides maybe a case when you have no code at all. So it might be reasonable to put all Go code in a separate repository, and say C++ code in another, and then Javascript in the third.

Multiple Monorepos with Nested Modules

Potentially the highest degree of flexibility, along with complexity and chaos, can be achieved by employing multiple monolithic repositories with multiple independently versioned modules in them.

It's strongly advised against using this option, but it's still available. If your organisation is ready to pay the price of this flexibility, or if there is a real need in it (which is highly unlikely), then this is technically possible, and, with very high levels of personal responsibility of each individual and collective responsibility of the engineering department as a whole, might work. Depends on you.

Closing Notes

Maintaining a monolithic repository that several teams are working with, is not an easy task. The experience shows that this is not only possible, but is also highly beneficial when approached consciously and organised, when the vision is shared by everyone involved in the process, when the culture is strong, and people understand the importance of keeping things in a good order.

The techniques described in this section are not new, unique, silver bullet, nor a panacea. But they have proved to be helpful for projects of various sizes, in small and larger teams. Give it a go, experiment, adjust to your needs, and find what works and what does not.

Versioning and Go

The unit won't be complete without considering such an important topic as dependency management, versioning and distribution in the context of the project layout. Thankfully, in Go this is addressed mostly with one of the most significant changes over last several years which is Go Modules.

Although Modules was released more than 2 years ago, many individuals and companies have only started adopting them. Partially this is due to laziness that is often masked under complains about lack of documentation. There is, actually, lots of high quality content on the topic, as Russ Cox published an extensive and exhaustive set of articles on his blog while Modules were going from the proposal to the final release. Partially this is due to a generally slow adoption process when something new is introduced.

Anyway, over the course of last year, the situation with documentation has significantly improved. One of the best resources so far is the Go Modules section on the official Go Wiki. Yet, developers are often not confident when it comes to either performing a major dependency update, or releasing a major version. Many projects are still using dep. And while there is nothing wrong with dep or the use of it, the tool is now obsolete, and it will be restricting its user in many ways. So it's better to migrate to Modules, and benefit from what Go has to offer.

This section summarises the learnings and experience gathered since when Modules were first mentioned. If you've already read all of the official materials mentioned above (Russ's posts and all posts in the official Go Wiki and blog), then you probably learn a little from the section. However, there is some practical advice that might be useful. And given the current module is called "The Style Guide", consider everything below as a recommendation.

The outline:

Notes on Terminology

This section clarifies the terminology used further around branches in Git and GitHub. This is important because with the introduction of Go Modules, depending on the approach that is taken to versioning, the meaning of branches may change.

When working with Git, and especially with GitHub and the likes, there are two special kinds of branches:

  • the main branch that is considered as the most recent/stable branch
  • the default branch that is:
    • the default branch in Git
    • the default base branch for new pull requests
    • the default branch shown when the repository is visited on the web
    • in Git, versions prior to v2.28.0 defaulted to master. Starting from v2.28.0 it allows specifying an alternative name
    • on GitHub, before October 1, 2020 it was master. Starting from the day it's main.

Knowing this two terms is important, as well as understanding the role and meaning for the workflow.

The Main Branch

Historically, and until recently, the main branch has been referred to as master. From this point and further in the book, this branch is referred to as "the main branch", or simply main.

The main branch in a project usually contains the most recent version of code. It’s also widely accepted in the open source community that the main branch is the bleeding edge, containing the latest features, thought might not necessarily stable.

The meaning of the main branch in organisations is slightly different. Often, instead of being the cutting edge, the main branch represents the most recent version that is shippable to the customer and/or deployable to production. Of course, if an organisation follows The Git Flow, then the meaning of the main branch is different. We refer to the main branch in a sense that is closest to The GitHub Flow.

It's fair to say that, for the majority of companies, the main branch is the most recent stable branch that is currently in production.

So from now on, when you see "the main branch" or main, and in your case it's still named master, don't be confused.

The Default Branch

The default branch in a repository is the one that is:

  • chosen and its contents shown when you visit the project’s homepage
  • default base branch for a new pull request.

There are a few notes about the default branch. The first is that with GitHub, we've become accustomed to seeing the most recent version of a project when we visit the repository. Very often, the default and main branches in a project are the same branch.

On the other hand, sometimes the default branch is not necessarily the same as the main branch. This becomes especially important the introduction of Go Modules.

Your users, including you, probably like the experience that is familiar. Getting immediate access to the most recent version without needing to switch the branch when you visit a project page on GitHub saves some time and energy, and prevents from unnecessary distractions. So it would be great if it's possible to keep things simple.

As we shall see, Go Modules introduces ways that can disrupt the existing workflows under certain conditions. We will also learn how to keep things in a good order, such that versions are properly and conveniently managed, and users are happy with quick and easy access and to the most recent code.

Prerequisites and Assumptions

As we're going to look into some practical aspects of versioning after learning it in theory, it's good to introduce agreements about the environment in which all that works.

For the first, the most important thing here is the version of the programming language. It's worth a reminder that in Go only two most recent versions are officially supported. At the time of writing, the latest stable version is v1.15, and the other is v1.14. Anything else is unsupported, and should not be in use. If you do, then it's strongly advised to perform an update to the most recent version, for many reasons.

So the first agreement, and a prerequisite, is the version of the language. In everything that is related to versioning and Modules, the book assumes that you use at least version v1.14.

The second important thing is what is used for versioning and dependency management. Although there are alternative opinions and tools, Go Modules is the standard, de facto and de jure. Moreover, it has already proved to be a convenient and handy tool for both, managing dependencies and managing the release cycle of a software project.

So the second agreement and a prerequisite is about versioning, dependencies and tooling. From the first letter to the last, the book assumes that you use Go Modules for dependency management and versioning in Go projects.

The third agreement is implied by the previous, and is about versioning strategy. In modern times, people increment version numbers in any way they like. This kind of freedom is good to some extent. However, in this work, and this is aligned with the official approach Go takes to versioning, the Semantic Versioning model is used. In fact, the way Go manages dependencies with the minimal version selection algorithm is built upon conventions established by Semantic Versioning. Similarly, when and what kind of a change triggers increase in what part of the version, is governed by Semver.

As a quick reminder, Semantic Versioning means that for a version number MAJOR.MINOR.PATCH each part is incremented as follows:

  • MAJOR when a breaking change is introduced
  • MINOR when a new feature/update is backwards compatible
  • PATCH when backwards compatible bug fixes are made
  • also, additional labels for pre-release and build metadata can be added and incremented to the MAJOR.MINOR.PATCH format.

Semver has proven to be a meaningful and convenient approach to managing software version. It works perfectly well with a standard, slowly increasing versions model. It also works well for one approach that has become quite popular in recent years. You, of course, have seen versions like v2020.08.12, which can be generalised as YEAR.MONTH.DAY or YEAR.RELEASE_IN_THE_YEAR.FIX. It's less flexible as compared to the classic model, but still fits nicely into the semantic model.

Let's quickly recap the prerequisites:

  • Go version is at least v1.14
  • Go Modules for dependency management and versioning
  • Semantic Versioning is used to decide when and what version is incremented.

Approaches to Versioning

With modules, there are three main strategies for versioning a project. The official Go Wiki provides some guidelines on how and when to use which. In this section we review what the options are, how each works, and when to use it.

Building a firm understanding of how all this works and how it is represented, is required for working with modules efficiently. For this purpose, there is a quick review of important technical aspects of Modules.

After the review, we consider three approaches to versioning a Go project, namely:

  • Directory-Based Versioning
  • Branch-Based With Changing Default Branch
  • Branch-Based With Rolling Main Branch.

Review

Getting familiar with the terminology of Modules is helpful in becoming confident when working with it. The bare minimum includes understanding that:

  • anything that has not been tagged yet, is implicitly considered as v0.0.0-YYYYMMDDHHmmSS-first12SymbolsOfCommitHash
    • e.g. v0.0.0-20190827140505-75bee3e2ccb6
  • anything that has been tagged with v2+ but is not a module (i.e. just a tagged release), is implicitly considered as vX.Y.Z+incompatible
    • e.g. v2.2.1+incompatible
  • a correct major version must be set in the go.mod file for versioning to work for versions starting from v2
    • e.g. module github.com/awslabs/goformation/v4
  • each major version requires its own go.mod file (follows directly from the previous point)
  • an appropriate and valid tag is a must for versioning to work properly
  • the previous major version must exist in order for the next to work properly on the client
    • i.e. a client fails to use v2.0.0 if there is no a single release tagged with v1.y.z
  • backwards compatibility and introduction of breaking changes is governed by the conventions set by Semantic Versioning. I.e.:
    • only fixes to stability/correctness/security are allowed in a patch-level release vx.y.Z
    • new features that don’t break backwards compatibility are allowed in a minor-level release vx.Y.x
    • breaking changes are only allowed in a major-version release vX.y.z
  • unless a version is given explicitly, with go get/go get -u updates are performed only within the same major version, i.e. up to the latest available MINOR version.
  • go get github.com/someorg/somepkg updates the package to its latest version
  • go get -u github.com/someorg/somepkg updates the package and all its dependencies to their latest versions, and this is where the Minimal Version Selection algorithm works.

Keeping these in mind and following the rules is crucial for versioning to:

  • work
  • work as expected
  • work correctly.

From now on, the assumption is made that the reader has familiarised themselves with the basics and theory of Modules.

Directory-Based Versioning

The directory-based approach is the most conservative option that is available. New major versions live in the corresponding directories, in the same branch.

What It Is

With this technique, all versions reside in the same branch, but under different, major version-specific top-level directories. When a new major version, vN+1 is about to be created, all code from the current version, vN is copied into a new directory, vN+1. Each version is tagged.

How It Works

The first stable version is kept at the root level of a project. All new major versions are housed in directories that are created in the root directory, named after the version they represent - v2, v3, and so forth.

Essentially, this technique works as follows:

  • everything resides in the main branch
  • while the project is in active development phase, releases are tagged with v0.y.z
  • when the project reaches maturity, it’s tagged with v1.y.z
  • when there is a breaking change, a new directory v2 is created AND:
    • all the code is COPIED over to the new directory
    • also, the go.mod file
    • the module statement is changed in the go.mod file to reflect the new version, i.e:
      • module github.com/myorg/mylib/v2
    • when needed, a new version is tagged with v2.y.z.
When To Use It

The main benefit of this option is backwards compatibility with pre-Modules versions of Go. In other words, it's recommended for the projects that have backwards compatibility with old versions of Go as one of the main goals.

The Go Team has backported a basic support for Modules to 1.9.7+ and 1.10.3+, and improved support in 1.11, so these versions are able to consume modules with versions v2+.

Weak Points

Although the method is the most conservative and allows for the widest range of supported versions of Go, it has some obvious and hidden downsides.

Firstly, it opens up a possibility for misusing Modules and breaking the client of a library. The major downside of the method is that it’s too tempting to begin sharing code between the versions. If there is a need to share code between versions, pull it off to a separate module (i.e. library).

A second downside is pollution of the root directory if the versioning policy is aggressive, and new major versions are released quite often. For example, go-redis/redis has already got v8. If they followed this path (thankfully they don't), they would have ended up with something like below plus all the files they have:

v2
v3
v4
v5
v6
v7
v8

The only real strong reason for using the major directory approach is backwards compatibility with old, officially unsupported versions of Go. Unless it's a main goal of your project, using this method is not recommended.

Branch-Based Versioning

A more convenient and the intended approach to versioning is based on branches. This approach fully utilises the Go modules functionality that is baked into versions v1.12+ (although v1.11 was the first to support modules, it’s was more like a preview).

What It Is

Each major version is kept in its own branch. When working on patches and minor updates for a version, a working branch is based off of the corresponding home branch for that major version. When a breaking change is introduced, a new major branch is created, and is tagged accordingly.

How It Works

Before going into detail on how this method works, a couple of things are worth noting:

  • strictly speaking, a new branch is not required, as long as a change can be introduced without breaking the existing code
  • by default, the Go Team assumes that you use the main branch to house v1.y.z.

The way this technique works can be summarised as:

  • everything resides in the main branch
  • while the project is in active development phase, releases are tagged with v0.y.z
  • when the project reaches maturity, it’s tagged with v1.y.z
  • when a breaking change is about to be introduced, a major new branch v2 is created AND:
    • the module statement is changed in the go.mod file to reflect the new version
      • i.e. module github.com/myorg/mylib/v2
    • when needed, a new version is tagged with v2.y.z
    • all changes to v2 are targeted towards the v2 branch. I.e.:
      • the base branch for any PR that touches v2 code is v2
      • effectively, the v2 branch is now a second main branch in the project, for anything related to this version
  • when there is a need to introduce a fix or backport a change to v1, it's done as it was done before:
    • the base branch for any PR that touches v1 code is the old main branch (main, master or something else).
When To Use It

In general, this way is what the Go Team has planned as the intended use of Modules, and what it is designed to work together the best with. It has mostly advantages, including:

  • nice separation of concerns
  • clear boundaries between incompatible versions
  • ease of use
  • ease of automation.

However, these guidelines advise against using this method, due to its downsides, which are not as obvious at the first glance, and in favour of the third option.

If, after reading about the rolling main branch technique, that approach does not seem appropriate, then default to this, simple major-branch based way.

Weak Points

From a purely technical perspective, this approach has only one, rather minor, downside which is just a new way of working. At the same time, there is a bit bigger question about the changed semantics and perception of the two branch terms introduced in the Notes on Terminology.

In this context, the following two questions arise:

  • what branch do I use as the main branch?
    • i.e. what is my current, most recent shippable/deployable version, or the branch containing experimental changes?
  • what branch do I use as the default branch? I.e.:
    • what is the branch to show by default when my repository is visited?
    • what is the branch new pull requests should use as the base branch?

With this technique, the answer to the first question is unclear. By the classic meaning, the bleeding edge should be pointed to the branch that has the highest vX, or to the branch that is even untagged. By the meaning employed by organisations, it should be the branch that is currently deployed in production. Again, it should be the most recent vX. But the community seem to still consider the main branch that is the home for v1 as a main branch.

A similar question is about the default branch. It’s reasonable to say that when you visit the homepage of a project, it’s natural to expect the documentation and code shown to be up to date. Therefore, the default branch for a repository should be changed each time when the major version is incremented, which means a new major branch is created. If the latest version in a project is v3, then the default branch in the repository, for convenience of the users and developers, should be set to this branch.

Besides being technically beneficial, this approach has a philosophical downside - the meaning of the main branch of a repository is changed, and now there is no permanent home for the latest and greatest changes. For other languages, this still holds.

A better option, which has the same technical advantages, but also does not have the methodological downsides of the this one, is a slight modification of major branches and a rolling main branch.

Branch-Based With Rolling Main Branch

It would be great to have the questions, left by the previous option above open, answered. It would be also great to keep the meaning of the main branch. As it turns out, there is a way to address all of these.

This approach is an extension to the previously described method, with one, rather significant difference.

What It Is

In essence, the main branch always represents the most recent version, whatever it is. When a new major version, vN, is introduced, the previous major version, vN-1, is kept in the appropriate major branch, vN-1, for N >= 2. All new development continues going in to the main branch. Hence, everything remains as it should be, in a good order.

How It Works

A project starts with the main branch and no versions. At some point, the first version is tagged in the main branch, and it remains here. When it’s time to work on a new major version, prior to making any breaking changes, a stable version-specific branch is created, v1. This branch is then used for all subsequent minor and patch releases, they are based off and merged into this branch, when making changes to v1 code. As of this point, the main branch is the home for the new version, and is tagged as minor and patch releases are prepared. When it’s time to release another major version, the process is repeated, keeping a snapshot of the current stable version in v2`.

This approach ensures the following:

  • the main branch remains the most recent and stable version (as you merge only code that is proved working)
  • the main branch continues to be the source of truth
  • the default branch and the main branch are the same.

This is how it works:

  • everything resides in the main branch
  • while the project is in active development phase, releases are tagged with v0.y.z
  • when the project reaches maturity, it’s tagged with v1.y.z
  • right before working on a breaking change, create a snapshot of the stable code in v1
    • and continue using it for updates and fixes to this version
  • a new, potentially, incompatible, next version is now located in the main branch, AND
    • the module statement is changed in the go.mod file to reflect the new version, i.e module github.com/myorg/mylib/v2
    • when needed, a new version is tagged with v2.y.z
  • when it's time for the new version, right before working on a breaking change, create a snapshot of the stable code in v2
    • and continue using it for updates and fixes to this version
  • a new and, potentially, incompatible, next version is now housed in the main branch, AND
    • the module statement is changed in the go.mod file to reflect the new version, i.e module github.com/myorg/mylib/v3
    • when needed, a new version is tagged with v3.y.z
  • repeat.

This model allows an easy to understand, simple to work with and clean solution. The questions about the meaning and roles of the main and default branches are answered.

When To Use It

The guideline is to prefer this method whenever possible and by default. Only use the alternatives when (not if) there is no way this approach can work.

Weak Points

As this technique is an extension of the major-branch approach, and it has addressed the weak points left since, it's hard to find a reasonable downside.

Of course, someone could argue that:

  • it does not support the pre-Modules versions of Go
  • there is some added process into when to do what.

However, the reality is such that:

Summary

As we saw, for a new project, it's important to make a right decision at the beginning, as it significantly simplifies things going forward. Nonetheless, even changing the approach for an existing project from one to another is surely doable, it just requires a little more effort. What is important, however, that the method used for versioning, is convenient, and is not limiting when looking into the future. Pro-actively preventing from accumulating legacy, or actively working on reducing the amount if it exists, is always worth the effort.

The following table combines together the facts that are likely to be involved when deciding on the approach to versioning.

NameHowNextDefaultMainGoWhen
Major Directorydir
vN
dir
vN+1
mainmainv1.9.7+
v1.10.3+
v1.11+
Support for old versions
Major Branch
v1 in main
branch
vN
branch
vN+1
latest vNlatest vNv1.11+When the major branch with rolling main does not suit
Major Branch
rolling main
branch
vN-1
mainmainmainv1.12+By default, the preferred choice

So choose wisely, experiment, implement and move on. In general, stick to the third option. :-)

Implementing Versioning

Having reviewed the theory in the preceding sections, let's now turn to the practical side of the subject. Specifically, there are three basic questions about versioning:

  • What does the initial migration to versioning look like for a stable project?
  • What does the process look like within the same major version?
  • What does the process look like when a new major version is introduced, assuming versioning has already been implemented?

This list is, of course, not exhaustive, and barely scratches the surface of all possible cases. For more information, see the Modules page on the official Go Wiki. This section aims to provide the reader with guidelines on how to achieve versioning. If a situation you're in is more of an edge case, and is not described here, then it's likely to be covered by the Go Team.

Overview

One last pre-flight check before moving on to the practical advice. The procedures and steps described further rely on certain assumptions, and will not work if some preconditions aren't met. Here we briefly remind what those conditions are.

The following is considered as a given, and further is implied to be true:

  • Semantic Versioning is used, and strictly followed
  • Go Modules is the default
    • any library/project that has not been migrated to Modules yet, should be migrated before proceeding
  • No backwards compatibility with pre-Modules versions of Go, including:
    • 1.9.7+
    • 1.10.3+
    • sometimes, 1.11
  • No backwards compatibility with projects/libraries those dependencies are managed with dep or anything else but Modules.

With this in mind, let’s move on to concrete steps towards versioning.

Initial Migration to Versioning

Initial migration to versioning may be required in several cases. For example, a library might have been already stable for a while, but has not been versioned yet. This is quite often the case for libraries developed by companies and used internally. The same may as well apply to an open-source package.

You may also want to perform the initial transition to the Go way of versioning even if your project already uses versions in its simplest form, i.e. just tags.

The steps below assume the first case. However, the procedure will also work for the second scenario, with the only difference being the starting point. The instructions assume that the current state of a library is stable, and the intent is to migrate and version it properly, without breaking potential clients that might the project. So it's cautious and has steps that are aimed at providing the strongest guarantee.

This is how the initial migration may look like for a previously non-versioned project to versioning:

  • pull the most recent version of the main branch
  • create a tag v0.1.0 in the main branch
  • push the tag
    • at this point, all clients that have a pseudo-version v0.0.0-etc will be able to get this version (if don't use vendoring and are running go get, go build or any command that implies downloading modules)
    • if migrating a project that already uses tags, then increment the MINOR version
  • optionally, update clients explicitly, if needed, but this is superficial, due to the point above
    • at this point, we have to fix the first stable version, as there is no such a version as v0, hence there is no way yet to introduce a breaking change, as we need to landmark the stable version
  • create a new branch off of the main branch, v1
  • checkout to v1
  • push the branch to GitHub
    • from now on, this is the new home for all further changes related to the initial version
    • mark the branch as protected to avoid deleting it by accident
  • create a v1.0.0 in the main branch
  • push the tag
    • at this point, clients that previously had a pseudo-version v0.0.0-etc and used autoupdates, were migrated to v0.1.0, and continue using it
    • clients that haven’t attempted updating, or use vendoring, remain on a pseudo-version v0.0.0-etc, and will continue using it
    • even if someone updates a client, as the v1.0.0 points to the same stable code, so nothing breaks
  • optionally, update clients explicitly, if needed
    • at this point, any client that has been migrated to v1.0.0, will not start using any other major version unless explicitly told to do so using go get with an explicit tag or latest, and import of the versioned module
  • checkout out back to the main branch
  • branch off of the main branch to make a change to the go.mod file
  • increment the version of the module in go.mod, i.e.:
    • module github.com/myorg/mylib/v2
  • commit, push
  • open a PR targeted to the main branch
  • pass review, merge the PR
  • pull the updated main branch
  • create a tag with the incremented major version, v2.0.0
  • push the tag
    • don’t worry about clients. Unless explicitly updated, they remain on the previous stable version, v1.y.z
  • from now on, you’re free to make any breaking changes
  • from now on, follow the processes described in sections below:
  • celebrate success.

Although this process may seem a bit complicated, it's rather not, and is required only once for a module. By doing this way we can be sure the users of a package won't be accidentally affected.

Changes Within One Major Version

Assuming that the initial transition has been done, making a new MINOR- or PATCH-level release within the same major version is straightforward. On average, it looks like this:

  • branch off of the major version’s home branch
  • make changes
  • commit, push, repeat
  • open a PR targeted at the major version’s home branch, pass review, merge the PR
  • pull the updated major version branch
  • depending on the level of change (MINOR or PATCH), figure out the last version for that level
  • create a tag with the incremented version for the level
  • push the tag.

Introducing a New Major Version

Introducing a new major version involves a little more steps as compared to minor/patch. However, on average, this should probably happen not too often, a couple of times a year. The frequency of this procedure depends on the aggressiveness of the release cycle.

So, assuming that the project has already used another major version before, this is is what it looks like:

  • suppose the current version is v2, and a new major version, v3, should be introduced
  • before any changes, create a branch named after the current major version, based off of the most recent version of main
    • in this case, it's v2
  • push the branch to GitHub (or the service you use, obviously)
    • additionally, if the functionality of the service allows so, mark the branch as protected to guard against an accidental force-push or removal, as this branch is, from this moment, essentially, the home for the previous major version for as long as that version will be maintained
  • figure out the last minor version for this major version, v2
    • let's suppose it's v2.3.0
  • from the new home branch of this version, the v2 branch, create a new tag with the incremented minor version
    • in this case, it's v2.4.0
  • push the tag
  • branch off of the main branch to make a change to the go.mod file
    • this should be a regular, short-lived development branch
  • increment the version of the module in go.mod
    • i.e. module github.com/myorg/mylib/v3
  • commit, push
  • open a PR targeted at the main branch
  • pass review, merge the PR
  • pull the updated main branch
  • create a tag with the incremented major version, v3.0.0
  • push the tag
  • make coffee.

Notes on Automation

For a developer, thinking of automating a repeatable, prone to human errors, or just boring process is natural.

However, to better understand the process, it's worth practicing it to a point when you know what the steps are, what exactly they do, and why a certain step is needed. Only after having practiced it enough, when you fully understand the task, attempt automating it. Learning the process by hand gives a sense of what is going on, and helps in understanding of what to do if something goes wrong, or if it does not work as expected. Automation based on a solid experience is more likely to be solid and robust. Try doing this earlier, and the flaws in your knowledge leak into the implementation, making it flawed too, in some way or the other. This is a general principle of automation:

Whenever the legal moves of a formal system are fully determined by algorithms, then that system can be automated.

– John Haugeland, Artificial Intelligence: The very idea (1985)

In the context of versioning, there are some steps that can potentially be automated, although not all of them seem to require it. For example, the initial transition to Modules highly depends on each particular project. Attempts to generalise it would require making lots of assumptions and guesses about the state of a project before conventing it into a module. As this process is required only once in a blue Moon per project, there is little to zero value in automating it beyond what the Go toolset already offers.

Potentially, the task of introducing a new major version might be a good candidate for future automation. The procedure described in the corresponding section is straightforward, and comes down to:

  • determining the current MAJOR version, vN
  • creating a major version branch for keeping the state of this version, vN
  • pushing the branch
  • determining the current MINOR version, vN.M.P
  • switching to the branch
  • tagging it with the next MINOR version, vN.M+1.0
  • pushing the tag
  • switching back to the main branch
  • branching off of it
  • increasing the MAJOR version number in the go.mod file, vN+1
  • pushing, merging, pulling the main branch again
  • tagging it with the incremented version, vN+1.0.0
  • pushing the tag.

On the other hand, creating a new major version is an important process, a significant step in the release cycle, and it is non-linear by its nature. It might include some extra steps requiring human interaction, approvals, or anything else. As long as a tool is able to reliably do the work, and properly handle potential failures, it might be useful. Otherwise, the world is already full of sub-optimal and poor programs, and it won't benefit from one more piece of software, "hacked" over night.

Another part that can be automated is introduction of a new MINOR or PATCH release within the same MAJOR version. This process is mostly the same as with other languages, with the only difference being that if an update is being introduced to a major version that is not the current major version, a tool must take into account that a new tag must should be created from the corresponding major branch.

Some steps can definitely be automated though. We first need to seek for the most likely sources of errors. An obvious candidate, and the most boring thing to do, is figuring out the previous version tag and creating a new tag with a proper version. It's a simplest step, but its simplicity is deceiving, as messing it up once could cause issues to the users of a project. That's why having a command that shows the current and next versions would be helpful.

Stick to the KISS principle and UNIX philosophy. There is a set of powerful tools which are available at our disposal, without the need to reinvent the wheel. Some of these include:

  • make
  • git
  • awk
  • sed.

With some thought and testing, a library can be provided with a minimal set of PHONY targets in the Makefile that simplifies calculation of a version number. For instance:

  • make next-minor
    • returns the next minor version for the current version (based on the branch, existing tags and go.mod)
  • similarly, make next-patch
  • make release-minor
    • in addition to properly calculating a version, it creates and pushes a tag
  • similarly, make release-patch.

Even such a minimal set of tools is enough to simplify everyday (in reality, weekly) maintenance of a library to not worry about the process much, if at all.

Conclusion

This section has shown that choosing and implementing a good approach to versioning is important for the lifecycle of a project, regardless of its size.

A versioning strategy has effect at various levels, ranging from what branch developers start off and merge into everyday, all the way up to what end users of the software see by default when they're visiting the homepage of the project's repository. A good strategy, as all good things, reduces the mental load and saves time, whereas a poor strategy confuses and requires extra steps to get something which should be accessed easily.

A good strategy also works both short- and long-term. In short-term, it allows for simple and straightforward introduction of a new major version. Long-term, it allows maintaining multiple major versions easily for as long as you should as a responsible developer. A good approach is also universal, as it does not depend much on a particular technology.

The third option given here, the Major Version Branch with Rolling Main Branch, looks like a candidate for a good one. It does not actually depend on anything specific to Go. It works for other technologies as well as it does with Go. It's good short- and long-term, and it does keep the conventions that the community is accustomed to, along with keeping the meaning of the main and default branches. It addresses the needs of versioning without raising new issues. This is a good sign.

Notes on Release Notes

One more thing to talk about before wrapping up the project layout guidelines, and a follow up for the previous topic, is release notes. We will briefly discuss some ideas for better communicating what's changed in a project.

This is important for both, internal and external audiences. For a private project in an organisation, the internal audience is developers who use the software; the external audience is management and other potentially interested parties. When done well, and everyone knows where to look at, release notes that are published regularly and reflect the work that has been done, serve as a summary and description of the outcomes.

In an open-source project, the internal audience is committers and maintainers; the external audience is users of the project. For the internal audience, often working from all over the world, this helps with keeping everyone in sync with what is going on in the project. For the users, it's the only convenient way for knowing about changes, and it provides information required for planning and deciding on when, and often - how, to perform an update. Of course, in both cases, private and public projects, audiences may, and often do, overlap.

Although the reasoning above just touches on the topic, this is already sufficient to understand the importance of preparing and publishing some form of release notes. This should be done for each and every release. Needless to say that a MAJOR version must come with a comprehensive description of the changes it contains.

The main guideline of this section is summarised as provide your project with a changelog, informing the users of the software about the changes you've made. A few practices help in achieving this:

  • being generally disciplined, in the process, keeping things in a good order
  • writing good commit messages
  • using annotated tags
  • and sticking to good practices of maintaining a changelog.

A changelog is not release notes though. A changelog is more on the technical side of a project, while release notes are more on the product, or even marketing side. For software projects whose target audience is developers, the terms may be interchangeable, but not necessarily are.

As this book is more about the technical side of things, we're focusing on maintaining a good changelog, and related activities. Let's look at these a bit closer.

Write Good Commit Messages

Perhaps there are not so many topics in software development with such a long history of attempts to establish good practices, and make everyone in a team to follow them, as writing good commit messages.

One of the famous and best resources that summarises what needs to be said about writing commit messages, is this article. If you haven't seen and read it yet, stop reading this, and don't get back until that article is fully read.

A quick summary on how to do better with commit messages:

  1. Separate subject from the body with a blank line
  2. Limit the subject line to 50 characters
  3. Capitalise the subject line
  4. Do not end the subject line with a period
  5. Use the imperative mood in the subject line
  6. Wrap the body at 72 characters
  7. Use the body to explain what and why vs. how.

This is important for many reasons, such as overall order, clean history, transparency and the ease of working with the commit tree. Good commit messages become even more important as it reduces the effort required to gather the list of changes for the changelog.

Use Annotated Tags

By default, create annotated Git tags.

As you know, Git offers two flavours of tags - lightweight and annotated. A lightweight tag is a pointer to a specific commit. An annotated tag, on the other hand, is a full Git object, which means it contains the following information:

  • the checksum
  • who has made the tag, when, and their email
  • a tagging message (pretty much a commit message, hence same rules apply)
  • can be signed with GPG (therefore, verified).

The official Git documentation recommends using annotated tags.

Since each tag carries all identity information with it, this helps in figuring out what was changed, when, where and by whom. Combined with good commit messages, annotated tags provide enough information for preparing the list of changes for the changelog. The worst case scenario would be when in order to create a changelog, the information contained in the repository was not enough, and you had to use multiple sources, including the task tracker.

By writing good commit messages and using annotated tags, you have helped your future self with preparing for writing a good changelog.

Maintain the Changelog

To keep the audience of a project informed about changes, maintain the changelog.

What makes a good changelog? What is important, and requires a special attention? Consider the following suggestions:

  • remember that the changelog is for people
  • add an entry for each version
  • keep records in the descending order, i.e. the latest version comes first
  • always include the date of a release
  • group changes by their type
  • be consistent with the style and wording
  • provide only relevant information.

The sub-sections below go into more detail on these suggestions.

Changelog is for People

First and foremost, changelogs are written to be read by humans, not machines. Ultimately, the changelog should answer the question: will my software work if I update this dependency, or does the update break anything?

Having developed and maintained software that was relied upon in everyday work by developers, both internal and external, it’s safe to say that you either only provide good and valuable changelog and/or release notes, or nothing at all. Something in the middle, an automated non-sense or irrelevant information are just noise that no one is interested in.

The changelog is where engineers look at while making decisions about using a library or updating it. Keep in mind that "engineers" means not only software developers that use the project in their environments; the target audience, depending on the popularity and how widespread the project is, also includes:

  • system administrators and operations engineers
  • SRE and Devops people
  • application security analytics and engineers
  • and whoever is working with them, such as support and sale teams, lead engineers, product managers, designers, etc.

Given this, make sure that the changelog provides what it is expected to provide - the information about what's changed, when, how it affects the user, and what to pay a special attention to.

Log Each Version

Add each version to the changelog.

Although this is self-evident, and does not require further explanation, it's worth reminding that the user should be able to see what has been changed in each particular version of the software. If the software is a library, especially a public one, and is used by other developers in their products, this is a must.

Last Version Comes First

Always add the latest version at the top of the changelog. In other words, it's similar to a stack data structure, versions in the changelog are listed in a reverse order.

This is important for a few reasons:

  • For the first, the information of interest should be accessible as quickly and easily as possible.
  • Secondly, in most cases, the user is interested in recent changes rather than in earlier versions of the software.
  • And finally, this helps in saving a tiny but important bit of mental energy, as neither the user, nor a maintainer, has to scroll the entire list to get to the point when reading or updating the changelog.

Show The Date of Release

Provide each version with the date of release. Use the ISO-8601 format, aka YYYY-MM-DD.

This is essential for keeping track of changes, and it's hard to tell what's more important - the version number, or the date when it was released. Without either, the information is not complete. The date of release establishes a concrete point in the timeline, and allows for relating to a particular version, and correlating between the date and various events.

The format requirement helps to avoid confusions caused by the variety of regional formats. The ISO-8601 format is well-known, simple and easy to understand. Just stick to it.

Group Changes by Type

To simplify reading of a changelog, group changes by the type of a change. This is helpful because structured and organised information is easier to understand, as well as to notice something important.

Some of the most common groups, along with the corresponding types of changes, and the relation to Semantic Versioning, are shown in the table below. For Semver, where multiple options are available, they are given in the order of preference. The table shows an approximate relation, just to visualise how types of changes and version numbers correspond.

GroupType of a ChangeSemver
AddedNew FeaturesMAJOR, or MINOR
ChangedImprovementsMINOR, or MAJOR
FixedBug FixesPATCH, or MINOR
SecurityVulnerability FixesMINOR
DeprecatedDeprecationsMINOR
RemovedRemoved Features/APIsMINOR, or MAJOR

For most use cases, these groups should be enough.

However, a project may support multiple platforms (i.e. operating systems and processor architectures, more on this in Unit 2). In addition to changes common across all supported platforms, something different can be added/updated/fixed for a particular architecture. In such case, two options are available:

  • prefix a change that is specific to a particular platform with its name
    • e.g. for an Added change for the Mac
Added:
    ...
    Mac: Notarisation
    ...
Fixed:
    ...
    Linux: High memory usage when working with large files
    ...
  • alternatively, group platform-specific changes together, after the list of common changes
    • e.g. for a Fixed change for Linux, a change should be listed in the Fixed sub-list for the Linux list, like
Common:
- Added
    ...
- Updated
    ...
- Fixed
    ...

Mac:
- Added
    - Notarisation
    ...

Linux:
- Fixed
    - High memory usage when working with large files
    ...

There are many different ways of grouping, and at some point it might be tempting to add one more category, or a grouping criterion. Resist the temptation, and ask a question "Is this group essential?". Highly likely it's not, and such a change should belong in one of the existing groups. The importance of this is the subject of the Be Consistent guideline.

A special group of changes, however, makes an exception. A natural part of the project and product development process is upcoming changes that are not going to be included in the next one or a couple of releases. These are usually new features, available for canary testing, and while being developed, they aren't included in a release. If this approach is used in your project, keep these changes at the very top of the changelog, before the latest version. Users and developers would benefit from knowing what's coming, and they would be able to plan accordingly, or become early adopters of new features. As a positive side effect, you can receive feedback sooner.

Be Consistent

Use consistent wording, formatting, and grouping. This is especially important when working in a team of developers. Make sure everyone follows the same visual and writing styles, and the changelog looks nice, and is coherent and consistent. For the reader, it should appear as if it was written by one person.

Specifically, ensure the following:

  • all dates are always in the ISO-8601 format, i.e. YYYY-MM-DD
  • all versions are always in the Semver format, i.e. vX.Y.Z
  • groups are either all Simple Past verbs, or plural nouns from the table given here
  • the list of groups is relatively persistent
  • changes are articulated and communicated clearly
  • sections with versions are clickable, for ease of navigation
  • strive for short and concise descriptions, using, whenever possible, Plain English, or the plain version of the language of your choice.

Provide Relevant Information

Provide only relevant information that is important for the user or the developer. In other words, keep the signal vs. noise ratio high.

For example, here is what should be included in a changelog, in addition the main contents:

  • changes in the public API
  • increase/decrease in performance
  • any implementation-specific changes that may affect the user, especially, in a negative way
  • updates to dependencies if they contain any of the following:
    • any negative impact on performance
    • significant increase in the size of the software, aka bloated libraries
    • significant changes in licensing, including dependencies (yes, you're responsible for this too).

Things that are better be left outside of a changelog:

  • internal implementation details that don't affect the user/developer in any way
  • regular dependency updates
  • refactoring without a noticeable change in functionality/performance.

It's worth noting that sometimes changes are so important that it makes sense to provide a description of the actions required from the user. For non-technical users this is normally communicated through release notes and documentation. For developers, the changelog might be a good place to look at first. However, the home for detailed descriptions, explanations, instruction, migration steps, etc is still documentation. A quick summary and a link to the place in the docs is a good way to start.

Conclusion

What is the most important ingredient in a relationship? Trust. A significant contributor to trust is communication, clear, transparent and honest. As long as involved parties trust each other, the relationships continues to last.

There is a relationship when someone uses software; it's between the user and the developer. The user might be a single person, or a larger group; similarly, there might be a single developer, or an organisation. The important thing is whether there is trust in the relationship.

It is the features and quality of the software what forms the foundation for the trust. If it works, and works well, then users remain loyal. The documentation plays an important role, and is a big part of communication between the developer and user. So far so good.

When is trust lost, assuming there is no betrayal or lie involved? One possible reason is when expectations are not met. The other is, however, when something unexpected happens, regularly, without a notice. When something changes, but neither the changes, nor the reasons for them have been communicated, or communicated poorly.

A changelog is the way for addressing the issue. A well-written changelog communicates what's changed clearly, transparently and honestly. This is why, if you want to build lasting relationships with your users, you need to maintain a changelog.

The ideas and suggestions in this section are based on learnings and experiences gained over many years, and from various sources. A great resource for reference, and where some of the ideas were looked at for inspirations, is Keep a Changelog. Another good place to learn from is real-life examples. The list below includes some that are worth looking at:

The ultimate goal is to let users of your software know about the changes you make. If the suggestions given here don't suit, or too resource demanding, focus on communicating changes to the user in a simplest possible form. While the way it's done is important, it's secondary to the fact that it's done at all. Trust is hard to gain, but easy to loose. Make sure that, while making changes to increase trust, the way the changes are communicated support that increase.

Summary

We began the unit with the excitement of starting something new. However, it's often the case that this excitement fades away quickly, if the project has not been properly cared of.

The guidelines given in this unit are only the very tip of the iceberg of preparing and maintaining a project in a healthy and consistent state. We briefly discussed techniques that apply at the highest, the project level. Project design depends on decisions made at each level. In next units, we go deeper, layer by layer, to learn more about better design choices.

Consistently following the guidelines requires an effort. At times it may seem too much to pay attention to, or too meticulous. However, it pays off in long term, and it always does. One of the best things one can do for their future self is to plan and prepare upfront such that as less as possible changes are required in places which should be set and just work. But that "just work" is, in fact, a consequence of these activities which are often invisible when done, and very noticeable when forgotten, ignored, or neglected.

At the end of the unit it is worth summarising all of the suggestions and recommendations given so far, compiled in one list:

  1. In general and overall, strive for a good project layout.
  2. For any kind of project:
    • provide files required for the project
    • keep the number of root elements at minimum
    • keep any level clean and up to date
    • don't create artificial hierarchy
    • group non-code files by their purpose
    • provide the Makefile
    • store credentials somewhere else
  3. For a library:
    • provide a good README file
    • keep the top-level list of objects of a manageable size
    • provide examples
    • provide benchmarking information
  4. For a single application:
    • use cmd directory for entry points (main.go files)
    • use bin directory for binaries
  5. When working with a monolithic repository, bear in mind the following:
    • avoid naively throwing in everything in to one repository
    • service-based monorepos may work, but they are harder to maintain long-term
    • similarly, entity-focused repositories are especially prone to dependency cycles
    • prefer a structured monolithic repository, and ensure that:
      • each package is carefully placed in the tree
      • utility code is being maintained as own standard library
      • there is a sandbox for code with breaking changes
    • when even more flexibility is required/desired, consider:
      • using nested modules within a single monorepo
      • or even using multiple monolithic repositories
        • some of which are "simple", single-module monorepos
      • and finally, there is always the most flexible option with multiple monorepos composed of nested modules
  6. When versioning your Go project, remember of these:
    • the main branch is usually considered to be the source of truth
    • the default branch should be the same as the source of truth
    • avoid the directory-based approach to versioning unless you absolutely must
    • similarly, there are few reasons for using the simple branch-based approach
    • use the branch-based approach with rolling main branch
  7. When working on code and then preparing a new version, remember to:
    • write good commit messages
    • use annotated tags
    • maintain the changelog, keeping in mind that
      • changelog is for people
      • there must be a record for each version
      • last version comes first in the list
      • each release must include the date it's released
      • changes should be grouped by the type
      • the style and wording should be consistent
      • and the information you add to the changelog is relevant.

An insightful reader might have already noticed that there is something common among the guidelines proposed here. The underlying principles for the suggestions in this and next units are basic and simple. Everyone knows about them, but fewer people do actually follow them in real life. As a reminder and a hint, they're listed below:

  • Do what needs to be done, and don't be lazy.
  • Anything you do, do it at the best level.
  • Especially, when nobody is watching.
  • Learn and then follow rules.
  • Always do your homework.
  • Never stop learning.
    • Regularly update your understanding of the things you already know.
    • Learn something new.

In the next unit, we take a step forward, and introduce guidelines on various aspects of package design in Go.

Package Layout

Alright, as we now know how to organise a project, it's time to think about next layer in the application design. In Go, applications are built of packages which makes a package an important part of the overall architecture of a project. This unit focuses on guidelines for designing packages well.

A package is one of the core concepts in Go. A piece of code cannot exist outside of a package, so a package is created before anything else. Packages are the high-level building blocks for an application. They provide us with different abilities and tools to accomplish a task. Therefore, whether an application is well designed or not depends on how the packages are organised in the first place. As packages are the vital for the ecosystem, it's important to get this part right. A good package layout will be mostly unnoticed, as it feels just right and natural, whereas a poor layout can lead to many potential issues, and is disturbing in many different ways.

The goal of this unit is to provide guidelines on how to work with packages - create, structure, and maintain, including:

  • when and why create a package
  • what to put in it
  • how to choose a proper name
  • how to organise a package
  • supporting code for multiple platforms.

Packages in Go are different from similar concepts in other languages. Since it's a modern language, it approaches code organisation in a unique way that takes into account what has and has not worked well, and introduces some new ideas. As a result, we get a powerful system that makes it easier to use, write and distribute software. Failing to acknowledge the difference from other languages, and to learn thinking about code organisation the Go way, are two of the major contributors to poor application design, issues with dependency management, leaked implementation details, lack of encapsulation and wrong abstractions, and so forth. When a package is designed in a wrong way, expect all sorts of problems.

What we want for out applications is quite an opposite. A coherent architecture, effortless dependency management, meaningful encapsulation, narrow and minimal public API, ease of maintenance, and so on. To get there, we continue with a package. Our next step is getting a good understanding of what it is.

What is a Package

You may already know that a package in Go is defined as a collection of files in the same directory that are compiled together. A file contains variables, constants, types, and functions, and they're visible in other files within the package. Visibility of a symbol outside of the package is controlled by the first character in the name: a capital letter makes a symbol exported, and a lowercase letter marks it as unexported. A package in Go provides a way to accomplish a set of tasks.

While sounding simple, this definition gives us all that we need to lay out the foundation for a good software design. Having thought about what it really means, and how to employ it, you realise that the package paradigm in Go is meant to embrace and support many of the good practices in software engineering. The responsible engineer knowns and applies the SOLID design principles.

So how does exactly the package system help us with software design? This is what packages help us to do:

  • provide namespace and set the context
  • group code by responsibilities, roles and functionality, and keep them clear (SRI), i.e. encapsulate what varies
  • facilitate extension over modification (OCP)
  • build and maintain proper abstractions (LSR)
  • keep interfaces and public API narrow (ISP), i.e. hide implementation details and decoupling
  • maintain healthy dependencies between objects (DIP)
  • distribute and reuse code (DRY)
  • keep it simple, testable and atomic (KISS)
  • and so on.

What a Go package is not? The most important thing to remember is that packages are not means of/for implementing a hierarchy. The hierarchy of libraries in Go is flat. Bear this fact in mind. Each time when you encounter net/http, it does not mean that http is a descendant of net, or a sub-package. This is done simply as a matter of namespacing and setting the context, and no more than that. The net package is completely independent, and can be used on its own, so the net/http package is (although it does use net internally). In spite of being inside net, there is no hierarchical or inheritance relation between the two. The http package is just located within the net package's folder.

Building a firm understanding of the packages' nature and purpose is one of the keys to designing changeable and maintainable software. Without this understanding, anything you do, even if it seems to be right, will eventually fall apart. It does not matter how well a house is built, if it stands on a weak and shaky foundation.

One of the most often mistakes is creating a package, not knowing when to and when not to do so. On one extreme, a project may have too much code, with too different roles mixed in a single package. On the opposite, a project may abuse the use of packages so that there is way too many of them. Neither of these situations helps in building great software.

You need to find balance, and knowing when it's a good time for creating a package is helpful.

When to Create a Package

A package is created to improve the design and architecture of software. This is the single and only reason for the existence of a package. What does it mean? The software design principles help understand this.

A package should be created to accomplish one or more of the following goals:

  • encapsulation of varying parts of a project
  • decoupling and setting boundaries between various parts
  • implementing abstract and extensible data and processes
  • providing reusable and reliable functionalities
  • distributing code.

That is all about it. If a package exists, but its existence does not support at least one of the goals above, something is wrong with the design. Here are some reasons to question the package's existence:

  • if a package does not make sense without another package (in a sense that it can't be used on its own)
  • if a package is located inside another package, and can't be used outside it, and it's not an internal package
  • if use of a package leads to an import/dependency cycle.

When planning a package, the following properties of code should guide the process:

  • roles that objects in it play
  • responsibilities that the code carries
  • functionality it provides
  • behaviours of objects.

Having clarified when to create a package, and before we can next steps, it's good to think about the lifecycle of a package, and how its design and contents influence the maintenance process.

Keep Public API as Narrow as Possible

Keep the public API of a package as narrow as possible. Put another way, keep the number of exported objects and elements at a minimum.

To understand why this is important, we need to remember a few things. First, packages in Go are building blocks, and a unit of distribution. A useful library found on the net is often a single package. A useful piece of software is made of a set of packages. Therefore, the API surface of a project is a superset of APIs of all packages in it. This makes it very important to design each individual package mindfully and consciously.

Second, in the vast majority of cases, it is the behaviour what software provides us with, not the concrete features. In other words, we're more interested in accomplishing a task by making the library do something we need, rather than in what fields are available on the library's objects. A library that exposes only behaviour, or behaviour and a minimum of features, is easier to both, use and maintain. Packages with lots of details exposed to the user tend to increase cognitive load: users are forced to think more than they really need to use the library, and authors have to keep in mind how each change would affect the user.

Third, one characteristic of a great software is that it is impossible or hard to misuse it. In other words, good libraries tend to minimise the risk of error by being designed and implemented in a way that helps to use it correctly. Such libraries also have minimal to no side effects, often secure by default, have and modest resource consumption.

Fourth, the only thing that is constant about software is CHANGE. Change means maintenance, finding and fixing bugs, changing implementation details for good, such as performance or security, adding new features, and more. As people use the software, they become more dependent on it. If a library does something valuable, it may become popular, and this is where things get complicated. The more people use the library, and the longer they do it, the harder it is to introduce changes, and the higher the risk of breaking others' systems. Even a simple update might take a long period of time to introduce, and even longer to be fully adopted by the users.

When writing software, you need to plan and organise it in such a way as to maximise the ability to change as much as you can, as fast as you can. In the context of package planning and design, this means that you need to keep the number of exported objects and elements at a possible minimum, and expose behaviour with a reasonable amount of features. This helps to:

  • hide implementation details
  • make maintenance easier for both, authors and users
  • reduce dependency of the user's software on your software
  • allow introducing changes faster.

At this point we know what a package is, when to create one, and that it's important to design it carefully. A more practical advice on how to achieve most of these goals in code will be discussed further in this and following units. The primary focus of the remainder of this unit is the structure of a package. We start with a very special one, main, and then survey guidelines for regular packages.

The Main Package

Keep the main package as small and focused as possible.

The main package defines the entry point of an application. This means that the code in this package is responsible for:

  • obtaining the configuration required to set up the application from external sources
  • setting up the proper lifecycle (the topic is explained in detail in Module 2 - Foundation):
    • setting up proper error flow
    • making sure all allocated and occupied resources are always cleaned up and released
    • providing guarantees that all deferred calls are always executed
    • ensuring the service stops gracefully
  • instantiating external dependencies for a service (this and other application-specific questions are considered in Module 3):
    • clients to any external services
      • databases and storages
      • other services
    • any other connections without which the service cannot exist
    • any files required for the service to operate (incl. pid, lock, pipes)
    • the logger
  • performing pre-flight checks
  • creating an instance of the service
  • starting up the service.

The main package (and, as you'll know in a moment, the main.go file) is the right place to instantiate, create, check anything that is required for a service or application to start up and run correctly. Think about this the following way: anything that your app can't fix at runtime, must originate from the main package (and, as you'll know later, passed as arguments to the constructors of parts which the app is made of). As a consequence, anything else should live outside of the main package.

Here are some examples of what can originate from the main function:

  • instances of any databases a service is using, unless the service can function fully without them, or the service is designed such that it can self-heal
  • any permissions in the system/platform under which the service is running, unless the service can fix or legally workaround missing entitlements
  • any files and related permissions that are relied upon at later stages, such as unix sockets, log files, temporary locations, configuration files, storages for data, and so forth
  • anything else without which it does not make sense to continue execution of the process, because there is no reason to allocate memory and waste resources if the environment you're running in and circumstance do not allow operating healthily. Unless, of course, you're expecting so and prepared for that.

It's worth mentioning that there are some libraries that implicitly facilitate a poor layout of the main package. There are many applications out there available for looking at to obtain enough evidence to support this. When a project uses such a library, the recommendations from documentation are often taken as is, and the main package (and often the entire project) is deteriorated by the code which uses, is used or required by such libraries. So instead of simply copying code from the examples, think of a better way of organising code, so that the main package remains the entry point with as less contents as possible.

The contents of the main package, ideally, should be in the single file, main.go. The most important part of the main.go file is the main function. The package, file and function must be free from clutter, short and focused.

The suggestions above are not hard to follow when a service is well designed and thought through. The primary goal of these is to make the instantiation process clean and straightforward, as well as explicit. Another important goal is to reduce, as much as possible, the mental load required when working on a project. The main.go file is the introduction to the story you tell in code. So make sure the reader is welcomed to follow it, and not confused.

Package Provides Something

A package provides tools for solving a particular problem.

It's important to understand that a package in Go is not a container for arbitrary code that shares some commonalities. It's not a container in the usual meaning of the word. Instead, a package is a provider of a way for accomplishing a set of tasks.

Think about how a library makes its way in to a project. There is a task to solve in the first place, and it is natural to seek for an existing solution first. Assuming a successful search, at the end of the process you find a library that does what you need, or significantly simplifies the task. So the library gives, or provides, something that you can use directly or as a building block. Notice that what you're searching for is not particular contents; it's rather a way, a solution, a tool to address a particular need. This illustrates that libraries (and not only in Go) for us, users, are providers, not collections.

That insight suggests what the author of a library should be focusing on. As an author, you design a package to provide the users of the package with ways for accomplishing a particular, well-defined set of tasks.

Yet the vast majority of packages seen in private codebases is still just collections of something. It's very common and sad to see packages that contain things, either literally, or by design. That's usually due to lack of understanding of the purpose of a package, combined with lack of design. Refrain from approaching problems in such a way.

The following example may sound corny, however it's worth repeating. A package named tools is a bad idea, the reasons given above tell us why. Such a package is a collection of usually arbitrary utility code that is either hard to decouple from its dependencies, or it has too wide, or sometimes too narrow, use case, so that it can't be put into a more specific and focused package. Thus, reject the idea of having a package like this. Instead, think about the following:

  • how to make code less dependent on details so it can be moved into a focused package that provides something
  • or, keep the code where it's used

If the reason for creating the tools package is to share code between other packages, ask the following questions:

  • why can't I import the package that contains code that I want to use? The answer may suggest that there's a bigger and deeper problem with the design that prevents you from using it, thus it should be solved first. For example, dependencies pointing into a wrong direction might signal violation of The Dependency Inversion Principle (DIP).
  • shouldn't the code, that is going to be shared, belong in the type it operates on, as a method? Then both places, the method, and the caller, would use code naturally, without a separate package.
  • is there another way to address the issue?

On the other side, the model package makes a lot of sense, as long as it provides the models that represent the data structures and components of a business process. Each model encapsulates its behaviour, and the consuming packages (those that operate on models) can describe a business process using the behaviour of a model.

Be sure the package you're about to create will provide something. That something comes in handy for the next step in planning a package.

Naming a Package

The name of a package is one of its most important attributes. It impacts any place where it's mentioned, as well as code in the package. A good package name helps write expressive and understandable software.

Name packages consciously. The process requires thinking and planning. When packages are named well, the code using them is read naturally, and understanding of what it does is much easier. However, even a single poorly-named package in a project can significantly reduce readability, especially if it's an often-used one. The importance of a good name is so high because code is read more often than it's written. That's why you want to get it right.

This section goes into detail on choosing a good name for a package.

Context and Namespace

The name of a package sets the context for its contents, and is the namespace. The package's context defines its meaning, role and responsibility.

When choosing a name for a package, make sure the name sets the right context, and is a good namespace. It's not a mere label, but a rather significant contributor to overall design, coherence, and readability of a project, no matter whether the project uses the package, or it is the package.

Consider net/http. Its context is everything that has to do with the HTTP protocol. It provides everything that we need to either talk to a server on the net, or to serve requests from clients. From the context, it's clear that there are things like requests, responses, headers, and so forth.

But it's also a namespace, because the name of a package, unlike in some languages, is used as part of an expression or statement. For example, the Request type. When you see http.Request it's clear that this type is an HTTP request, not an arbitrary request.

The context of a package is important for both, its authors and users.

As the author, you're already in the package which means there is absolutely no reason to duplicate information that can be derived from the context and namespace set by the name.

It would have been a non-sense to write http.HTTPRequest, http.HTTPResponse, or bytes.BufferBytes. Nonetheless, way too many developers in the Go community still and often do so.

If you think that that's only applicable to the standard library, here is another, a real-life example. Compare the two:

  • repository.UserRepository
  • and repository.User.

The first is absurdly verbose and repetitive; the second is clear, concise and sufficient. The first is legacy coming from languages like Java or PHP, and alludes a poor design, and ignoring Go conventions, or not understanding them. The second is more idiomatic and Go-fashioned. Another example is handler.AuthHandler (too verbose, noisy) and handler.Auth (much better).

For a user, the name of a package is part of code, of every call and expression in which the package is used. Thus, a good name helps in making code easier to write, but most importantly – read and understand. If the author of a package has put a good thought into the name, it improves the way code is perceived. Otherwise, each interaction with code would take more effort than it should; if you work with such code repeatedly and constantly, you'll soon realise how much more mental energy it takes, and how harder it is to maintain it.

A bad name leads to poorly-named objects, such as types, functions, methods, etc, which leads to a poor package API, which in turn leads to poor code that uses the package. Such code is legacy from the first symbol. This is how the code you write impacts others' codebases. And similarly, this is how your codebase is impacted by someone else's code.

Because of this, take the time and think it through before submitting, as an author, or accepting, as a user or reviewer, code with a new package. Think about your future self, and those who will use the package and maintain it, and code based on it. Only when the name and the contents of a package are balanced and play well together, merge or approve the pull request. Merged code is legacy, and you want it to be good.

What a Name is Made Of

Before introducing guidelines on naming a package, it's important to understand what does create a name, technically.

Two things do contribute to the name of a package. The first thing is the name of the directory where it's located. The second is the package statement, the first non-comment line of any .go file in Go. Needless to remind that the directory name can contain hyphens, whereas the value in the package statement cannot.

The value specified by the package statement is the package name. The directory containing a package can be named arbitrary and unrelated to the package name, though it does not mean it's a good idea. Not all of what's possible is worth doing.

While this opens up an infinite number of choices for naming, it's better to stick to the conventions in the community.

Name the Directory After the Package it Contains

Use the same name for the directory containing a package, and for the package itself.

If your package is named rpc, then the directory in which it's located must be named rpc.

So many projects and libraries violate this simple guideline. Don't confuse your future self, your colleagues and users, and use consistent naming between packages and directories.

Choose Short and Concise Noun

The name of a package must be a singular, short, concise and expressive noun or an acronym, made of lower-case letters.

DoDon't
handlerhandlers
storagedataStorages
httphttp_something

This rule, as any other, has an exception. The name can be a plural noun if and only if it provides tools that work with things that can be described as a plural noun. Good examples are bytes and strings. They are named so not because they contain types that implement bytes and strings. It is because they provide tools for manipulating bytes and strings.

An example of the exception might be a hypothetical package files which provides tools for working with files in various ways. But even then, such a package might be questioned since its contents may belong in other packages. And the recent addition of the io/fs package proves it. The fs name stands for filesystem, and encapsulates a variety of tools for dealing with it.

So remember, a good package name is:

  • singular
  • short
  • concise
  • precise.

Avoid Common, Tools or Util

Avoid common and vague names such as common, tools, util or similar.

This is may be one of the most often repeated mantras in Go, yet often a package like this is found in codebases.

Recently, even a previously considered as "okay" suffix util has become obsolete with the release of Go 1.16. The Go Team deprecated the io/ioutil package in favour of the os and io packages. So there is no more excuse even for your dbutil package, should you happen to have one, because why would it exist, if:

  • it can be called db
  • if db already exists, what does prevent you from adding code in it?

Refrain From Using go and go- Prefixes

Refrain from using the go and go- as prefixes or suffixes for a library, project, directory and the package name.

Does it sound controversial? Maybe. But why do we need to prefix anything with the name of the technology it's based on? This simply does not make sense.

The Go package and module system provide everything to guarantee uniqueness of names. It's also impossible to use a library named somelib written say in Java. So why would you need to name a Go implementation of anything as go-somlib or gosomelib? Find a better and unique name instead.

There are several issues with using go as a prefix/suffix:

  • it adds zero useful information
  • if it contains a hyphen, then the library and package it contains have different names, which is confusing. The package inside a go-somelib library is somelib, not gosomelib or go-somelib.

If there is a strong need, desire or other reason to have go in the name, consider using it in the account/organisation name. The three most popular services for code collaboration and sharing (Github, Gitlab and Bitbucket), as well as self-hosted solutions (like Gitea and Gogs), use the account as a namespace for projects. Just compare:

DoDon't
github.com/golib/oriongithub.com/someone/goorion
gitlab.com/go-lib/cruxgitlab.com/someone/go-crux
bitbucket.org/someone/lyrabitbucket.org/gosomeone/golyra

It's easy to see that the examples in the first column are shorter, cleaner, and nicer. This is what we want from names.

One exception needs to be made. It's easy to imagine a situation when an organisation has versions of the same library implemented in a number of distinct technologies. And Go might not be the first of them, and the most straightforward name might have already been taken. In such case, there are just a few options available:

  • get creative, and use a different name. Many companies do so, and experience has shown this to be one of the best options. This guarantees uniqueness.
  • or, add a suffix to the repository name, but not to the module name. For example, Twitch has twirp-ruby in addition to Go's twirp.
GoodAcceptable
github.com/myorg/thaugithub.com/myorg/thau
github.com/myorg/thetagithub.com/myorg/thau-go
github.com/myorg/omegagithub.com/myorg/omega-cpp

With this exception, it's still possible to:

  • have a not so different name of a package/module/library
  • sort repositories by the name
  • unify naming across an organisation.

Having figured out how to name a package, let's see how to better organise it.

Structure

The next most important aspect of the package design is choosing and maintaining a healthy structure.

Similarly to the project layout, the package layout either improves the overall quality of the package, application and project, or unnecessary complicates, causes variety of issues, and just makes it harder to develop and support code.

It's important to structure packages well, as fixing it up at a later stage requires significantly more effort. Writing software is the easiest step in its lifecycle – the better you do from the beginning, the lower the cost of its maintenance.

This section discusses the following guidelines:

  • use flat structure by default
  • there are no sub-packages in Go.

Flat by Default

Most of the time, a package should have a flat structure, i.e. no structure at all.

That's it. As said earlier, do not introduce any structure for the sake of an artificial hierarchy. Several flat packages are better than some hierarchy when there is no actual need for that.

For example, when working on the model package, it does not make sense to have packages within it for each particular model.

DoDon't
modelmodel/user, model/track, model/playlist

Here is what objects would look like for the first case:

  • model.User
  • model.Track
  • model.Playlist
  • etc.

Now compare it to an alternative, where everything is in a separate package. We would then get something more or less awful, for example:

  • model/user.User
  • model/track.Track
  • model/playlist.Playlist
  • an so forth.

However, there is a good reason for placing packages in a directory that has already got a package in it, – the context provided by a good name.

Not a Sub-Package, But a Package in the Same Directory

There is no such a thing as a sub-package in Go. Instead, when it makes sense and is appropriate, a package can be created in a directory that already has a package in it.

If there is no hierarchical relation between packages, then what would be a reason for such a placement? Like said earlier, a good name sets the proper namespace and context. Therefore, it would be great if it was possible to benefit from using these for other code which fits nicely nearby the package inside the directory.

One of the best examples is the net package. Have a look at which packages reside in the net directory alongside with the net package, in the standard library:

  • http
  • mail
  • rpc
  • smtp
  • textproto
  • url.

While many developers are accustomed to think of them as net/http, net/url and so forth, the reality is that they're http, url and so on, respectively. What creates a feeling that the http package is a sub-package of net, is its import path. But that's only an import path. The package is http. And because the context and namespace set by net are appropriate and suitable for packages that deal with various aspects of networking, it makes a perfect sense to place them in that directory.

A real-life example of such placing would be a handler/middleware package. While middleware is a package on its own right, placing it under the handler directory is reasonable, because a middleware has to do with handling requests, and with handlers.

Use this sparingly and cautiously because of the risk of introducing a dependency cycle. The best way to avoid it is to have no shared code between packages at different levels. Design the relations such that dependencies always point into one, the same direction. For example, the package that is located at the parent directory can be imported by packages located deeper, but not the other way around. In some situations, it's reasonable to put all packages at the same level, by shifting what would be the top-level package into a child directory, so each package is in a sibling directory. Note that this is rather rare.

Files in a Package

The Go package system is simple and powerful. It helps you prepare the foundation for the project; it is very convenient, and allows you to have short and concise names. There is no need to duplicate the information in the name of a package that is already being communicated by the path to the package and its location.

Up until now we've discussed mostly the ephemeral structure of a package - directories and naming. Code lives in files which a package is made of. How the files are named, how many of them, and how is the code split into files, are important and definitive aspects of the package design.

Since Go offers its own approach to packages, it's important to survey guidelines on how to put files in a package together. This section provides some advice on how to organise a package from the file perspective. And then the next unit further expands the topic.

Newcomers from other languages often don't know, nor do they learn the Go conventions, and use approaches that are unnatural and foreign to Go. Even those who envision themselves as seasoned Go developers, continue doing this unconsciously. It can be often found that a Go project looks much alike a PHP, Java, or C# one because the developer did not learn much about Go traditions and conventions. This leads to perpetuation of the problems that Go was created to solve.

Very few people learn the Go way of doing things, and even fewer do actually apply it in practice. You don't even need to open up a single file in a project to notice influence from other languages, and lack of good design. Countless files and poor naming are just a few examples of what could be improved.

A good design benefits from respecting and following Go-specific conventions. Some of the guidelines are:

  • begin with a single file named after the package
  • keep type's declaration and all its methods in the same file
  • avoid creating file per object (type, function, etc)
  • prefer fewer files larger in size instead of numerous small files
  • when appropriate, move helpers in to a separate file
  • when appropriate, use a separate file for documentation.

Let us look at each of the suggestions closer.

Begin With a Single File Named as the Package

Begin with a single file that is named after the package.

In fact, this is two guidelines combined together:

  • start with only one file
  • name that file as the package is named.

A default tendency observed in many engineers is creating a taxonomy of files right at the beginning. Those files are named arbitrary, mostly poorly, due to behavioural patterns developed while working with other languages. This is unnecessary, and leads to a mess, not to mention it goes against the good traditions Go has established.

Instead, create just one file. Give it the same name as the package name. And then start adding some good code that you have thought about before writing.

As the package evolves, and new code is being added, it's reasonable to think about when it's the right time to add another file to the package. This leads to a deeper question – when to split code into multiple files?

Organise Files by Topic, Responsibility and Behaviour

Organise files by topic, responsibility and behaviour that are implemented in them. This is the criteria for splitting up a file into two.

A package is started with a single file, and everything resides in that file. At some point, a new piece of functionality is about to be added, but that piece, while relating to the main theme of the package, is slightly different from the rest of the contents of the file. What to do?

The answer depends on what is going to be added. If it's a type which one of the existing types will operate on, then it's better to keep them in the same file. It's also good to keep as many of similar objects in the same file as it makes sense. If you have multiple helper functions that are used as http.HandlerFunc, then it makes complete sense to keep them in the same file. Don't create a file per function only because it's a new function.

Sometimes though it's better to split functionality into two files. For instance, you're working on a library, and there are two core parts to it - a server, and a client. It's beneficial to keep them separately, as their responsibilities and roles are quite different. But everything that has to do with either the server or client, should belong in the same file as the other type.

Name Other Files Meaningfully

When adding files later, name them after the most important type declared in the file, or after the primary responsibility of the contents of the file.

In Go, we name files after the most important thing in it. If it's hard to identify what's more important, then name it after the functionality it provides, or responsibility it carries.

In other languages, a file is usually named after the class in it. While it's alright in Go, and is used often, this is not the only contributor to the name of a file.

Make sure to use a single noun. Refrain from using plural nouns or other types of words.

Avoid Creating File per Object

Avoid creating a file per object (type, function, etc).

Please don't. Go is not Java, PHP, C# or any other language where creating a file per class is either a must, or a poor tradition. Go is also not C/C++ where you used to have a header and implementation file (we will mention this further).

Instead, follow the guideline about the criteria for organising files above. Roles, responsibilities, behaviours – these are the right factors indicating whether the contents needs to be split into files or not.

A file per type leads to the situations that have been considered in the previous sections and in the unit about project layout. Too many files complicate navigation, and increase cognitive load while working with a package, and so forth.

Prefer Fewer Files of Larger Size

Prefer a fewer number of files of larger size instead of countless small files.

This is a natural consequence of the preceding advice. Fewer files is easier to navigate through. It does not produce clutter neither in the browser or terminal window, nor in the IDE. It's easier to organise and keep tidy.

Keep Methods of a Type in The Same File

Keep the type's declaration and all its methods in the same file. Period.

One of the worst things one could do is split up the declaration of a type into multiple files. This is bad because of the increased effort required to maintain the mental map of a package, significantly reduced readability, and complicated understanding what the type does, its dependencies and responsibilities.

The two biggest problems with methods of a type scattered across multiple files are:

  • understanding what the type is responsible for. You simply never know it until all files in the package are checked.
  • making changes to the type, especially changing an existing field of a method. Same problem as above, exacerbated by the fact that you have to check all the files in advance, as the change should be planned and thought about before you implement it. And, if implemented without the full picture, you may discover that the change does actually not make sense because of the way it's used on other place.

So always keep everything that is part of the type declaration in the same file. Don't split the type declaration into several files. When a type is fully declared in a single place, it's easy to read and navigate through the code. The responsibilities and dependencies are easy to notice. It is also just easier to work with, literally.

As soon as you find such a type, the best thing that can be done is immediately gather the type declaration in the single file. It will save a lot of resources for present and not-so-far-future you, and everyone else working on the project.

There is one exception from this guideline, and we'll get back to it later. For now, it's sufficient to say that the only single reason for having some parts of a declaration in a separate file is implementing platform-dependent code. Otherwise, unless the exception is in effect, please don't.

Avoid Ambiguous File Names

Avoid ambiguous names for files, such as common, types, etc.

The motivation is similar to the name of a package. Such names carry zero useful information about the contents. Our goal is to reduce energy required to understand what's happening in a package. Names like common complicate the task because it does not give any hint on what's in the file named as such.

The name types does not make sense as per the guidelines given above which recommend naming a file after the most important thing in it, and grouping contents by roles. It's impossible to have a file named types in a well-designed package. If you have one, you know what to do.

Use a Separate File for Documentation When Appropriate

Use a separate file for documentation in a package which provides detailed explanations.

As you know, the top-level comment immediately preceding the package clause is the package comment. The first sentence is the package comment, and it can be optionally followed by a larger comment that is considered as a package-wide documentation.

Normally, the package comment must be written at the first file of a package. When you provide your users with detailed documentation, it would be good to keep it separate from the rest of code, but still within the package, so the Go documentation routines (godoc) are able to parse it.

There is a solution to this. Simply put the documentation along with the package comment in a file named doc.go. Normally and conventionally, the only content of this file is the package and documentation comments. Avoid adding anything else here.

/*
Package mypackage provides a convenient way to do X. <- the package comment

It achieves high performance by doing X using Y and Z.

Here is an example of using it.

Here is another one.

And here are some explanations about why is it valuable or any other useful information.
*/
package mypackage

Remember that this only makes sense for packages that have detailed documentation written in a specific way, i.e. the way godoc is able to understand. Otherwise, prefer more traditional places for documentation, such as README files, documentation sites, etc.

Before we get to discussing how to organise a file internally in the lights of package design, there is one topic that is worth mentioning. The package layout is especially important when working on/with cross-platform software.

Cross-Platform Code

The guidelines for designing packages would have been incomplete without considering writing software in which implementation details depend on the platform the software was created for. This brings us to the topic of cross-platform development in Go, from the package design perspective.

Writing, maintaining and deploying cross-platform code has never been as simple as it is with Go. It's not to say it's easy, but it has become much easier and simpler than it was before Go had come. The language provides us with built-in support for cross-compiling for multiple platforms so we don't even need to run the target system to build a binary for it. Tooling is excellent in this area, compared to other languages.

However, working on a cross-platform project is different from a single-platform one. All design and architectural decisions now are multiplied by the number of platforms you support. The need to properly test code and prove its correctness sets higher requirements for the design of services in the project.

Depending on the practices used to develop such a project, cross-platform experience might be as usual and smooth as single-platform, or it might turn into a nightmare. It is important to understand some basic principles of writing cross-platform code. The principles tell you what tools when to use. Careful planning, cautious and mindful approach to implementation are also necessary to avoid many pitfalls.

This section aims to aid you with advice on working with cross-platform code from the package organisation perspective. We will consider the following aspects of developing packages for multiple platforms:

  • Basic Principles of Writing Cross-Platform Code
  • Cross-Platform Options in Go
  • Package and File Organisation.

Basic Principles of Writing Cross-Platform Code

Note: The content below is relevant to some other languages, though here we're mainly talking about Go. A reasonable amount of simplifications is in place. Pay attention, and think about what you read.

In one sentence, writing cross-platform code can be almost indifferent from writing for a single platform, if you've done the homework, i.e. have learned and followed good design and architecture principles and rules. The principles we're referring to are SOLID, some of the design patterns and common sense. Otherwise, if the good practices are neglected, chances for success are dramatically lower, or even zero.

As a general rule of thumb, all the business logic must be platform-agnostic. It is the implementation details what varies from platform to platform, but core algorithms remain common for all of them. This is important because even two platforms significantly increase the complexity just because all potential outcomes are now squared. Any design decision should be aligned with the main goal – reducing entropy in the software.

The starting point, and an obvious way to reduce entropy, is to keep the amount of cross-platform code at a possible minimum. The less code is different between platforms, the easier it is to guarantee its correctness, maintain and test it.

With the two principles applied, the likelihood of a project ending up in a better shape is higher. Though it's still possible that the project might not be able to achieve its goals due to specifics of one of the supported platforms. If it is the case, there are essentially two potential scenarios. One is trying to maintain all features on par for each platform. While this is the ideal and desired situation, it may often lead to waste of resources since not everything is equally possible across available platforms.

The other possible scenario is a reasonable and pragmatic trade off. What helps in staying efficient is to know when to mock missing cross-platform features or skip them at all. Not every system feature has something similar in another system.

For example, there is no direct alternative to Windows's Registry in Linux or FreeBSD. Of course, in Unix, Linux-based systems and its derivatives we have a kernel and the sysctl API or similar, but that's not exactly the same thing is the Registry (One may argue that systemd has been working hard to become a Linux counterpart of the Windows Registry). Trying to mimic its behaviour would be tedious and unjustifiably complex. In such case, two potential alternatives are available – one is to abstract the business logic such that it does not depend on a particular system feature, and the other is to skip one system that does not support some functionality that is important in another system.

To get a sense of how implementation details can vary across many platforms, have a look at the list of differences in the net package in the Go standard library.

Cross-Platform Options in Go

One of the key strengths of the Go programming language is its powerful tooling focused on developer's productivity. Fast compilation and ease of dependency management are vital for the ecosystem. But tools for writing and building multi-platform code are what make Go truly special.

Writing and maintaining cross-platform software (without distribution) is a two-stage process:

  • the code needs to be written in a way that allows changing its behaviour based on the platform it's running on
  • the code then needs to be compiled for each supported platform.

We talk about the second stage first, and then dig deeper into the first.

Terminology

Before we get too far in the discussion, it's good to agree on the terminology used in next sections.

In this work, as well as in the community, we use the following terms with the corresponding meanings:

  • Go operating system, or simply operating system, as per the environment variable GOOS – one of the target operating systems Go supports. Expressed as one of the predefined values.
  • Go architecture, or simply architecture, as per the environment variable GOARCH – one of the the target system architectures Go supports. Similarly to the operating system, it's one of the predefined values.
  • Go platform, or just platform, a combination of the two, an operating system and architecture, in this order. This term is not used in code or as a setting, but is used in discussions and documentation. When you see a use of the term "platform" in a conversation that is related to cross-platform software in Go, it means a combination of an operating system and architecture. For instance, these sentences are written on the darwin/amd64 platform.

It's worth reminding that each of GOOS and GOARCH and, therefore, the platform, can, and usually is, defined implicitly by inferring the value from the environment, if not set explicitly. When you're running go build on your Intel-based Mac without setting the environment variables, GOOS=darwin and GOARCH=amd64 are supplied to the compiler. In this case, the target platform is darwin/amd64. On a Raspberry Pi under Linux, it would be linux/arm and additional GOARM. Similar effect can be achieved on any other platform with variables explicitly set as follows GOOS=linux GOARCH=arm GOARM=6.

While we're on this topic, it's no harm in repeating that when only one of the variables specified to a value that is different from the current system value, the other is inferred from the current system. For the rest of the book, if not mentioned explicitly, the implied value of GOARCH is amd64.

Okay, now we can move forward. The next section is a quick refresher of the compilation process, and then we dive into creating packages that support several platforms.

Compiling For Multiple Platforms

Even if a project has no code that depends on a platform, the binary format has to match the requirements and expectations of a target system. In Go, there are two options for compiling code for more than one platform:

  • simply building on each of the target platforms
  • cross-compile for each of the supported platforms.

With the first option, the code for each of the supported platforms is built on that platform. In its simplest form, the build process looks no different from what we do every day – go build is go build, after all. What differs is the artefact which is specific to the platform it has been built on.

The second option offers a slightly different approach. The result is controlled by a pair of well-known environment variables that tell the compiler what it should produce. This process is also known as cross-compilation:

GOOS=freebsd GOARCH=amd64 go build -o myapp-freebsd-amd64
GOOS=linux GOARCH=arm go build -o myapp-linux-arm
GOOS=windows GOARCH=386 go build -o myapp-windows-386.exe
GOOS=darwin GOARCH=amd64 go build -o myapp-darwin-amd64

Which of the two options when to use? It depends on your environment, priorities, available resources and limitations. The native compilation option is the easiest from the tooling perspective. You simply run the same set of tools to test and build code, and get the results. A major downside is the need to run and maintain instances of all platforms, which might be expensive as it requires some (potentially, significant) additional resources (i.e. costs you time, attention and money).

On the other hand, the second option is less demanding to the build infrastructure. All code is built on the developers's machine or in a single instance of CI/CD. Platform-agnostic code is tested on the most available platform, but running tests for platform-specific code still requires native instances.

So choose wisely. The guideline is to use cross-compilation for producing release artefacts, and to use specialised, automated test environment to run tests against platform-dependent code.

It's worth noting that, in its simplest form, a cross-platform program has no differences in the implementation between supported platforms (not counting potential differences in the standard library). In such case, all what is needed is compiling a binary for each platform. That's exactly what we've just covered.

Compiling code for multiple systems is a relatively easy step. But to build something, we have to write it first. The question is how to do it, and do it in a right way.

Developing for Multiple Platforms

When working on functionality which involves special features of an operating system, a specialised implementation is required for the varying part. Then, different implementations need to be incorporated with the platform-independent business logic of the project.

Go offers build constraints to enable conditional compilation. This allows to write specialised code for a particular platform, and guarantees that the code is included and available only when compiled for that target system. For all other platforms, which are not covered by constraints, such code simply does not exist.

Build constraints can be expressed in two ways:

  • using the file name
  • using build tags.

When compiling, the two environment variables GOOS and GOARCH and/or the build flag -tags are used to control the process. Specifically, these settings tell the compiler when and what to include in the resulting binary.

As you will learn or remember soon, while the two ways under the hood work in the same way (due to the go/build package), in everyday development each of them has own pros and cons. They can be used independently in simple scenarios. More often though, they're used together as it allows more flexible behaviour, but sometimes it's just the only way to express the constraints. Let's briefly recap what each of them is, and then learn how to employ them to make the design of a package better.

NOTICE: The contents below covers topics such as using build tags and file names for conditional compilation. The text has been written when Go 1.16 has been released. Everything below applies for Go versions up to 1.16. Starting from version 1.17, things will change. There is a proposal which has been accepted. This is going to be quite a large language change, on par with introduction of Modules. I will keep this part of the unit updated as the new behaviour is available, and it's clear what happens to the existing approach. For now, we focus on the existing way of managing compilation.

The File Name

We start with the method based on the file name suffix. If a source file has a suffix that is a valid operating system, architecture or a combination of the two, then the file is compiled only for that platform.

Here are some examples, for a package mypkg:

  • mypkg_linux.go is included in the binaries for Linux and any architecture (of course, for those that are supported by Go)
  • mypkg_amd64.go is included in binaries for any operating system running on the amd64 architecture (also known as x86_64, Intel 64, but these are not valid names in Go)
  • mypkg_windows_386.go is included in the binary only when targeted at x86-based 32-bit Windows
  • mypkg_posix.go is included on any platform (remember, platform = os + arch, hence on any os and any arch) as posix is not a valid operating system nor architecture identifier for the Go toolchain.

When to include a file is controlled by the environment at the build time:

  • when running go build the values are derived from the system the build is running on, i.e. your machine or the machine the command is executed on
  • or when GOOS and/or GOARCH are explicitly set.

This also works with testing. To write a platform-specific test you simply add the familiar test suffix to the end of the corresponding file name or to any file that will include such code:

  • mypkg_linux_test.go is included only when running go test on Linux, or GOOS=linux go test on any system
  • mypkg_amd64_test.go is included only when running go test on an x86-based 64-bit system, or GOARCH=amd64 go test on any other architecture
  • mypkg_windows_386_test.go is included only when running go test on an x86-based 32-bit Windows, or GOOS=windows GOARCH=386 go test on any system
  • and so on.

For all available GOOS and GOARCH valid values and combinations please visit this page.

As you see, the approach offers an easy way for separating code for a particular system. It's enough in simple cases or when code is different between all the systems your project works on. But what if a more fine grained control is needed? Or what if code is the same for Windows and Linux, but is different for macOS?

Another use case is including/excluding code at the compilation time based not only on a platform, or even not on the platform at all, but on a custom criterion. For example, you may want to include some features in the free version of your product, and other features only in the binary shipped to the paying users. You may even have multiple paid tiers with three sets of features, and the corresponding binaries should include all features from the preceding tiers plus something else. Or you may even want to offer different implementations of the same feature based on the tier?

The answer to these and many other questions about conditional compilation is build tags.

Build Tags

Build constraints in Go are expressed in a form of build tags. As you may already know, a build tag is a line comment that begins with // +build and lists the conditions under which a file should be included when compiling and/or testing the project.

Basics

The following rules are in effect when using build constraints:

  • multiple tags may be listed on the same line
  • multiple lines with build tags may appear, one next to another
  • tags may appear in any kind of source file (not limited to Go)
  • constraints must appear close to the top of the file
  • they may be preceded only by blank lines and other comments
  • a series of build tag lines must be followed by an empty line (to distinguish from the documentation)
  • all this means that in Go files, build constraints must appear only before the package clause.

There are also certain rules for writing and grouping build tags:

  • when listed on the same line, space-separated tags are interpreted as OR
  • if an option contains a comma, then it's evaluated as AND of its separated terms
  • allowed symbols for a term are letters, digits, underscores and dots
  • a term can be negated when preceded with an exclamation mark !
  • when constraints are expressed as multiple lines, the overall result is the AND of individual lines
  • a special tag of // +build ignore can be used to exclude the file from consideration.

Here are some examples:

  • a simple build constraint
// +build linux,arm windows,386

It results to the following: (linux AND arm) OR (windows AND 386).

  • multiple build tags
// +build linux darwing
// +build amd64

It results to the following: (linux OR darwin) AND (amd64)

  • with negation
// +build linux darwin,!cgo

Which results to (linux) OR (darwin AND NOT cgo).

More Control

The use of build tags is not limited to only platforms. Other conditions can be expressed in the very same way, for even more advanced control over the compilation. In the example below, we want to offer three tiers in the service – free, silver and gold. Each subsequent tier should include the features from the preceding tier. This is how it can be achieved on the compilation level which may help protect your app against cracking and/or bypassing licensing terms:

  • define a base line, with no tags, in base.go
  • define a the first paid tier, or silver level, in silver.go
// +build silver
  • define a the second paid tier, gold, in gold.go
// +build silver
// +build gold

Notice that we have to list the both tags, silver and gold for the second tier, as we want it to include the features from the preceding tier. That's because of the way how the boolean logic is expressed using the constraints.

Passing Tags

Now, how to tell compiler about constraints?

Build constraints are inferred from three places, two of which you already know:

  • some environment variables, such as GOOS, GOARCH, CGO_ENABLED, etc
  • the suffix of the name of a Go source file
  • when explicitly passed as a special build flag of comma separated values when running build, test and other commands, as -tag tag1,tag2,tag3.

Note that the file name's suffix method is mentioned here. That's because, as said earlier, under the hood it works as a build constraint. When a file has a suffix that matches a valid supported operating system, architecture, or a combination of the two, the file is considered to have an implicit build constraint set to the values in the suffix.

Using Only Build Tags

Having reminded ourselves what a build tag is, let's continue exploring options for cross-platform development.

As mentioned earlier, conditions for platform-dependent compilation can be expressed using file name suffixes. In a basic scenario, when the implementation details are different for, say, windows and linux, it's easy to express with mypkg_windows.go and mypkg_linux.go, in addition to mypkg.go.

However, what to do when two different platforms have the exact same implementations? One way is to simply have two separate files with same contents. For example, app_darwin.go and app_freebsd.go. But this kind of duplication is not something we normally want, unless we have to.

A much better approach is to use build tags for expressing conditions for the compilation process. It helps in making things cleaner and removes the need of duplicating the code. Add a build tag to the file, and give it a meaningful name that does not conflict with the supported operating systems, say app_unix.go:

// +build darwin freebsd

Now whenever the code is built on each of the mentioned platforms, or with the help of the environment variable GOOS, code will be included in binaries for either of the operating systems.

Using Suffixes and Tags

Finally, let's consider another possible situation where a feature's implementation is different for some platforms, but is the same for others. The suffix-based method can be used in combination with build tags.

To illustrate this situation, let's assume the low-level implementation is different for Windows on the one hand, and Linux with macOS on the other, though the latter is the same for Linux and macOS. Then a reasonable approach would be to put the Windows-specific code into myapp_windows.go file, and let the suffix control inclusion of this code. For Linux and macOS, put implementation into myapp_posix.go and add the following build constraint at the top of the file:

// +build linux darwin

And that's it. The environment variables and/or build flags passed to go build or go test are now in control of what's included during the process.

Notes

As you see, Go offers convenient tools for writing, building and shipping cross-platform software. Those who wrote programs for more than one platform in the past agree that Go has made huge progress in making the process easier. Those who haven't got cross-platform experience yet would probably not be surprised, if the first experience happened to be with Go. But then they would definitely notice the difference if faced cross-platform development with other languages.

These tools are great, but that's not enough on its own to produce great software and developer experience in short and long term. To achieve a good level of efficiency we need something else, and that's what the next section is about – actual advice on cross-platform packages.

Package and File Organisation

By now, we have learned a few foundational principles and tools that guide and help us when working on a piece of cross-platform software. Next step is to look at how to combine and apply them to make software maintainable and the process of maintenance as smooth as possible. The ultimate goal is to make code simple and easy to work with; we also strive to simplify support of such software.

The section introduces practical advice on how to design packages and files better in a cross-platform project. This includes:

Let's talk about each of them in more detail. And we begin with repeating one of the basic principles.

Keep Platform-Dependent Code at Minimum

Keep the amount of platform-specific code at a possible minimum.

Sounding very simple, it's one of the hardest to follow, and the most often ignored piece of advice. By its nature, it goes hand-in-hand with the common sense in package design – keep the public API of a package as narrow as possible.

The best way to keep things simple is to have no cross-platform code. But often that's not possible, and some code has to be made platform-dependent. In such case, you have to keep the amount of such code as small as possible.

Having learned the tools for conditional compilation, you know that it's possible to conditionally include code in the final binary depending on the constraints specified at the build time. This leads to a question about what code to pull into files that are conditionally included? Or, in a more general form, what parts of the code are better to be made cross-platform?

Looking at the problem from a general software design perspective, our building blocks in code are:

  • types
  • methods
  • and helper functions.

Therefore, when working in a cross-platform context, these are what can have varying implementations between platforms supported by a project. However, the guideline says that it's good to keep the code at a minimum. This helps in deriving guiding principles for writing cross-platform code.

In general, when working with varying implementations for the same functionality for several platforms, bear in mind the following suggestions:

  • in the worst case, types are implemented differently for each platform, but obey the same interface/contract
  • a slightly better option is to allow methods vary between platforms
  • the best option is to pull platform specifics into helper functions.

The insightful reader might have already noticed that these guidelines do support and reflect two of the basic principles introduced at the beginning of this unit, namely:

  • the business logic must be platform-agnostic
  • and keep the amount of cross-platform code at a possible minimum.

The first of the two is supported by the fact that by default you aim to pull platform-specifics in to helpers. This guarantees that methods and types do not depend on a platform, thus less business logic is leaking to the platform.

The second is supported by the fact that aiming for having no details leaked to type and method levels, a lower amount of platform-dependent code is automatically guaranteed.

One warning should be made though. The guideline does not prevent you from introducing platform-specific helper types when it helps to improve the logic of software and its design. What should be avoided, however, is platform-specific public API of a package. If a public method or type depends on a platform, then decouple the public part and use cross-platform private implementations that are not exposed to the user. This drastically improves both, the user and developer experience. Being free to change the implementation without the risk of breaking user's applications is vital for productive and efficient maintenance of any software, but even more so in the cross-platform world.

Use Simple Branching in Trivial Cases

Use simple branching in trivial cases.

A simplest form of cross-platform code is setting a variable within an if or switch statement depending on values of runtime.GOOS, runtime.GOARCH or runtime.Version().

Such code does not require any build constraints (reminder, build constraints can be expressed as a suffix in the name of a file, or as a build tag). The result depends on the values of the corresponding constants or the returned value of the Version() function, which in turn depend on the system where the program is running and/or version of Go used during the build process.

However, it's tempting to abuse this method. Beware that it is only helpful in very basic cases like choosing the appropriate extension of a file name or something similar. Never implement different behaviour based on branching depending on the aforementioned values. This leads to fragile architecture and complicates testing, debugging and maintenance overall.

This method helps to choose what path the execution will follow at runtime, but it does not imply conditional compilation. If implementations vary between platforms, and are not available on them at the same time, then builds will be broken.

Essentially, one of the most reasonable uses of this option is to determine and/or report what actual platform the code is running on.

Don't confuse runtime conditions with compile-time conditions.

No Platform-Specific Code in Main

Keep the main function, as well as the main package, free from platform-dependant code.

This guideline is a natural and logical consequence of the fact that the main package must contain the smallest possible amount of code. When initialisation steps are encapsulated in a separate single or multiple packages, it's possible to have platform-specific initialisation wherever it is required. Other advantages of this are improved testability and separation of concerns.

Technically, the rules that manage conditional compilation apply to the main package and the files in it, as to any other package. However, if your main contains platform-specific code, it suggests there is likely a problem with design and architecture – implementation details have leaked at the highest level of the application hierarchy. The main package must be as free as possible from any details.

Use File Suffix by Default

Use the file suffix to manage cross-platform code by default, i.e. when the conditions for compilation are simple.

What this suggestion means is that in simple cases where the platforms you need to support are discrete and don't share commonalities, the simplest possible way is to use suffixes with files containing code for a particular platform, be it different operating systems, architectures, or a combination of the two criteria.

For instance, suppose you have a CLI application that must support Linux and Windows, and the architecture does not matter (as in amd64 or 386), and something requires a different implementation for each of the platforms. In this case, we have a simplest condition possible – two different operating systems. The best way to express it is to use suffixes:

└── something
    ├── something.go
    ├── something_linux.go
    └── something_windows.go

Nothing else is required. Not even a single tag. Then, as discussed earlier, there are two ways to build it:

  • on the same machine, using cross compilation:
# For Linux.
GOOS=linux go build -o bin/myapp-linux ./cmd/myapp

# For Windows.
GOOS=windows go build -o bin/myapp-windows.exe ./cmd/myapp
  • on machines with the corresponding platforms the environment variable is not needed:
# On Linux.
go build -o bin/myapp-linux ./cmd/myapp

# On Windows.
go build -o bin/myapp-windows.exe ./cmd/myapp

Use Build Tags in Mixed Cases

Use build tags and suffixes in mixed cases, i.e. when some platforms have their unique implementations, but some share code.

Let's extend the previous example by the following requirement: the app should also support macOS, darwin in Go's vocabulary, and the implementation of the platform-specific part is exactly the same as it is for Linux. This is a pretty common situation for macOS. Some parts work close or similar to the linux platform, some are similar to those in freebsd, while some are unique to darwin itself.

In the case of our example, we have two options:

  • implement the new requirement separately for darwin, hence in its own file, something_darwin.go
  • or re-use the existing implementation which, for the sake of exercise, is assumed to be the same.

The former option has a rather weak advantage of not bothering with conditional compilation, and just relies on the automatic choice the compiler makes. A big disadvantage though is unjustified duplication, which clearly can and should be avoided. And this is exactly what the second option is about.

In order to re-use the existing implementation for darwin, and continue supporting linux, we need to make a few changes:

  • first off, stop relying on the file suffix, as there is no file suffix that would include both linux and darwin, and exclude all other platforms at the same time. This means, we shall rename the file
  • secondly, we need to specify a proper build tag to tell the compiler when it should include the file.

How to choose a new name for the something_linux.go file? We can't simply append _darwin at the end, as this file becomes suffixed with darwin, and will be only included for this platform. The name should not be a valid word from the list of operating systems, platforms and any other words supported by Go. Here are a few examples:

  • something_darlin.go
  • something_lindar.go
  • something_linuxdarwin.go.

However, this particular case is very often. What's common between Linux, macOS, and many other Unix and Unix-like operating systems? Right, they are, though to varying degrees, compliant with POSIX. So a pretty common thing to see in the standard library and other cross-platform projects is the use of the word posix as a suffix for cases when Linux and other operating systems from the Unix family share same code. Going further, you can also often run into cases when you have something common among all or many Unices, but different for Linux. In such case, for example, code that is the same for FreeBSD, OpenBSD, NetBSD and macOS can be put into a file with a name suffixed by _unix.go. Keep this in mind, and don't reinvent the wheel.

Alright, the name problem is solved. The existing file needs to be renamed from something_linux.go to something_posix.go. The name does not tell anymore the compiler when to include and exclude the file, but a tag will do. The code in the file should be built for Linux or macOS. That or is really important, as it's a condition. It does not make sense to express it as and, since this is an impossible combination (an operating system cannot be Linux and macOS at the same time). Here is what the resulting tag should look like:

// +build linux darwin

Also, keep in mind the rules about the placement of a build tag.

This is what our example was like before the changes:

  • directory listing:
.
├── bin
├── cmd
│   └── myapp
│       └── main.go
└── something
    ├── something.go
    ├── something_linux.go
    └── something_windows.go
  • and the content of the something_linux.go file:
package something

const (
	Feature2 = "this is a version supported by a number of unix-like systems"
)

This is what it looks like after the changes:

  • directory listing:
.
├── bin
├── cmd
│   └── myapp
│       └── main.go
└── something
    ├── something.go
    ├── something_posix.go
    └── something_windows.go
  • and the content of the something_posix.go file:
// +build linux darwin

package something

const (
	Feature2 = "this is a version supported by a number of unix-like systems"
)
  • building:
# On/For Mac.
$ GOOS=darwin go build -o bin/myapp-darwin ./cmd/myapp

# Running.
$ ./bin/myapp-darwin
common feature for all platforms
this is a version supported by a number of unix-like systems

An Advanced Example

Here is another question: What if we need to support a third feature which is unique to the Mac, and is unavailable for all other platforms? In this case, we should use the full power available at our disposal, and also avoid exposing the difference to main. How can this be accomplished?

We can use our example to illustrate the approach. Let's assume there should be a feature that does something on darwin but does not exist/has no effect on other platforms. There are two approaches available:

  • mock the behaviour on the platforms that don't have the feature, so it can be called in a platform-independent way, i.e. make it no-op
  • or encapsulate the platform-dependent call such that it's not exposed to the platform-independent code, hence is transparent.

The latter is available and really helpful in real life when implementing unique parts nested within a higher-level code that is called by code that is unaware of different platforms. For our example we'll follow the former way, but the second is preferred whenever it's possible.

Also note, that a simple if runtime.GOOS == "darwin" { // unique code } won't do the trick. Why? Because it will be attempted to be included for all platforms during compilation, and conditioned only at runtime.

Let's introduce a third feature which should work only on the Mac, while all existing conditions must remain. In order to make this happen, we:

  • implement the new feature for darwin
  • mock it for windows and linix
  • the second step implies that we have to add a Linux-specific file. We'll use the suffix for that.

We also don't want to make a special case for the existing code shared between Linux and macOS. The fact that we're now going to support all three platforms, does not mean we should have a separate platform-specific implementation, just for the sake of separation. As one of the guiding principles says, the more code can be shared and/or abstracted, the better. And if the second feature introduced earlier remains the same for the two platforms, there is absolutely no point in splitting it.

Let's have a look at the project after the necessary have been made.

  • directory listing:
.
├── bin
├── cmd
│   └── myapp
│       └── main.go
└── something
    ├── something.go
    ├── something_darwin.go
    ├── something_linux.go
    ├── something_posix.go
    └── something_windows.go

Notice the two new files, something_darwin.go and something_linux.go.

  • the main.go file:
package main

import (
	"fmt"

	"../../something"
)

func main() {
	fmt.Println(something.Feature1)
	fmt.Println(something.Feature2)

	something.DoFeature3()
}

Notice how the code does not depend on platform here.

  • something.go
package something

const (
	Feature1 = "common feature for all platforms"
)

This piece of code is common among all supported platforms.

  • something_darwin.go
package something

import (
	"fmt"
)

func DoFeature3() {
	fmt.Println("this is a unique feature for the mac")
}

This code is compiled for and executed only when running on the Mac.

  • something_linux.go
package something

func DoFeature3() {}

Here we mock the missing functionality for Linux.

  • something_posix.go
// +build linux darwin

package something

const (
	Feature2 = "this is a version supported by a number of unix-like systems"
)

This code remains the same, as it works nicely for the two platforms, hence no change is needed. This file is included for both platforms.

  • something_windows.go
package something

const (
	Feature2 = "this is a windows specific version"
)

func DoFeature3() {}

Finally, for Windows, the new feature is mocked in this existing file.

Here are a few results of building and running:

# Build for darwin.
GOOS=darwin go build -o bin/myapp-darwin ./cmd/myapp

# Run on darwin.
$ ./bin/myapp-darwin
common feature for all platforms
this is a version supported by a number of unix-like systems
this is a unique feature for the mac


# Build for linux.
$ GOOS=linux go build -o bin/myapp-linux ./cmd/myapp

# Run on linux.
$ docker run -it --rm -v $(pwd)/bin:/root/bin/ alpine:3.13 ash
% /root/bin/myapp-linux
common feature for all platforms
this is a version supported by a number of unix-like systems

As you see, the feature works as expected on macOS, and is non-existent on Linux. Quod Erat Demonstrandum.

Strive for Platform-Independent Tests

Write platform-independent tests whenever possible.

This suggestion is corollary to and supports the previous two guidelines. Maintenance of multi-platform code is not a trivial business, and has its cost. By making sure that the code is designed to be as much as possible independent of the platform it's run on, and only really low-level parts do differ, we have hopefully reduced the cost. But the code has to be tested, and since tests are also code, the same rules apply.

Platform-independent approach to tests works best in a case where the result of an operation or behaviour must be the same across all platforms despite different low-level implementation details. In such a case, the low-level code does not necessarily need to be covered, as long as tests are run against all target platforms.

To illustrate this, let's slightly update our example. We'll add a feature that has to work uniformly across all platforms, but will have three different implementations. Let's say we need to add a fourth feature which prints "this feature has different implementations but yields the same result" on all three supported operating systems (here, as in GOOS).

This is what the code looks like:

  • this is our platform-independent high-level code, something.go:
package something

import (
	"fmt"
)

const (
	Feature1 = "common feature for all platforms"
)

func RunFeature4() {
	fmt.Println(Feature4())
}

func Feature4() string {
	return doFeature4()
}

Notice the RunFeature4 function which internally calls the doFeature4 function. Where is it declared? It's not declared in any file that is visible for all platforms simultaneously. Instead, each of the platforms has its own implementation. Here we go:

  • for darwin, something_darwin.go:
package something

import (
	"fmt"
)

var (
	feature4 = []byte(`this feature has different implementations but yields the same result`)
)

func DoFeature3() {
	fmt.Println("this is a unique feature for the mac")
}

func doFeature4() string {
	return string(feature4)
}

Here, the special implementation of doFeature4 uses the slice of bytes as the source.

  • for linux, something_linux.go:
package something

func DoFeature3() {}

func doFeature4() string {
	return "this feature has different implementations but yields the same result"
}

In the world of Linux, things are often more straightforward. We just return a string.

  • here is windows, something_windows.go:
package something

const (
	Feature2 = "this is a windows specific version"
)

var (
	feature4 = []rune(`this feature has different implementations but yields the same result`)
)

func DoFeature3() {}

func doFeature4() string {
	return string(feature4)
}

In Windows, our implementation is based on a slice of runes.

  • finally, this is how the feature is used, main.go:
package main

import (
	"fmt"

	"../../something"
)

func main() {
	fmt.Println(something.Feature1)
	fmt.Println(something.Feature2)

	something.DoFeature3()
	something.RunFeature4()
}
  • building and running:
# Darwin.
$ GOOS=darwin go build -o bin/myapp-darwin ./cmd/myapp
$ ./bin/myapp-darwin
common feature for all platforms
this is a version supported by a number of unix-like systems
this is a unique feature for the mac
this feature has different implementations but yields the same result

# Linux.
$ GOOS=linux go build -o bin/myapp-linux ./cmd/myapp
$ docker run -it --rm -v $(pwd)/bin:/root/bin/ alpine:3.13 ash
% /root/bin/myapp-linux
common feature for all platforms
this is a version supported by a number of unix-like systems
this feature has different implementations but yields the same result

The output suggests that the feature works as expected. But how can we prove it remains working over time, especially given that changes to software are made often, not to mention that platforms do not stand still too, and are being constantly updated? Of course, with tests. In this case, the outcome is and must be exactly the same for all platforms. So this outcome is the result we want to make sure is always correct, not the implementation detail. If something happens to the implementation, then the broken test will highlight it.

In cases like this, write platform-independent tests.

To illustrate this situation, here are two example tests:

  • the first checks the feature itself
  • the second tests how it works end-to-end.

Here is the something_test.go file, which is in effect on all platforms:

package something_test

import (
	"bufio"
	"os"
	"strings"
	"testing"

	"../something"
)

func TestFeature4(t *testing.T) {
	exp := "this feature has different implementations but yields the same result"
	act := something.Feature4()

	if act != exp {
		t.Errorf("expected: %s, got: %s\n", exp, act)
	}
}

func TestRunFeature4(t *testing.T) {
	stdout := os.Stdout
	r, w, err := os.Pipe()
	if err != nil {
		t.Errorf("failed to create reader and writer: %s\n", err)
		t.FailNow()
	}

	os.Stdout = w
	something.RunFeature4()

	b := bufio.NewReader(r)
	out, err := b.ReadString('\n')
	if err != nil {
		t.Errorf("failed to read from reader: %s\n", err)
		t.FailNow()
	}

	if err := w.Close(); err != nil {
		t.Errorf("failed to close writer: %s\n", err)
		t.FailNow()
	}

	os.Stdout = stdout

	exp := "this feature has different implementations but yields the same result"
	if act := strings.TrimSuffix(out, "\n"); act != exp {
		t.Errorf("expected: %q, got: %q\n", exp, act)
	}
}

Notice how neither of the two tests has any knowledge about the platform-specific code. In the first test, TestFeature4, we check that the feature is implemented correctly. The second test, TestRunFeature4, does a full check, including capturing the result from the standard output.

Here are test results:

  • on macOS:
$ go test ./something/... -v
=== RUN   TestFeature4
--- PASS: TestFeature4 (0.00s)
=== RUN   TestRunFeature4
--- PASS: TestRunFeature4 (0.00s)
PASS
ok  	_/Users/user/projects/xplatform/something	0.005s
  • and on Linux, using the golang:1.15 image:
$ docker run -it --rm -v $(pwd):/root/xplatform golang:1.15 bash
% cd /root/xplatform/
% go test ./something/... -v
=== RUN   TestFeature4
--- PASS: TestFeature4 (0.00s)
=== RUN   TestRunFeature4
--- PASS: TestRunFeature4 (0.00s)
PASS
ok  	_/root/xplatform/something 0.003s

The output is exactly the same, and tests pass in both cases. And that's what we should strive for.

Provide Cross-Platform Tests Only When You Must

Provide platform-specific tests only when you must.

If we had lived in a perfect world, platform-agnostic tests would have been enough. Since this is not the case, under some circumstances we have to provide our code with specialised tests. There are two main reasons for having specialised tests:

  • the feature being tested is not available on all platforms, hence there is no way to test it uniformly in accordance with the previous guideline
  • or the platform-specific implementation detail is crucial, and requires an extra cautious approach.

Note that the latter of the two reasons is rather rare for most cases of casual and business software. A good illustration of this and the preceding guidelines can be found in the net package of the Go's standard library. The vast majority of the package's tests operates on platform-independent code, and only a few parts are provided with specialised tests.

When a single or a group of platforms needs a separate set of tests, follow the same rules as with regular cross-platform code:

  • in simple cases, use file name suffixes, e.g. something_darwin_test.go
  • in cases where a number of platforms share the same implementation and tests, use a meaningful suffix and build tags, e.g. something_unix_test.go and // +build darwin freebsd.

To give an example, let's add a test to the feature that is unique for the darwin platform in the demo code shown previously. As a reminder, this is the platform-specific implementation we're going to test, something/something_darwin.go:

package something

import (
	"fmt"
)

var (
	feature4 = []byte(`this feature has different implementations but yields the same result`)
)

func DoFeature3() {
	fmt.Println("this is a unique feature for the mac")
}

func doFeature4() string {
	return string(feature4)
}

In particular, we're interested in the DoFeature3 function, as it's available only on the Mac; on all other platforms it's mocked as func DoFeature3() {}. The implementations are different, so there is no way it can be expressed in a clean way. Of course, one can always write if runtime.GOOS() == "darwin" { ... }, but that's rather naïve. Why would we allow such a leak of details to a higher level of abstraction?

Instead, we create a file to hold the tests for the platform of interest, something_darwin_test.go in our case, and fill it with test code:

package something_test

import (
	"bufio"
	"os"
	"strings"
	"testing"

	"../something"
)

func TestDoFeature3(t *testing.T) {
	stdout := os.Stdout
	r, w, err := os.Pipe()
	if err != nil {
		t.Errorf("failed to create reader and writer: %s\n", err)
		t.FailNow()
	}

	os.Stdout = w
	something.DoFeature3()

	b := bufio.NewReader(r)
	out, err := b.ReadString('\n')
	if err != nil {
		t.Errorf("failed to read from reader: %s\n", err)
		t.FailNow()
	}

	if err := w.Close(); err != nil {
		t.Errorf("failed to close writer: %s\n", err)
		t.FailNow()
	}

	os.Stdout = stdout

	exp := "this is a unique feature for the mac"
	if act := strings.TrimSuffix(out, "\n"); act != exp {
		t.Errorf("expected: %q, got: %q\n", exp, act)
	}
}

Running the tests emits the following output:

  • on macOS
$ GOOS=darwin go test ./something/... -v
=== RUN   TestDoFeature3
--- PASS: TestDoFeature3 (0.00s)
=== RUN   TestFeature4
--- PASS: TestFeature4 (0.00s)
=== RUN   TestRunFeature4
--- PASS: TestRunFeature4 (0.00s)
PASS
ok  	_/Users/user/projects/xplatform/something	0.006s
  • and on Linux, the output is still the same as before
$ docker run -it --rm -v $(pwd):/root/xplatform golang:1.15 bash
% cd /root/xplatform/
% go test ./something/... -v
=== RUN   TestFeature4
--- PASS: TestFeature4 (0.00s)
=== RUN   TestRunFeature4
--- PASS: TestRunFeature4 (0.00s)
PASS
ok  	_/root/xplatform/something 0.003s

Organising cross-platform code and tests this way allows for:

  • better encapsulation of responsibilities
  • better isolation of changes
  • more clarity on what, when, where and how
  • and this all results into a healthier software and process.

This example shows a couple of important things – that it is possible to have specialised tests if needed, and that the complexity and cost of maintenance grow much faster with each new varying detail. While being able to test the specifics of a platform is important, it's also important to use this ability wisely. The best code is unwritten code; but a better code is the one that checks for the outcome rather than for a specific implementation detail. This supports the guideline – write platform-specific tests only when you must, and try to cover as much as possible at the platform-independent level. It will save you time and energy.

Summary

This unit went one layer deeper in the project structure, and discussed guidelines for package design. Its leitmotif is similar to the one of the project layout – mindful, thoughtful and responsible choices do a better job short and long-term, for your peers and you.

Here is a quick recap of the suggestions in this unit:

  1. Create a package to improve the design of software, not for the sake of having a package.
  2. The main package is special, and must be kept as minimal as possible.
  3. When creating a new package, make sure to:
    • keep the public API as narrow as possible
    • provide a solution for a task
    • choose a good and meaningful name which
      • sets the right context and namespace
      • is consistent with the environment, and is not confusing
      • is short and concise singular noun
      • is precise and focused, i.e. avoid common, util, tools, etc
      • and is unique.
  4. Organise a package properly, in accordance with Go conventions:
    • prefer a flat list of packages by default
    • there is no such thing as a sub-package in Go, organisationally speaking.
  5. Keep files in a package in a good order:
    • begin with a single file named as the package
    • organise files by topic, responsibility and behaviour
    • name other files meaningfully
    • avoid creating file per object
    • prefer fewer files of larger size
    • keep methods of a type in the same file
    • avoid ambiguous file names
    • use a separate file for documentation when appropriate.
  6. To minimise or reduce the maintenance burden when working with cross-platform code:
    • follow the basic principles of writing cross-platform code
      • the business logic must be platform-agnostic
      • the amount of cross-platform is as minimal as possible
      • significant differences between platforms are compensated by minimal mocking or no-op implementations
    • know the tools available for cross-platform development in Go
      • ways to produce binaries
      • approaches to differentiating code between supported platforms
        • using file names
        • or build tags
        • and how to combine them when needed
    • organise packages containing cross-platform code with change in mind, i.e.:
      • keep platform-dependent code at minimum
      • use simple branching in trivial cases
      • have no platform-specific code in main
      • use file suffix in simple cases
      • use build tags when more flexibility is needed
      • strive for platform-independent tests
      • and provide cross-platform tests only when you must.

These guidelines are the continuation of the project level recommendations at a higher resolution. Applied together and in a consistent manner, this sets up a solid foundation for a codebase. Even if you inherited a disorganised codebase, it would be possible to improve its structure by taking small steps towards the better shape. As long as you keep improving it, and continue making right design choices, the codebase eventually reaches, or remains in, a good shape. And then you have to continue, as nothing is static, and is always evolving.

The next step will take us one more layer further – the level of a file in a package. We've made the necessary preparations, and are ready for discussing code organisation within a file.