Monorepo: Additional Chapters

The approach to maintaining a monolithic repository outlined above is a good starting point, and works quite well in most cases. At the same time, larger organisations with a larger number of teams and services, may want or need more flexibility when working with a monorepo. Let's think of potential reasons for that, and how to address the needs.

Open Questions

What questions are left with no answers by the structured monorepo? Some of them include:

What if we need to maintain both, older and newer versions of a package? For example, this could easily be the case when a large number of services work with the same model, say a core business unit. You have to update the model, you tried to find a way to make a backward and forward-compatible change, but it seems to be either impossible or too resource demanding. Had this model been in a separate module, a new major version with a breaking change could have done the job. But with a monorepo, everything is versioned as a whole, or not versioned at all.
What if some packages are changed too often, such that it's hard for other teams to keep their services up to date with the changes? Or, how to allow quick and frequent changes in code that is relied upon by other code? While this should not be normally the case in a well-designed system, this happens. There might be places in a codebase which are depended upon by hot execution paths, and for some reason changes to them are made quite often. If several teams are forced to perform routine updates caused by the work of another team, this can quickly become a major bottleneck, a source of anxiety and even lead to a (rather wrong) conclusion that monorepos don't work.
What if infrastructure and utility code changes often enough to bother other teams with the need to keep their services up to date? Similarly to the previous question, this normally should not happen. Yet, sometimes this happens, especially when the initially written low-level code is not optimal (someone quickly prototyped, and forgot to finalise the work by taking the draft to a solution that is done well), and at some point it needs to be refactored. In such case, there will be breaking changes, and having everything in a single repository dictates the need to adjust all depending services at the same time.
What if all of the above is the case, the number of people working on a project is large, as well as the codebase, which is slightly aged and has got a fair amount of legacy in it?

These are just a few questions that you may have, either because it's your case, or you know someone who is in a similar situation, or for any other reason. There are, of course, many other potential questions. Our goal here is not to find a silver bullet that is the ultimate answer to these and all other questions. Instead, we're trying to find out something to start with, where you can begin experimenting and working out your own way.

Further we consider two additional options that you can employ to help in maintaining a healthy monolithic repository as well as to keep your teams happy and efficient, as much as it is possible in this context. It's advised to use them only when the structured approach is not enough, and it has either already limited you in some way, or the analysis shows some evidence that it might happen soon, and other changes have been tried to improve the situation.

As with everything in life, any flexibility comes at a cost of increased complexity and responsibility, and you have to keep the latter two in balance. Abusing the complexity, or neglecting the responsibilities are two of the most common reasons for any failure, especially and very often it is the case in software development.

Having put an extra stress on the importance of being cautious with added complexity, let's look at ways for achieving additional flexibility. The first one offers the use of nested Go modules, and the second is a bit more radical, but still an option for situations when changes happen too often, and are too breaking. And finally, we look at the combination of the two.

NOTE: In the context of the next section we use the term "nested modules" for a Go module that is located not at the root of a repository, potentially having siblings in neighbour directories. We deliberately refrain from using the term "submodule" to avoid confusing it with a Git submodule. Git submodules are not considered in and are outside of the scope of the book.

More Structure with Nested Modules

Consider converting the top-level packages into nested modules when need more fine grained control and flexibility over packages' lifecycle.

This is possible because Go modules support nested modules, which are sometimes referred to as submodules.

As you know, a Go module is identified by the go.mod file. Each module must have this file, and it's location defines the root of the module. While it is the most often use case, and it is strongly advised to have a single module per repository located at the root, it's possible to maintain multiple modules under the same repository. It's also possible to version nested modules independently using Git tags of a slightly different form.

In general, the technique works as follows:

some of the top-level packages are separate modules
each such package, i.e. module, is versioned separately
each module can import another module, as if it was just a package
following the conventions set by semantic versioning is mandatory, i.e. each breaking and incompatible change requires a new major version of such module This allows the use of a separate import path by the code that is interested in the newest version, while the code that is yet to be updated continues using the older version.

Now let's look at an example. Here, we have a monolithic repository called monorepo owned by the myorg account. In the repo, we've got two packages - model and controller. Everything up until now has been as a single module, defined at the root of the repository. Our task is to convert the two packages into nested modules, and to version them independently. The execution plan consists of three steps, for each package:

initialise a module
commit and push changes to the remote origin
version the new module by creating and pushing a tag, paying special attention to the name of the tag, it's module_name/vX.Y.Z, not just vX.Y.X.

The final directory structure looks like this:

.
├── controller
│   ├── controller.go
│   └── go.mod
├── model
│   ├── go.mod
│   └── model.go
└── go.mod

Let's go look at each step in detail for each of the two packages, starting with controller. For simplicity, we omit creating a separate branch and pull request:

first, convert the controller package into module

cd controller
go mod init github.com/myorg/monorepo/controller

now commit and push the change

git add go.mod
git commit -m "Convert controller package to module"
git push

next, version the module by creating a tag

git tag -a controller/v0.1.0 -m "Convert controller to module"
git push --tags

# OR
git push -u origin controller/v0.1.0

cd ..

Now we handle the model package, assuming it's version is somewhat bigger:

convert it into module:

cd model
go mod init github.com/myorg/monorepo/model

commit & push

git add go.mod
git commit -m "Convert model package to module"
git push

finally, version the module

git tag -a model/v0.4.1 -m "Convert model to module"
git push --tags

# OR
git push -u origin model/v0.4.1

cd ..

And that is it.

This approach allows for versioning packages within the monorepo independently. The biggest advantage is that you can now easily introduce a breaking change under a new major version, starting from v2.0.0. A detailed conversation about versioning Go modules is the subject for next section, right after we finish with monolithic repositories.

All above is, of course, good, but what if you're larger than Google, and one monorepo is too limiting for your needs, or for some other reason the options discussed so far offer not enough flexibility? Well, would two or more monolithic repos help you?

Multiple Monorepos

In a perfect world, and for some teams, a single monorepo works absolutely well. But what if your reality is somewhat less than ideal, and one monolithic repository is not enough? If a single repository is too limiting or restrictive, that is probably not a sign that it does not work. Perhaps you just need opt for two or more monorepos, each carrying unique responsibilities.

We leave aside some obvious and legit cases where a large organisation has several monolithic repositories as a matter of separation of concerns. For example, one company acquires another, and the two companies under the same roof continue development in two monorepos. Legal concerns might be involved too, so this is an absolutely natural reason for multiple monorepos.

Similarly, we leave aside situations where a company starts experimenting in a new field, or a new project. For example, it's natural for a game studio to develop each game in a separate repository.

Nonetheless, there are questions that some may want to ask. What if, for example, the infrastructure-specific code is evolving too quickly? Or what if CI/CD workflows for client and service code are too different? What if services in the monorepo use so different technologies that it somehow resulted to incompatible workflows? Or, what if one team wants to follow GitHub Flow, and the other is willing to follow Git Flow? What if one team is simply not willing to adapt the practice?

The questions above, although sounding real, are a subject to the overall engineering policy and culture at an organisation. Normally, an organisation has some sort of shared vision where everyone is aligned with and agreed on the foundational principles, approaches and tooling. So if those sorts of questions are present within your organisation, and everyone is doing whatever they want, that's probably a topic for a separate conversation. Here we focus on the technical aspects of dealing with a problem where one monorepo is not enough. And while the author has failed to find an example of a real reason for this (besides a few mentioned earlier and the likes), it should not prevent us from exploring what's available.

Below we briefly explore several available combinations for consideration should you need this. It's obvious that all options that are mentioned, as well as anything that you come up with, have pros and cons, and here are a few common. Unless given explicitly, a common advantage is increased flexibility, either technical or organisational/legal, which comes at a cost of the downside in a form of exponentially increased complexity, and increased cost of maintenance. Instead of reducing entropy and chaos, they facilitate it.

NOTE: Bear in mind that the options explored further won't work for all teams, and finding a silver bullet is not the goal of the exploration. It should be clear that no one is enforcing anything upon the reader. Some teams consist mostly of full stack developers, while others prefer specialised engineers. Therefore, it's obvious what is good for one case, might not work for another.

Multiple Single-Module Monorepos

A simplest option, if applying this adjective is even possible once we're here, is to have a few monorepos that follow the structured approach described in detail earlier in the unit.

It's relatively straightforward, and enables for easier separation of concerns. This option might suit for cases when code needs to be split for both, legal and technical reasons, when alternatives are somewhat less optimal.

It might also be helpful when there is a strong desire or need in splitting utility code from business logic code. In such situation, the monorepo with utility code becomes an official in-house extension to the standard library, and the monorepo with business logic and services is focused only on the company's business domain.

Architectural Monorepos

Another option is splitting all code into few layers, and put each layer into a separate monolithic repository.

A large organisation may have a huge project, or many projects, each of them have at the very least three major components: infrastructure, client-side and services. In such case, it might make sense to organise code as follows:

all infrastructure and operations code lives in one monorepo
the second monorepo is devoted to client code
- it's also possible to go further and split web clients from mobile, if needed
finally, the code that is responsible for handling and persisting data, i.e. services, goes into its own repository.

With this setup, all major components are split by their nature, and this might be beneficial for the teams, and processes.

Technology-Specific Monorepos

An organisation may have hundreds of thousands lines of code written using one technology stack, i.e. language, and then switch to another, producing another hundreds of thousands of lines of code, or millions. And while this is not necessarily a goal on its own, it might be beneficial to split code into monolithic repositories focused on a specific technology.

This option may work quite well when a company is transitioning from one stack to another, or works with several in parallel. Go code stands out in that it's much easier to support compared to other languages, besides maybe a case when you have no code at all. So it might be reasonable to put all Go code in a separate repository, and say C++ code in another, and then Javascript in the third.

Multiple Monorepos with Nested Modules

Potentially the highest degree of flexibility, along with complexity and chaos, can be achieved by employing multiple monolithic repositories with multiple independently versioned modules in them.

It's strongly advised against using this option, but it's still available. If your organisation is ready to pay the price of this flexibility, or if there is a real need in it (which is highly unlikely), then this is technically possible, and, with very high levels of personal responsibility of each individual and collective responsibility of the engineering department as a whole, might work. Depends on you.

Closing Notes

Maintaining a monolithic repository that several teams are working with, is not an easy task. The experience shows that this is not only possible, but is also highly beneficial when approached consciously and organised, when the vision is shared by everyone involved in the process, when the culture is strong, and people understand the importance of keeping things in a good order.

The techniques described in this section are not new, unique, silver bullet, nor a panacea. But they have proved to be helpful for projects of various sizes, in small and larger teams. Give it a go, experiment, adjust to your needs, and find what works and what does not.