I am currently travelling conferences and Java User Groups with a talk called “Whoops! Where did my architecture go?”. It discusses approaches to create and maintain logical architectues in Java code bases, challenges, tools and tries to outline some ideas how one can accomplish this. A core part of the argumentation is the discussion about the importance of Java packages. Jens Schauder has written a blog post about that topic recently and I felt I had some things to add. The more I though about it I got the conclusion that a comment I envision would exceed the length of a reasonable reply I thought I’ll write up a blog post. I will point to the slides of the presentation in places where it comes in handy. The talk is also based on a lot of sample code which we will get to a bit later. The code can be found on GitHub.
I’d like to take a step back at first and briefly discuss some general concepts and ideas that will be necessary to understand the approach I’ll present later. I’ll use the Java software development space as posterchild for this to make sure we can build relations into the day-to-day practice but the concepts apply to software systems in other languages, too.
A core principle of complex problem solving is “divide and conquer”. We split up a complex problem into smaller ones and approach those smaller ones individually. In Java software this is usually done in several artifact levels: deployment units (WARs, JARs), packages and classes eventually. As a side effect of that we create dependencies between these artifacts and we need some means to manage them.
As a consequence of that we create artifacts with different (in)stability metrics (see Wikipedia for details). This metric essentially expresses the risk of a change made to an artifact. Assume we have two artifacts A and B, A depending on B. A change in A will never affect B, whereas a change in B potentially breaks A. Thus, there’s value in potentially splitting up a big artifact into two smaller ones with a defined dependency direction as it reduces the risk of a change in (in my just given case) A. If a dependency from B to A is introduced, both artifacts essentially become a single one from an “the effect of change” point of view as we cannot touch one without potentially affecting the other.
A practical example of this aproach is the layered architecture pattern where you split up you system in presentation layer, service layer and repository layer with directed dependencies from top to bottom. The exact number and names don’t really matter. Still you can see that you can definitely change the presentation layer without any chance of breaking something in the repsoitory layer given you haven’t violated the dependency rules.
Layering is a reasonably understood by developers as it’s decomposes the software system by by a technical aspect. It’s tought in university, probably practitioned in most of the development shops out there. So here’s the interesting question:
If we understand and value the benefits of the general approach of slicing code horizontally, why do we so horribly neglect that approach when it comes to vertical decomposition, i.e. business functionality?
I know this is a bold question, but it’s essentially what I’ve seen through most of the code reviews I’ve done in the last couple of years. In general, I’d even argue that splitting up business functionality in dedicated slices and monitoring and making sure the allowed defined dependencies are not violated is much more important for long-term maintainability of software. This is mostly do the cost of a change correlating to the risk of a change.
Take a CRM system as example: you’ll probably have a Core slice that contains some code that other slices will depend on. You’ll find a Customer management slice, that keeps track of customers, their addresses etc. On top of that you might then build a Contract management slice that clearly needs to know about the Customer and use the Core slice as well (see this slide).
So let’s assume we defined both slices and layers as architectural concepts, how do we make sure these architecture concepts are embodied in our Java codebase? What we usually do is creating naming conventions and map our layers and slices into package and class names. There are some great tools like Sonargraph and Structure 101 available out there that help you at doing exactly that. Being a happy day-to-day user of Sonargraph I still asked myself: “How far could we actually get with plain Java (language, compiler runtime) means only?”.
We already identified that controlling dependencies between code artifacts is key to reduce the risk of change effectively. Let’s have a look how this is approached in general. A key aspect to dependency control is that an artifact is able to explicitly express a dependency. Interestingly we have solutions to do exactly that for two of the three artifact levels (JARs, packages, classes) mentioned above.
Deployment units are usually managed through either Gradle, Maven or Ivy. At the class level dependency injection comes into play as a core aspect of it is to expose dependencies through either constructor arguments or setters (an interesting post on this at Jens Schauders blog again). Still this is not entirely sufficient cause at runtime the type space in the JVM is flat and a class can potentially “ask” for a dependency it shouldn’t be able to and get it injected. This is where the aforementioned tools would step in again and veto at build time. But yeah, we wanted to see how far we can get without them.
At this point, I’d like to throw in another question:
Dear Java developer, have you ever wondered why you make your class’ properties private by default but the class itself public by default? Hint: ‘Because my IDE generates it this way’ is not a valid answer.
The observation that should lead you to is that Java developers usually just skip packages as means to control visibility of types. We practice information hiding in classes/properties (we actually don’t as most developers run “Generate getters and setters” right away) but don’t on the classes-in-a-package level. We just discovered that inside a JAR we essentially have to manage which type can see which other type as the class space of public types is flat in a JVM. Essentially this means that if a class is not public (i.e. package private), we don’t have to dependency manage it on the global level but within the package only.
The reason packages are usually not used as dependency managment mean is that designing packages is not a trivial task, even you use an architecture tool like Sonargraph to help you. An aspect of that is that there are hardly any guidelines on how to do it and thus various kinds of crazy patterns have evolved over the years. Arbitrarily restricting the number of classes in a package is just - arbitrary (actually one of the few points I disagree with Jens Schauder’s blog post). Simply grouping all exception types into a exceptions
package is not really reasonable either as it groups types by technical commonalities which usually doesn’t provide any architectural benefit.
Assuming we’d like to map our previously defined architecture onto Java packages, here’s what you’ll probably see something like this. You find actual code backing this approach in the packages-before project of this GitHub repo:
….account.domain
….account.presentation
….account.repository
….account.service
….core.domain
….customer.domain
….customer.presentation
….customer.service
….customer.repository
This is not bad in general but there are a few things to consider here. First, having the slice first is a good idea as you can easily separate or externalice and entire chunk of functionality. Remember that in our example, the account
slice was not depended on by anything else so we should be able to remove this package and all of it’s sub folders to ditch the features implemented in the slice and the app should still work with the customer features only.
Second, you see that the core slice does not have code in any of the actual layers, but a domain
package only. This essentially shows that we expose something that can be considered an implementation detail to the public: the layering. That might be considered not too important but it has a consequence that completely subverts our idea of reducing the number of types to manage. Let’s have a more detailed look at that:
….customer.domain
+ Customer
….customer.service
+ CustomerManagement
+ CustomerManagementImpl
+ CustomerNumberGenerator
….customer.repository
+ CustomerRepository
Assume the following: CustomerRepository
is simply persisting Customer
instances. The core part of the service implementation is generating customer numbers when a new Customer
is created. This creates the need to make sure all customers are created through the service interface to make sure this behavior gets applied. Unfortunately, CustomerRepository
needs to be a public type so that it can be referred to from the service implementation. This opens it up to be an injection candidate into virtually any other component of the system, which is exactly what we want to avoid. So what if we ditched the layer packages entirely? (The code can be found here.)
….customer
+ Customer
+ CustomerManagement
o CustomerManagementImpl
o CustomerNumberGenerator
o CustomerRepository
We end up with less then a half of public types to manage. CustomerRepository
cannot be used from anywhere else as the Java compiler prevents us from doing so. CustomerNumberGenerator
and the layering in general becomes an implementation detail. Architecturally this creates stronger gates between the slices (the packages actually) and thinner ones between the layers (simple class-to-class dependencies) as illustrated in this slide (compare to the slide before to see the difference).
The basic approach I recommend is to move the vertical slices into the focus of the package naming and try to model them in a way that the public API of a slice is as tiny as possible in the first place. This is of course no silver bullet approach as packages can grow significantly and it might make sense to extract certain types into a sub-package or the like which then usually leads you to the need to use an architecture management tool. The core idea here is to try to use the means of visibility control that are available in Java to write code that is not a giant potential dependency mess in the first place. Packages can actually help you to achieve exactly that.