Context
Many times in my career, I had to face cases where I had to keep a history of one or many tables. Most of the time, the need of history comes after the main feature itself is developed and ends up in this kind of architecture:
| Post | | PostHistory | |
It comes as the most natural and painless way to add the history feature to an existing entity. You can also find gems doing this job for you, in which case, they will serialize your Post in a generic history table.
Both of those approaches sound sexy because they’re simple to implement and they don’t introduce a “risk” for the Post model. Problem is, they’re very hard to maintain.
In the first case, every time we run a migration on the post table, we will have to run it on the PostHistory too. Double the work. If you have special columns (enum, serialized stuff in any manner), you also have to maintain the same code in both classes. Or you have to extract everything in modules to share it between classes, which makes the code less readable. If you use services, it can lead to even more complicated situations.
In the second case, if your Post model changes, restoring the serialized version from history will become hard, if not impossible. If you want to historize only a subset of properties from Post, it’s also inconvenient. Plus, if you have serialized columns in Post table, you will end with serialized data IN serialized data in the history table. Something you don’t want to deal with in case of bugs.
The other main problem is if you need to use history in another context than just reading values and showing them to a human, you will have a schizophrenic code, using sometimes Post, sometimes PostHistory.
This architecture also come with the classic problem “when do we create a new entry?”:
- before saving will imply that the last version is not in history, force you to read from 2 tables to have the complete history
- after saving will imply that the last version is in two different places (content duplicated)
Let’s take a simple example, restore values from history:
# services/posts/historize_service.rb |
Also consider that in this specific case, published state is ambiguous. And if many versions in the history have different meanings, it’s hard to identify which one has what meaning. We could transfer “publish” informations into the history table but that would mean history would carry contextual informations and we would have to maintain the whole history to ensure only one version is published at a time.
Separate “meta” and “content”
A more flexible solution you can consider is to separate in two separate models what is content (ie: what we need to keep track of over time) and what’s immutable over time.
| Post | | Post::Content | |
Using this design, whatever version you use, you will always manipulate the same type of object. You don’t have to double the logic and migrations. You keep your data stable over time.
To give specific meaning to specific versions, we will rely on foreign keys. Post::Content should not know it’s role in outside models. Let’s keep it as small as possible and focus on data.
Your post model will look like this:
# models/post.rb |
Create a new entry in history is as simple as that:
# services/posts/historize_service.rb |
Compare to the previous version, we don’t need to modify this service every time we add (or remove) a property. It’s stable over time, that’s all we want.
The simplest way to deal with versions is to consider our Post::Content objects as immutable
# services/posts/content/update_service.rb |
This service will create a new content entry every time you want to update content properties (title and/or content). If you want to prevent blank entries that would create useless history entry, you can use a form object to validate the data before calling the service.
Let’s keep going with restoring an entry
# services/posts/restore_content_from_history_service.rb |
Let’s see how we do the publishing now
# services/posts/publish_service.rb |
And voila! This design is flexible enough to add many features that won’t compromise the structure.
Making it a bit more transparent
# models/post.rb |
Plugging features
Adding version numbers, for example
# models/posts/content.rb |
This could also be extracted to a module, exposed as a method like
has_versions_number scope: :post_id |
You can even disable update at model level to ensure no one is changing the history
# lib/active_record/immutable_model.rb |
Recommendations
To update Post from forms, I discourage you to use ActiveRecord nested_attributes pattern as long as it’s a very horrible pattern (even if it looks very convenient). Instead, you would prefer to use a form object and hide the complexity of content versioning from your user.
Conclusion
This solution solves some problem cases but still have cons, like any other one. It has a higher cost at beginning but offers more flexibility over time, especially if you want to add features based on your history. It also allows you to easily “tag” and access some versions.