top of page

Subscribe to get best practices, tutorials, and many other cool things directly to your email inbox

  • Writer's pictureAhmed Tarek

What Is Caching In Software Systems

Definition and Best Practices of Caching in Software Systems.


Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.
Photo by Ben White on Unsplash, modified by Ahmed Tarek

Every Software Engineer has heard about Caching. It is a term which you would hear from time to time especially when you work on big Software Systems.


On these systems, most probably you would deal with huge amount of data and different services and this is where Caching becomes useful.


However, from time to time I come across some wrong statements about caching. Sometimes these statements are about wrong goals or usages of Caching, other times are about handling Caching, and too many times about implementing Caching.


That’s why I decided to write this article to share with you my understanding of Caching and some best practices that are proved to be efficient.


 

Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

The Definition


Caching in Software Systems is temporarily storing a copy of an asset to be used by the system modules instead of the original asset in order to have some gain. This gain could have more than one form.


As you might have noticed some words in the definition are highlighted in bold. This is for a reason.


I believe that every one of these words defines an important aspect that we need to keep in mind when dealing with Caching.


Temporarily ▶ It means that the copy we maintain of the asset should be deleted/cleared/replaced at some point. Otherwise, this is not Caching.


Copy ▶ It means that we don’t cache the asset itself, we cache a copy of it.


Asset ▶ I know that you were expecting to see the word “data” somewhere in the definition. But, it was meant to use the word asset because Caching is not about caching only data retrieved from a database or any kind of storage systems, it could be about caching memory objects as well.


Gain ▶ It means that Caching should not be a goal by itself. If you don’t have a well defined gain and advantage of using Caching, then you should not use it.

In the next sections, we will discuss these aspects in details.


 

Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

The Gain


Before thinking of how to Cache an asset, you first need to ask yourself if you actually need to cache it or not.


There should be some gain you get from caching this asset. Otherwise, you would be just adding more complexity to your system.


The gain to get from Caching could be:

  1. Performance enhancements: like when you cache some data retrieved from a SQL database in memory objects. This would make accessing these objects faster and eventually the performance would be enhanced.

  2. Lower cost: like when you cache currency conversion rates retrieved from a paid third party Currency Converter REST service in a memory object or even a database. This would save you some money you were going to pay per request.

  3. Other advantages: These could be some business-related advantages of caching.


The reason why I am stressing on this point is because it happened before that I found someone investing effort and time in caching some data retrieved from a database and storing the cache into another database.


So, my question to him was:

Is there any difference between the first database and the second one?

Unfortunately, the answer was:

No, they are almost the same.

So, as you can imagine, my next question was:

Then why are you trying to cache the data in the first place?

And, the answer was:

Actually, I don’t know any more. You made me question myself.

I can tell you more about the rest of the conversation but I would not bore you to death.

Long story short, at the end we found out that he could make use of caching if he would cache the data to an object in memory.


Therefore, my advice to you is to always ask yourself: what would I gain from caching this asset?


 

Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

The Asset


As I said in the definition, not everything you cache would be data. Sometimes you would cache other assets like objects. These objects could be different things.


Like when you cache the data retrieved from a SQL database.


And when you cache files and folders retrieved from the operating system file system.


And when you cache parameters retrieved from external DLL files.


And when you cache handlers and delegates at the moment of expensive object creation.


All of these are examples of what you can cache.


 

Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

The Invalidation


As I said in the definition, Caching should be temporarily. This means that at some point the copy of the asset we are keeping should be invalidated/removed/cleared. However, there is more to keep in mind.


The factors on which we decide to invalidate the cache are not always the same. They differ from one system to another depending on the business needs.


For example, you could decide to invalidate the cache:

  1. When the cached copy have been there for a certain amount of time; a week or an hour for example.

  2. When the data in the main source is changed.

  3. When an external trigger is invoked.

  4. Or a combination of some/all of these.


Every one of these factors would have its impact on your system design and implementation.

That’s why it would be wise to properly decide on these first before jumping into implementation.


 

Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.
Photo by Kelly Sikkema on Unsplash, modified by Ahmed Tarek

Design and Best Practices


To come up with a good design for a system using Caching, we need to take it step by step. This would help us spot the weaknesses and work on enhancing them.


 

Single Point of Access


Before implementing Caching into your system, there are some important preparations to apply.


So, if your system is implemented this way:


Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

Then you would have a lot of problems implementing Caching as you don’t have a single point of control on the stream of data.


Therefore, in this case, it would be better to change the design to something like this:


Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

This way, you have only one module which is responsible for accessing the data source -whatever it is- and owns the knowledge of how to deal with it.


Also, this way, you are sure that there is no way for any action to be taken on the data source that could under go your radar.


 

Consistent VS Inconsistent


Now after having our Repository layer introduced, we are ready for adding Caching.


Some people might decide to update the design to be something like this:


Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

However, this is not good as now our services -where the business logic resides- is aware of the Cache module and how to deal with it.


Some other people might decide to update the design to be something like this:


Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

This is even worse. Now our services are exposed to both the Caching and Repository modules and the logic of deciding when to use the cache and when to use fresh data from the data source would be residing and duplicated between the different services.


That’s why the design should be something like this:


Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.

So, we would have the following:

  1. A defined interface for our business-related commands and queries called IRepository.

  2. A defined interface for actions related to Caching operations called ICache.

  3. A module implementing IRepository to act as the direct repository implementation. It could be implemented specifically for a SQL database or an XML file or whatever data source. Let’s call it Repository for now.

  4. A module implementing IRepository and ICache to act as the cache of our system. It should be implemented specifically for the certain caching media that our system will use. Let’s call it CacheRepository for now.

  5. A module implementing both IRepository and ICache that would be internally composed of Repository and CacheRepository and delegates the calls properly to them. However, it would also be responsible for checking if the data should be returned from the cache or the data source. If it is the data source, then it would also update the cache.

  6. One more responsibility of this module is to handle the External Cache Triggers if they exist.

  7. Then, the rest of the system services should use this module as it expects from any IRepository, not the Repository, nor the CacheRepository.


Following this design would make sure:

  1. Other system services only know about IRepository.

  2. All External Cache Triggers only know about ICache.

  3. There is a consistent way of executing business-related queries and commands.

  4. Not exposing knowledge about Caching except to the appropriate modules.

  5. Having flexibility with handling different Cache invalidation factors even the ones depending on business rules and actions.


 

Eager VS Lazy


As we said before, Cache invalidation could be triggered due to different factors. But, whatever the factor is, there is still one matter to think about.


After invalidating the Cache, when exactly to refresh the data in the Cache with fresh data?:

❓ Should we do it just after the invalidation? ▶ Eager

❓ Or should we do it whenever the data is being retrieved? ▶ Lazy


Personally, I don’t have any preference here although I think most probably using the Lazy approach should be fine.


From my point of view, it depends on two things:

  1. How often the cache would be invalidated?

  2. How critical the performance is for queries?


Additionally, it would always be wise to evaluate this matter depending on your system requirements and real behavior.


Note: I came across some systems where the cache is updated once a day at a certain time. This is handled by a separate service which runs on a scheduler service. This solution would not be appropriate for all systems but for some systems it could be fine.


 

Pseudocode


In this section, I will provide a basic implementation of the design discussed above. It is presented into Pseudocode to focus on the main idea rather than knowledge of certain programming language.


Also, please note that some best practices are dropped for the same reason.



IRepository

interface IRepository
{
	AddEmployee(Employee employee): void
	GetAllEmployees(): List<Employees>
}


ICache

interface ICache
{
	LastUpdatedAt: DateTime
	InvalidateCache(): void
}


SQLRepository

SQLRepository implements IRepository
{
	AddEmployee(Employee employee): void
	{
		// add employee into the database.
	}

	GetAllEmployees(): List<Employees>
	{
		// get employees from database.
	}
}


CacheRepository

CacheRepository implements IRepository and ICache
{
	LastUpdatedAt: DateTime

	AddEmployee(Employee employee): void
	{
		// add employee into the cache storage media.
		// update LastUpdatedAt to current date and time.
	}

	GetAllEmployees(): List<Employees>
	{
		// get employees from the cache storage media.
	}

	InvalidateCache(): void
	{
		// delete all employees from the cache storage media.
		// update LastUpdatedAt to minimum value of date and time.
	}
}


Repository

Repository implements IRepository and ICache
{
	private IRepository m_SqlRepository;
	private IRepository m_CacheRepository;

	Repository(IRepository sqlRepository, IRepository cacheRepository)
	{
		m_SqlRepository = sqlRepository;
		m_CacheRepository = cacheRepository;
	}

	LastUpdatedAt: DateTime => m_CacheRepository.LastUpdatedAt

	AddEmployee(Employee employee): void
	{
		m_SqlRepository.AddEmployee(employee);
		m_CacheRepository.AddEmployee(employee);
	}

	GetAllEmployees(): List<Employees>
	{
		return SelectRepository().GetAllEmployees();
	}

	InvalidateCache(): void
	{
		m_CacheRepository.InvalidateCache();
	}

	private SelectRepository(): IRepository
	{
		if((DateTimeNow - m_CacheRepository.LastUpdatedAt) <= Minutes(15))
		{
			return m_CacheRepository;
		}

		return m_SqlRepository;
	}
}

As you can see this Pseudocode is missing some details like how to fill in the cache when initializing the Repository module and when the cache is outdated. However, all of these are implementation details that would vary from one system to another.


Therefore, I don’t recommend that you just take the code to your production environment. The only purpose of this code is to open your mind to some basic ideas.


 

Definition and Best Practices of Caching in Software Systems. Cache Invalidation Gain Asset Design Code. Code Coding Programming Development Design Software Engineering Architecture Best Practice.
Photo by Pietro Rampazzo on Unsplash, modified by Ahmed Tarek

Final Thoughts


In this article we went through some basic definitions and aspects of Caching in Software Systems.


We also went through a design of Caching and some best practices which I think would be so helpful. Is this all?


No, we just scratched the surface. You need to read more about Caching and its techniques, check different designs, understand them, analyze them, spot the differences, advantages, disadvantages, and finally adapt them to your own needs.


That’s it, hope you found reading this article as interesting as I found writing it.



Recent Posts

See All

Subscribe to get best practices, tutorials, and many other cool things directly to your email inbox

bottom of page