Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

The Efficiency of Distributed Caching Platforms in Modern Backend Architectures: A Comparative Analysis of Redis and Hazelcast

Zolotukhina Dar'ya

independent researcher

394062, Russia, Voronezh region, Voronezh, lane Antokolsky, 4

dar.zolott@gmail.com

DOI:

10.7256/2454-0714.2024.4.72305

EDN:

JNJVQQ

Received:

11-11-2024


Published:

05-01-2025


Abstract: The object of this study is two caching and distributed data storage systems — Redis and Hazelcast — which are widely used to accelerate data access in high-load applications. This article presents a comprehensive comparative analysis of these systems based on key aspects important for efficient caching: architectural features, memory management models, clustering approaches, fault tolerance mechanisms, and scalability. Special attention is given to investigating caching patterns and support for SQL-like queries. The aim of the work is to provide an in-depth analysis of the advantages and limitations of Redis and Hazelcast in the context of data caching, as well as to identify their strengths and weaknesses under different loads and usage scenarios. The methodology of the research includes a comparative analysis of Redis and Hazelcast based on key aspects, with results presented in the form of a comparative table. Performance testing of CRUD operations was also conducted using automated tests integrated into a Spring Boot application. The study shows that Redis, being a single-threaded system with fast read and write operations, is more efficient for simple, localized applications, while Hazelcast, which supports multi-threading and dynamic clustering, handles large data volumes and distributed tasks more effectively. The author's contribution to the research is a comprehensive comparative analysis of these systems, considering key characteristics such as performance, scalability, and fault tolerance, along with testing their performance in real-world scenarios. The novelty of the research lies in the detailed examination of Redis and Hazelcast for data caching in high-load applications, which will be valuable for the development and optimization of the infrastructure of high-performance distributed systems that require real-time data caching.


Keywords:

caching, Redis, Hazelcast, performance, CRUD operations, fault tolerance, multithreading, clustering, in-memory data storage, distributed system

This article is automatically translated.

Introduction

As data volumes and system performance requirements increase, caching becomes a key mechanism that allows faster access to data, reducing the load on the main databases. In today's world of distributed systems, it is important to have a reliable and productive caching solution that meets the needs of both small and large applications. Among the many available caching solutions, two market leaders — Redis and Hazelcast — offer a variety of capabilities for processing large amounts of data in RAM, but they differ in architecture and approaches to distributed data management.

Review of Redis and Hazelcast

Redis is one of the leading in-memory caching systems, which allows you to significantly speed up access to data by storing it in RAM. This solution has become the standard for many applications that require high speed and performance, as Redis provides low latency and high throughput. Due to its characteristics, Redis is used in a variety of tasks, from storing user sessions to caching the results of complex calculations and temporary data requiring instant access [1].

The popularity of Redis is due to its versatility and flexibility in configuration, which makes it suitable for high-load systems that need to quickly process a huge number of requests in real time. Unlike traditional storage systems, Redis is focused on instant access to information, which makes it ideal for caching, where it is required to maintain high performance and data processing speed. Combined with support for fault tolerance and scaling capabilities, Redis continues to be one of the most sought-after caching solutions for modern distributed systems.

Hazelcast is a powerful distributed data caching and management system that focuses on providing high performance and flexibility in large distributed applications. As an in-memory system, Hazelcast is designed to speed up access to data by storing it in RAM, which is especially important for applications requiring fast data processing and low latency. Unlike many traditional solutions, Hazelcast supports a multithreaded architecture and is focused on working in clusters, which allows it to provide high scalability and reliability under high load conditions [2].

Thanks to the ability to automatically manage clusters and distribute data between nodes, Hazelcast is actively used for caching and supporting temporary data storage in large systems where load balancing and minimizing response time are required. Hazelcast also features built-in support for fault tolerance and high availability, making it attractive for mission-critical applications. These qualities allow Hazelcast to confidently occupy a place among the leaders of caching and in-memory storage solutions in modern corporate systems, providing both flexibility in configuration and reliability in operation.

Comparative analysis

In this article, the main comparison between Redis and Hazelcast is conducted in seven key categories.:

1. Multithreading.

2. Caching patterns.

3. Clusterization.

4. Query construction.

5. Memory.

6. Calculations.

7. Fault tolerance.

Redis is a single—threaded system that uses a high-performance core with minimal memory consumption. This approach allows you to run multiple Redis instances on a single machine, distributing the load across CPU cores and maximizing the use of system resources. The single—threaded model provides simplicity of architecture and reduces the likelihood of cluster desynchronization problems - situations where isolated parts of the cluster operate independently, which can lead to data differences between nodes during write operations [3]. This is especially important to ensure data integrity.

In contrast, Hazelcast uses a multithreaded architecture with a thread pool for I/O operations. On each node of the cluster, I/O operations are distributed among three types of streams: streams for receiving incoming requests, streams for reading data from other nodes and clients, and streams for writing data. Thanks to this, Hazelcast can efficiently handle high I/O load, scaling according to needs. However, the multithreaded model is less protected from the risk of cluster desynchronization problems, which can cause difficulties in ensuring integrity in a distributed cluster [4].

The key difference between Hazelcast and Redis when using them as caching solutions is the flexibility of the approaches: Redis is strictly focused on using a single caching pattern, whereas Hazelcast supports several different schemes. When using Redis to cache data stored in another repository (for example, in a database), you need to use the cache-aside template, which implies additional network hops each time you access the repository.

Figure 1. Cache-assist pattern

Hazelcast supports both the cache-aside template and the read-through and write-through templates, in which the system automatically interacts with the database, performing reads when there is no object in the cache or writes when data changes.

Figure 2. Read/write-through pattern

These mechanisms free the developer from the need to manually synchronize the state between the cache and the main database, as required in Redis, where it is also necessary to implement logic for updating and reading data from storage. In Hazelcast, on the contrary, all the logic of interaction with the database is implemented at the level of the caching layer, which allows you to write cleaner, more structured and maintainable code.

Redis and Hazelcast also have differences in the implementation of clustering, a mechanism for combining multiple servers or nodes into a single system, which allows for high availability, fault tolerance, and scalability of applications. In Redis, clustering is implemented through sharding [5], which allows you to distribute data between multiple nodes. In this case, each node stores only a part of the data, and Redis clusters automatically determine which node is responsible for a particular key. However, the process of adding or removing nodes requires manual redistribution of shards, which increases the complexity of management. In addition, when a node fails, developers need to intervene to restore data and functionality, and operations with data located on other nodes involve additional network costs.

On the other hand, Hazelcast was originally designed as a distributed system and offers more advanced clustering features. It provides automatic management of data distribution and dynamically redistributes it when nodes are added or removed, which greatly simplifies cluster administration. When a node fails, Hazelcast automatically restores data and redistributes the load, which helps to increase the system's fault tolerance. Moreover, nodes in Hazelcast can communicate with each other without the need for additional network calls, which reduces delays and improves performance.

Thus, the key differences between Redis and Hazelcast in the context of clustering are that Redis requires manual management of sharding and recovery, whereas Hazelcast automates these processes, providing higher fault tolerance and efficiency of inter-node communication. Redis may be more suitable for simple scenarios, while Hazelcast is preferable for distributed systems requiring high availability and automatic management.

Queries in Redis and Hazelcast, despite the common use of the key/value model, have significant differences in approaches to extracting data by value properties when the key is unknown. Hazelcast provides predicate APIs, SQL-like WHERE expressions, as well as projection mechanisms, which allows you to perform queries against complex objects and JSON structures. These features make Hazelcast more convenient for performing complex data sampling and analysis, while maintaining the flexibility of working with objects of various structures.

In Redis, a similar approach requires a different storage architecture: data is usually represented as a hash, where each row is stored with a key generated based on the primary key of the table, and additionally included in a Set or Sorted Set. This storage method allows you to implement limited query capabilities, but requires additional design steps and may be less effective for complex attribute queries, since Redis does not support full-fledged predicates and SQL-like queries.

When comparing Hazelcast and Redis in terms of working with memory, several key aspects can be identified that reflect their different architectural approaches and data management strategies.

Redis is focused on storing all data in RAM, which provides lightning-fast access to it. Its data structure is optimized to minimize memory usage, and it supports various types such as strings, hashes, lists, and arrays. Redis uses compression mechanisms and efficient memory allocators, which saves resources and increases bandwidth when working with large amounts of data. However, this approach implies that data that exceeds the available RAM requires the use of disk storage, which can reduce performance.

On the other hand, Hazelcast runs in a Java environment and uses a bunch of memory managed by the garbage collector. This means that data can be stored both in RAM and in more stable storage such as disks, which provides more flexibility. However, garbage collection in Hazelcast can cause pauses in applications, especially when the amount of data increases [6]. This can lead to time delays and reduced performance, which is critical for highly loaded systems.

Another important aspect is the distributed nature of data storage. Redis usually works as a single node, although it supports clustering, which divides data between multiple instances, but memory management can become more complex. Hazelcast, being an initially distributed system, provides automatic distribution of data across nodes, which allows for better memory usage and increased fault tolerance. This allows Hazelcast to cope with the increase in data volume by distributing the load among the cluster members.

Redis provides maximum data access speed in memory, while Hazelcast offers greater flexibility and scalability due to its distributed architecture, but it may encounter problems related to garbage collection and memory management.

Redis and Hazelcast both allow computing functions to be performed directly on a node or cluster member where a specific set of data is stored, which is especially useful in caching scenarios that require local access to cached data. For example, in Hazelcast, the program can be directed to the node where a specific key is located, which minimizes network delays and speeds up access to data already stored in the cache. Redis also implements similar functionality, which allows you to efficiently process cached data, avoiding additional calls to remote nodes.

However, differences in distributed computing approaches between Redis and Hazelcast also affect their caching capabilities. Hazelcast provides a Java program running on one of the nodes with access to data on other nodes in the cluster, which allows the distributed cache to be used more flexibly and efficiently. This is especially useful when caching data that is frequently accessed from different parts of the application, as Hazelcast supports caching scenarios with node-to-node communication and dynamic data allocation. In Redis, clusters and the Lua script engine do not support access to cached data on other nodes, limiting caching to the local area of the node and reducing the possibilities of flexible cache management.

Thus, Hazelcast provides a more powerful infrastructure for distributed computing and caching, allowing the cache to be used on multiple cluster nodes as a single memory, which is optimal for applications with intensive node-to-node interaction and distributed tasks. Redis, on the contrary, is better suited for tasks where caching and code execution are focused on a specific node without the need for inter-node communication, which makes it effective for localized caching and data processing within a single node.

Fault tolerance is an important aspect in the design of distributed systems, ensuring the continuity of applications even in the event of a failure of individual components [7]. In the context of Redis, fault tolerance is achieved through replication mechanisms and the use of Sentinel to monitor the health of nodes. In this architecture, the primary server (master) can have one or more replicas that are periodically updated to synchronize data. In case of failure of the main node, the replica can be upgraded to the master status. However, this process requires manual intervention or additional configuration of Sentinel, which can cause temporary downtime.

Hazelcast was originally developed with a focus on automating fault tolerance processes. Each node in the cluster stores replicas of data [8], which allows the system to respond effectively to failures. When a node fails, Hazelcast automatically redirects requests to other nodes containing up-to-date copies of data, which minimizes downtime and increases system availability. Moreover, the automatic data recovery and redistribution mechanism avoids the need for manual management and reduces the risks associated with data loss.

Thus, the key difference in fault tolerance approaches between Redis and Hazelcast is that Redis requires significant effort on the part of developers to maintain system performance, whereas Hazelcast offers a more reliable and automated solution. This makes Hazelcast a more preferable option for applications that require high availability and reliability in the face of failure of individual components.

The comparison results are summarized in the comparison table.

Table 1.

Comparison of Redis and Hazelcast

Aspect

Redis

Hazelcast

Streams

Single-threaded

Multithreaded

Caching Patterns

She-aside

Just-aside, read-through, write-through

Type of clustering

Manual sharding

Automatic management and redistribution

SQL-like queries

Does not support

Supports

Memory

Fast data access, but limited by the amount of RAM

Potential delays due to garbage collection, but it is possible to process large amounts of data.

Calculations

Limited to working with data on a single node

Distributed computing with automatic node load balancing

Fault tolerance

Failure requires manual intervention and configuration.

If it fails, it automatically redirects requests.

Redis and Hazelcast performance testing for CRUD operations

Initial data

The program containing the tests is written on the Java Spring Boot platform. The tests are performed on a set of data placed in a CSV file. It includes over 250,000 unique records, each of which is represented by different types of data (strings, integers, long integers, and floating point numbers).

The Java client used for Redis is Jedis. Jedis is a Java client library for interacting with the Redis database, providing a convenient and high—performance API for performing data operations in Redis. Jedis supports basic Redis commands, as well as advanced features [9].

The Java client used for Hazelcast is the Hazelcast Predicts API. It provides an interface for executing queries against specified conditions in distributed data structures such as IMap. It can be used to filter and find data based on complex conditions similar to SQL queries, which provides flexibility and high performance when working with distributed data [10].

The set of tests performed included:

  • Create: The source data is loaded and inserted into the HSET (Redis) and IMap (Hazelcast) structure. The test is carried out within 1 hour. During this time, the data is read from the CSV file and added to the database until the test time expires.
  • Read: 5 types of queries are executed with different parameters, which are randomly selected from a predefined set. A set of queries is being prepared, which are executed in turn within 1 hour. The number of requests executed per second is fixed.
  • Update: Similar to the reading test, but with a different set of queries. The execution time is 1 hour. The result is the number of updated rows per second.
  • Delete: The test is similar to updating, it runs for 1 hour, and the number of deleted rows per second is recorded.

Hardware configuration

The tests were carried out on a machine with the following characteristics:

  • Processor: Apple M2 @ 3.49 GHz, 8 cores, 8 threads.
  • RAM: 16 GB.
  • Storage: 256GB PCIe® NVMe™ M.2 SSD.

Performance testing results

Create: Redis and Hazelcast showed almost similar results when inserting data, processing the same number of requests per second. Redis showed an average insertion of 1040 records per second, Hazelcast – 10020 records.

Figure 3. Result of the Create operation

Read: Redis showed higher performance when reading data. This is due to the higher efficiency of the Redis library (Jedis), which made it possible to process requests faster. Redis read an average of 14 records per second, Hazelcast – 4. However, Redis provides high read speed due to full data storage in memory, which makes it more demanding on RAM. Thus, Redis needed almost twice as much memory as Hazelcast to store the same amount of data (Redis about 1300 bytes per write, Hazelcast about 650 bytes).

Figure 4. The result of the Read operation

Figure 5. Memory consumption

Update and Delete: Both solutions showed comparable performance when testing data update and deletion operations, ensuring minimal delays and high execution speed. Redis updated an average of 1203 records per second, Hazelcast – 1066 per second. When deleting, Redis showed an average of 1,380 records per second, while Hazelcast showed 1,275 records per second.

Figure 6. The result of the Update operation

Figure 7. The result of the Delete operation

Conclusions

In the course of the conducted research, a comparative analysis of two popular caching and distributed data storage systems, Redis and Hazelcast, was carried out. An assessment of their architecture, performance, fault tolerance and scalability mechanisms revealed key features that influence the choice of the optimal solution in the context of highly loaded distributed applications.

Characterized by a single-threaded architecture and high performance in data operations, Redis demonstrates low latency and fast data access, making it suitable for applications focused on fast data in memory. However, its limited flexibility in clustering and fault tolerance may require additional efforts when managing multiple nodes and ensuring data availability in more complex scenarios. Redis is optimal for horizontally scalable systems with simple data distribution requirements.

The Hazelcast system with multithreaded architecture and built-in clustering support provides more flexible and scalable capabilities for building distributed systems. Automatic data distribution and scaling mechanisms significantly simplify administration and increase fault tolerance. Hazelcast handles intensive node-to-node interactions and high fault tolerance requirements more efficiently, making it the preferred solution for large, high-load applications.

Performance testing has shown that Redis surpasses Hazelcast in terms of read speed. Both systems showed similar results when updating and deleting data, but Redis requires more RAM, which should be taken into account when choosing a solution for resource-intensive applications.

Thus, the choice between Redis and Hazelcast should be based on the specifics of the tasks. Redis is more suitable for applications where high performance is critical when working with data in memory, while Hazelcast is a better option for scalable distributed systems requiring high fault tolerance and flexibility.

References
1. Carlson, J. (2013). Redis in Action. New York: Manning.
2. Johns, M. (2015). Getting Started with Hazelcast. Birmingham: Packt Publishing.
3. Billig, V. A. (2016). Parallel computing and multithreaded programming. Moscow: INTUIT.
4. Kadomsky, A. A., & Zaharov, V. A. (2016). Efficiency of multithreaded applications. Scientific journal, 7. Retrieved from https://scientificmagazine.ru/images/PDF/2016/8/Nauchnyj-zhurnal-7-8.pdf
5. Boichenko, A. V., Rogozhin, D. K., & Korneev, D. G. (2014). Dynamic Scaling Algorithm for Relational Databases in Cloud Environments. Statistics and Economics, 6-2. Retrieved from https://statecon.rea.ru/jour/article/view/584/566
6. Filatov, A. U., & Miheev, V. V. (2022). Analysis of the efficiency of stream-local garbage collection in distributed data storage and processing systems. SibSUTI Bulletin, 1. Retrieved from https://vestnik.sibsutis.ru/jour/article/view/122/126
7. Goleva, A. I., Storozhenko, N. R., Potapov, V. I., & Shafeeva, O. P. (2019). Mathematical modeling of fault tolerance of information systems. NSU Bulletin, 4. Retrieved from https://intechngu.elpub.ru/jour/article/view/110/98
8. Borsuk, N. A., & Kozeeva, O. O. (2016). Analysis of Database Replication Methods in Online Service Development. Symbol of Science, 11-3. Retrieved from https://os-russia.com/SBORNIKI/SN-2016-11-3.pdf
9. Intro to Jedis – the Java Redis. Retrieved from https://www.baeldung.com/jedis-java-redis-client-library
10. PredicatesAPI. Retrieved from https://docs.hazelcast.com/hazelcast/5.5/query/predicate-overview

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The article considers a comparative analysis of two leading in-memory caching solutions in distributed systems — Redis and Hazelcast. Both tools are used to improve the performance of modern distributed systems by speeding up data access and reducing the load on the main databases. The main attention is paid to the comparative characteristics of architecture, fault tolerance, scalability and performance, which allows us to assess their applicability in the context of specific tasks. The research methodology is based on a comparative analysis of the architectural features of Redis and Hazelcast, as well as on testing the performance of these platforms when performing CRUD operations. Testing was conducted using Java Spring Boot and client libraries such as Jedis and Hazelcast Predicts API. The test results were analyzed for various operations: creating, reading, updating and deleting data, which allowed us to draw conclusions about performance and resource consumption. The relevance of the study is due to the growth of data volumes and the requirements for the performance of modern distributed systems. Given the need to scale applications and increase the speed of query processing, choosing the optimal caching solution becomes an important stage in system design. Therefore, the analysis of the advantages and limitations of Redis and Hazelcast is timely and in demand. The scientific novelty of the work lies in a comprehensive comparative analysis of Redis and Hazelcast, conducted in the context of their application in modern backend architectures. The article discusses in detail the key aspects affecting performance and fault tolerance, such as multithreading, support for various caching patterns, clustering features and working with memory. Also an important contribution is the provision of test results on real data, which confirms the conclusions about the practical applicability of the solutions under consideration. The article is characterized by a well-structured presentation of the material. The introduction provides the necessary context and explains the significance of the study. The main part is divided into thematic blocks, each of which considers a separate aspect of the comparison, which makes it easy to follow the course of the analysis. The test results are presented using visual illustrations and graphs, which contributes to a deeper understanding of the data obtained. The article draws important conclusions about the advantages and disadvantages of Redis and Hazelcast, depending on the tasks facing the developer. Redis demonstrates high performance in data reading operations, but requires more RAM and manual control during clustering. Hazelcast offers more flexible capabilities for distributed systems, including automatic scaling and fault tolerance management. Such results will be of interest to specialists in the field of high-load application development and distributed system architecture, as well as to anyone who is considering performance optimization using caching solutions. The article is a valuable contribution to the study of the effectiveness of distributed caching platforms. The analysis reveals the strengths and weaknesses of Redis and Hazelcast, which helps to make an informed choice when designing the backend architecture of modern applications. The work is written in an academic style, the structure of the presentation is logical and makes it easy to perceive the material. It is recommended for publication, as it contains valuable data and conclusions that can be useful to a wide audience of specialists and researchers in the field of software development.