Library
|
Your profile |
Software systems and computational methods
Reference:
Val'kov V.A., Stolyarov E.P., Korchagin A.A., Ermishin M.V., Yakupov D.O.
Comparison of methods for optimizing the speed of reading/writing drives
// Software systems and computational methods.
2024. ¹ 2.
P. 73-85.
DOI: 10.7256/2454-0714.2024.2.70900 EDN: DXCLJH URL: https://en.nbpublish.com/library_read_article.php?id=70900
Comparison of methods for optimizing the speed of reading/writing drives
DOI: 10.7256/2454-0714.2024.2.70900EDN: DXCLJHReceived: 26-05-2024Published: 02-06-2024Abstract: The objects of this study are data storage devices of various types and levels of complexity, as well as the principles of their operation. They are complex technical systems that include many components and are characterized by a high degree of integration. The subject of the research is to study the main characteristics of hard drives and solid-state drives. Their structure, functional features, principles of operation and ways of optimization are important. The purpose of the study is to determine the most effective methods for optimizing the operation of these devices. This includes aspects such as memory management, load balancing, power management, and others. The results of this research can be used to improve data efficiency, improve the performance of data storage systems and create new technologies in this area. This study examines the performance of various disk storage solutions through a series of tests aimed at understanding speed and dependence on external factors. The main conclusions of the study reflect the importance of the integrated use of optimization approaches to improve the speed of reading and writing data. Optimizing the processes of reading and writing data is critically important for modern high-performance computing systems, as well as for applications that require quick access to large amounts of information. The improved techniques used in the course of the study contribute to a significant increase in the performance of data storage devices. They take into account the specifics of various types of storage devices, including hard drives and solid-state drives, and offer optimization approaches that take into account their unique characteristics. Overall, the results of this study provide valuable insights into the principles of optimizing data storage, and they can serve as a basis for developing new strategies and solutions in this important area of information technology. This study represents a significant contribution to the scientific understanding of optimizing data reading and writing processes, and its findings may have long-term implications for the development of data storage technologies. Keywords: Hard drives, Solid state drives, Optimization, Efficiency, Fragmentation, Reading data, Defragmentation, Interface, Cache Buffer, The file systemThis article is automatically translated. Introduction: For a user of drives such as HHD or SSD, it is certainly important to know the basic characteristics of such storage devices in order to purchase the right product and increase the performance of his system. To do this, you need to understand what characteristics affect performance and how they can be optimized. The main operations for drives are write and read, they are divided into sequential and random access. Sequential access means accessing data sequentially, starting from the beginning, while random access allows you to directly access any part of the drive without having to read the previous data. Random file access is one of the key aspects of storage performance, especially for disks, and is one of the most time–consuming operations. The speed of random file access is crucial for efficient data management. Random reads, especially small ones, are not easy to buffer or preload in any way. Sequential operations can be performed in a chain and reach maximum speed, records can be cached in memory for subsequent sorting. Real casual readings don't have any of these advantages. In this article, we will look at some of the main characteristics that affect the speed of storage devices, as well as ways to optimize them. Factors affecting the speed of reading and writing data: There are several key factors that affect the speed of reading and writing data from disk. Let's look at some of them: 1. Drive Type: There are different types of drives, such as HDD (hard drives) and SSD (solid state drives). SSDs provide faster data read and write speeds compared to HDDs due to the lack of moving parts. 2. Connection Interface: There are various disk connection interfaces such as SATA, NVMe (for SSD) and others. More modern interfaces, such as NVMe, usually provide higher data transfer rates. 3. Disk rotation speed (for HDD): In the case of a hard disk, the speed of rotation of the disk affects the speed of access to data on the disk. A higher rotation speed usually means faster data access. 4. Cache Buffer size: The cache buffer on disk is used to temporarily store data before it is written to or read from disk. The large size of the cache buffer can speed up the processes of reading and writing data. 5. File system: The selected file system can also affect the speed of working with data on the disk. Some file systems are more efficient at managing files and providing quick access to them. 6. Fragmentation: Disk fragmentation can slow down the speed of reading and writing data, since files are stored in different fragments on the disk, which requires additional time to combine them when reading. In addition, physical disks can use the built-in command queue (NCQ), which allows hard disks to internally optimize the execution order of received read and write commands. This reduces the number of unnecessary movements of the drive head, which leads to increased performance (and minor wear on the drive) when performing workloads that require multiple simultaneous read/write requests, which most often occur in server-type applications. NCQ can change performance, but in some cases a poor implementation can make things worse, especially under heavy load. Optimization of the read write speed: There are many ways to optimize the speed of reading and writing data, and you can use an integrated approach that includes several of them at once. Based on the factors affecting the performance of the media, some optimization methods can be identified such as: Cache usage: Caching data can significantly speed up access to frequently used data. This is especially useful in case of frequent access to the same data, which avoids repeated reading from slow media. Parallel execution of operations: If there are several independent read/write operations, parallel execution can be used to reduce the total execution time. This can be achieved using both multithreading and asynchronous I/O operations. This can include both the use of command-line utilities and the organization of RAID arrays. Data fragmentation: If possible, data can be fragmented and distributed across multiple media, which will allow parallel read/write operations from different devices, increasing the overall access speed. Using faster media: Transferring data to faster media such as solid-state drives (SSDs) can significantly speed up data access times.
Materials and tasks Before applying optimization methods, programs were installed: Defraggler – a program for tracking data fragmentation and performing disk defragmentation, PrimoCache – a software solution for caching data to speed up storage. The tests were carried out using the following drives: · HDD WD Blue WD10EZEX · HDD WD Blue WD10EALX · SSD WD Blue WDS250G2B0A To compare optimization methods, performance tests were performed before and after the methods were applied, followed by a comparative analysis. We have set the following tasks for ourselves: 1)Perform a disk performance analysis. First you need to see how well the drives perform their functions. 2) Identify the problem area. We need to determine what problems we are dealing with and analyze an approximate solution plan. 3) Develop a plan to optimize the operation of the drives. There are many ways to improve your work. We need to identify the most effective ones. 4) Check the effectiveness. Re-conduct a performance analysis and organize the experience gained into results. 5) Draw conclusions on the work carried out.
Results The program was used to evaluate the performance: CrystalDiskCrystal (version 9.0.3) is a program for testing disk performance.
1. Disk defragmentation
Disk defragmentation is the process of rearranging files on a hard drive in order to improve performance. When a file is saved to disk, it is split into several fragments and placed on different parts of the disk. This is due to the constant reading and writing of files, in which places for new information may not be located in a row. The consequences of fragmentation are increased file access times due to the need to move around the disk to read all fragments, which in turn slows down the speed of reading and writing. Defragmentation allows you to combine fragmented files by moving them to disk so that they are located closer to each other, which reduces data access time and improves performance.
Fig.1 WD10EZEX Reading Test
Fig.2 WD10EZEX Writing Test
Fig.3 WD10EALX Reading Test Fig.4 WD10EALX Writing Test Thus, disk defragmentation helps to improve performance and increase the speed of reading and writing the disk by optimizing the location of files on it. 2.Using solid-state drives (SSDs). Solid State drives (SSDs) have a number of advantages over conventional hard drives, which makes them much faster in terms of data read and write speed. -No moving parts: Solid-state drives do not have moving mechanical elements, unlike hard drives, where data is recorded on rotating magnetic plates. This allows the SSD to provide faster data access without delays due to mechanical movement. -Fast Data access: SSDs use NAND flash memory to store information, which provides very fast data access. The time to read and write data on an SSD is significantly less than on hard drives. -No data fragmentation: on SSD, data is stored as blocks in memory, and the blocks themselves are divided into pages. The data is written to separate pages of blocks, and it is impossible to update the data by simply overwriting the old ones. Moreover, you can only erase the entire block. In this regard, SSD has garbage collection, an optimization process in which the SSD controller reorganizes data to simplify and speed up the process of writing data in the future. The TRIM command also helps optimize memory allocation and usage, allowing the operating system to notify the SSD about which data blocks (pages) do not carry a payload and may not be physically stored. It is possible to check whether TRIM is enabled in Windows using the command (0 means that TRIM is enabled, 1 means disabled) Rice.5 TRIM -Support for cell read and write technologies: Standard types of NAND flash memory are SLC, MLC, TLC and QLC. The key differences between NAND memory types are cost, capacity, and service life. The resource is determined by the number of programming-erase cycles (P/E) that a flash memory cell can withstand before wear. The P/E cycle is the process of erasing and writing a cell, and the more P/E cycles NAND technology can withstand, the higher the resource of the device. These factors make SSD solid-state drives a popular choice for improving the performance and speed of computers. SSD is often paired with HDD, instead of one large SSD, you can buy a small NVMe, only for the operating system, working applications, and store all other files, distributions and backups on a cheap slow SATA HDD. Although this is done more to save and increase storage capacity, a big plus in this approach is a slight reduction in the load on the SSD, an increase in its lifetime. In addition, more free SSDs have a larger cache size and higher performance than full ones.
Fig.6 Changing the size of the SLC cache depending on the amount of free space in the Intel SSD 665p
3. Configuring the cache The appearance of the cache in hard drives (HDD) is usually a small amount of fast memory, most often SRAM or DRAM. The cache is used to temporarily store data that is most frequently requested by the processor in order to speed up access to it. In solid-state drives (SSDs), the cache is usually represented as a small amount of NAND Flash memory. SSD cache is used to cache data in order to improve performance and reduce wear on NAND cells. The cache allows you to increase the speed of access. When data is frequently requested, the cache allows you to access it faster without resorting to slow reading from the main drive. Since the cache stores the data most frequently requested by the processor, this contributes to a faster response of the system as a whole. Also in the case of SSD, the cache helps to reduce the number of write operations to the main storage, which in turn can extend the service life of the drive. Thus, the cache directly affects the performance of the drive In order to improve performance, it makes sense to customize the drive cache to the user's needs using various utilities, such as PrimoCash. As a test, we will set the previously used WD Blue 10EZEX "Cache task", allocating 6144 MB of RAM as the L1 cache, and set the block size to 512 KB.
Fig.7 WD10EZEX Reading test PrimoCache
Figure 8 WD10EZEX Writing test PrimoCache
Fig.9 The result of CrystalDiskMark before and after cache configuration It can be seen from the test that the sequential read and write speeds have increased by about 40 times, but it should be understood that this speed is achieved simply because data is first written to the RAM allocated as a cache, and then written from RAM to the disk itself at a speed limited by the physical characteristics of the disk. This approach has its advantages, such as the ability to access data immediately after recording, copying, because this data is recorded in RAM, which is many times faster. However, since RAM is volatile memory, if the data has not yet been written to disk and the power has been turned off, then the data will be lost.
4. Using command line utilities. Many users feel the need to copy and transfer a large amount of data, one of the ways to optimize the operation of the drive in this situation may be to use command-line utilities such as robocopy in windows or rsync in Linux. Command-line utilities provide more flexible and powerful tools for managing and moving large amounts of data. They provide more precise control over the process of copying and transferring files, which allows you to optimize resource usage and improve performance. Moreover, robocopy allows you to copy and transfer files faster than standard Windows tools, thanks to more efficient use of multithreading. The number of threads involved can be changed using the "/mt:" option. Below is the log of copying a 9.19 GB file using robocopy, which took 1 minute 57 seconds. Figure 10 Robocopy At the same time, copying the same file using standard Windows tools took about 2 minutes and 26 seconds. Figure 11 Copying using standard Windows tools
5.Using RAID RAID (Redundant Array of Independent Disks) is a technology that combines multiple physical disks into a single logical device to improve performance, reliability, or both. RAID is used both on servers and on regular computers to protect data and/or increase the speed of access to them. There are many RAID levels, each of which has its own characteristics, some of the most popular are listed below: RAID 0 (Striping): The data is divided into blocks and written to two or more disks in parallel (streams). This increases the speed of data access, since read/write operations can be performed in parallel on all disks. However, in case of failure of one of the disks, the data on all other disks is lost, since the information is divided into parts without any protection. RAID 1 (Mirroring): Data is written to two or more disks simultaneously. This ensures data backup (mirroring), which increases the reliability of storage. However, the amount of available disk space will be equal to the volume of one of them, since all data is duplicated. RAID 5 (Parity): Data is written to multiple disks with the addition of control information (parity information), which allows data recovery in case of failure of one of the disks. RAID 5 provides a balance between performance and reliability, while using fewer disks compared to mirroring. RAID 6: Similar to RAID 5, but provides additional protection against disk failure by storing two parity blocks. This allows the system to recover data even if two disks fail at the same time. RAID 10 (RAID 1+0): This is a combination of RAID 1 and RAID 0. The data is split into blocks and mirrored to other disks. Thus, the advantages of mirroring (high reliability) and striping (high performance) are combined. The choice of a specific RAID level depends on performance, reliability, and budget requirements. Among the RAID applications are: system administration; data backup; working with BigData; client-server models. However, performance improvements can be achieved most effectively using RAID 10 or RAID 0 rather than RAID 5 or RAID 6. This is due to the fact that when RAID 6 or 5 writes to disk, the workload itself, although it will be a clean write, but individual disks spend as many operations on reading as on writing, which leads to poor performance. In general, RAID 0 is preferable to use in order to improve performance, since when using it, the speeds of the disks used will essentially be summed, but the speed will be calculated based on the slowest of them.
Conclusion : Thus, although not all existing methods of optimizing the operation of drives have been considered, thanks to the optimization approaches and characteristics that affect performance, it is possible to improve the performance of drives and the system as a whole. However, it should be noted that the choice of specific optimization methods will depend on the specifics of the tasks being solved, the characteristics of the hardware and software used. The best results are shown by using RAM as an L1 cache and using RAID 0 arrays. As already described above, these methods have their disadvantages, probably the main one is the possible loss of data. Disk defragmentation, although it does not give the same increase, also affects disk and system performance. Automatic disk defragmentation has long been an integral part of operating systems, ensuring optimal storage performance without the need for manual user intervention. An integrated approach to optimizing the operation of storage devices, taking into account both hardware and software solutions, will significantly improve the performance of modern computing systems and meet the increasing demands of users for data processing and storage speed. References
1. Hard drives with perpendicular magnetic recording (PMR) and SMR (Shingled Magnetic Recording) [Electronic resource]. Accessed: May 22, 2024. Retrieved from https://kb.synology.com/en-id/DSM/tutorial/PMR_SMR_hard_disk_drives
2. Tenacious Magnetic Coupling (SMR): What tacos are you escaping? [electronic resource]. Accessed: May 22, 2024. Retrieved from https://interface31.ru/tech_it/2022/12/chto-takoe-cherepichnaya-magnitnaya-zapis-smr-i-stoit-li-ee-izbegat.html 3. Lipkin G. Approximate magnetic recording of SMR: examples of work, solutions and shortages [Electronic resource]. Release date: May 22, 2024 Retrieved from https://habr.com/ru/articles/494614/ 4. 2021 SSD Benchmark Suite [Electronic resource]. Date of publication: May 22, 2024 Retrieved from https://www.anandtech.com/print/16458/2021-ssd-benchmark-suite 5. CrystalMark [Electronic resource]. Accessed: May 22, 2024. Retrieved from https://crystalmark.info/en/ 6. Defragmenter [Electronic resource]. Date of birth: May 22, 2024 Retrieved from https://www.ccleaner.com/defraggler?cc-noredirect = 7. PrimoCache [Electronic resource]. Date of birth: May 22, 2024 Retrieved from https://www.romexsoftware.com/en-us/primo-cache/
Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|