A Systematic Review on Workload Change Detection in Distributed Databases
Abstract
Distributed Databases became essential part of a large part of nowadays software. It has numerous of advantages including scalability, fault tolerance, high availability, and improved performance. It solves a lot of problems of centralized databases but can also suffer with challenges. One of them is skewed access. Workload in distributed DBMS often changes, such fluctuations can cause ineffective operation of the system. Imagine access to one row of database became 10 times more frequent, or complex requests start operating with the data highly distributed geographically. Such behavior shows that initial data distribution cannot be always efficient enough. And to address this problem adoptive design technics were invented. In this article we review the common steps of adoptive technics and concentrate attention at workload detection and hot data identification.
The purpose of the article is to introduce adoptive design approach of distributed database management systems, review and analyze existing technics and theirs steps, especially workload change detection and hot data identification. The final goal is to compare theses technics and lead out their main concerns.
As a result of this work some existing approaches were analyzed and highlighted their common parts alongside with differences, presented their main issues.
After reviewing all technics, we can see that current solutions cannot give precise results without creating much overhead to the system. Also, there is no approach to giving up-to-date information about hot data without creating overhead. Overhead in such situations is a major issue. In skewed access patterns distributed nodes can become very busy with processing queries and additional computations can lead to worse overall system performance then without adoptive design or even to node outage. So, search for solutions, that give precise and up-to-date results without significant overhead is a big field of future researches.
Downloads
References
/References
Luminate Data, LLC, “Year-End Music Industry Report 2023,” Luminate Data, LLC, 2023. [Online]. Available: https://luminatedata.com/reports/yearend-music-industry-report-2023/. [Accessed: Nov. 27, 2024]
M. T. Özsu and P. Valduriez, Principles of Distributed Database Systems. 4th edition. Cham, Switzerland: Springer Nature, 2020.
R. Taft et al., “E-Store: Fine-grained elastic partitioning for distributed transaction processing systems”, Proceedings of the VLDB Endowment, vol. 8, no. 3, pp. 245 – 256, 2014. https://doi.org/10.14778/2735508.2735514.
M. Serafini, R. Taft, A. J. Elmore, A. Pavlo, A. Aboulnaga and M. Stonebraker, “Clay: Fine-grained adaptive partitioning for general database schemas”, Proceedings of the VLDB Endowment, vol. 10, no. 4, pp. 445 – 456, 2016. https://doi.org/10.14778/3025111.3025125.
C. Curino, E. P. C. Jones, S. Madden and H. Balakrishnan, “Workload-aware database monitoring and consolidation”, in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. Athens, 2011, pp. 313 – 324. https://doi.org/10.1145/1989323.1989357.
A. Quamar, K. A. Kumar and A. Deshpande, “SWORD: Scalable workload-aware data placement for transactional workloads”, in Proceedings of the 16th International Conference on Extending Database Technology. Genoa, 2013, pp. 430 – 441. https://doi.org/10.1145/2452376.2452427.
C. Curino, E. Jones, Y. Zhang and S. Madden, “Schism: A workload-driven approach to database replication and partitioning”, Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 48 – 57, 2010. https://doi.org/10.14778/1920841.1920853.
S. Navathe, S. Ceri, G. Wiederhold and J. Dou, “Vertical partitioning algorithms for database design”, ACM Transactions on Database Systems, vol. 9, no. 4, pp. 680 – 710, 1984. https://doi.org/10.1145/1994.2209.
J. J. Levandoski, P.- Å. Larson and R. Stoica, “Identifying hot and cold data in main-memory databases” in Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE). Brisbane, 2013, pp. 26 – 37. https://doi.org/10.1109/ICDE.2013.6544811.
B. Glasbergen, M. Abebe, K. Daudjee, S. Foggo and A. Pacaci, “Apollo: Learning query correlations for predictive caching in geo-distributed systems” in Proceedings of the 21st International Conference on Extending Database Technology (EDBT). Vienna, 2018, pp. 253 – 264. https://doi.org/10.5441/002/edbt.2018.23.
M. Brendle, N. Weber, M. Valiyev, N. May, R. Schulze, A. Böhm and G. Moerkotte, “SAHARA: Memory footprint reduction of cloud databases with automated table partitioning” in Proceedings of the 25th International Conference on Extending Database Technology (EDBT). Edinburgh, 2022, pp. 13 – 26. https://doi.org/10.5441/002/edbt.2022.02.
Luminate Data, LLC, “Year-End Music Industry Report 2023,” Luminate Data, LLC, 2023. [Online]. Available: https://luminatedata.com/reports/yearend-music-industry-report-2023/. [Accessed: Nov. 27, 2024]
M. T. Özsu and P. Valduriez, Principles of Distributed Database Systems. 4th edition. Cham, Switzerland: Springer Nature, 2020.
R. Taft et al., “E-Store: Fine-grained elastic partitioning for distributed transaction processing systems”, Proceedings of the VLDB Endowment, vol. 8, no. 3, pp. 245 – 256, 2014. https://doi.org/10.14778/2735508.2735514.
M. Serafini, R. Taft, A. J. Elmore, A. Pavlo, A. Aboulnaga and M. Stonebraker, “Clay: Fine-grained adaptive partitioning for general database schemas”, Proceedings of the VLDB Endowment, vol. 10, no. 4, pp. 445 – 456, 2016. https://doi.org/10.14778/3025111.3025125.
C. Curino, E. P. C. Jones, S. Madden and H. Balakrishnan, “Workload-aware database monitoring and consolidation”, in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. Athens, 2011, pp. 313 – 324. https://doi.org/10.1145/1989323.1989357.
A. Quamar, K. A. Kumar and A. Deshpande, “SWORD: Scalable workload-aware data placement for transactional workloads”, in Proceedings of the 16th International Conference on Extending Database Technology. Genoa, 2013, pp. 430 – 441. https://doi.org/10.1145/2452376.2452427.
C. Curino, E. Jones, Y. Zhang and S. Madden, “Schism: A workload-driven approach to database replication and partitioning”, Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 48 – 57, 2010. https://doi.org/10.14778/1920841.1920853.
S. Navathe, S. Ceri, G. Wiederhold and J. Dou, “Vertical partitioning algorithms for database design”, ACM Transactions on Database Systems, vol. 9, no. 4, pp. 680 – 710, 1984. https://doi.org/10.1145/1994.2209.
J. J. Levandoski, P.- Å. Larson and R. Stoica, “Identifying hot and cold data in main-memory databases” in Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE). Brisbane, 2013, pp. 26 – 37. https://doi.org/10.1109/ICDE.2013.6544811.
B. Glasbergen, M. Abebe, K. Daudjee, S. Foggo and A. Pacaci, “Apollo: Learning query correlations for predictive caching in geo-distributed systems” in Proceedings of the 21st International Conference on Extending Database Technology (EDBT). Vienna, 2018, pp. 253 – 264. https://doi.org/10.5441/002/edbt.2018.23.
M. Brendle, N. Weber, M. Valiyev, N. May, R. Schulze, A. Böhm and G. Moerkotte, “SAHARA: Memory footprint reduction of cloud databases with automated table partitioning” in Proceedings of the 25th International Conference on Extending Database Technology (EDBT). Edinburgh, 2022, pp. 13 – 26. https://doi.org/10.5441/002/edbt.2022.02.