Advancements in Big Data Analytics: A Comprehensive Review of Tools and Technologies, from Hadoop to Spark

Advancements in Big Data Analytics: A Comprehensive Review of Tools and Technologies, from Hadoop to Spark

Authors

  • Prabu Ravichandran Sr. Data Architect, Amazon Web Services Inc., Raleigh, NC, USA

Downloads

Keywords:

Big Data Analytics, Hadoop, Spark, Distributed Computing, Tools, Technologies, Advancements, Comparative Analysis, Data Processing, Insights

Abstract

This research paper provides a comprehensive review of advancements in big data analytics, focusing on the evolution of tools and technologies from Hadoop to Spark. Big data analytics has revolutionized the way organizations process, analyze, and derive insights from massive volumes of data. The emergence of distributed computing frameworks such as Hadoop and Spark has played a pivotal role in enabling efficient processing of large-scale datasets. This paper examines the key features, functionalities, and comparative advantages of these frameworks, along with exploring other relevant tools and technologies in the realm of big data analytics. By synthesizing current research findings and industry practices, this paper aims to offer insights into the landscape of big data analytics tools and technologies, facilitating informed decision-making for organizations seeking to leverage the power of big data.

Downloads

Download data is not yet available.

References

White, Tom. Hadoop: The Definitive Guide. O'Reilly Media, 2015.

Zaharia, Matei, et al. "Apache Spark: A Unified Engine for Big Data Processing." Communications of the ACM, vol. 59, no. 11, 2016, pp. 56-65.

Marz, Nathan, and James Warren. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications, 2015.

Lakshmanan, Ganesh, et al. "Apache Flink: Stream and Batch Processing in a Single Engine." IEEE Data Eng. Bull., vol. 38, no. 4, 2015, pp. 28-38.

Vavilapalli, Vinod Kumar, et al. "Apache Hadoop YARN: Yet Another Resource Negotiator." Proceedings of the 4th Annual Symposium on Cloud Computing, ACM, 2013, pp. 5-5.

Ghazal, Ahmed, et al. "Big Data Benchmarks: Metrics, Requirements, and Evaluation Criteria." Proceedings of the VLDB Endowment, vol. 5, no. 12, 2012, pp. 1980-1991.

Chambers, Craig, et al. "FlumeJava: Easy, Efficient Data-Parallel Pipelines." Proceedings of the 7th ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM, 2012, pp. 363-375.

Zaharia, Matei, et al. "Discretized Streams: Fault-Tolerant Streaming Computation at Scale." Proceedings of the 24th ACM Symposium on Operating Systems Principles, ACM, 2013, pp. 423-438.

Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters." Communications of the ACM, vol. 51, no. 1, 2008, pp. 107-113.

Apache Software Foundation. "Apache Storm Documentation." 2012, storm.apache.org.

Zaharia, Matei, et al. "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing." Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, USENIX Association, 2012, pp. 2-2.

Zaharia, Matei, et al. "Spark: Cluster Computing with Working Sets." Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, 2010, pp. 10-10.

Li, Haoyuan, et al. "Scaling Spark in the Real World: Performance and Usability." Proceedings of the VLDB Endowment, vol. 8, no. 12, 2015, pp. 1840-1851.

Apache Software Foundation. "Apache Kafka Documentation." 2011, kafka.apache.org.

Taylor, Mike. Big Data and the Internet of Things: Enterprise Information Architecture for a New Age. Apress, 2015.

Freeman, Eric, and James Freeman. Machine Learning with TensorFlow. O'Reilly Media, 2017.

Grolinger, Katarina, et al. "Challenges for MapReduce in Big Data." Proceedings of the IEEE International Congress on Big Data, IEEE, 2014, pp. 182-189.

Marz, Nathan. "Big Data Analytics with Spark." Communications of the ACM, vol. 59, no. 4, 2016, pp. 56-65.

Apache Software Foundation. "Apache Hadoop Documentation." 2005, hadoop.apache.org.

Sparks, Evan, et al. "GraphX: A Resilient Distributed Graph System on Spark." First International Workshop on Graph Data Management Experiences and Systems, ACM, 2014, pp. 2-2.

Downloads

Published

18-11-2020

How to Cite

Ravichandran, P. “Advancements in Big Data Analytics: A Comprehensive Review of Tools and Technologies, from Hadoop to Spark”. Journal of Science & Technology, vol. 1, no. 1, Nov. 2020, pp. 91-107, https://thesciencebrigade.com/jst/article/view/198.
PlumX Metrics

Plaudit

License Terms

Ownership and Licensing:

Authors of this research paper submitted to the Journal of Science & Technology retain the copyright of their work while granting the journal certain rights. Authors maintain ownership of the copyright and have granted the journal a right of first publication. Simultaneously, authors agreed to license their research papers under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

License Permissions:

Under the CC BY-NC-SA 4.0 License, others are permitted to share and adapt the work, as long as proper attribution is given to the authors and acknowledgement is made of the initial publication in the Journal of Science & Technology. This license allows for the broad dissemination and utilization of research papers.

Additional Distribution Arrangements:

Authors are free to enter into separate contractual arrangements for the non-exclusive distribution of the journal's published version of the work. This may include posting the work to institutional repositories, publishing it in journals or books, or other forms of dissemination. In such cases, authors are requested to acknowledge the initial publication of the work in the Journal of Science & Technology.

Online Posting:

Authors are encouraged to share their work online, including in institutional repositories, disciplinary repositories, or on their personal websites. This permission applies both prior to and during the submission process to the Journal of Science & Technology. Online sharing enhances the visibility and accessibility of the research papers.

Responsibility and Liability:

Authors are responsible for ensuring that their research papers do not infringe upon the copyright, privacy, or other rights of any third party. The Journal of Science & Technology and The Science Brigade Publishers disclaim any liability or responsibility for any copyright infringement or violation of third-party rights in the research papers.

Loading...