Cloud-agnostic architectures for machine learning based on Apache Spark. (September 2021)
- Record Type:
- Journal Article
- Title:
- Cloud-agnostic architectures for machine learning based on Apache Spark. (September 2021)
- Main Title:
- Cloud-agnostic architectures for machine learning based on Apache Spark
- Authors:
- Nagy, Enikő
Lovas, Róbert
Pintye, István
Hajnal, Ákos
Kacsuk, Péter - Abstract:
- Highlights: Cloud provider-independent cluster deployment in cloud Scalable multi-VM virtual infrastructures Big Data, Machine Learning, Stream Processing reference architectures Automated deployment of clusters with Spark, HDFS, Kafka, Jupyter, RStudio software stacks Abstract: Reference architectures for Big Data, machine learning and stream processing include not only recommended practices and interconnected building blocks but considerations for scalability, availability, manageability, and security as well. However, the automated deployment of multi-VM platforms on various clouds leveraging on such reference architectures may raise several issues. The paper focuses particularly on the widespread Apache Spark Big Data platform as the baseline and the Occopus cloud-agnostic orchestrator tool. The set of new generation reference architectures are configurable by human-readable descriptors according to available resources and cloud-providers, and offers various components such as Jupyter Notebook, RStudio, HDFS, and Kafka. These pre-configured reference architectures can be automatically deployed even by the data scientist on-demand, using a multi-cloud approach for a wide range of cloud systems like Amazon AWS, Microsoft Azure, OpenStack, OpenNebula, CloudSigma, etc. Occopus enables the scaling of cluster-oriented components (such as Spark) of the instantiated reference architectures. The presented solution was successfully used in the Hungarian Comparative Agendas ProjectHighlights: Cloud provider-independent cluster deployment in cloud Scalable multi-VM virtual infrastructures Big Data, Machine Learning, Stream Processing reference architectures Automated deployment of clusters with Spark, HDFS, Kafka, Jupyter, RStudio software stacks Abstract: Reference architectures for Big Data, machine learning and stream processing include not only recommended practices and interconnected building blocks but considerations for scalability, availability, manageability, and security as well. However, the automated deployment of multi-VM platforms on various clouds leveraging on such reference architectures may raise several issues. The paper focuses particularly on the widespread Apache Spark Big Data platform as the baseline and the Occopus cloud-agnostic orchestrator tool. The set of new generation reference architectures are configurable by human-readable descriptors according to available resources and cloud-providers, and offers various components such as Jupyter Notebook, RStudio, HDFS, and Kafka. These pre-configured reference architectures can be automatically deployed even by the data scientist on-demand, using a multi-cloud approach for a wide range of cloud systems like Amazon AWS, Microsoft Azure, OpenStack, OpenNebula, CloudSigma, etc. Occopus enables the scaling of cluster-oriented components (such as Spark) of the instantiated reference architectures. The presented solution was successfully used in the Hungarian Comparative Agendas Project (CAP) by the Institute for Political Science to classify newspaper articles. … (more)
- Is Part Of:
- Advances in engineering software. Volume 159(2021)
- Journal:
- Advances in engineering software
- Issue:
- Volume 159(2021)
- Issue Display:
- Volume 159, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 159
- Issue:
- 2021
- Issue Sort Value:
- 2021-0159-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-09
- Subjects:
- Reference architectures -- Big data -- Artificial intelligence -- Machine learning -- Cloud computing -- Orchestration -- Distributed computing -- Stream processing -- Spark
Computer-aided engineering -- Periodicals
Engineering -- Computer programs -- Periodicals
Engineering -- Software -- Periodicals
Periodicals
620.0028553 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09659978 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.advengsoft.2021.103029 ↗
- Languages:
- English
- ISSNs:
- 0965-9978
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 0705.450000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18297.xml