Machine Learning with Distributed Data Management and Process Architecture

Engin Baysal, Cuneyt Bayilmis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the development of technology that takes place more and more every day in our lives, it becomes almost impossible to manage and process the data produced and thus brought about the necessity of storage and analysis. Both the data size and the increase in the variety of data have necessitated the development of new methods in this context. In this study, distributed data management and analysis tools which are developed for data that cannot be processed in traditional regulations have been used. The machine learning application has been developed by using Logistic Regression classification algorithm. The application was implemented with the data set obtained from the sensors using pyspark libraries on the Spark cluster created using the Google Cloud service. And the working environment managed by YARN, has been observed during the implementation of the application.

Original languageEnglish
Title of host publicationUBMK 2019 - Proceedings, 4th International Conference on Computer Science and Engineering
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages53-57
Number of pages5
ISBN (Electronic)9781728139647
DOIs
Publication statusPublished - Sept 2019
Externally publishedYes
Event4th International Conference on Computer Science and Engineering, UBMK 2019 - Samsun, Turkey
Duration: 11 Sept 201915 Sept 2019

Publication series

NameUBMK 2019 - Proceedings, 4th International Conference on Computer Science and Engineering

Conference

Conference4th International Conference on Computer Science and Engineering, UBMK 2019
Country/TerritoryTurkey
CitySamsun
Period11/09/1915/09/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • apache spark
  • big data
  • big data analytics
  • logistic regression
  • machine learning
  • pyspark
  • yarn

Fingerprint

Dive into the research topics of 'Machine Learning with Distributed Data Management and Process Architecture'. Together they form a unique fingerprint.

Cite this