Abstract
With the development of technology that takes place more and more every day in our lives, it becomes almost impossible to manage and process the data produced and thus brought about the necessity of storage and analysis. Both the data size and the increase in the variety of data have necessitated the development of new methods in this context. In this study, distributed data management and analysis tools which are developed for data that cannot be processed in traditional regulations have been used. The machine learning application has been developed by using Logistic Regression classification algorithm. The application was implemented with the data set obtained from the sensors using pyspark libraries on the Spark cluster created using the Google Cloud service. And the working environment managed by YARN, has been observed during the implementation of the application.
Original language | English |
---|---|
Title of host publication | UBMK 2019 - Proceedings, 4th International Conference on Computer Science and Engineering |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 53-57 |
Number of pages | 5 |
ISBN (Electronic) | 9781728139647 |
DOIs | |
Publication status | Published - Sept 2019 |
Externally published | Yes |
Event | 4th International Conference on Computer Science and Engineering, UBMK 2019 - Samsun, Turkey Duration: 11 Sept 2019 → 15 Sept 2019 |
Publication series
Name | UBMK 2019 - Proceedings, 4th International Conference on Computer Science and Engineering |
---|
Conference
Conference | 4th International Conference on Computer Science and Engineering, UBMK 2019 |
---|---|
Country/Territory | Turkey |
City | Samsun |
Period | 11/09/19 → 15/09/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- apache spark
- big data
- big data analytics
- logistic regression
- machine learning
- pyspark
- yarn