Data Pre-Processing for Machine Learning Models using Python Libraries
Namrata Pandey1, Pawan Kumar Patnaik2, Sargam Gupta3

1Namrata Pandey*, Department of Computer Science and Engineering, Bhilai Institute of Technology Durg, Bhilai, Chhattisgarh, India.
2Dr. Pawan Kumar Patnaik, Department of Computer Science and Engineering, Bhilai Institute of Technology Durg, Bhilai, Chhattisgarh, India.
3Mr. Sargam Gupta, Department of Computer Science and Engineering, Bhilai Institute of Technology Durg, Bhilai, Chhattisgarh, India.

Manuscript received on March 28, 2020. | Revised Manuscript received on April 25, 2020. | Manuscript published on April 30, 2020. | PP: 1995-1999 | Volume-9 Issue-4, April 2020. | Retrieval Number: D9057049420/2020©BEIESP | DOI: 10.35940/ijeat.D9057.049420
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Data pre-processing is the process of transforming the raw data into useful dataset. Data pre-processing is one of the most important phase of any machine learning model because the quality and efficiency of any machine learning model directly depends upon the data-set, if we skip this step and design a model with data sets containing missing values then the model we have designed will not be that efficient and will be inconsistent model. This paper describes the methodology for pre-processing the data in seven sequence of steps using python powerful libraries which are open source machine learning libraries that support both supervised and unsupervised learning like pandas is a high level data manipulation tool, scikit learn which provides various tools for model fitting, data pre-processing, model selection and many other utilities. These steps include dealing with missing value, categorical values, importing data sets etc. This analysis helps in cleaning and transforming the datasets which future applied to any learning model and produce a efficient machine learning model.
Keywords: Pre-processing, python libraries, machine learning..