Feature Scaling

3 min readMar 1, 2021

Feature Scaling is a data preprocessing technique. By preprocessing, we mean the transformations that are applied to the data before it is fed into some algorithm for some processing. What is Feature Scaling? Feature Scaling is a technique where we standardize the range of all independent features of a data-set. It is also called Normalization. Generally, when we get raw data, all the features values varies on different scales. It is important to bring all the feature values on the scale so that value of one feature should not dominate over the others and hinder the performance of the learning algorithm. This process of bringing all the features values on the same scale is called feature scaling. Ensuring standardised feature values implicitly weights all features equally in their representation. Feature scaling or re-scaling of the features is performed such that they have the properties of normal distribution (most of the time) where values have standard deviation = 1, and mean = 0. Why Feature Scaling is required? Most of the real world applied machine learning algorithms are classification algorithms. Many of the classification algorithms works by calculating the distance between data points in space. If one feature has a wide range of values, then this feature is likely to dominate the distance measure between the data points over other features. Above this, if this feature proves to be insignificant in the end then, it will be hindering the algorithm results to a large extent. This will result in decrease in accuracy of the algorithm. For example : Let us look at the subset of wine dataset:

The range of value for “Magnesium” feature is 80–100 or above, while range of values of “Malic_Acid” feature is ranging 1.something. Now if we apply distance formula(as we do in case of KNN) on two data points here, Magnesium will dominate the distance value to a greater extent than any of the other feature, thus resulting in wrong predictions later. Note — Distance Formula is given as d = (( x1 — x2 )² + (y1-y2)²….)undefined Let’s understand this better with the help of another example — Suppose you have a company employees’ data. Now the age of employees in a company may be between 21–70 years, the size of the house they live is 500–5000 Sq feet and their salaries may range from 30000−80000. In this situation if you use a simple Euclidean metric, the age feature will not play any role because it is several order smaller than other features. However, it may contain some important information that may be useful for the task. Here, you may want to normalize the features independently to the same scale, say [0,1], so they contribute equally while computing the distance.” How Feature Scaling is applied in sklearn? There are many ways by which we can apply feature scaling on the dataset: 1. The easiest way of scaling is to use — preprocessing.scale() function A numpy array of values is given as input and output is numpy array with scaled values. This will scale the values in such a way that mean of the values will be 0 and standard deviation will be 1.

3. Another method is to scale the features between given minimum and maximum values, generally between 0 and 1. Function -preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True) Feature_range is given in the form of tuple — (min , max) Copy — True (default), set it to False if you want inplace transformation Formula used -

thanks…

Feature Scaling

Written by Pawan Seth

No responses yet