Welcome back readers !! So far we are trying to understand the Machine learning in a simple way, keeping the beginners context in mind ! This is one more post, I thought I would write on explaining the various learning types in the Machine Learning.
Obviously the word says "Machine Learning" - means machine learn something. How do machine can learn something is what we understood in our various posts before ( here ). We will see about how many types of learning are possible by the machines. Here when I say types of learning, I say about the various classification of learning procedures (Algorithm - Step by step procedure to solve a problem).
Types of Learning:
Following types of learning procedures can be identified,
- Rule association
- Deep Learning
I can understand that this sounds Greek and Latin now, trust me I will make you understand all this in easy way. We can't cover all the types in the same post. So today I will be explaining the Regression and rest we will keep it as a surprise element for the next post.
The word Regression have several meaning in English like reducing the effect, taking back to the previous state, etc., But in the field of statistics, regression have a meaning different from all these. In statistics regression says that it is the relationship between the value that we are trying to find and to other values in the data. This clearly explains that the output in the given data (Means our interested column that we are trying to predict) is dependent on the other columns of data values.
|Just for fun :-P
Imagine you have a data where two columns are present age and salary and salary is our interest of prediction. Take the column of interest as Y and other as X, then regression says that
Y is always dependent on X means Y can be derived or modified by modifying the X that means we can mathematically say this as
Y is the function of X. => Y = f(X)
To understand this easily lets take an example, consider that for the sake of imagination salary is 10000 multiplied by the age. So from our basic mathematical understanding this can be written as
salary = 10000 * age
in our problem it is multiplication function of 10000 but in other problems it can be holding some other relation too. So in general we say this as function of x. f => (x).
So how does this relation helps me in making the machine to learn ? Visit this to know on this
Linear Regression :
So we know that regression is some relation in terms of f(x). Fine before we see what is Linear Regression we might have learnt few basic geometric equations in the elementary schools like
Line equation => y = mx + c ; m= slope (Inclination of line when visualizing in the graph sheet) c = intercept (point at the line cuts the y axis in graph sheet )
circle equation => (x-a)2+(y-b)2 = r2 where a, b are intercepts where the the circle cuts the x and y axis and r is the radius of the circle.
Like this try to recall as many equations as possible, like cosine curve, sin curve, sinusoidal equation, plane equation, polynomial equation etc.
|Just for fun :-P
Now visiting back to the linear regression, if you take the line equation (for Uni-variate data - data that have only only 1 input values) or a plane equation (for Multivariate mainly the bi-variate data - where the input columns are more than 2) we call it as a linear regression, the relation between the other data points and the points of interest is guided by the line equation.
means y = f(x) => y = mx + c or y = ax1+bx2+cx3
Imagine the above graph have the data points like (salary , age ) plotted in the purple color, linear regression says that we have the relation in terms of a line equation.
Now we being the human, we understand that the data may be regressed with the line equation so that we tried the linear regression method. Now in the linear regression we as a human can identify the line that is visually close to all the data points. But how to make the machine to find the line that is very close to all the data points (This line is called the best fit line). So by adjusting the various parameters of slope and intercept (m and c) will give me various possible lines. Fine now we told the machine to adjust the parameters and find all the possible lines but how the machine will identify that the line identified is the best fit line.
Finding the best fit line :
When we say best fit line then we say that the line should be close to almost all data points. so we speak about the distance between the line and the data point on the x axis means the input data columns.
consider the graph below ,
we projected the points to the line and this length of projection is the distance between the line. This projection length is called error in the machine learning and statistics world but why ? If we take the ideal scenario we should have the line that pass all the data points plotted on the graph but the data in general, in the physical real world will not be following this line exactly so we say the distance of variance of the point from the line as the error.
|Just for fun :-p
Now to calculate the total error that the line is facing we need the individual error and we add all the errors together.
each error = > distance between data point (x) and line point (this we call in machine learning world as predicted point y`) => individual error => (y` - x)
There are chances that there can be negative values too, so we square the individual error values
now I sum all the individual error (Sum of errors)=> (y1`-x1)2 + (y2`-x2)2..... + (yn`-xn)2
to get the quantifiable error quantity for that line
In out example above = > (Lh - H)2 + (Li- I)2 + (Lg - G)2 where Lh, Li,Lg are the points projected data points on line or we call it as the predicted points on line
Okay fine we understood what is called error, but what is it going to do for my machine learning ?
If you remember we stuck in the question of machine can now understand the method to draw various lines by changing the parameters but how will it identify the best fit line.
The best fit line is the line that has the minimum error. By continuously tuning the parameters we get a line and on calculating the error for the line and comparing it with the previous error we find the minima value and this minimal value is called the Global minima.
So this process shows that by tuning the parameters the error may decrease or increase that means there is a relation of error with respect to the parameters. If you take parameters as w and the Error as E then E is the function of w means e = E(w)
See the graph below ,
Just to visualize the error with respect to the parameters we consider this graph, in real the machine will not draw all these graphs. May be they don't have eyes to do so :-p
here E(w) is the sum of error that we calculated for each line
There are also chances that the machine will not go smooth in learning this global minima sometimes it may get stuck in the temporary minimum value but that might not be the global minima, such temporary minima are called the local minima. To overcome these local minima we have some methods and algorithms like Gradient Descent techniques. We will see those later in our blog.
Does Error is always the sum of the distance between data point and prediction point ?
Yes the distance between the predicted point by the line or any function and the actual data point is called the individual error. But its not necessary that we always need to do the summation to find the quantity of error that the line is producing.
You can also take the mod(individual error) abs(individual error) and sum or can do the average of all the errors, it all depends on the requirement and the data that you have for solving a problem.
I hope you might have understood the linear regression, If you take the polynomial equation in place of the linear equation then it becomes the polynomial regression but the concept remains the same.
Before we complete the topic just keep hard in mind that regression techniques will predict the numerical or integer value as output. In our example of age and salary we don't have the data of a person of age 72(unseen data) then as per our equation above then his salary as per the linear regression should be 72 * 10000 = 720000 hope it is the dream for all :-P
Also any machine learning model should have these components - a learning equation, error function and tuning parameters.
Well we will see the other types of learning in the subsequent posts until then read and enjoy Machine Learning !!