Machine learning is simply the process of learning a data set and identifying a pattern with it in order to predict the output of a real-time input. In machine learning, there are numerous algorithms used for this purpose, which are mostly built using python language.
Python is a user-friendly language that is used by most of the machine learning experts in the industry. Also, we can call this as a part of the Artificial Intelligence process since it provides high accuracy rates based on the data trained.
Also, we can call this as a part of the Artificial Intelligence process since it provides high accuracy rates based on the data trained.
Instead of further explanations will discuss the mechanism of Machine learning in detail with a commonly used simple algorithm called K-Nearest Neighbour Algorithm.
This algorithm mainly consists of 3 items as follows:-
- Training data set
- Testing data set
- KNN value
Training data set
The training data set means the data which should be trained in order to get output using the algorithm. Higher the training data set higher accuracy. The data should be training input parameters & training output value.
Testing data set
The data which should be tested against the training data set are called as testing data set. Simply the real-time input through which we expect an output.
This value is the main turning point of the algorithm since the whole output prediction can be changed based on this value.
Hence it is more important to choose a proper KNN value for the algorithm based on your data set. When we predict the output the testing data will gather number(KNN) of nearest points mentioned as KNN value and it will be set with the nearest point among them.
Higher the KNN value higher the nearest points it will gather. Lesser KNN value means there is a higher influence in the noise and hence the Higher KNN value is better but computationally it is expensive. Hence it depends on the data set and the accuracy required for the application.
Application of KNN algorithm
KNN Algorithm could be applied to various scenarios once it is understood completely. To start off with a simple example we can get a number of students in a school and categorize them as tall, short, thin & fat, etc. Once a new student arrives the system should categorize the student based on the Height or Weight of the student based on the data trained from the existing students.
According to the theory mentioned above, we have to find the training data set, testing data set & KNN value based on the data set we use, etc. Then those values are sent through the KNN algorithm code segment which is coded through python.
* Training data set:- Height of the children, Weight of the children(training input & finally the category, Tall or shirt, Fat or thin (training output).
Let’s go through the following code segment and get the complete idea of it:
#BEGINNING OF CODING
#The following code segments are necessary libraries to import for the KNN classifier to function
from sklearn import datasets
from statistics import mode
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
#end of import
#setting training data
#Training data is a text file and it should read as P1,P2,P3,…..Pn,N
#P stands for input parameters (In above example only one value: Height or Weight)
#N stands for category – This should be only one value (In above example: Tall or
#Short, Fat or Thin)
#Example image of a training data set text file:
#As per the image categorize all training items and for training place it with a unique number and you can assign it to a variable using an IF function.
#Setting testing data
#This text file should contain only testing input data (In the above example the height of the new comer to the school will be test input)
#Example image of a testing data set text file:
#This contains only testing input parameters, real time input from the system.
#In above example only one parameter should be their. (Height or Weight)
#Then initializing the X, Y coordination which are required as per KNN classifier algorithm
X=np.array((datavalues.iloc[:, [0,1]]), dtype=float)
Y=np.array((datavalues.iloc[:, 2]), dtype=float)
#Setting the KNN value (In above example lesser KNN value like 1 or 2 would do the job because the height level which is set to categorize a student as Tall or Short cannot have any overlaps, searching for multiple nearest neighbor might give false results)
knn = KNeighborsClassifier(n_neighbors=2)
#Fitting the X, Y values into the KNN classifier
#Initializing test data into X_test coordination
X_test = np.array(test values.iloc[:, [0,1]])
#Predicting the final result based on the training data set
prediction = np.array(knn.predict(X_test), dtype=float)
#As per the above example X_test should contain one single test input and test output will be provided via the value of “prediction”
#END OF CODING
Using the above small piece of code we can automate so many data sets that we come across in our daily life. Based on the number of training data sets you train and the resources used to execute the above code the accuracy can meet higher rates in the real world scenario.
Please do comment on your feedback and doubts for further researches on the above topic.
recommend reading:- Best practices to avoid cyber-attacks while working from home