Introduction To Machine Learning

Aditya Ranjan Behera
7 min readDec 28, 2020

--

What is Machine Learning?

Imagine you are learning cricket. In the first attempt, you miss the wicket with a wide-angle, but after several numbers of attempts, you are able to target the wicket. This happens because your brain learning from the previous activity and make changes. This is exactly happening with the machine then it is known as Machine Learning. In the machine, learning machine learns from its train dataset. This can now be achieved by data analytics and artificial intelligence.

In general, “machine learning is a branch of Artificial Intelligence that provides the computer system with the ability to progressively learn and improves its performance on handling various tasks without being explicitly programmed to perform all the tasks”.

Why we need Machine Learning?

“Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.” — — — Andrew Ng

Data is the lifeblood of all the organizations. Data-driven decisions increasingly make the difference between keeping up with the competition or falling further behind. Machine learning can be the key to unlocking the value of corporate and customer data and enacting decisions that keep a company ahead of the competition.

Some Applications of Machine Learning: -

1) Financial Service: — Risk analytics and regulation

2) Manufacturing: — Predictive maintenance and condition monitoring

3) Retail: — Upselling and cross-channel marketing

4) Healthcare and life science: — Disease identification and risk satisfaction

5) Travel and Hospitality: -Dynamic pricing

6) Logistics and Transportation: — Root optimization

7) Energy: — Energy demand and supply optimization

A Brief History of Machine Learning?

You might think that machine learning is a relatively new topic, but no, the concept of machine learning came into the picture in 1950, when Alan Turing (Yes, the one from Imitation Game) published a paper answering the question “Can machines think?”.

In 1957, Frank Rosenblatt designed the first neural network for computers, which is now commonly called the Perceptron Model.

In 1959 Bernard Widrow and Marcian Hoff created two neural network models called Adeline, that could detect binary patterns and Madeline, that could eliminate echo on phone lines.

In 1967, the Nearest Neighbor Algorithm was written that allowed computers to use very basic pattern recognition.

Gerald DeJonge, in 1981 introduced the concept of explanation-based learning, in which a computer analyses data and creates a general rule to discard unimportant information.

During the 1990s work on machine learning shifted from a knowledge-driven approach to a more data-driven approach. During this period, scientists began creating programs for computers to analyze large amounts of data and draw conclusions or “learn” from the results. Which finally overtime after several developments formulated into the modern age of machine learning.

Follow here for a detailed history of machine learning.

Classification of Machine Learning Models: -

Machine learning models can be classified into three types: -

1) Supervised Learning: — In supervised learning applications, the training data set (given data set) comprises of samples of input vectors {x} along with their corresponding target vectors. The task here would be to find a function f(x) that takes a data point x and produces the equivalent target value t such that f(x) = t
Supervised Learning problems can either be Regression or Classification.

a. Regression: Here, the training data set is in R-dimensional real space. Here the output would just be one or more continuous variables

b. Classification: In classification, the training data is from some finite set. Here the aim is to assign each of the input vectors to one of a definite number of discrete categories.

2) Unsupervised Learning: — Unsupervised Learning deals with finding patterns or trend within the input data. Unsupervised learning can be clustering, density estimation or dimensionality reduction.

In clustering, we find clusters or groups of similar examples within the data.
Dimensionality Reduction reduces data from a higher dimensional space to a lower dimension like 2 or 3 dimensions for the better analysis and representation of data.

3) Reinforcement Learning: — This algorithm is similar to unsupervised ML; it doesn’t contain any associated responses but instead has positive or negative feedback for the solution proposed by the algorithm. It can be compared to the “trial and error” method of learning in humans. The algorithm requires dynamic programming and can be taught to respond without human intervention by a system of reward and punishment.

The Life cycle of Machine Learning

1) Gathering Data: — Data Gathering is the first step of the machine learning life cycle. The goal of this step is to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected from various sources such as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle. The quantity and quality of the collected data will determine the efficiency of the output. The more will be the data, the more accurate will be the prediction.

This step includes the below tasks:

o Identify various data sources

o Collect data

o Integrate the data obtained from different sources

2) Data Preparation: — After collecting the data, we need to prepare it for further steps. Data preparation is a step where we put our data into a suitable place and prepare it to use in our machine learning training. In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

o Data exploration:
It is used to understand the nature of data that we have to work with. We need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find Correlations, general trends, and outliers.

o Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3) Data Wrangling: — Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make it more suitable for analysis in the next step. It is one of the most important steps of the complete process. Cleaning of data is required to address the quality issues.

It is not necessary that data we have collected is always of our use as some of the data may not be useful. In real-world applications, collected data may have various issues, including:

o Missing Values

o Duplicate data

o Invalid data

o Noise

So, we use various filtering techniques to clean the data. It is mandatory to detect and remove the above issues because it can negatively affect the quality of the outcome.

4) Data Analysis: — Now the cleaned and prepared data is passed on to the analysis step. This step involves

o Selection of analytical techniques

o Building models

o Review the result

The aim of this step is to build a machine learning model to analyze the data using various analytical techniques and review the outcome. It starts with the determination of the type of problems, where we select the machine-learning techniques such as Classification, Regression, Cluster analysis, Association, etc. then build the model using prepared data, and evaluate the model. Hence, in this step, we take the data and use machine-learning algorithms to build the model.

5) Train Model: — Now the next step is to train the model, in this step we train our model to improve its performance for better outcome of the problem. We use datasets to train the model using various machine learning algorithms. Training a model is required so that it can understand the various patterns, rules, and, features.

6) Test Model: — Once our machine learning model has been trained on a given dataset, then we test the model. In this step, we check for the accuracy of our model by providing a test dataset to it. Testing the model determines the percentage accuracy of the model as per the requirement of project or problem.

7) Deployment: — The last step of machine learning life cycle is deployment, where we deploy the model in the real-world system. If the above-prepared model is producing an accurate result as per our requirement with acceptable speed, then we deploy the model in the real system. But before deploying the project, we will check whether it is improving its performance using available data or not. The deployment phase is similar to making the final report for a project.

Pros and Cons.

Pros: -

  • Automation of everything
  • Wide range of application
  • Scope of improvements
  • Efficient handling of data
  • Best for education

Cons: -

  • Possibility of high error
  • Algorithm selection
  • Data acquisition
  • Time and space

Summary

In this blog, I have presented you with the basics concepts of machine learning. Hope that this blog will help and would have motivated you enough and to get interested in the topic.

--

--

Aditya Ranjan Behera

A student in PG Diploma in Data Science at IIIT, Bangalore with an interest in data analysis, ETL, Machine learning and business problem-solving.