Introduction To Data Science

Aditya Ranjan Behera
5 min readDec 28, 2020

--

“Data science” is becoming a buzzword that almost everybody talks about these days. Jobs in data science is becoming a trend in the 21st century. But what is this Data science?… Is it discover nowadays or it exists from an ancient time?

So let’s dive into the world of data science.

What is Data Science?

Data science is a blend of statistics, mathematics, analysis, domain knowledge and information science. It is an inter-disciplinary filed which uses scientific methods, algorithms and systems to derive some knowledge from structured and unstructured data.

So, Data Science is primarily used to make decisions and predictions making use of predictive causal analytics, prescriptive analytics (predictive plus decision science) and machine learning.

Evolution Of Data Science

Data science is not a new discipline in business. In the ancient Egyptians used census data to increase efficiency in tax collection and they accurately predicted the flooding of the Nile River every year. Since then, people working in data science have carved out a unique and distinct field for the work they do.

By emerging of new technologies there is exponential of growth of data. This has created an opportunity for data analysis and deriving new insights from data. The data scientist is expertise to perform data science technologies. Data scientist analyzes data, uses machine learning algorithms to predict the future occurrence of an event.

Importance of Data Science

Data science is a significant part of the global economy.

Now let us see some importance of data science in today’s world. Below are some reasons which are shows importance of data science.

1. Most an important feature of data science is it can be widely used in every industry like finance, e-commerce, healthcare, travel, education. By the help of data science, the industries can tackle their challenges also address them effectively.

2. As data science helps in identifying opportunities, governments, do-gooders and companies know where exactly the action needs to be taken to drive a bigger impact. Data science is a field that uses an interdisciplinary approach to make sense of data and extract actionable insights.

3. The data gathered over time can be used for predictions that will help in decisions making and will help provide better results. Now let’s take business as an example, imagine you can train models more effectively and recommend products to your customers with more precision. How direct and easy it has made business because you will be able to understand the precise requirement of your customers.

4. When you have a large variety of data, it can be used for decision making. Let us take a look at a basic scenario, imagine your car has the intelligence to drive you to work, won’t it be amazing? This is how it will work; the self-driving car collects data from sensors, radars, camera, and lasers to create a map of its surroundings. Based on this data, it makes decisions like when to slow down, when to speed up, when to overtake, where to turn to etc. making use of advanced machine learning algorithms.

5. By introducing new technologies and smart devices data uses are increases exponentially. To handle these big data sets “Big Data” is introduced. Big data is continuously emerging and growing. Using various tools which are developed regularly, big data helps the organization to resolve complex issues related to IT, human resource and resource management efficiently and successfully.

6. Data science can be used in the predictive analysis that will help predict events or occurrences to come. Looking at the importance of data science in a predictive analysis like a weather forecast, it is easy for us to appreciate the efforts of a data scientist. Data from ships, aircraft, radars, satellites can be collected and analyzed to build models to forecast the weather and predict the occurrence of any natural disaster.

7. Data science also influenced retail industries. Let’s take an example to understand this, the older people were having a fantastic interaction with the local seller. The seller was also capable of fulfilling the requirements of the clients in a personalized way. But now due to the emergence and increase of supermarkets, this attention got lost. But with the help of data analytics, it is possible for the sellers to connect with their clients.

8. Data Science helps organizations to build this connection with the clients. With the help of data science, organizations and their products will be able to create a better and deep understanding of how customers can utilize their products.

Life Cycle of Data Science

Discovery:

Before you begin the project, it is important to understand the various specifications, requirements, priorities and required budget. You must possess the ability to ask the right questions. Here, you assess if you have the required resources present in terms of people, technology, time and data to support the project. In this phase, you also need to frame the business problem and formulate initial hypotheses (IH) to test.

Data preparation:

In this phase, you require analytical sandbox in which you can perform analytics for the entire duration of the project. You need to explore, preprocess and condition data prior to modelling. Further, you will perform ETLT (extract, transform, load and transform) to get data into the sandbox.

Model planning:

Here, you will determine the methods and techniques to draw the relationships between variables. These relationships will set the base for the algorithms which you will implement in the next phase. You will apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools. Common tools for model planning are “R”, “SAS”, “SQL analysis”.

Model building:

In this phase, you will develop datasets for training and testing purposes. Here you need to consider whether your existing tools will suffice for running the models or it will need a more robust environment (like fast and parallel processing). You will analyze various learning techniques like classification, association and clustering to build the model.

Operationalize:

In this phase, you deliver final reports, briefings, code and technical documents. In addition, sometimes a pilot project is also implemented in a real-time production environment. This will provide you with a clear picture of the performance and other related constraints on a small scale before full deployment.

Communicate results:

Now it is important to evaluate if you have been able to achieve the goal that you had planned in the first phase. So, in the last phase, you identify all the key findings, communicate to the stakeholders and determine if the results of the project are a success or a failure based on the criteria developed in Phase 1.

Summary

While Data Science is a vast subject, being an aggregate of several technologies and disciplines, it is possible to acquire these skills with the right approach. In the end, Data Science is a very robust field that best fits people who have a knack for experimentation and problem-solving. With a large number of applications, Data Science has become the most versatile career.

--

--

Aditya Ranjan Behera

A student in PG Diploma in Data Science at IIIT, Bangalore with an interest in data analysis, ETL, Machine learning and business problem-solving.