Python For Data Science

Aditya Ranjan Behera
9 min readJan 5, 2021

--

Everyone knows that any scripting language shootout that doesn’t show Python as the best language is faulty by design”Max. M

Python is the first choice of programmers for various reasons: The language is easy to read and to work with relatively simple to learn. Python has a great community like Stackoverflow and plenty of resources are available.

Python plays an important role in data careers also. Learning of python for data science will help you to grab some useful skills.

Let’s grab those skills….

Before getting those skills we have to know that what is python.

Python is an interpretable high-level general-purpose dynamic programming language? It is relatively very simple to learn and to use because the code is clean and easy to interpret. So the most numbers of programmers are familiar with it.

History of Python

Python was started in the 1980s by “Guido Van Rossum” but actually started coding in 1991. Firstly, it started at CWI in the Netherlands which is successors of ABC programming language which was inspired by SETL. It was capable of handling exceptions and interfacing with the Amoeba OS. To get through all the details follow the Wikipedia here.

Why Python for Data Science

Python becomes a major force in the field of AI, Machine Learning and Data Science. It can open many numbers of doors for your career opportunities. Apart from AI, ML or data science python also major ably used language for web development or GUIs.

The reason behind using python for data science because python is easily interpretable and can handle very complex problems efficiently. It also has some libraries like “NUMPY”, “PANDAS”, “sklearn” or inbuilt function which provides us with some statistical tools which help for handling data sets and for visualisation. It makes data science attainable for those coming from backgrounds like business and marketing. Most of the data scientist does not need a high programming knowledge like cryptography or memory leaks so a data scientist has to write clean and logical code in python that’s why it can do data analysis in a better way.

Now let’s understand the python in a better way

Basic Data Types/structures and Inbuilt Functions in Python

Before going to learn python we have to learn some data types and data structures and functions evolved in python.

First, we will discuss inbuilt data types:

1) Integer(INT): — It contains all real numbers like 1,2,3,4, 5 it can be as long as possible

2) Float(float): — It contains some continuous variable, means the values specified with decimal points. Ex: 1.2, 34.6, 89.223

3) Complex Numbers: — Complex numbers are a combination of real and imaginary numbers. Like “2+3j”

4) String (STR): — String is nothing but the sequence of characters. It may be single-quoted, double-quoted or triple quoted delimiters. In python anything inside quoted delimiters are string.

5) Boolean: — Boolean data types are containing only True or False values.

Now we will discuss inbuilt data structures:

1) List: — List can store any data types like int, float, complex numbers, strings. It enclosed with square brackets. It works on LIFO principle and it can be mutable. Means we can manipulate this. Ex: [1.2, 3, “a”]

2) Dictionary: — Dictionary also can store any data types but it stores in a key:value pair. It enclosed with curly brackets. It can mutable but keys are not mutable. Ex: {“a”: [1,2,3,”Static”], “b”: 23, “c”:12.3}

3) Set: — Like list can store variables but it can only store unique values. It also enclosed with curly brackets. Ex: {1,2,3,4, “set”}

4) Tuples: — Tuples is also similar to list but it is immutable. It enclosed with parenthesis. Ex: (1,2,3,33, 4, 2, “list”, 4.2)

Python interpreter supports some inbuilt functions lets discuss them:

1) Math

abs(): Returns the absolute value of a number

divmod(): Returns quotient and the remainder of integer division

max(): Returns the largest of the given arguments or items in an iterable

min(): Returns the smallest of the given arguments or items in an iterable

pow(): Raises a number to a power

round(): Rounds a floating-point value

sum(): Sums the items of an iterable

2) Type Conversion

ascii(): Returns a string containing a printable representation of an object

bin(): Converts an integer to a binary string

bool(): Converts an argument to a Boolean value

Chr(): Returns string representation of the character given by integer argument

complex(): Returns a complex number constructed from arguments

float(): Returns a floating-point object constructed from a number or string

hex(): Converts an integer to a hexadecimal string

int(): Returns an integer object constructed from a number or string

oct(): Converts an integer to an octal string

ord(): Returns an integer representation of a character

repr(): Returns a string containing a printable representation of an object

str(): Returns a string version of an object

type(): Returns the type of an object or creates a new type object

3) Iterables and Iterators

all(): Returns True if all elements of an iterable are true

any(): Returns True if any elements of an iterable are true

enumerate(): Returns a list of tuples containing indices and values from an iterable

filter(): Filters elements from an iterable

iter(): Returns an iterator object

len(): Returns the length of an object

map(): Applies a function to every item of an iterable

next(): Retrieves the next item from an iterator

range(): Generates a range of integer values

reversed(): Returns a reverse iterator

slice(): Returns a slice object

sorted(): Returns a sorted list from an iterable

zip(): Creates an iterator that aggregates elements from iterables

4) Composite Data Type

bytearray(): Creates and returns an object of the bytearray class

bytes(): Creates and returns a bytes object (similar to bytearray, but immutable)

dict(): Creates a dict object

frozenset(): Creates a frozenset object

list(): Creates a list object

object(): Creates a new featureless object

set(): Creates a set object

tuple(): Creates a tuple object

5) Classes, Attribute and Inheritance

classmethod(): Returns a class method for a function

delattr(): Deletes an attribute from an object

getattr(): Returns the value of a named the attribute of an object

hasattr(): Returns True if an object has a given attribute

isinstance(): Determines whether an object is an instance of a given class

issubclass(): Determines whether a class is a subclass of a given class

property(): Returns a property value of a class

setattr(): Sets the value of a named the attribute of an object

super(): Returns a proxy object that delegates method calls to a parent or sibling class

6) Input/ Output

format(): Converts a value to a formatted representation

input(): Reads input from the console

open(): Opens a file and returns a file object

print(): Prints to a text stream or the console

7) Variable References and Scope

dir(): Returns a list of names in current local scope or a list of object attributes

globals(): Returns a dictionary representing the current global symbol table

id(): Returns the identity of an object

locals(): Updates and returns a dictionary representing the current local symbol table

vars(): Returns __dict__ attribute for a module, class, or object

8) Miscellaneous

callable(): Returns True if an object appears callable

compile(): Compiles source into a code or AST object

eval(): Evaluates a Python expression

exec(): Implements dynamic execution of Python code

hash(): Returns the hash value of an object

help(): Invokes the built-in help system

memoryview(): Returns a memory view object

staticmethod():Returns a static method for a function

__import__(): Invoked by the import statement

Apart from these inbuilt function, we can define our own functions by using def() and class() method.

Essential Python Libraries for Data Science

In data science, we have to do data processing, data mining, visualisation and analysis. There are so many numbers of active python libraries for data science, machine learning or artificial intelligence which will help us in our work. Let’s go over some of the leading libraries in this field.

1) Pandas: It is a python package which contains high-level data structures or tools which are perfect for data wrangling and data munging. These tools are designed for data analysis, data manipulation, aggregation and visualization. Pandas are also built on “Numpy”, so it is quite easy to leverage Numpy-centric applications like data structures with labelled axes. By the help of pandas, it is easy to handle missing data by using python and prevents common errors.

2) SciPy: SciPy module enables linear algebra, integration, optimization, statistics and other frequently used tasks in data science. It is a highly user friendly and provides fast and convenient N-dimensional array manipulation. Its arrays are depending upon Numpy because it’s main functionality is built upon Numpy. By the help of its specific submodules, it provides efficient numerical routines like numerical integration and optimization.

3) Numpy: Numpy is also known as “Numerical Python”. Numpy offers fast and precompiled inbuilt functions for numerical routines, due to this it becomes much easier to work with large multi-dimensional arrays and metrics. We don’t need to use the standard mathematical operations iteratively on an entire data set. But it does not provide powerful statistical analysis capabilities or functionalities.

4) Matplotlib: Matplotlib is a visualization library which makes it quick and easy to generate graphs, plots and charts from data set. It is highly helpful in univariate and bivariate exploratory data analysis. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.

5) Scikit Learn: Scikit Learn is also known as Sklearn. This module focused on machine learning which is built on SciPy. This library provides some machine learning algorithms and helps the user quickly implement popular algorithms on data sets. It has all standard tools for ML models like classification, clustering and regression.

6) Seaborn: Seaborn library is also a data visualisation library, which is based on Matplotlib. But it is more user friendly than Matplotlib because it uses less syntax

7) Scrapy: Scrapy is a user friendly library of python which helps us in large scale web scraping (extraction of data from public websites) and to store in a preferred format.

Starting with Python

For programming with python first, we have to install python.

Installation guide of python:

Steps to install python:

1) First download python from pythonorg

. 2) After download install python and add it to the path

3) Steps to add python to the path (if not add during installation):

1) Start the Run box and enter sysdm.cpl

2) This will open system properties window and go to Advanced and click Environment variable

3) In system variable window, find path variable and click edit

Then paste here the python.exe file location.

Jupyter Notebook

For programming in python, we need an interactive programming environment that allows for coding, data exploration and debugging in the web browser. Jupyter notebook is the best choice for this. It can be accessed via your default web browser. It allows you to mix code, graphics and text. You can even say that it works like a content management system as you can also write a blog post such as this one with a Jupyter Notebook.

Installing guide for Jupyter notebook. (Before installing sure that you have an active internet)

If you are using anaconda

1) Open conda and type “conda install –c conda-forge notebook”

2) Then directly open anaconda and type “Jupyter notebook”

If you are using PIP

1) Open Run box and type “cmd”

This will open the command prompt

2) Then type “pip install Jupyter notebook”

This will install Jupyter notebook

3) Then open command prompt and type “Jupyter notebook”

This will open Jupyter notebook in your default browser. For more doubts follow here.

Summary

This is the beginner’s guide just scratched the surface of python for data science. Choosing a language to learn, especially if it’s your first, is an important decision. I hope this blog will help you to get through the python.

--

--

Aditya Ranjan Behera

A student in PG Diploma in Data Science at IIIT, Bangalore with an interest in data analysis, ETL, Machine learning and business problem-solving.