Python For Data Science
“Everyone knows that any scripting language shootout that doesn’t show Python as the best language is faulty by design” — Max. M
Python is the first choice of programmers for various reasons: The language is easy to read and to work with relatively simple to learn. Python has a great community like Stackoverflow and plenty of resources are available.
Python plays an important role in data careers also. Learning of python for data science will help you to grab some useful skills.
Let’s grab those skills….
Before getting those skills we have to know that what is python.
Python is an interpretable high-level general-purpose dynamic programming language? It is relatively very simple to learn and to use because the code is clean and easy to interpret. So the most numbers of programmers are familiar with it.
History of Python
Python was started in the 1980s by “Guido Van Rossum” but actually started coding in 1991. Firstly, it started at CWI in the Netherlands which is successors of ABC programming language which was inspired by SETL. It was capable of handling exceptions and interfacing with the Amoeba OS. To get through all the details follow the Wikipedia here.
Why Python for Data Science
Python becomes a major force in the field of AI, Machine Learning and Data Science. It can open many numbers of doors for your career opportunities. Apart from AI, ML or data science python also major ably used language for web development or GUIs.
The reason behind using python for data science because python is easily interpretable and can handle very complex problems efficiently. It also has some libraries like “NUMPY”, “PANDAS”, “sklearn” or inbuilt function which provides us with some statistical tools which help for handling data sets and for visualisation. It makes data science attainable for those coming from backgrounds like business and marketing. Most of the data scientist does not need a high programming knowledge like cryptography or memory leaks so a data scientist has to write clean and logical code in python that’s why it can do data analysis in a better way.
Now let’s understand the python in a better way
Basic Data Types/structures and Inbuilt Functions in Python
Before going to learn python we have to learn some data types and data structures and functions evolved in python.
First, we will discuss inbuilt data types:
1) Integer(INT): — It contains all real numbers like 1,2,3,4, 5 it can be as long as possible
2) Float(float): — It contains some continuous variable, means the values specified with decimal points. Ex: 1.2, 34.6, 89.223
3) Complex Numbers: — Complex numbers are a combination of real and imaginary numbers. Like “2+3j”
4) String (STR): — String is nothing but the sequence of characters. It may be single-quoted, double-quoted or triple quoted delimiters. In python anything inside quoted delimiters are string.
5) Boolean: — Boolean data types are containing only True or False values.
Now we will discuss inbuilt data structures:
1) List: — List can store any data types like int, float, complex numbers, strings. It enclosed with square brackets. It works on LIFO principle and it can be mutable. Means we can manipulate this. Ex: [1.2, 3, “a”]
2) Dictionary: — Dictionary also can store any data types but it stores in a key:value pair. It enclosed with curly brackets. It can mutable but keys are not mutable. Ex: {“a”: [1,2,3,”Static”], “b”: 23, “c”:12.3}
3) Set: — Like list can store variables but it can only store unique values. It also enclosed with curly brackets. Ex: {1,2,3,4, “set”}
4) Tuples: — Tuples is also similar to list but it is immutable. It enclosed with parenthesis. Ex: (1,2,3,33, 4, 2, “list”, 4.2)
Python interpreter supports some inbuilt functions lets discuss them:
1) Math
abs(): Returns the absolute value of a number
divmod(): Returns quotient and the remainder of integer division
max(): Returns the largest of the given arguments or items in an iterable
min(): Returns the smallest of the given arguments or items in an iterable
pow(): Raises a number to a power
round(): Rounds a floating-point value
sum(): Sums the items of an iterable
2) Type Conversion
ascii(): Returns a string containing a printable representation of an object
bin(): Converts an integer to a binary string
bool(): Converts an argument to a Boolean value
Chr(): Returns string representation of the character given by integer argument
complex(): Returns a complex number constructed from arguments
float(): Returns a floating-point object constructed from a number or string
hex(): Converts an integer to a hexadecimal string
int(): Returns an integer object constructed from a number or string
oct(): Converts an integer to an octal string
ord(): Returns an integer representation of a character
repr(): Returns a string containing a printable representation of an object
str(): Returns a string version of an object
type(): Returns the type of an object or creates a new type object
3) Iterables and Iterators
all(): Returns True if all elements of an iterable are true
any(): Returns True if any elements of an iterable are true
enumerate(): Returns a list of tuples containing indices and values from an iterable
filter(): Filters elements from an iterable
iter(): Returns an iterator object
len(): Returns the length of an object
map(): Applies a function to every item of an iterable
next(): Retrieves the next item from an iterator
range(): Generates a range of integer values
reversed(): Returns a reverse iterator
slice(): Returns a slice object
sorted(): Returns a sorted list from an iterable
zip(): Creates an iterator that aggregates elements from iterables
4) Composite Data Type
bytearray(): Creates and returns an object of the bytearray class
bytes(): Creates and returns a bytes object (similar to bytearray, but immutable)
dict(): Creates a dict object
frozenset(): Creates a frozenset object
list(): Creates a list object
object(): Creates a new featureless object
set(): Creates a set object
tuple(): Creates a tuple object
5) Classes, Attribute and Inheritance
classmethod(): Returns a class method for a function
delattr(): Deletes an attribute from an object
getattr(): Returns the value of a named the attribute of an object
hasattr(): Returns True if an object has a given attribute
isinstance(): Determines whether an object is an instance of a given class
issubclass(): Determines whether a class is a subclass of a given class
property(): Returns a property value of a class
setattr(): Sets the value of a named the attribute of an object
super(): Returns a proxy object that delegates method calls to a parent or sibling class
6) Input/ Output
format(): Converts a value to a formatted representation
input(): Reads input from the console
open(): Opens a file and returns a file object
print(): Prints to a text stream or the console
7) Variable References and Scope
dir(): Returns a list of names in current local scope or a list of object attributes
globals(): Returns a dictionary representing the current global symbol table
id(): Returns the identity of an object
locals(): Updates and returns a dictionary representing the current local symbol table
vars(): Returns __dict__ attribute for a module, class, or object
8) Miscellaneous
callable(): Returns True if an object appears callable
compile(): Compiles source into a code or AST object
eval(): Evaluates a Python expression
exec(): Implements dynamic execution of Python code
hash(): Returns the hash value of an object
help(): Invokes the built-in help system
memoryview(): Returns a memory view object
staticmethod():Returns a static method for a function
__import__(): Invoked by the import statement
Apart from these inbuilt function, we can define our own functions by using def() and class() method.
Essential Python Libraries for Data Science
In data science, we have to do data processing, data mining, visualisation and analysis. There are so many numbers of active python libraries for data science, machine learning or artificial intelligence which will help us in our work. Let’s go over some of the leading libraries in this field.
1) Pandas: It is a python package which contains high-level data structures or tools which are perfect for data wrangling and data munging. These tools are designed for data analysis, data manipulation, aggregation and visualization. Pandas are also built on “Numpy”, so it is quite easy to leverage Numpy-centric applications like data structures with labelled axes. By the help of pandas, it is easy to handle missing data by using python and prevents common errors.
2) SciPy: SciPy module enables linear algebra, integration, optimization, statistics and other frequently used tasks in data science. It is a highly user friendly and provides fast and convenient N-dimensional array manipulation. Its arrays are depending upon Numpy because it’s main functionality is built upon Numpy. By the help of its specific submodules, it provides efficient numerical routines like numerical integration and optimization.
3) Numpy: Numpy is also known as “Numerical Python”. Numpy offers fast and precompiled inbuilt functions for numerical routines, due to this it becomes much easier to work with large multi-dimensional arrays and metrics. We don’t need to use the standard mathematical operations iteratively on an entire data set. But it does not provide powerful statistical analysis capabilities or functionalities.
4) Matplotlib: Matplotlib is a visualization library which makes it quick and easy to generate graphs, plots and charts from data set. It is highly helpful in univariate and bivariate exploratory data analysis. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.
5) Scikit Learn: Scikit Learn is also known as Sklearn. This module focused on machine learning which is built on SciPy. This library provides some machine learning algorithms and helps the user quickly implement popular algorithms on data sets. It has all standard tools for ML models like classification, clustering and regression.
6) Seaborn: Seaborn library is also a data visualisation library, which is based on Matplotlib. But it is more user friendly than Matplotlib because it uses less syntax
7) Scrapy: Scrapy is a user friendly library of python which helps us in large scale web scraping (extraction of data from public websites) and to store in a preferred format.
Starting with Python
For programming with python first, we have to install python.
Installation guide of python:
Steps to install python:
1) First download python from pythonorg
. 2) After download install python and add it to the path
3) Steps to add python to the path (if not add during installation):
1) Start the Run box and enter sysdm.cpl
2) This will open system properties window and go to Advanced and click Environment variable
3) In system variable window, find path variable and click edit
Then paste here the python.exe file location.
Jupyter Notebook
For programming in python, we need an interactive programming environment that allows for coding, data exploration and debugging in the web browser. Jupyter notebook is the best choice for this. It can be accessed via your default web browser. It allows you to mix code, graphics and text. You can even say that it works like a content management system as you can also write a blog post such as this one with a Jupyter Notebook.
Installing guide for Jupyter notebook. (Before installing sure that you have an active internet)
If you are using anaconda
1) Open conda and type “conda install –c conda-forge notebook”
2) Then directly open anaconda and type “Jupyter notebook”
If you are using PIP
1) Open Run box and type “cmd”
This will open the command prompt
2) Then type “pip install Jupyter notebook”
This will install Jupyter notebook
3) Then open command prompt and type “Jupyter notebook”
This will open Jupyter notebook in your default browser. For more doubts follow here.
Summary
This is the beginner’s guide just scratched the surface of python for data science. Choosing a language to learn, especially if it’s your first, is an important decision. I hope this blog will help you to get through the python.