Pandas Basics: Machine Learning in Python

4 min readDec 31, 2020

Continuing with the series “Machine Learning in Python”, we have the next most commonly used software library in Python, that is, Pandas. In the next few minutes, we shall learn about the basics of Pandas library and how to get yourself setup to explore the vast world of data.

“In the future, I think that programming languages are going to diminish in importance relative to data itself and common computational libraries.”
— Wes McKinney (Creator — Pandas)

Pandas: A tool for Data Analysis in Python

Pandas — derived from the word “Panel Data” is a play on the phrase “Pythons Analysis of Data” itself. Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Even though Pandas comes with its own problems — low performance and long runtime — when dealing with datasets that are over the limit of 1GB, it is still widely used in the Data Science Community for processing small to large data.

Pandas mainly works on DataFrame and Series or by converting raw data to DataFrames and Series.

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.

Each column in a DataFrame is a Series. A pandas Series has no column labels, as it is just a single column of a DataFrame. A Series does have row labels.

As there is nothing more to say about Pandas, lets get straight to the handiwork.

Working with Pandas: How to get setup!

Like any other library, lets first install Pandas with:
For additional help: Install Pandas

pip install pandas

Importing Pandas to your .py project:

import pandas
or
import pandas as pd
or
import pandas as <alias>

Pandas gives you three different ways of dealing with data:

Convert a Python list, dictionary or NumPy array to Pandas data-frame

# List to DataFrame
lst = ["A","B","C"]
df = pd.DataFrame(lst)# Dictionary to DataFrame
dct = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(dct)# NumPy array to DataFrame
data = np.array([[5.8, 2.8], [6.0, 2.2]])
df = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]})

Open a local file using Pandas, usually a CSV file, but could also be a delimited text file, Excel, etc.

pd.read_csv("../data_folder/data.csv")
or
pd.read_<filetype>()

Open a remote file or database like a CSV or a JSON on a website through a URL or read from a SQL table/database

# Reading from a RAW URL
url="https://raw.githubusercontent.com/.../data.csv"
c=pd.read_csv(url)# Reading a SQL Query
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
pd.read_sql("SELECT * FROM my_table;", engine)
pd.read_sql_table('my_table', engine)
pd.read_sql_query("SELECT * FROM my_table;", engine)

You can even save a data-frame you’re working with/on to a different kind of file extension

df.to_<filetype>(<filename>)

Other basics commands of Pandas

Creating a Series

series1 = pd.Series([1,2,3,4]), index=['a', 'b', 'c', 'd'])
# Pandas will default count index from 0

Set the Series name

srs.name = "Insert name"

Set index name

srs.index.name = "Index name"

Create a DataFrame

df = pd.DataFrame(
         {"a" : [1 ,2, 3],
          "b" : [7, 8, 9],
          "c" : [10, 11, 12]}, index = [1, 2, 3])

Specify values in DataFrame columns

df = pd.DataFrame( 
     [[1, 2, 3], 
     [4, 6, 8],
     [10, 11, 12]],
     index=[1, 2, 3], 
     columns=['a', 'b', 'c'])

Understanding your data

# To get the first 5 entries in your data table
df.head(<filename>)# To get statistical data
df.describe(<filename>)

Select a single value

# By Position
df.iloc[[0],[0]] 'Name'
df.iat([0],[0]) 'Name'# By Label
df.loc[[0], ['Label']] 'Name'
df.at([0], ['Label']) 'Name'

Retrieve rows and columns description

df.shape

Other functions of Pandas can be read here.

This blog provides a small overview of advantages and functionality of Pandas Library in Python. This documentation is by no means a complete guide to Pandas but a way to kickstart your journey of Machine Learning with Pandas.

Thanks for reading.
Don’t forget to click on 👏!

Pandas Basics: Machine Learning in Python

Pandas: A tool for Data Analysis in Python

Working with Pandas: How to get setup!

Pandas gives you three different ways of dealing with data:

Other basics commands of Pandas

Written by Divyansh Chaudhary