Pandas  Data Munging and Analysis In A Cuddly Form
Database Design
Overview
Concept
pandas
is an extraordinarily powerful data munging and analysis module, and is a key part of the Pythonbased scientific computing toolkit.
Features
Some of pandas’ most valuable features:

ability to read and write data to and from common plain text data formats (e.g. CSV, JSON, fixedwidth column text, etc), properietary formats like Excel’s
.xlsx
, and online resources 
fast and efficient data handling in memory (based on
numpy
and Apache Arrow) 
builtin ability to handle missing data (i.e. remove it or interpolate it)

intuitive criteriabased filtering and grouping, similar to that of common databases

support for timeseries data

seamless integration with the other popular Pythonbased scientific computing tools, including numpy and matplotlib
Numpy
Concept
pandas is built on top of numpy  a Python module for optimized handling of ndimensional arrays of data. A basic understanding of numpy is thus helpful in understanding pandas.

arrays in
numpy
are referred to in documentation as ndarrays, for ndimensional arrays. 
each array stores values of a single data type, called that array’s dtype  usually an
int
,float
, or an object type 
like Python lists, each value in an ndarray is indexed with an integer, starting from 0.
Example Notebook
A basic understanding of numpy
will help master pandas
.

Here is a Jupyter Notebook that shows some of the core features of
numpy
. 
Open it from within JupyterLab or any other Jupyter Notebook viewer.
Installing
numpy
and pandas
, like most popular modules, can be installed via conda
or pip
package managers, preferably into a virtual environment.
pip install numpy # try 'pip3' instead of 'pip' if your system requires it
pip install pandas
Importing into a Python script with the np
and pd
aliases is the convention:
import numpy as np
import pandas as pd
Creating
ndarrays can be created from scratch with a variety of numpy
functions.

np.array( [23, 56, 2] )
 creates an ndarray from a Python list 
np.random.random_sample( (3, 4) )
 creates an ndarray with the given shape (number of rows and columns) 
np.zeros( (3, 4), dtype = int )
 creates an ndarray in the given shape, filled with zeros 
np.ones( (3, 4), dtype = int )
 creates an ndarray in the given shape, filled with zeros 
np.arange( 10, 40, 5 )
 similar to the Pythonrange()
function  creates an ndarray with values in the given range; the third argument indicates the step. 
np.linspace( 10, 40, 5 )
 creates an ndarray with values in the given range; the third argument specifies the number of values to generate
Indexing and slicing
ndarrays in numpy
can be indexed and sliced in the same manner as Python lists.
Take the following ndarray:
x = np.array( [10, 12, 14, 16, 18 ] )

x[2]
 refers to14

x[2]
 refers to16

x[2 : 4]
 refers toarray([14, 16])

x[ : 2]
 refers toarray([10, 12])

x[ 2 : ]
 refers toarray([14, 16, 18])
Simple math operations
It is straightforward to perform to the same math operation across all values in an ndarray.

np.array([1, 2, 3, 4]) + 2
 results inarray([3, 4, 5, 6])

np.array([1, 2, 3, 4])  1
 results inarray([0, 1, 2, 3])

np.array([1, 2, 3, 4]) * 2
 results inarray([2, 4, 6, 8])

np.array([1, 2, 3, 4]) / 2
 results inarray([.5, 1., 1.5, 2])

np.array([1, 2, 3, 4]) > 2
 results inarray([False, False, True, True])

np.array([1, 2, 3, 4]) != 2
 results inarray([True, False, True, True])
Basic statistics
numpy
includes functions to perform basic statistics on any ndarray, such as calculating the min, max, mean, median, and standard deviation.
For example, take the following ndarray:
x = np.array([
[ 2, 50, 100],
[ 3, 60, 200],
[ 4, 55, 150],
[ 5, 40, 250]
])

x.mean()
 calculate the mean of all values in the flattened array 
np.mean(x, axis=0)
 calculate the means ‘vertically’ 
np.mean(x, axis=1)
 calculate the means ‘horizontally’ 
the other functions  to calculate the min, max, median, and standard deviation  have equivalent options.
Filtering
You may apply certain conditions to extract a subset of values from an ndarray.
Take the following ndarray:
a = np.array( [10, 12, 14, 16, 18 ] )

a[a < 15]
 results inarray([10, 12, 14])

a[a > 15]
 results inarray([16, 18])

a[ (a == 10)  (a > 16) ]
 results inarray([10, 18])

a[ (a % 2 == 0) & (a > 16) ]
 results inarray([18])
Removing null values
The value, np.nan
represents a null value. And the function, np.isnan()
can be helpful in finding null values in an array.
For example, take the folowing data:
x = np.array([np.nan, 1, 12, np.nan, 3, 41])

np.isnan(x)
 results inarray([True, False, False, True, False, False])

x[ np.isnan(x) ]
 results inarray([nan, nan])

x[ ~np.isnan(x) ]
 results inarray([1, 12, 3, 41])
Series
Concept
A Series in pandas
is a onedimensional series of values, often representing the values in either a single row or a single column of a tabular data structure.

Series share much in common with onedimensional
numpy
ndarrays. 
The difference, in practice, is that pandas Series are not restricted to integer indices, and can have string indices, for example.

In this sense, Series can resemble Python dictionaries.
Examples
See this example Jupyter Notebook for examples exhibiting some of the core Series concepts.
Dataframe
Concept
A DataFrame is the main data type that users of pandas
interact with.

DataFrames hold data in a row/column format similar to a spreadsheet

Each column is represented as a pandas Series, and each row is also represented as a pandas Series.

Each row has an index  a unique identifier of that row.
Examples
See this example Jupyter Notebook for examples exhibiting some of the core DataFrame concepts.
Visualizations
Concept
pandas
contains wrappers around the popular matplotlib
plotting module, and includes several functions for creating several common types of plots:

bar()
andbarh()
for vertical and horizontal bar plots, respectively 
hist()
for histograms andbox()
for boxplots 
kde()
ordensity()
for density plots 
area()
for area plots,scatter()
for scatter plots 
hexbin()
for hexagonal bin plots 
pie()
for pie plots
Examples
See this example Jupyter Notebook for examples of data visualizations using pandas
and matplotlib
.
matplotlib examples
While it is not necessary to have a deep understanding of matplotlib
in order to use pandas
plotting functions, it might be helpful. Here is a sample Jupyter Notebook with some simple matplotlib
examples that don’t use pandas
.
Conclusions
Thank you. Bye.