# Series in pandas
A set of examples that exhibit some of the core features of the [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) data type in the `pandas` module.

## import

In [2]:
import numpy as np
import pandas as pd

## Create a Series

In [3]:
# simple series with automatic numeric indices
x = pd.Series([22, 44, 66, 88])
x

0    22
1    44
2    66
3    88
dtype: int64

In [23]:
# get value with numeric index
x[2]

66

In [24]:
# with a custom indices
y = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y

a    22
b    44
c    66
d    88
dtype: int64

In [25]:
# get value with custom index
y['c']

66

In [33]:
# from a Python dictionary
z = pd.Series({'foo': 22, 'bar': 44, 'baz': 66, 'bum': 88})
z

foo    22
bar    44
baz    66
bum    88
dtype: int64

In [98]:
# from a scalar
a = pd.Series(5, index=[0, 1, 2, 3, 4, 5])
a

0    5
1    5
2    5
3    5
4    5
5    5
dtype: int64

## Data types
Unlike `numpy` ndarrays, a single `pandas` Series can contain a variety of data types.

In [161]:
s = pd.Series( ['hello', 44, True, 3.14, [1, 2, 3] ] )
s

0        hello
1           44
2         True
3         3.14
4    [1, 2, 3]
dtype: object

Convert data types with `astype()`.

In [173]:
# convert to string
s = s.astype(str)
s

0        hello
1           44
2         True
3         3.14
4    [1, 2, 3]
dtype: object

## Lambda expressions

In [176]:
# run a function to transform each value in the series
s.apply(lambda x: 'hello ' + x)

0        hello hello
1           hello 44
2         hello True
3         hello 3.14
4    hello [1, 2, 3]
dtype: object

## Naming Series
Series can be given a custom name.

In [177]:
# create a Series with a custom name
x = pd.Series([22, 44, 66, 88], name='non-anonymous Series')
x

0    22
1    44
2    66
3    88
Name: non-anonymous Series, dtype: int64

In [58]:
# get the name
x.name

'non-anonymous Series'

## Indexing
Accessing values from a `pandas` Series.

In [81]:
# an example Series, with String labels
x = pd.Series({'foo': 22, 'bar': 44, 'baz': 66, 'bum': 88})

In [82]:
# access by label
x['bar']

44

In [83]:
# access by position (even with a custom-labeled Series)
x[1]

44

In [84]:
# access by integer index
x.iloc[1]

44

In [87]:
# access by label index
x.loc['bar']

44

In [89]:
# accessing a subset by positions
x[ [0, 1, 2] ]

foo    22
bar    44
baz    66
dtype: int64

In [90]:
# accessing a subset by integer indices
x.iloc[ [0, 1, 2] ]

foo    22
bar    44
baz    66
dtype: int64

In [93]:
# accessing a subset by label indices
x.loc[ ['foo', 'bar', 'baz'] ]

foo    22
bar    44
baz    66
dtype: int64

## Slicing
Unlike `numpy` ndarrays, slicing a `pandas` Series will also slice the index.

In [38]:
# slice an automatically-indexed Series
x = pd.Series([22, 44, 66, 88])
x[2 : ]

2    66
3    88
dtype: int64

In [99]:
# the same thing, using iloc
x.iloc[2 : ]

baz    66
bum    88
dtype: int64

In [100]:
# slice a custom-indexed Series
y = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y[2 : ]

c    66
d    88
dtype: int64

Slice syntax within the brackets, `[` and `]`, generally works the same way as regular Python list slices and `numpy` slices.

## Sorting

In [180]:
# unsorted
y

0    Foo
1    Bar
2    Baz
3    Bum
dtype: object

In [187]:
# sorted by index
y.sort_index(ascending=False)

3    Bum
2    Baz
1    Bar
0    Foo
dtype: object

In [188]:
# sorted by value
y.sort_values(ascending=True)

1    Bar
2    Baz
3    Bum
0    Foo
dtype: object

## Introspection
Accessing some metadata about a Series

In [39]:
# the data type of the Series
x = pd.Series([22, 44, 66, 88])
x.dtype

dtype('int64')

In [41]:
# the shape of the Series... in this case a one-dimensional array with 4 values
y = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y.shape

(4,)

In [61]:
# get the name of a named series
x = pd.Series([22, 44, 66, 88], name="non-anonymous Series")
x.name

'non-anonymous Series'

## Simple math operations

In [6]:
# add a scalar to all values in a Series
x = pd.Series([22, 44, 66, 88])
x + 2
x

0    22
1    44
2    66
3    88
dtype: int64

In [7]:
# subtract a scalar from all values in a Series
x = pd.Series([22, 44, 66, 88])
x - 2

0    20
1    42
2    64
3    86
dtype: int64

In [11]:
# divid all values in a Series by a scalar
x = pd.Series([22, 44, 66, 88])
x / 11

0    2.0
1    4.0
2    6.0
3    8.0
dtype: float64

... and so on

In [46]:
x > 50

0    False
1    False
2     True
3     True
dtype: bool

In [48]:
x != 44

0     True
1    False
2     True
3     True
dtype: bool

In [18]:
# add two series together
x = pd.Series([22, 44, 66, 88])
x + x

0    24
1    46
2    68
3    90
dtype: int64

In [15]:
# add two series together
x = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y = pd.Series([1, 2, 3, 4], index=['d', 'c', 'a', 'b'])
x + y

a    25
b    48
c    68
d    89
dtype: int64

## Math operations and the alignment of labels
Unlike `numpy` ndarrays, operations on Series automatically align by labels.

In [49]:
# for example, take two Series with the same set of labels, but in different orders
a = pd.Series({'foo': 22, 'bar': 44, 'baz': 66, 'bum': 88})
b = pd.Series({'bum': 1, 'baz': 2, 'bar': 3, 'foo': 4, })

In [51]:
# math operations will be performed on values that share the same label
a + b

bar    47
baz    68
bum    89
foo    26
dtype: int64

Besides this difference, all the basic math operations (+, -, *, /) between two Series work the same way as in `numpy` ndarrays.

## Heads and tails
When dealing with large amounts of data, it's sometimes useful to see a sample of the data, without viewing the entire data set.  The `head()`, `tail()`, and `sample()` functions can help with this.

In [20]:
# first, let's generate a large Series

import numpy as np # import numpy for convience generating a lot of sample data

# make a really big Series from a random numpy ndarray
x = pd.Series( np.random.random(5000) ) 

In [25]:
# get the default of what's in x
x

0       0.081618
1       0.532477
2       0.878032
3       0.608577
4       0.302715
          ...   
4995    0.544468
4996    0.162016
4997    0.977883
4998    0.155737
4999    0.198541
Length: 5000, dtype: float64

In [23]:
# get the head... the first few values
x.head()

0    0.081618
1    0.532477
2    0.878032
3    0.608577
4    0.302715
dtype: float64

In [24]:
# get the tail... the last few values
x.tail()

4995    0.544468
4996    0.162016
4997    0.977883
4998    0.155737
4999    0.198541
dtype: float64

In [27]:
# get a sample of a few random values
x.sample(5)

2682    0.590757
1161    0.007108
878     0.275778
2050    0.666622
1396    0.261979
dtype: float64

## Basic statistics
Basic statistical functions, like `mean()`, `median()`, `min()`, `max()`, and `std()` work just like their `numpy` equivalents.

In [28]:
# make a linearly-spaced series of 50 values from 1 to 100
x = pd.Series( np.linspace(1, 100, 50) ) 
x.head()

0    1.000000
1    3.020408
2    5.040816
3    7.061224
4    9.081633
dtype: float64

In [29]:
# get an overview of most common stats
x.describe()

count     50.000000
mean      50.500000
std       29.452257
min        1.000000
25%       25.750000
50%       50.500000
75%       75.250000
max      100.000000
dtype: float64

In [30]:
# calculate the mean value of the entire Series
x.mean()

50.5

In [35]:
# calculate the mean value for only those values in the Series that are greater than 50
x[ x < 5 ].mean()

2.010204081632653

The other statistics functions - `min()`, `max()`, `median()`, `std()`, `count()` - work similarly.

In [158]:
x

0    22
1    44
2    66
3    88
dtype: int64