How To Isolate Trend, Seasonality And Noise From A Time Series

A headfirst dive into a powerful time series decomposition algorithm using Python

A time series can be thought of as being made of 4 components:

  • A seasonal component
  • A trend component
  • A cyclical component, and
  • A noise component.

The Seasonal component

The seasonal component explains the periodic ups and downs one sees in many data sets such as the one shown below.

Retail Used Car Sales. Data source: US FRED
Retail Used Car Sales. Data source: US FRED (Image by Author)

In the above example, the seasonal period is approximately 12 months and it peaks in March and bottoms out in November or December before peaking in March again.

A time series can contain multiple superimposed seasonal periods. A classic example is a time series of hourly temperatures at a weather station. Since the Earth rotates around its axis, the graph of hourly temperatures at a weather station will show a seasonal period of 24 hours. The Earth also revolves around the Sun in a tilted manner, leading to seasonal temperature variations. If you follow the temperature at the weather station at say 11 am for 365 days, you will see a second pattern emerging that has a period of 12 months. The 24 hour long daily pattern is superimposed on the 12 month long yearly pattern.

In case of the hourly weather data, one knows what are the underlying physical phenomena that are causing the two seasonal patterns. But in most cases, it’s not possible to know what are all the factors that are introducing seasonality into your data. And so, it is seldom easy to unearth all the seasonal periods that may be hiding in a time series.

That being said, the commonly occurring seasonal periods are a day, week, month, quarter (or season), and year.

Seasonality is also observed on much longer time scales such as in the solar cycle, which follows a roughly 11 year period.

Daily sunspot count. Data source: SILSO
Daily sunspot count. Data source: SILSO (Image by Author)

The Trend component

The Trend component refers to the pattern in the data that spans across seasonal periods.

The time series of retail eCommerce sales shown below demonstrates a possibly quadratic trend (y = x²) that spans across the 12 month long seasonal period:

Retail eCommerce sales. Data source: US FRED
Retail eCommerce sales. Data source: US FRED (Image by Author)

The Cyclical component

The cyclical component represents phenomena that happen across seasonal periods. Cyclical patterns do not have a fixed period like seasonal patterns do. An example of a cyclical pattern is the cycles of boom and bust that stock markets experience in response to world events.

Dow Jones % change in closing price from previous year (1880–2020). Data source: MeasuringWorth.com via Wikipedia)
Dow Jones % change in closing price from previous year (1880–2020). Data source: MeasuringWorth.com via Wikipedia) (Image by Author)

The cyclical component is hard to isolate and it’s often ‘left alone’ by combining it with the trend component.

The Noise component

The noise or the random component is what remains behind when you separate out seasonality and trend from the time series. Noise is the effect of factors that you do not know, or which you cannot measure. It is the effect of the known unknowns, or the unknown unknowns.


Additive and Multiplicative effects

The trend, seasonal and noise components can combine in an additive or a multiplicative way.

Additive combination
If the seasonal and noise components change the trend by an amount that is independent of the value of trend, the trend, seasonal and noise components are said to behave in an additive way. One can represent this situation as follows:

y_i = t_i + s_i + n_i

where y_i = the value of the time series at the ith time step.
t_i = the trend component at the ith time step.
s_i = the seasonal component at the ith time step.
n_i = the noise component at the ith time step.

Multiplicative combination
If the seasonal and noise components change the trend by an amount that depends on the value of trend, the three components are said to behave in a multiplicative way as follows:

y_i = t_i * s_i * n_i


A step-by-step procedure for decomposing a time series into trend, seasonal and noise components using Python

There are many decomposition methods available ranging from simple moving average based methods to powerful ones such as STL.

In Python, the statsmodels library has a seasonal_decompose() method that lets you decompose a time series into trend, seasonality and noise in one line of code.

In my articles, we like to get into the weeds. So before we use seasonal_decompose(), let’s do a deep dive into a simple, yet powerful time series decomposition technique.

Let’s understand how decomposition really works under the covers.

We’ll hand-crank out the decomposition of a time series into its trend, seasonal and noise components using a simple procedure based on moving averages using the following steps:

STEP 1: Identify the length of the seasonal period
STEP 2: Isolate the trend
STEP 3: Isolate the seasonality+noise
STEP 4: Isolate the seasonality
STEP 5: Isolate the noise

We’ll use as an example, the following time series of retail sales of user cars dealers in the US:

Retail Used Car Sales. Data source: US FRED
Retail Used Car Sales. Data source: US FRED (Image by Author)

Let’s load the data into a pandas DataFrame and plot the time series:

import pandas as pd
import numpy as np
import math
from matplotlib import pyplot as plt


mydateparser = lambda x: pd.datetime.strptime(x, '%d-%m-%y')

df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, index_col=0, parse_dates=['DATE'], date_parser=mydateparser)

fig = plt.figure()

fig.suptitle('Retail sales of used car dealers in the US in millions of dollars')

df['Retail_Sales'].plot()

Now let’s begin the step by step decomposition of this time series.

STEP 1: Try to guess the duration of the seasonal component in your data. In the above example, we’ll guess it to be 12 months.

STEP 2: Now run a 12 month centered moving average on the data. This moving average is spread across a total of 13 months. i.e. 6 months each on the left and right side of the center month. The 12 month centered MA is an average of two moving averages that are shifted from each other by 1 month, effectively making it a weighted moving average.

Here is an illustration of how this centered MA can be calculated in Microsoft Excel:

Illustration of a 2 x 12 centered moving average
Illustration of a 2 x 12 centered moving average (Image by Author)

This MA will smooth out seasonality and noise and bring out the trend.

Continuing with our Python example, here is how we can calculate the centered moving average in Python:

#Add an empty column to store the 2x12 centered MA values
df['2 x 12 CMA (TREND)'] = np.nan

#Fill it up with the 2x12 centered MA values
for i in range(6,df['Retail_Sales'].size-6):
    df['2 x 12 CMA (TREND)'][i] = np.round(
        df['Retail_Sales'][i - 6] * 1.0 / 24 + 
        (
            df['Retail_Sales'][i - 5] + 
            df['Retail_Sales'][i - 4] + 
            df['Retail_Sales'][i - 3] + 
            df['Retail_Sales'][i - 2] + 
            df['Retail_Sales'][i - 1] + 
            df['Retail_Sales'][i] + 
            df['Retail_Sales'][i + 1] + 
            df['Retail_Sales']i + 2] + 
            df['Retail_Sales'][i + 3] + 
            df['Retail_Sales'][i + 4] + 
            df['Retail_Sales'][i + 5]
        ) * 1.0 / 12 + 
        df['Retail_Sales'][i + 6] * 1.0 / 24

Notice how the values at indices [i-6] and [i+6] are weighted by 1.0/24 while the rest of the values are each weighted by 1.0/12.

Let’s plot the resulting time series that is contained in column ‘2 x 12 CMA (TREND)’:

#plot the trend component
fig = plt.figure()

fig.suptitle('TREND component of Retail sales of used car dealers in the US in millions of dollars')

df['2 x 12 CMA (TREND)'].plot()

plt.show()

As you can see, our moving average transformation has highlighted the trend component of the retail sales time series:

(Image by Author)

STEP 3: Now we have a decision to make. Depending on whether the composition is multiplicative or additive, we’ll need to divide or subtract the trend component from the original time series to retrieve the seasonal and noise components. If we inspect the original car sales time series, we can see that the seasonal swings are increasing in proportion to the current value of the time series. Hence we’ll assume that the seasonality is multiplicative. We’ll also take a small leap of faith to assume that the noise is multiplicative.

Thus the retail used car sales time series is assumed to have the following multiplicative decomposition model:

Time series value = trend component * seasonal component * noise component

Therefore:

seasonal component * noise component = Time series value / trend component

We’ll add a new column into our data frame and fill it with the product of the seasonal and noise components using the above formula.

df['SEASONALITY AND NOISE'] = df['Retail_Sales']/df['2 x 12 CMA (TREND)']

Let’s plot the new column. This time, we will see the seasonality and noise showing through:

fig = plt.figure()

fig.suptitle('SEASONALITY and NOISE components')

plt.ylim(0, 1.3)

df['SEASONALITY AND NOISE'].plot()

plt.show()
(Image by Author)

STEP 4: Next, we will get the ‘pure’ seasonal component out of the mixture of seasonality and noise, by calculating the average value of the seasonal component for all January months, all February months, all March months and so on.

#first add a month column
df['MONTH'] = df.index.strftime('%m').astype(np.int)

#initialize the month based dictionaries to store the running total of the month wise  seasonal sums and counts
average_seasonal_values = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0, 11:0, 12:0}

average_seasonal_value_counts = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0, 11:0, 12:0}

#calculate the sums and counts
for i in range(0, df['SEASONALITY AND NOISE'].size):
    if math.isnan(df['SEASONALITY AND NOISE'][i]) is False:
        average_seasonal_values[df['MONTH'][i]] =  
            average_seasonal_values[df['MONTH'][i]] +
            df['SEASONALITY AND NOISE'][i]
        average_seasonal_value_counts[df['MONTH'][i]] =
            average_seasonal_value_counts[df['MONTH'][i]] + 1

#calculate the average seasonal component for each month
for i in range(1, 13):
    average_seasonal_values[i] = average_seasonal_values[i] / average_seasonal_value_counts[i]

#create a new column in the data frame and fill it with the value of the average seasonal component for the corresponding month

df['SEASONALITY'] = np.nan

for i in range(0, df['SEASONALITY AND NOISE'].size):
    if math.isnan(df['SEASONALITY AND NOISE'][i]) is False:
        df['SEASONALITY'][i] = 
            average_seasonal_values[df['MONTH'][i]]

Let’s plot this pure seasonal component:

#plot the seasonal component
fig = plt.figure()

fig.suptitle('The \'pure\' SEASONAL component')

plt.ylim(0, 1.3)

df['SEASONALITY'].plot()

plt.show()
(Image by Author)

STEP 5: Finally, we will divide the noisy seasonal value that we had isolated earlier with the averaged out seasonal value to yield just the noise component for each month.

noise component = noisy seasonal component / averaged out seasonal component

df['NOISE'] = df['SEASONALITY AND NOISE']/df['SEASONALITY']

#plot the seasonal component

fig = plt.figure()

fig.suptitle('The NOISE component')

plt.ylim(0, 1.3)

df['NOISE'].plot()

plt.show()
(Image by Author)

So there you have it! We just hand cranked out the procedure for decomposing a time series into its trend, seasonal and noise components.

Here is a collage of the time series and its constituent components:

(Image by Author)

Time series decomposition using statsmodels

Now that we know how decomposition works from the inside, we can cheat a little, and use the seasonal_decompose() in statsmodels to do all of the above work in one line of code:

from statsmodels.tsa.seasonal import seasonal_decompose

components = seasonal_decompose(df['Retail_Sales'], model='multiplicative')

components.plot()

Here’s the plot we get:

Output of seasonal_decompose() on the Retail Used Car Sales data set
Output of seasonal_decompose() on the Retail Used Car Sales data set (Image by Author)

Here is the complete Python source code:

import pandas as pd
import numpy as np
import math
from matplotlib import pyplot as plt
#construct the date parser
mydateparser = lambda x: pd.datetime.strptime(x, '%d-%m-%y')
#load the data set into a pandas data frame
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, index_col=0, parse_dates=['DATE'], date_parser=mydateparser)
#plot the data set
fig = plt.figure()
fig.suptitle('Retail sales of used car dealers in the US in millions of dollars')
df['Retail_Sales'].plot()
plt.show()
#add a column containing a 2 x 12 centered moving average. this column will capture the trend component in the time series
df['2 x 12 CMA (TREND)'] = np.nan
for i in range(6,df['Retail_Sales'].size6):
df['2 x 12 CMA (TREND)'][i] = df['Retail_Sales'][i 6] * 1.0 / 24 + (
df['Retail_Sales'][i 5] + df['Retail_Sales'][i 4] + df['Retail_Sales'][i 3] + df['Retail_Sales'][
i 2] + df['Retail_Sales'][i 1] + df['Retail_Sales'][i] + df['Retail_Sales'][i + 1] + df['Retail_Sales'][
i + 2] + df['Retail_Sales'][i + 3] + df['Retail_Sales'][i + 4] + df['Retail_Sales'][
i + 5]) * 1.0 / 12 + df['Retail_Sales'][i + 6] * 1.0 / 24
#plot the trend component
fig = plt.figure()
fig.suptitle('TREND component of Retail sales of used car dealers in the US in millions of dollars')
df['2 x 12 CMA (TREND)'].plot()
plt.show()
df['SEASONALITY AND NOISE'] = df['Retail_Sales']/df['2 x 12 CMA (TREND)']
#plot the seasonality and noise components
fig = plt.figure()
fig.suptitle('SEASONALITY and NOISE components')
plt.ylim(0, 1.3)
df['SEASONALITY AND NOISE'].plot()
plt.show()
#calculate the average seasonal component for each month
#first add a month column
df['MONTH'] = df.index.strftime('%m').astype(np.int)
#initialize the month based dictionaries to store the running total of themonth wise seasonal sums and counts
average_seasonal_values = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0, 11:0, 12:0}
average_seasonal_value_counts = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0, 11:0, 12:0}
#calculate the sums and counts
for i in range(0, df['SEASONALITY AND NOISE'].size):
if math.isnan(df['SEASONALITY AND NOISE'][i]) is False:
average_seasonal_values[df['MONTH'][i]] = average_seasonal_values[df['MONTH'][i]] + df['SEASONALITY AND NOISE'][
i]
average_seasonal_value_counts[df['MONTH'][i]] = average_seasonal_value_counts[df['MONTH'][i]] + 1
#calculate the average seasonal component for each month
for i in range(1, 13):
average_seasonal_values[i] = average_seasonal_values[i] / average_seasonal_value_counts[i]
#create a new column in the data frame and fill it with the value of the average seasonal component for the corresponding month
df['SEASONALITY'] = np.nan
for i in range(0, df['SEASONALITY AND NOISE'].size):
if math.isnan(df['SEASONALITY AND NOISE'][i]) is False:
df['SEASONALITY'][i] = average_seasonal_values[df['MONTH'][i]]
#plot the seasonal component
fig = plt.figure()
fig.suptitle('The \'Pure\' SEASONAL component')
plt.ylim(0, 1.3)
df['SEASONALITY'].plot()
plt.show()
df['NOISE'] = df['SEASONALITY AND NOISE']/df['SEASONALITY']
#plot the seasonal component
fig = plt.figure()
fig.suptitle('The NOISE component')
plt.ylim(0, 1.3)
df['NOISE'].plot()
plt.show()
#Do all of the above using one line of code!
from statsmodels.tsa.seasonal import seasonal_decompose
components = seasonal_decompose(df['Retail_Sales'], model='multiplicative')
components.plot()
plt.show()
Time Series Decomposition

And here is the link to the data set used in the Python example.


Citations and Copyrights

Data sets

U.S. Census Bureau, Retail Sales: Used Car Dealers [MRTSSM44112USN], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/MRTSSM44112USN, June 17, 2020, under FRED copyright terms. Download link to curated data set.

U.S. Census Bureau, E-Commerce Retail Sales [ECOMNSA], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/ECOMN, under FRED copyright terms.

SILSO, World Data Center — Sunspot Number and Long-term Solar Observations, Royal Observatory of Belgium, on-line Sunspot Number catalogue: http://www.sidc.be/SILSO/, 1818–2020 (CC-BY-NA)

Samuel H. Williamson, “Daily Closing Values of the DJA in the United States, 1885 to Present,” MeasuringWorth, 2020
 URL: http://www.measuringworth.com/DJA/

Images

All images in this article are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.


PREVIOUS: How To Adjust For Inflation In Monetary Data Sets

NEXT: The White Noise Model


UP: Table of Contents