The trend extraction methods covered in the other vignettes return a
single smooth component. Often, though, an economic series is more
naturally described as the sum of three parts: a slow-moving
trend, a repeating seasonal pattern,
and an irregular remainder.
decompose_series() splits a series into these three
components and adds them as columns to the original data frame.
The theme below is used throughout the vignette for consistent styling.
library(ggplot2)
theme_series <- theme_minimal(paper = "#fefefe") +
theme(
legend.position = "bottom",
panel.grid.minor = element_blank(),
strip.background = element_rect(fill = "#2c3e50"),
strip.text = element_text(color = "#fefefe"),
axis.ticks.x = element_line(color = "gray40", linewidth = 0.5),
axis.line.x = element_line(color = "gray40", linewidth = 0.5),
axis.title.x = element_blank(),
palette.colour.discrete = c(
"#2c3e50",
"#e74c3c",
"#f39c12",
"#1abc9c",
"#9b59b6"
)
)It is worth being clear about the difference between this function
and augment_trends().
augment_trends() returns only the
trend (trend_* columns). The seasonal and
irregular movements are simply smoothed away.decompose_series() returns all three
components (trend_*, seasonal_*, and
remainder_*), and they add back up exactly to the original
series.Because there is a seasonal component,
decompose_series() requires seasonal data: monthly
(frequency = 12) or quarterly (frequency = 4).
For annual series there is nothing seasonal to isolate, so use
augment_trends() instead.
Let’s start with the gdp_construction dataset, a
quarterly index of Brazilian construction activity that ships with
trendseries.
ggplot(gdp_construction, aes(date, index)) +
geom_line(lwd = 0.7) +
scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
labs(
title = "Brazilian construction activity",
y = "Index"
) +
theme_seriesPassing the data to decompose_series() adds three new
columns. The frequency is detected automatically from the date
column.
gdp_parts <- gdp_construction |>
decompose_series(value_col = "index")
gdp_parts
#> # A tibble: 124 × 5
#> date index trend_stl seasonal_stl remainder_stl
#> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 1995-01-01 100 102. -4.37 1.93
#> 2 1995-04-01 100 101. -1.62 0.476
#> 3 1995-07-01 100 100. 4.44 -4.62
#> 4 1995-10-01 100 99.4 1.55 -0.946
#> 5 1996-01-01 97.8 101. -4.37 1.42
#> 6 1996-04-01 101. 102. -1.62 0.613
#> 7 1996-07-01 107. 103. 4.44 0.370
#> 8 1996-10-01 103. 104. 1.55 -2.49
#> 9 1997-01-01 101. 106. -4.37 -0.530
#> 10 1997-04-01 108. 109. -1.62 0.849
#> # ℹ 114 more rowsThe three components are named after the method (stl by
default): trend_stl, seasonal_stl, and
remainder_stl.
Reshaping to long format makes it easy to plot the original series alongside its three components.
gdp_long <- gdp_parts |>
pivot_longer(
cols = c(index, trend_stl, seasonal_stl, remainder_stl),
names_to = "component",
values_to = "value"
) |>
mutate(
component = factor(
component,
levels = c("index", "trend_stl", "seasonal_stl", "remainder_stl"),
labels = c("Observed", "Trend", "Seasonal", "Remainder")
)
)ggplot(gdp_long, aes(date, value)) +
geom_line(aes(color = component), lwd = 0.7, show.legend = FALSE) +
facet_wrap(vars(component), ncol = 1, scales = "free_y") +
scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
labs(
title = "STL decomposition of construction activity",
subtitle = "Observed = Trend + Seasonal + Remainder",
y = NULL
) +
theme_seriesThe default method is STL (Seasonal-Trend
decomposition via Loess), implemented with stats::stl().
The seasonal component is estimated with a loess smoother, the trend
with an adaptive moving average, and the remainder is whatever is left
over. The defaults (s.window = "periodic",
robust = FALSE) suit most economic series with a stable
seasonal pattern.
Fine control is available through the params list. The
two most useful options are an evolving seasonal pattern and robust
fitting.
# Allow the seasonal pattern to evolve slowly over time, and downweight outliers
gdp_robust <- gdp_construction |>
decompose_series(
value_col = "index",
params = list(s.window = 13, robust = TRUE)
)A s.window of "periodic" forces an
identical seasonal shape in every year; supplying a positive odd integer
instead lets that shape drift gradually. Setting
robust = TRUE reduces the influence of one-off spikes on
the trend and seasonal estimates.
The regression method
(methods = "regression") fits a single OLS model
\[ y_t = f(t) + s(t) + \epsilon_t \]
where \(f(t)\) is a polynomial in time and \(s(t)\) is a set of period (month or quarter) dummy variables. The trend is the constant plus the polynomial terms, the seasonal component is the period effect (centred to mean zero), and the remainder is the model residual.
Unlike STL, the regression trend is a smooth global polynomial, which
can be a better description of a series with a steady long-run
direction. For the examples that follow we use the
oil_derivatives dataset, which records monthly
petroleum-derivatives production in Brazil; we restrict it to the
1995–2006 window and work on the log scale.
suboil <- oil_derivatives |>
filter(between(date, as.Date("1995-06-01"), as.Date("2006-12-01"))) |>
mutate(lprod = log(production))
ggplot(suboil, aes(date, lprod)) +
geom_line(lwd = 0.7) +
labs(
title = "Petroleum derivatives production",
y = "Thousand barrels per day (log scale)"
) +
theme_seriesThe trend argument selects the polynomial form:
"linear" (the default), "quadratic", or
"cubic". Orthogonal polynomials are used by default for
numerical stability.
suboil_reg <- suboil |>
decompose_series(
value_col = "lprod",
methods = "regression",
trend = "cubic"
)
ggplot(suboil_reg, aes(date)) +
geom_line(aes(y = lprod, color = "Original"), lwd = 0.7) +
geom_line(aes(y = trend_regression, color = "Trend (reg.)"), lwd = 0.7) +
labs(title = "Regression cubic trend", y = "log production", color = NULL) +
theme_seriesIf the series is trend-stationary, remainder_regression
— the series with the trend and seasonality removed — is approximately
stationary.
ggplot(suboil_reg, aes(date, remainder_regression)) +
geom_line(lwd = 0.7) +
labs(title = "Regression remainder", y = NULL) +
theme_seriesIt is instructive to compare the two decomposition methods directly. Both produce fixed seasonal patterns, but STL follows short-term fluctuations more closely, producing a “cleaner” remainder.
comparison <- suboil |>
decompose_series(value_col = "lprod", methods = "stl") |>
decompose_series(value_col = "lprod", methods = "regression", trend = "cubic")
comparison_long <- comparison |>
pivot_longer(
cols = trend_stl:remainder_regression,
names_pattern = "(.*)_(.*)",
names_to = c("component", "method"),
values_to = "trend"
) |>
mutate(
component = factor(component, levels = c("trend", "seasonal", "remainder"))
)
ggplot(comparison_long, aes(date, trend)) +
# Original series
geom_line(
data = suboil,
aes(y = lprod),
layout = c(1, 2),
lwd = 0.5,
alpha = 0.5
) +
# Components and methods
geom_line(aes(y = trend, color = component), lwd = 0.7) +
facet_grid(vars(component), vars(method), scales = "free_y") +
guides(color = "none") +
labs(title = "STL vs regression decomposition", y = NULL) +
theme_seriesLike augment_trends(), decompose_series()
accepts a group_cols argument to decompose several series
at once. The data must be in tidy long format. Here we use the
electricity dataset, which records monthly electricity
consumption for three sectors (residential, commercial, and
industrial).
electricity_parts <- electricity |>
mutate(lvalue = log(value)) |>
decompose_series(value_col = "lvalue", group_cols = "name_series")
glimpse(electricity_parts)
#> Rows: 1,689
#> Columns: 7
#> $ date <date> 1979-02-01, 1979-03-01, 1979-04-01, 1979-05-01, 1979-06…
#> $ name_series <chr> "electric_commercial", "electric_commercial", "electric_…
#> $ value <dbl> 1030, 1057, 1044, 1038, 1002, 979, 985, 1047, 1067, 1113…
#> $ lvalue <dbl> 6.937314, 6.963190, 6.950815, 6.945051, 6.909753, 6.8865…
#> $ trend_stl <dbl> 6.908726, 6.918239, 6.927753, 6.937020, 6.946288, 6.9554…
#> $ seasonal_stl <dbl> 0.04401996, 0.04635439, 0.04502582, -0.01022254, -0.0527…
#> $ remainder_stl <dbl> -0.015431723, -0.001403624, -0.021963659, 0.018253335, 0…Each group is decomposed independently, and the components are stacked back into a single data frame.
ggplot(electricity_parts, aes(date)) +
geom_line(aes(y = seasonal_stl), color = "#2c3e50", lwd = 0.8) +
facet_wrap(vars(name_series), ncol = 1, scales = "free_y") +
scale_x_date(date_breaks = "3 years", date_labels = "%Y") +
labs(
title = "Electricity consumption by sector",
subtitle = "STL seasonal component extracted per group (log scale)",
y = "seasonal factor (log)"
) +
theme_seriesSo far we have handled multiplicative seasonality by logging the
series before decomposing (lprod = log(production),
lvalue = log(value)). Many macroeconomic series behave this
way: the seasonal variations grow with the level of the series, so an
additive decomposition of the raw data would leave a seasonal pattern
that widens over time.
Every decompose_series() method is additive by
construction. Instead of logging manually, you can pass
transform = "log", which logs the series, decomposes it
additively, and exponentiates the components back to the original
scale.
On the log scale the additive identity holds; after exponentiating,
the components satisfy the multiplicative identity
trend * seasonal * remainder = value. This requires
strictly positive data.
methods = "classic" is the textbook decomposition
implemented by stats::decompose(). The trend is a centred
moving average of order equal to the frequency, the seasonal component
is the average detrended value for each period, and the remainder is
what is left. It is simple and fast, but shouldn’t be used in
practice.
methods = "bsm" fits a Basic Structural (state-space)
Model with stats::StructTS(): stochastic level, slope, and
seasonal components estimated by maximum likelihood and extracted with
the Kalman smoother. Unlike the moving-average methods, it returns trend
and seasonal estimates for every observation and lets both
components evolve over time. The trade-off is that it relies on
numerical optimisation, which can occasionally fail to converge on short
or irregular series.
seasonal)methods = "seats" runs the U.S. Census Bureau’s
X-13ARIMA-SEATS program through the seasonal package — the
seasonal-adjustment procedure used by many statistical agencies.
seas() is called with its automatic defaults: ARIMA model
selection, log/level transformation, outlier detection, and calendar
adjustment. The SEATS trend-cycle and seasonally adjusted series are
then mapped to a trend/seasonal/remainder triple that reproduces the
original series exactly. Because X-13 selects its own transformation
internally, there is normally no need to set
transform = "log" for this method.
# requires the 'seasonal' package
seas_oil <- decompose_series(
suboil,
value_col = "lprod",
methods = "seats"
)
ggplot(seas_oil, aes(date)) +
geom_line(aes(y = lprod, color = "Original"), lwd = 0.7) +
geom_line(aes(y = trend_seats, color = "Trend (SEATS)"), lwd = 0.7) +
labs(title = "SEATS trend-cycle", y = "log production") +
theme_seriesggplot(seas_oil, aes(date, seasonal_seats)) +
geom_line(lwd = 0.7) +
labs(title = "SEATS seasonal component", y = NULL) +
theme_seriesOften our interest is mainly in the series without its
seasonal component. For convenience, deseason_series()
wraps decompose_series() and returns the seasonally
adjusted series directly.
deseas_oil <- deseason_series(
suboil,
value_col = "lprod",
methods = "seats",
# set to TRUE to also return the trend, seasonal, and remainder components
components = FALSE
)
ggplot(deseas_oil, aes(date, seasadj_seats)) +
geom_line(lwd = 0.7) +
labs(
title = "SEATS seasonally adjusted series",
y = "log production"
) +
theme_seriesThis is equivalent to using seasadj = TRUE in
decompose_series().
decompose_series() splits a seasonal (monthly or
quarterly) series into trend, seasonal, and remainder components that
sum to the original series."stl" (default),
"regression", "classic", "bsm",
and "seats"transform = "log" for
multiplicative seasonality, where the components instead multiply to the
original series.deseason_series() is a convenience wrapper that returns
only the seasonally adjusted series.