| Title: | São Paulo Metro Passenger Demand Data |
|---|---|
| Description: | Provides passenger demand data for the São Paulo metro system, covering 2012 to 2026. Datasets include monthly passenger entries and transported counts by line, average weekday station entries, daily station entries, and spatial geometries for metro and commuter train lines and stations. |
| Authors: | Vinicius Oike [aut, cre, cph] |
| Maintainer: | Vinicius Oike <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-07 07:12:52 UTC |
| Source: | https://github.com/viniciusoike/metrosp |
A daily calendar for São Paulo (city) covering 2012–2030, classifying each date as a holiday or business day. Includes national, state, and municipal holidays in São Paulo, with flags for optional work days (is_ponto_facultativo) and extended holiday weekends (is_feriadao).
calendar_spocalendar_spo
A data frame with one row per day and the following columns:
Calendar date (Date).
Calendar year (integer).
Day of week from lubridate::wday(): 1 = Sunday,
2 = Monday, ..., 7 = Saturday (integer).
TRUE for Saturdays and Sundays (logical).
TRUE when the date is a gazetted holiday
at any scope (logical).
TRUE when the date is neither a weekend
nor a holiday (logical).
Name of the holiday in Portuguese (character).
NA on non-holiday dates.
Scope of the holiday (character).
One of "national", "state", or "municipal";
NA on non-holiday dates.
TRUE for holidays that are technically
optional at the federal level (Carnaval, Corpus Christi) but observed
as holidays in São Paulo (logical).
TRUE when a holiday falls on Monday, Tuesday,
Thursday, or Friday, creating a potential extended weekend with the
adjacent Saturday/Sunday (logical).
The calendar covers the full date range of the
station_daily dataset (Lines 4/5 from January 2012) and
extends through 2030 for forecasting use.
station_daily for daily passenger data that can be
joined on date.
Spatial line geometries for São Paulo metro (METRO SP) and commuter train (CPTM) lines, including both currently operating lines and planned future expansions.
lineslines
An sf data frame with LINESTRING geometry (CRS: WGS84 / EPSG:4326) and the following columns:
Official line number (integer).
Portuguese color name of the line (character).
English color name of the line (character).
Operating company name (character).
Either "metro" (METRO SP) or "train" (CPTM)
(character).
Either "current" (operating) or "future"
(planned expansion) (character).
Line route geometry (sfc_LINESTRING).
Requires the sf package to work with spatial features. The
distinction between types follows GeoSampa's classification. Broadly,
"metro" lines run underground as a subway and "train" lines
run above ground as commuter rail, though exceptions exist.
GeoSampa, Prefeitura de São Paulo. https://geosampa.prefeitura.sp.gov.br/
stations for station point locations.
A named character vector of official hex color codes for the six metro lines operated by METRO SP (Lines 1–3 and 15) and ViaMobilidade (Lines 4 and 5).
metro_colorsmetro_colors
A named character vector of length 6. Names are English color names; values are hex color codes:
Line 1 — "#171796"
Line 2 — "#007A5E"
Line 3 — "#ED2E38"
Line 4 — "#FFD525"
Line 5 — "#874ABF"
Line 15 — "#8F8F8C"
Colors follow official METRO SP and ViaMobilidade branding. Only the six currently operating metro lines are included; CPTM train lines and planned future lines (e.g., Line 6 Orange, Line 17 Gold) are not covered.
lines for the full line reference (numbers, names,
and route geometries).
Monthly count of passengers entering São Paulo metro stations, aggregated by metro line. Data covers October 2017 through 2026 for Lines 1, 2, 3, and 15; Line 4 from January 2012; Line 5 from October 2017. Sourced from the METRO SP transparency portal and the Insper Dataverse.
passengers_entrancepassengers_entrance
A data frame with the following columns:
First day of the month (Date).
Metro line number: 1, 2, 3, 4, 5, 15, or 99 for the network total (integer).
Abbreviated metric code (character). One of:
"total", "mdu", "msa", "mdo",
"max".
Passenger count (numeric).
Measurement type in English (character). One of:
"Total", "Average on Business Days",
"Average on Saturdays", "Average on Sundays",
"Daily Peak".
Measurement type in Portuguese (character). One of:
"Total", "Média dos Dias Úteis",
"Média dos Sábados", "Média dos Domingos",
"Máxima Diária".
English name of the metro line (character).
Portuguese name of the metro line (character).
Calendar year (integer).
Data by source and line:
Lines 1, 2, 3, and 15: METRO SP transparency portal, October 2017–2026.
Line 4 (Amarela/ViaQuatro): Insper Dataverse, January 2012–2026.
Line 5 (Lilás/ViaMobilidade): METRO SP transparency portal, October 2017–July 2018; Insper Dataverse, August 2018–2026.
Network total (line_number = 99): METRO SP transparency
portal only; may not be available for all years.
Metrics:
total: Total passengers in the month.
mdu: Average daily entries on business days
(Média dos Dias Úteis).
msa: Average daily entries on Saturdays
(Média dos Sábados).
mdo: Average daily entries on Sundays
(Média dos Domingos).
max: Daily maximum (Máxima Diária).
Months beyond the last published data point for each line are trimmed
during assembly; interior NAs (e.g. operational outages) are
preserved.
Companhia do Metropolitano de São Paulo (METRO SP). https://transparencia.metrosp.com.br/dataset/demanda
passengers_transported for transported counts,
station_averages for station-level weekday averages.
Monthly count of passengers transported by São Paulo metro, aggregated by metro line. Data covers October 2017 through 2026 for Lines 1, 2, 3, and 15, and October 2017 through December 2019 for Line 5. Sourced from the METRO SP transparency portal.
passengers_transportedpassengers_transported
A data frame with the following columns:
First day of the month (Date).
Metro line number: 1, 2, 3, 5, 15, or 99 for the network total (integer).
Abbreviated metric code (character). One of:
"total", "mdu", "msa", "mdo",
"max".
Passenger count (numeric).
Measurement type in English (character). One of:
"Total", "Average on Business Days",
"Average on Saturdays", "Average on Sundays",
"Daily Peak".
Measurement type in Portuguese (character). One of:
"Total", "Média dos Dias Úteis",
"Média dos Sábados", "Média dos Domingos",
"Máxima Diária".
English name of the metro line (character).
Portuguese name of the metro line (character).
Calendar year (integer).
All data comes from the METRO SP transparency portal. Line 4 (Amarela)
is not available in this dataset — the Insper Dataverse source does not
include transported counts for Lines 4 or 5. Line 5 (Lilás) is available
from the METRO portal only for October 2017–December 2019. The network
total (line_number = 99) may not be available for all years.
Metrics:
total: Total passengers in the month.
mdu: Average daily entries on business days
(Média dos Dias Úteis).
msa: Average daily entries on Saturdays
(Média dos Sábados).
mdo: Average daily entries on Sundays
(Média dos Domingos).
max: Daily maximum (Máxima Diária).
Months beyond the last published data point for each line are trimmed
during assembly; interior NAs (e.g. operational outages) are
preserved.
Companhia do Metropolitano de São Paulo (METRO SP). https://transparencia.metrosp.com.br/dataset/demanda
passengers_entrance for entry counts,
station_averages for station-level weekday averages.
Monthly average of weekday (business day) passenger entries for each station in the São Paulo metro system. Data covers October 2017 through 2026 for Lines 1, 2, 3, and 15; Line 4 from January 2012; Line 5 from October 2017. Sourced from the METRO SP transparency portal and the Insper Dataverse.
station_averagesstation_averages
A data frame with the following columns:
First day of the month (Date).
Metro line number (integer).
Name of the metro station (character).
Average weekday passenger entries (numeric).
English name of the metro line (character).
Portuguese name of the metro line (character).
Calendar year (integer).
Only the weekday average (mdu) metric is available at the station level.
For line-level data with all five metrics, see
passengers_entrance. Trailing months whose data has not yet
been published by the source are excluded (rows with NA values are
dropped during assembly).
Station coverage by line and source:
Line 1 (Azul/Blue): 23 stations, October 2017–2026 (METRO SP portal).
Line 2 (Verde/Green): 14 stations, October 2017–2026 (METRO SP portal).
Line 3 (Vermelha/Red): 18 stations, October 2017–2026 (METRO SP portal).
Line 4 (Amarela/Yellow): January 2012–2026 (Insper Dataverse).
Line 5 (Lilás/Lilac): October 2017–July 2018 (METRO SP portal) and August 2018–2026 (Insper Dataverse).
Line 15 (Prata/Silver): 10 stations in 2020, 11 from January 2021 onward (Jardim Colonial added), October 2017–2026 (METRO SP portal).
Companhia do Metropolitano de São Paulo (METRO SP). https://transparencia.metrosp.com.br/dataset/demanda
station_daily for daily station entries,
passengers_entrance for monthly line-level totals.
Daily passenger entries at each station in the São Paulo metro system. Data covers January 2012 through 2026 for Lines 4 and 5 (Insper Dataverse), and 2020 through 2026 for Lines 1, 2, 3, and 15 (METRO SP transparency portal).
station_dailystation_daily
A data frame with the following columns:
Date of observation (Date).
Metro line number: 1, 2, 3, 4, 5, or 15 (integer).
Full station name (character).
Daily passenger entries (numeric).
English name of the metro line (character).
Portuguese name of the metro line (character).
Three-letter station abbreviation used internally
by METRO SP (character). NA for Lines 4 and 5 (Dataverse
source).
Calendar year (integer).
Station coverage and date range by line:
Line 1 (Azul/Blue): 23 stations, 2020–2026 (METRO SP portal).
Line 2 (Verde/Green): 14 stations, 2020–2026 (METRO SP portal).
Line 3 (Vermelha/Red): 18 stations, 2020–2026 (METRO SP portal).
Line 4 (Amarela/Yellow): January 2012–2026 (Insper Dataverse);
station_code is NA.
Line 5 (Lilás/Lilac): August 2018–2026 (Insper Dataverse);
station_code is NA.
Line 15 (Prata/Silver): 10 stations in 2020, 11 from 2021 onward (Jardim Colonial added), 2020–2026 (METRO SP portal).
Some stations appear on multiple lines (e.g., Ana Rosa on Lines 1 and 2,
Paraíso on Lines 1 and 2, Sé on Lines 1 and 3). These are recorded
separately for each line. Days beyond the last published data point for each line are trimmed
during assembly; interior NAs (e.g. operational outages) are
preserved.
Companhia do Metropolitano de São Paulo (METRO SP). https://transparencia.metrosp.com.br/dataset/demanda
station_averages for monthly weekday averages,
passengers_entrance for monthly line-level totals.
Inauguration (commercial opening) dates for São Paulo metro stations,
covering stations whose opening falls within or near the
station_daily / station_averages window. Used
to flag ramp-up periods in which monthly ridership is still climbing
toward steady-state and should generally be excluded from year-on-year or
baseline comparisons.
station_inaugurationstation_inauguration
A data frame with one row per (line, station):
Metro line number (integer).
Full station name (character).
Date of commercial opening (Date). NA
for stations whose opening predates the dataset window (i.e., they
were already operating when the data record begins).
Short label identifying the expansion phase, e.g.
"L15 Fase 4" (character).
Whether the inauguration date has been cross-checked
against the operator's announcement or an equivalently reliable
source (logical). Stations with verified = FALSE carry
best-effort dates and should not be relied on for legal or
publication purposes without re-checking.
Free-text annotations about the source or any caveats
(character, possibly NA).
TRUE when inauguration_date is
NA because the station opened before the data starts
(logical).
inauguration_date + 180 days — a heuristic
end of the initial ramp-up period (Date). NA when
pre_data_window is TRUE.
The table is assembled by data-raw/build_station_inauguration.R
from data-raw/station_inauguration.csv. To extend the table or
verify uncertain dates, edit the CSV (setting verified = TRUE
once cross-checked) and re-run the build script.
Suggested use: when computing pre/post comparisons (e.g.\ 12m-vs-prior-12m
or recovery-vs-2019), exclude stations where either window overlaps
ramp_up_end to avoid mistaking ramp-up growth for organic demand
change.
Compiled from operator announcements (Companhia do Metropolitano de São Paulo, ViaQuatro, ViaMobilidade).
stations for spatial point locations,
station_averages for monthly weekday averages.
Spatial point locations for São Paulo metro (METRO SP) and commuter train (CPTM) stations, including both currently operating stations and planned future stations.
stationsstations
An sf data frame with POINT geometry (CRS: WGS84 / EPSG:4326) and the following columns:
Station name in title case (character).
Line number the station belongs to (integer).
Portuguese color name of the line (character).
English color name of the line (character).
Operating company name (character).
Either "metro" (METRO SP) or "train" (CPTM)
(character).
Either "current" (operating) or "future"
(planned expansion) (character).
Station location (sfc_POINT).
Requires the sf package to work with spatial features. The
distinction between types follows GeoSampa's classification. Broadly,
"metro" lines run underground as a subway and "train" lines
run above ground as commuter rail, though exceptions exist.
GeoSampa, Prefeitura de São Paulo. https://geosampa.prefeitura.sp.gov.br/
lines for line route geometries,
station_averages for passenger data by station.