JSC370 Final Project
  • Home
  • Final Report
  • Midterm Report
  • Interactive Viz
JSC370 · Winter 2026 · University of Toronto

Carbon Dioxide Emissions per Capita, 1990–2023

How economic, energy, and land-use factors explain cross-country differences.

Yi Fan (Eric) Wang  ·  April 26, 2026

Read the Final Report → Explore Interactive Visuals GitHub Repository ↗ Download Final Report PDF ↓
At a glance
~7,400 Country–year observations
6 World Bank indicators
XGBoost Final best model
GDP / capita Strongest predictor
The story

Climate change is one of the major global challenges, and it is largely driven by carbon dioxide emissions from human activities. Those emissions are not evenly distributed across countries: countries at different stages of economic development rely on different energy sources, follow different urbanization paths, and face different land-use pressures, all of which leave distinct imprints on per-capita emissions.

This project uses a 34-year panel of every country reporting to the World Bank to map how those imprints have evolved — and to ask which factors actually drive cross-country differences once everything else is held equal. Six indicators carry the analysis: carbon dioxide emissions per capita as the outcome, with GDP per capita, fossil-fuel electricity share, renewable energy share, forest area, and urban population share as candidate predictors, plus the calendar year as a temporal control.

Research question

How have carbon dioxide emissions per capita evolved across countries from 1990 to 2023, and which economic, energy, and land-use factors explain and predict these differences?

The Midterm Report built the data pipeline from the World Bank Open Data API, cleaned 7,378 country-year records down to 5,499 complete cases, and laid the explanatory groundwork with an OLS regression (adjusted $R^2$ approximately 0.66) and a Generalized Additive Model (pseudo $R^2$ approximately 0.73). The GAM's significantly nonlinear smooth terms made it clear that a strictly linear story was leaving variance on the table.

The Final Report picks up from there. It keeps the same panel and the same predictor set so the two reports stay directly comparable, then swaps in two flexible tree-based models — a pruned Decision Tree as an interpretable baseline and a tuned XGBoost Regressor as the flexible predictor — with country-grouped train/validation/test splits to avoid country-level leakage. The result is both better predictive performance and a sharper, gain-based picture of which predictors actually do the work.

Data sources

World Bank Open Data API

All six indicators come from the World Bank Open Data API (WDI), pulled live for 1990–2023 across every country reporting. After dropping regional aggregates and standardizing missing markers, the panel covers 196 countries with 7,378 country–year rows, which the modelling stage filters to 5,499 complete cases.

Indicator WDI code Unit
CO2 emissions per capita outcome EN.GHG.CO2.PC.CE.AR5 metric tons / capita
GDP per capita NY.GDP.PCAP.KD constant 2015 USD / person
Fossil-fuel electricity share EG.ELC.FOSL.ZS % of electricity
Renewable energy share EG.FEC.RNEW.ZS % of final energy
Forest area AG.LND.FRST.ZS % of land area
Urban population share SP.URB.TOTL.IN.ZS % of population
Key findings

Three takeaways

Trajectory

Emissions peaked in the early 2010s

Global per-capita emissions rose through the 2000s, peaked around 2012, then declined — with a sharp dip in 2020 from the COVID slowdown.

Best model

Tuned XGBoost wins on every metric

The tuned XGBoost beats the pruned Decision Tree and an untuned XGBoost baseline on R², RMSE, and MAE on the held-out test set.

Top driver

Log GDP per capita dominates

Followed by renewable energy share and urban population share. Forest area, fossil-fuel electricity share, and the calendar year contribute the least.

Model comparison

Test performance across models

All three models are evaluated on the same held-out test set of unseen countries, using country-grouped train/validation/test splits to prevent country-level leakage. Metrics are on log carbon dioxide emissions per capita.

Model Test R² Test RMSE Test MAE
Decision Tree (pruned) 0.706 0.558 0.374
XGBoost (tuned) best 0.722 0.543 0.359

Higher R² is better; lower RMSE and MAE are better. The tuned XGBoost wins on every metric — see the report for full hyperparameters and importance plots.

What's on this site

Four ways to dive in

Final Report

Full written analysis

Introduction, methods, results, and conclusions. Available as HTML and as a downloadable PDF.

Midterm Report

The earlier OLS & GAM analysis

Background context — the midterm-stage analysis (OLS regression and GAM) that this Final Report extends. HTML only.

Interactive

Three plotly figures

An animated world CO₂ map, a histogram by decade, and a CO₂-vs-GDP scatter by continent — all interactive.

Code

GitHub repository

Source .qmd files, the data pipeline, all figures, and the rendered website — fully reproducible.

Methodology

How the analysis flows

Data → Cleaning → EDA → Decision Tree + XGBoost → Evaluation

JSC370 2026S · Yi Fan (Eric) Wang · University of Toronto

Download Final Report PDF · GitHub

April 26, 2026