This is an old article from 2022. With the support of AI now, it should be much easier to accomplish.
Background
I happened to see a video discussing national fortunes, and felt inspired after watching it, so I shared it on my video account. Later, I thought, could I find some data to support it? Thus, this video (and article) was created.
Data Acquisition
I initially wanted to find data starting from the Tang Dynasty, but then I realized that it seemed unreasonable, as it would be difficult to make horizontal comparisons. Collecting and estimating the data myself could be done, but it doesn’t seem to be my area of expertise. So I settled for existing data: GM-GDP-Dataset-v27.xlsx.
The data is on GDrive, and some friends may not be able to access it. You can get the download link by replying <span>0514GDP</span> in the public account backend.
Data Processing
File Format Conversion
The downloaded data is an <span>Excel</span> file with multiple <span>sheets</span>. To facilitate processing, we will convert it into a <span>csv</span> file. There are two methods:
- Simple and convenient, directly open the file with
<span>Excel</span>, select the desired<span>sheet</span>, and save it as<span>csv</span>. - Use a program.
<span>xlsx</span> to <span>csv</span>
- First, import the necessary packages:
import pandas as pd
from pyecharts import options as opts
from pyecharts.charts import Bar, Timeline
from pyecharts.commons.utils import JsCode
- Convert Excel to csv:
# Read Excel, specify the workbook
df = pd.read_excel("GM-GDP-Dataset-v27.xlsx", sheet_name="data-GDP-in-columns")
# Since the original data has two rows of descriptions, we need to reprocess the column names
columns = ["geo", "name"] # First and second columns
columns.extend(df.iloc[1, 2:].astype(int).astype(str).tolist()) # Other columns
df.columns = columns # Reset column names
df = df[2:] # Remove those two rows of descriptions
- Further processing:
df = df.fillna(0) # Fill NaN with 0
df.index = df["name"] # Change index to country names
df.pop("geo") # Discard abbreviation
df.pop("name") # Discard country names (already in the index)
- More processing:
df = df[:-7] # The last 7 rows are data for the world and continents, discard them
- Processing result:
df.head()
\t1800\t1801\t1802\t1803\t1804\t1805\t1806\t1807\t1808\t1809\t...\t2041\t2042\t2043\t2044\t2045\t2046\t2047\t2048\t2049\t2050
name
Afghanistan\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t2.211294e+09\t...\t1.663444e+11\t1.723039e+11\t1.784054e+11\t1.846484e+11\t1.910328e+11\t1.975585e+11\t2.042263e+11\t2.110376e+11\t2.179942e+11\t2.250976e+11
Albania\t2.982917e+08\t2.996139e+08\t3.009420e+08\t3.022761e+08\t3.036162e+08\t3.054195e+08\t3.067735e+08\t3.081335e+08\t3.094995e+08\t3.108722e+08\t...\t6.399410e+10\t6.485958e+10\t6.572149e+10\t6.657911e+10\t6.743400e+10\t6.828550e+10\t6.913390e+10\t6.997973e+10\t7.082276e+10\t7.166292e+10
Algeria\t1.969643e+09\t1.976833e+09\t1.984050e+09\t1.991293e+09\t1.998562e+09\t2.005859e+09\t2.013181e+09\t2.020530e+09\t2.027906e+09\t2.035309e+09\t...\t7.435795e+11\t7.666284e+11\t7.903464e+11\t8.147227e+11\t8.397461e+11\t8.654191e+11\t8.917401e+11\t9.186838e+11\t9.462136e+11\t9.742973e+11
Andorra\t3.551816e+06\t3.557750e+06\t3.563685e+06\t3.572586e+06\t3.578521e+06\t3.584456e+06\t3.590390e+06\t3.596325e+06\t3.605226e+06\t3.611161e+06\t...\t8.440631e+09\t8.609413e+09\t8.780199e+09\t8.951468e+09\t9.123809e+09\t9.295095e+09\t9.466711e+09\t9.638031e+09\t9.807867e+09\t9.977686e+09
Angola\t1.082731e+09\t1.086235e+09\t1.091491e+09\t1.096747e+09\t1.100251e+09\t1.105507e+09\t1.110763e+09\t1.116019e+09\t1.121275e+09\t1.124779e+09\t...\t4.941509e+11\t5.184238e+11\t5.437268e+11\t5.700874e+11\t5.975349e+11\t6.261032e+11\t6.558303e+11\t6.867555e+11\t7.189204e+11\t7.523675e+11
5 rows × 251 columns
The data is ready, let’s start plotting.
Plotting
Translation
From the data above, we can see that the country names are in English, which is not intuitive, so let’s translate them:
NameDict = {"China": "🇨🇳中国", "United States": "🇺🇸美国", "India": "🇮🇳印度", "Indonesia": "🇮🇩印尼", "Japan": "🇯🇵日本",
"France": "🇫🇷法国", "Italy": "🇮🇹意大利", "Russia": "🇷🇺俄罗斯", "United Kingdom": "🇬🇧英国", "Germany": "🇩🇪德国",
"Spain": "🇪🇸西班牙", "Bangladesh": "🇧🇩孟加拉", "Pakistan": "🇵🇰巴基斯坦", "Czech Republic": "🇨🇿捷克",
"Nigeria": "🇳🇬尼日利亚", "Turkey": "🇹🇷火鸡", "Netherlands": "🇧🇶荷兰", "Ukraine": "🇺🇦乌克兰", "Poland": "🇵🇱波兰",
"South Korea": "🇰🇷韩国", "Mexico": "🇲🇽墨西哥", "Egypt": "🇪🇬埃及", "Brazil": "🇧🇷巴西", "Canada": "🇨🇦加拿大",
"Saudi Arabia": "🇸🇦沙特", "Ireland": "🇮🇪爱尔兰", "Belgium": "🇧🇪比利时", "Austria": "🇦🇹奥地利", "Argentina": "🇦🇷阿根廷",
"Australia": "🇦🇺澳大利亚", "Switzerland": "『中立国』瑞士", "Philippines": "🇵🇭菲律宾", "South Africa": "🇿🇦南非",
"Thailand": "🇹🇭泰国", "Taiwan": "🇨🇳中国台湾", "Iran": "🇮🇷伊朗"}
The reason for not translating all of them is that I made a significant discovery: the top 20 countries hardly change, so there is no need to translate the other 100 or so.
Plotting
tl = Timeline().add_schema(play_interval=300, is_loop_play=False)
for i in range(1800, 2051):
data = df[str(i)].sort_values()[-20:]
bar = (
Bar(opts.InitOpts(bg_color="white"))
.add_xaxis([NameDict.get(x, x) for x in data.index.tolist()])
.add_yaxis("", [round(x/1000000000000, 2) for x in data.tolist()])
.reversal_axis()
.set_series_opts(label_opts=opts.LabelOpts(position="right"))
.set_global_opts(
title_opts=opts.TitleOpts(f"Purchasing Power Parity GDP (Trillions of USD) {i} Year")
)
.set_series_opts(
itemstyle_opts={
"normal": {
"color": JsCode(
"""new echarts.graphic.LinearGradient(0, 0, 0, 1, [{offset: 0,color: 'rgba(255, 0, 0, 1)'}, {offset: 1,color: 'rgba(0, 77, 167, 1)'}], false)"""
),
"barBorderRadius": [30, 30, 30, 30], "shadowColor": "rgb(0, 160, 221)",
}
}
)
)
tl.add(bar, "{}年".format(i))
tl.render_notebook() # Render directly
# tl.render("GDP.html") # Save as a webpage
At this point, the data support for national fortunes has been completed.
WeChatFerry: A tool that no longer exists
Reply <span>WCF</span> to observe🤖 college students reply <span>Quantum</span> to research quantum computing
Reply <span>Insurance</span> for consultation on insurance plans
Responses from humans may be slow, please be patient