Abstract¶
On March 8th, NVIDIA's stock price experienced a significant decline, falling from its all-time high (ATH) of 972.10 USD to 875.02 USD, representing a decline of over 10%. This decrease occurred simultaneously with a decline in the S&P 500 index, which decreased from 5187.05 USD to 5096.96 USD. This observation begs the question: How can we isolate the macroeconomic effect from the price movement in a specific asset?
In this study, we use the cryptocurrency market as an example, and we propose a simple approach for price analysis by comparing a single asset to the overall market performance. By comparing the market capitalization of a specific coin on a given date relative to the overall market, we can capture fluctuations in the percentile of this coin over time and better understand the underlying mechanisms driving price movements. By leveraging historical data and other factors such as market trends, our approach can help investors make informed investment decisions and potentially identify new opportunities for growth and profit.
0. Import libraries¶
%matplotlib inline
import os, re
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
USE_GPU = True
data_path = r'C:\Users\marcc\My Drive\Data Extraction\geckoscan-all'
1. Data preperation¶
Our dataset was obtained from CoinGecko using web scraping techniques. Specifically, selenium
and requests
libraries are used (reference). The dataset comprises multiple snapshots of the entire cryptocurrency market on various dates. As of the date of this analysis, CoinGecko lists approximately 10,000 cryptocurrencies, but around 6,000 of these are not actively traded on any exchange, centralized or decentralized. Consequently, the initial dataset contains "nan" values, which we will exclude from our analysis. Previous studies (reference) have demonstrated that both market capitalization and trading volume follow an exponential distribution. To normalize our data, we will apply a logarithmic transformation, specifically a base 10 transformation. This choice facilitates easier understanding of the axis representation for readers.
The below blocks cleans and transforms historical crypto market data to normalize it and exclude irrelevant values. It defines a function called clean_list
that removes commas, filters out non-numbers, and converts items to floats. It also defines a function called log_transform
that performs a logarithmic transformation on the data using torch
or np.log()
. The code then creates a cleaned dataset by iterating through all files in a directory, extracting market capitalization and trading volume data, applying the clean_list
and log_transform
functions, and storing the transformed data in tuples of date-sequence pairs for analysis purposes. This normalization and transformation process ensures that the data is in a consistent format.
# Preview data
preview = os.listdir(data_path)[-1]
df = pd.read_csv(os.path.join(data_path, preview))
df.head()
Unnamed: 0 | id | Symbol | Name | image | Price | MarketCap | market_cap_rank | fully_diluted_valuation | Volume24h | ... | total_supply | max_supply | ath | ath_change_percentage | ath_date | atl | atl_change_percentage | atl_date | roi | last_updated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | bitcoin | btc | Bitcoin | https://assets.coingecko.com/coins/images/1/la... | 68355.000 | 1.343466e+12 | 1.0 | 1.435840e+12 | 6.167528e+10 | ... | 2.100000e+07 | 21000000.0 | 69428.00 | -1.42662 | 2024-03-08T20:05:03.481Z | 67.810000 | 1.008265e+05 | 2013-07-06T00:00:00.000Z | NaN | 2024-03-09T05:16:02.259Z |
1 | 1 | ethereum | eth | Ethereum | https://assets.coingecko.com/coins/images/279/... | 3927.860 | 4.719836e+11 | 2.0 | 4.719836e+11 | 2.642884e+10 | ... | 1.201052e+08 | NaN | 4878.26 | -19.36480 | 2021-11-10T14:24:19.604Z | 0.432979 | 9.083959e+05 | 2015-10-20T00:00:00.000Z | {'times': 75.81178792735467, 'currency': 'btc'... | 2024-03-09T05:16:02.231Z |
2 | 2 | tether | usdt | Tether | https://assets.coingecko.com/coins/images/325/... | 1.001 | 1.017375e+11 | 3.0 | 1.017375e+11 | 9.669063e+10 | ... | 1.015923e+11 | NaN | 1.32 | -24.24184 | 2018-07-24T00:00:00.000Z | 0.572521 | 7.507701e+01 | 2015-03-02T00:00:00.000Z | NaN | 2024-03-09T05:15:13.214Z |
3 | 3 | binancecoin | bnb | BNB | https://assets.coingecko.com/coins/images/825/... | 489.580 | 7.531749e+10 | 4.0 | 7.531749e+10 | 4.075490e+09 | ... | 1.538562e+08 | 200000000.0 | 686.31 | -28.64016 | 2021-05-10T07:24:17.097Z | 0.039818 | 1.229874e+06 | 2017-10-19T00:00:00.000Z | NaN | 2024-03-09T05:15:38.966Z |
4 | 4 | solana | sol | Solana | https://assets.coingecko.com/coins/images/4128... | 147.180 | 6.522912e+10 | 5.0 | 8.410659e+10 | 5.485457e+09 | ... | 5.713691e+08 | NaN | 259.96 | -43.37098 | 2021-11-06T21:54:35.825Z | 0.500801 | 2.929537e+04 | 2020-05-11T19:35:23.449Z | NaN | 2024-03-09T05:15:53.054Z |
5 rows × 27 columns
# Clean data
def clean_list(lst: list) -> list:
lst = [str(item) for item in lst]
lst = [item.replace(',', '') for item in lst]
lst = [item for item in lst if item != "-"]
lst = [item for item in lst if item != "nan"]
lst = [float(item) for item in lst]
lst = [item for item in lst if item != 0]
return lst
# Logarithmic transformation
def log_transform(lst: list) -> list:
if USE_GPU == True:
vector = torch.tensor(lst).cuda()
log_vector = torch.log10(vector)
lst = log_vector.tolist()
else:
lst = np.log(lst)
return lst
# Create cleaned dataset
mcap_sequence, tvol_sequence = [], []
for filename in os.listdir(data_path):
date = re.search(r'\d{4}-\d{2}-\d{2}', filename).group()
df = pd.read_csv(os.path.join(data_path, filename))
mcap = df['MarketCap'].tolist()
mcap = clean_list(mcap)
mcap = log_transform(mcap)
mcap_sequence.append((date, mcap))
tvol = df['Volume24h'].tolist()
tvol = clean_list(tvol)
tvol = log_transform(tvol)
tvol_sequence.append((date, tvol))
2. Distribution plot¶
We will quickly visualize the distribution of market capitalization and trading volume across different dates. We observe that the distribution of trading volume exhibits a normal pattern, with a mean volume of approximately 1,000,000 USD. It's a bit more difficult to describe the distribution of the trading volume. Evidence shows that the distribution of trading volume exhibits a slightly positive skew, with a distinct spike between 10,000 USD and 1,000,000 USD. While the reason for this spike is still unclear, one possible explanation is the presence of pump and dump schemes, which are prevalent (reference) in the cryptocurrency market.
To visualize the distribution of market capitalization (mcap
) and trading volume (tvol
) data, we created a function called plot_distribution
. This function takes in a sequence of data, the x-axis and y-axis limits, and plots a histogram of the data using matplotlib
. The function first checks if there are still plots to be displayed, and then plots the histogram for each date in the sequence. Finally, the function sets the x-axis and y-axis limits, sets a title for each plot, and turns off the axis for any empty plots. This allows us to visualize the distribution of the data and identify any trends or patterns.
# Plot distribution
def plot_distribution(sequence, x: int, y: int):
num_plots = len(sequence)
num_cols = 10
num_rows = (num_plots // num_cols) + (num_plots % num_cols > 0)
fig, axs = plt.subplots(num_rows, num_cols, figsize=(30, 30))
for i, (date, lst) in enumerate(sequence):
if i < num_plots: # Check if there are still plots to be displayed
row = i // num_cols
col = i % num_cols
axs[row, col].hist(lst, bins=30)
axs[row, col].set_xlim(x)
axs[row, col].set_ylim(y)
axs[row, col].set_title(date)
for i in range(num_plots, num_rows * num_cols):
axs.flatten()[i].axis('off')
plt.tight_layout()
plt.show()
plot_distribution(mcap_sequence, (-2, 14), (0, 900))
plot_distribution(tvol_sequence, (-4, 12), (0, 1400))
3. Caculate statistics¶
We will now perform calculations on both market capitalization and trading volume for each day, including the calculation of means, standard deviations, and percentiles.
The below code block calculates statistical measures for a sequence of lists, where each list represents a single day's data. The code uses two functions: calculate_stats
and calculate_stats_sequence
. The calculate_stats
function takes a list of numbers as input and calculates various statistical measures, such as mean, standard deviation, and percentiles. The calculate_stats_sequence
function takes a sequence of lists as input, calculates statistical measures for each list using calculate_stats
, and stores the results in a Pandas DataFrame.
The code calls calculate_stats_sequence
twice, once with a sequence of market capitalization and once with a sequence of trading volume, and assigns the resulting DataFrames to the variables stats_mcap
and stats_tvol
, respectively.
# caculate statistics for one day
def calculate_stats(lst: list) -> list:
mean = np.mean(lst)
std_dev = np.std(lst)
quantiles = np.percentile(lst, [10, 25, 50, 75, 90])
stats = [mean, std_dev, quantiles[0], quantiles[1], quantiles[2], quantiles[3], quantiles[4]]
return stats
# caculate statistics for each day and store in a pandas dataframe
def calculate_stats_sequence(sequence) -> pd.DataFrame:
stats_list = []
for date, lst in sequence:
stats = calculate_stats(lst)
stats_list.append([date] + stats)
df = pd.DataFrame(stats_list, columns = ['date', 'mean', 'std', '10th', '25th', 'median', '75th', '90th'])
return df
stats_mcap = calculate_stats_sequence(mcap_sequence)
stats_tvol = calculate_stats_sequence(tvol_sequence)
# Preview statistics for market cap
stats_mcap = calculate_stats_sequence(mcap_sequence)
stats_mcap.head()
date | mean | std | 10th | 25th | median | 75th | 90th | |
---|---|---|---|---|---|---|---|---|
0 | 2023-08-09 | 6.159759 | 1.305085 | 4.675918 | 5.317879 | 6.124504 | 7.034170 | 7.783340 |
1 | 2023-08-10 | 6.158493 | 1.306935 | 4.665913 | 5.318132 | 6.124457 | 7.038537 | 7.777979 |
2 | 2023-08-15 | 6.156271 | 1.302976 | 4.670294 | 5.309894 | 6.120470 | 7.032239 | 7.776703 |
3 | 2023-08-20 | 6.131251 | 1.298832 | 4.656052 | 5.289159 | 6.098831 | 7.017901 | 7.732382 |
4 | 2023-08-20 | 6.132795 | 1.299603 | 4.656592 | 5.289621 | 6.103521 | 7.014202 | 7.733663 |
# Preview statistics for trading vol
stats_tvol = calculate_stats_sequence(tvol_sequence)
stats_tvol.head()
date | mean | std | 10th | 25th | median | 75th | 90th | |
---|---|---|---|---|---|---|---|---|
0 | 2023-08-09 | 3.760884 | 1.909839 | 1.212187 | 2.528467 | 4.116629 | 5.041850 | 5.836456 |
1 | 2023-08-10 | 3.866598 | 1.893633 | 1.306425 | 2.715920 | 4.207285 | 5.113204 | 5.887423 |
2 | 2023-08-15 | 3.871511 | 1.901572 | 1.317395 | 2.725520 | 4.190132 | 5.109093 | 5.942390 |
3 | 2023-08-20 | 3.856813 | 1.834472 | 1.370883 | 2.720292 | 4.183303 | 5.066449 | 5.833962 |
4 | 2023-08-20 | 3.855298 | 1.850005 | 1.288607 | 2.741486 | 4.191136 | 5.096559 | 5.838132 |
4. Visualization¶
To visualize the progression of statistical values over time, we will create line graphs for each variable (i.e. market capitalization and trading volume).
Readers will observe a notable change in the market capitalization line graph starting from January 2024. This increase can be attributed to a combination of factors, including the introduction of bitcoin ETF and the approaching bitcoin halving cycles, which led to a sudden price increase in the entire cryptocurrency market. Furthermore, we observe an increase in standard deviation during this period, indicating that along with the bull market, the market is becoming more volatile.
In addition to the change observed in the market capitalization graph around January 2024, readers will also observe a notable change in the graph for trading volume around November 2023, which is attributable to a modification of data sources. Prior to November 2023, the data consisted of the first 100 pages of listings on CoinGecko, while post-November 2023, the data includes all listings on CoinGecko, resulting in an increase of approximately 2000 cryptocurrencies with relatively low reported trading volume. This shift is responsible for the observed decrease in means and percentiles, as well as the increase in standard deviation.
The below code block defines a function called plot_stats
that takes in a Pandas DataFrame and a title string as inputs. The function then creates a subplot with three plots, each plotting a different statistical measure from the DataFrame: mean, standard deviation, and percentiles (10th, 25th, median, 75th, and 90th). The plots are labeled and titled accordingly, and the x-axis is set to display dates.
# plot statistics for mean, std, and percentiles
def plot_stats(dataframe: pd.DataFrame, title: str):
fig, axs = plt.subplots(1, 3, figsize=(20, 5)) # set size
# Mean over time
axs[0].plot(dataframe['date'], dataframe['mean'], label='Mean')
axs[0].set_xlabel('Date')
axs[0].set_ylabel('Value')
axs[0].set_title(f'Mean of {title} over Time')
axs[0].xaxis.set_major_locator(mdates.MonthLocator())
axs[0].grid(axis='y', linestyle='--')
# SD over time
axs[1].plot(dataframe['date'], dataframe['std'], label='Standard Deviation')
axs[1].set_xlabel('Date')
axs[1].set_ylabel('Value')
axs[1].set_title(f'Standard Deviation of {title} over Time')
axs[1].xaxis.set_major_locator(mdates.MonthLocator())
axs[1].grid(axis='y', linestyle='--')
# Quantiles over time
axs[2].plot(dataframe['date'], dataframe['10th'], label='10th Percentile')
axs[2].plot(dataframe['date'], dataframe['25th'], label='25th Percentile')
axs[2].plot(dataframe['date'], dataframe['median'], label='Median')
axs[2].plot(dataframe['date'], dataframe['75th'], label='75th Percentile')
axs[2].plot(dataframe['date'], dataframe['90th'], label='90th Percentile')
axs[2].set_xlabel('Date')
axs[2].set_ylabel('Value')
axs[2].set_title(f'Quantiles of {title} over Time')
axs[2].legend(loc='center left', bbox_to_anchor=(1.05, 0.5))
axs[2].xaxis.set_major_locator(mdates.MonthLocator())
axs[2].grid(axis='y', linestyle='--')
plt.show()
plot_stats(stats_mcap, "Market Capitalization")
plot_stats(stats_tvol, "Trading Volume")
5. Proposition¶
Based on the observed statistical patterns (i.e., mean, standard deviation, and percentiles) in the line graphs for market capitalization and trading volume, we propose an alternative approach to visualizing price movement. Specifically, we will isolate macroeconomic effects from price movement by comparing the market capitalization of a specific coin on a given date relative to the overall market. This will allow us to capture the fluctuations in the percentile of this coin over time.
The below code block defines a function called valuation
, which calculates the percentile of a given coin at each day and stores it in a Pandas DataFrame. The function iterates through a list of files, reads the data from each file, and calculates the percentile of the coin's market cap and 24-hour trading volume based on the entire market. The function then returns a pandas dataframe containing the date, market cap percentile, and trading volume percentile for the given coin at each day. We proceed to define a function called plot_result
, which takes a Pandas DataFrame and a name as input and plots the progression of market cap percentiles and 24-hour trading volume percentiles for the given coin over time.
# Percentile calculation function
def calculate_percentile(number: float, lst: list) -> float:
count = 0
for i in lst:
if i <= number:
count += 1
percentile = (count / len(lst)) * 100
return percentile
# Calculate percentile of a given coin at each day and store it in a pandas dataframe
def valuation(coin_of_interest: str):
output = []
for filename in os.listdir(data_path):
date = re.search(r'\d{4}-\d{2}-\d{2}', filename).group()
df = pd.read_csv(os.path.join(data_path, filename))
row = df[df['Symbol'].str.lower() == coin_of_interest.lower()]
coin_mcap = row['MarketCap'].to_list()
coin_mcap = clean_list(coin_mcap)
coin_mcap = log_transform(coin_mcap)
if coin_mcap:
mcap = df['MarketCap'].tolist()
mcap = clean_list(mcap)
mcap = log_transform(mcap)
percentile_mcap = calculate_percentile(coin_mcap[0], mcap)
else:
percentile_mcap = 0
coin_tvol = row['Volume24h'].to_list()
coin_tvol = clean_list(coin_tvol)
coin_tvol = log_transform(coin_tvol)
coin_tvol = coin_tvol
if coin_tvol:
tvol = df['Volume24h'].tolist()
tvol = clean_list(tvol)
tvol = log_transform(tvol)
percentile_tvol = calculate_percentile(coin_tvol[0], tvol)
else:
percentile_tvol = 0
output.append([date, percentile_mcap, percentile_tvol])
output = pd.DataFrame(output, columns=['Date', 'MarketCap Percentile', 'Volume24h Percentile'])
return output
# Plot result
def plot_result(df, name):
plt.subplots(figsize=(10, 5))
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.plot(df['Date'], df['MarketCap Percentile'], label='MarketCap Percentile')
plt.plot(df['Date'], df['Volume24h Percentile'], label='Volume24h Percentile')
plt.xlabel('Date')
plt.ylabel('Percentile')
plt.title(f'Percentile Progression of {name} vs Date')
plt.gca().xaxis.set_major_locator(mdates.MonthLocator())
plt.legend()
plt.show()
6. Demonstration¶
To illustrate this approach, we will conduct a short demonstration using Bitcoin and a newly arised meme coin called PEPE, from July 2023 to February 2024. Readers will observe that the percentile of Bitcoin consistently remains at 100, which is in line with its position as the most capitalized cryptocurrency.
Please note that the demonstration provided is for illustrative purposes only. To gain meaningful insights into potential altcoins in the market and make informed investment decisions, it is essential to use historical data, along with other factors, such as market trends and analysis of the underlying technology.
# Demonstration for BTC
result = valuation("BTC")
plot_result(result, 'BTC')
# Demonstration for ETH
result = valuation("PEPE")
plot_result(result, 'PEPE')