Simplified Market Simulation of Bitcoin by Fitting a T-distribution¶

Zhongmang (Marc) Cheng¶

Abstract¶

A Markov process is a framework used to model random processes that exhibit memoryless behavior, meaning that the future state of the process is independent of its past states. The assumption that stock prices follow a Markov process is commonly made in financial modeling. This simplification may not accurately capture the complexity of real-world market dynamics. Despite this, we will employ a simulation approach to forecast the future price of bitcoin by assuming that its movement is truly random and follows a specific distribution, specifically a t-distribution, for the purpose of this analysis.

0. Import libraries¶

In [92]:
%matplotlib inline

import random
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from tqdm import tqdm

import warnings
warnings.filterwarnings("ignore")

1. Data preview¶

In this analysis, we will use Yahoo Finance public API to retrieve up-to-date market information. To proceed, we will compute the percentage change in the price of bitcoin for each date in question and store the values in a list. Subsequently, we will create a histogram to visualize the distribution of these changes. As we calculate the mean of the percentage change, we observe that it is positive, indicating that the price of bitcoin has tended to increase rather than decrease over time. This suggests that there has been more gain than losses in the cryptocurrency market.

In [93]:
# Price chart preview
df = pd.read_csv("https://query1.finance.yahoo.com/v7/finance/download/BTC-USD?period1=1400000000&period2=2000000000&interval=1d&events=history&includeAdjustedClose=true")

plt.figure(figsize=(15, 5))
plt.plot(df['Date'].to_list(), df['Close'].to_list())
plt.grid(color='gray', alpha=0.5)
plt.gca().xaxis.set_major_locator(mdates.YearLocator())
plt.title('Bitcoin Price since September 17th 2014')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.show()
No description has been provided for this image
In [94]:
# Histogram for observations
observations = (df['Close'].pct_change() * 100).tolist()
observations = [x for x in observations if not np.isnan(x)]

print("Mean:", np.mean(observations))
print("Standard Deviation:", np.std(observations))

plt.hist(observations, bins=60, range=(-30, 30), edgecolor='black')
plt.grid(color='gray', alpha=0.5)
plt.title('Histogram of Observed Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.show()
Mean: 0.2137750647552026
Standard Deviation: 3.6827182471820006
No description has been provided for this image

2. Fit distribution¶

We will consider various probability distributions commonly used in statistical analysis, including the normal distribution, Cauchy distribution, t-distribution, F-distribution, alpha-distribution, beta-distribution, gamma-distribution, chi-distribution, and chi-squared distribution. For each of these distributions, we will calculate the Kolmogorov-Smirnov statistics, which measure the similarity between the observed distribution and the fitted distribution. Based on these calculations, we will determine the best-fitted distribution for the observations in question.

The Kolmogorov-Smirnov statistic can take on different values depending on the similarity between the observed and hypothesized distributions. If the two distributions are identical, the Kolmogorov-Smirnov statistic will be 0. If the observed distribution is more dispersed than the hypothesized distribution, the Kolmogorov-Smirnov statistic will be greater than 0. Conversely, if the observed distribution is less dispersed than the hypothesized distribution, the Kolmogorov-Smirnov statistic will be less than 0.

Our analysis reveals that the t-distribution provides the best fit for the current observations. The estimated parameters are as follows:

  • Mean (μ): 0.18
  • Scale (σ): 2.03
  • Shape (ν): 1.96

These results indicate that the distribution of the bitcoin price is similar to a t-distribution, which is a generalization of the normal distribution that takes into account the heteroscedasticity (non-constancy) of the variance across observations.

In [60]:
# Fit distribution for observations
def fit_dist(rv_list):
    
    distributions = [stats.norm, stats.cauchy, stats.t, stats.f,
                     stats.alpha, stats.beta, stats.gamma, 
                     stats.chi, stats.chi2]

    best_fit = None
    best_params = None
    best_ks_stat = np.inf

    for distribution in distributions:    
        
        params = distribution.fit(rv_list)   
        ks_stat, _ = stats.kstest(rv_list, distribution.cdf, args=params)   
        # Perform the Kolmogorov-Smirnov test
        
        if ks_stat < best_ks_stat:
            best_fit = distribution
            best_params = params
            best_ks_stat = ks_stat

    print("Best fit distribution:", best_fit.name)
    print("Best fit parameters:", best_params)
    print("Kolmogorov-Smirnov statistic:", best_ks_stat)
    
    return best_fit, best_params

dist, params = fit_dist(observations)
Best fit distribution: t
Best fit parameters: (2.1431054512763126, 0.17668432407380463, 1.9270381670653132)
Kolmogorov-Smirnov statistic: 0.02734883204485744
In [61]:
# Histogram for fitted distribution
plt.hist(dist.rvs(*params, size=10000), bins=60, range=(-30, 30), edgecolor='black')
plt.grid(color='gray', alpha=0.5)
plt.title('Histogram of Simulated Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.show()
No description has been provided for this image

3. Demonstration¶

To simulate market movement, we will randomly generate a new price change using t-distribution with the given parameters. To prevent cases where the market price become negative, we will take into account volatility by re-rolling the simulation if the next day's price change falls outside a certain range (+/- 30%), and reset the current state of the market to the new random price change until the price change is within the desired range.

In this demonstration, we will use historical market data from the last four years as a basis to project potential trends over the next two years.

In [62]:
# Function for simulation
def simulate_market(dist, params, starting_price, depth):
    
    price = starting_price
    simulations = [starting_price]
    for i in range(depth):
        change = -100
        while change < -30 or change > 30:
            change = dist.rvs(*params)
        price = price * (1 + change / 100)
        simulations.append(price)

    return simulations
In [88]:
# Run the simulation and plot the result
simulation = df['Close'].to_list()[-365*4:-1] + simulate_market(dist, params, df['Close'].to_list()[-1], 365*2)
plt.figure(figsize=(15, 5))
plt.plot(simulation)
plt.grid(color='gray', alpha=0.5)
plt.title('Simulated Bitcoin Price from 2020 to 2026')
plt.xlabel('Days Past')
plt.ylabel('Price (USD)')
plt.show()
No description has been provided for this image

4. Simulation¶

To further analyze the results of the financial market simulation using a Markov process, we will run the simulation 100 times and plot all the resulting prices in a single graph. This will allow us to better compare the different potential outcomes.

In this simulation, we will start from the current market price as a basis to project potential trends over the next 365 days.

In [91]:
# Simulate 100 times
simulations = []
plt.figure(figsize=(15, 5))
for i in tqdm(range(100)):
    simulations.append(simulate_market(dist, params, df['Close'].to_list()[-1], 365))

for series in simulations:
    plt.plot(series, color='grey', alpha=0.1)

plt.title('Multiple Series Plot of Bitcoin Price Simulation')
plt.xlabel('Days Past')
plt.ylabel('Price (USD)')
plt.ylim(0, 600000)
plt.show()
100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 150.48it/s]
No description has been provided for this image