Liquid Crypto Market Segmentation

Disclaimer: This is not financial advice. Anything stated in this article is for informational purposes only, and should not be relied upon as a basis for investment decisions. Chris Keshian may maintain positions in any of the assets or projects discussed on this website.

To subscribe to my mailing list, input your email here.



This is a seven-part series on FJ Labs’ investment process for liquid crypto:

  1. Building FJ Labs Liquid Crypto

  2. Liquid Venture Capital

  3. Liquid Crypto Market Segmentation

  4. Identifying Significant Variables that Impact Prices

  5. Data Dashboards for Liquid Crypto

  6. Research Process for Liquid Crypto

  7. Active Portfolio Management for Liquid VC



As discussed in my last post, liquid venture capital provides a more flexible investment paradigm for the emerging crypto asset class, and combines venture-style returns with lower risk and a shorter commitment.


Our goal as liquid VC investors is to take as an input the total investable landscape of crypto assets in the space, and produce as an output the best possible diversified portfolio of high quality, early-stage crypto projects and protocols. Over the next few posts, I will discuss how we approach this process.


Our process consists of three distinct phases: 

  • Research (purple)

  • Data Due Diligence (blue)

  • Investment (green)

In order to accomplish our end goal, we must first start by segmenting the market into categories with similar characteristics.

Currently there are 22,932 unique crypto assets, the vast majority of which are entirely worthless. To reduce this total investable landscape, we first apply a coarse filter that truncates the list based on quantitative metrics such as market cap, FDV, token issuance, trading volume, and token distribution. We then qualitatively backfill this list with compelling early-stage tokens that might have been preemptively removed by our coarse quantitative filter.

We then segment the token landscape into distinct verticals, around which we have an internal thesis. For example, if we are interested in exposure to DeFi DEXs, we will aggregate projects such as Uniswap, Curve, Sushiswap, Orca, Pancakeswap, and Balancer, amongst others. This delineation accomplishes a few goals:

  1. It lets us look at a vertical in aggregate, and then select the most compelling projects in each vertical

  2. It provides the competitive landscape against which we can track growth of specific projects, intra-vertical

  3. It provides the basis of our regression analysis and data dashboarding (more on this in future posts)

After attempting to run a regression model for each of the crypto verticals we track, we realized that many had limited histories. Often projects are less than one year old, so attempting to run regression for 19 unique verticals was too fine-grained to identify meaningful independent variables that impact prices in each vertical. As such, we dilated our focus, and attempted to define a higher-order grouping for the categories we invest in. We landed on using the following nomenclature:

  • Ecosystem Tokens - Layer 0, Layer 1, and Layer 2 assets that accrue value by selling blockspace.

  • App Tokens - Project specific tokens that accrue value proportional to cash flows, buy and burn mechanism, use, or governance.

The first (and most important) task in any regression pipeline is producing a clean and expressive dataset. From Token Terminal, we retrieved a dataset of 217 tokens containing price and 35 other fundamental variables (e.g. active developers, earnings, treasury holdings, etc.). We then merged this dataset with a set of 9 macro variables for each day, including the employment rate, GDP growth forecast, S&P 500, and US Treasury holdings. After processing this data, we had a rich set of over 150,000 rows of financial data to mine.


As shown in the data availability plot below for ecosystem tokens, this dataset extends as far back as 2013 with the inclusion of Bitcoin prices. More recent ecosystem tokens such as Sui and Arbitrum are present as well, with much less data. Bitcoin and Ethereum have the largest representation in this dataset, but other ecosystem tokens such as Fantom, Loopring, and Algorand help add diversity and robustness to our models.


For each period in our dataset (in this case, 1-day), we are interested in using all of the rich data we had curated to predict the return for the next period. As shown below, the daily returns we find in the dataset vary wildly, with a mean of 0.35% and a standard deviation of 6.8%. For comparison, the S&P 500 day-over-day has a mean return of 0.0327% and standard deviation of 0.975%. This demonstrates the extreme volatility present in the emerging crypto asset class.


Within the ecosystem tokens, we then performed a correlation analysis of our daily variables. In this correlation, we find many interesting patterns. As expected, we find that the number of active developers strongly correlates with the number of code commits and contract deployers. 

Additionally, as one would expect, the employment rate correlates positively with the total crypto market cap as well as other macro variables such as household assets and, meaning that individuals with incomes are able to direct that income towards assets like crypto. 

However, some counterintuitive trends are present, but upon closer inspection we can begin to understand them. For example, the large streak of negative correlations for earnings is due to Token Terminal's definition of earnings, which counts token issuance as a negative; thus, ecosystem tokens are often likely to have negative earnings due to high token issuance. 

We also find that the S&P500 correlates positively with token prices and total crypto market cap, which is due to general macro risk-on vs. risk-off sentiment.

The next post in this series will uncover the structure in this data using regression techniques such as multivariate linear regression, decision trees, and random forests, all of which combine the information across various rows of our correlation matrix to come up with quantitative predictions of returns. We will then turn our focus to the feature importance differences btween ecosystem tokens and appchains, which serves as the basis of our data dashboarding and portfolio tracking process.

Previous
Previous

Identifying significant variables that impact prices

Next
Next

Liquid Venture Capital