Advances in Financial Machine Learning – Ch3: Labelling


let´s go on with chapter 3 of Advances in Financial Machine Learning. The last chapter was about creating features for machine learning, i.e. bars in this case. Chapter 3 on the other hand is about labelling, which is necessary for supervised learning algorithms.

Fixed-Time Horizon Method: This method is probably the most common method of labelling financial data. Here, an observation is assigned a value int {-1,0,1} based on the return (r) in a specific time frame. For example, the observation will be labelled -1 if r < -theta (with theta being a specific threshold value), 0 if r <= theta, and  1 if r > theta. De Prado, however, mentions multiple reasons to avoid this method:

  1. As seen in chapter 2 already, time bars do NOT exhibit desiderable statistical properties.
  2. It does not make sense to apply the same threshold value theta disregarding other factors such as for example realized or implied volatility in the markets.

To solve these issues, some fixes are:

  1. Labelling with a varying threshold value. This may be achieved with an rolling exponentially weighted standard deviation of the returns, for example.
  2. As seen in the previous chapter, dollar and volume bars exhibit better statistical properties. Use these bars instead.

However, even adjusting for these flaws leaves out a crucial aspect that has to be considered: the path followed by prices. It is not realistic at all that the path would not be considered. It is not about if a profit is to be realized at some point. Rather, it is of utmost importance to label if one was stopped out earlier.

The Triple-Barrier Method: A labelling method proposed by De Prado that takes the path of the prizes into account is the triple-barrier method. This method labels a given observation according to the first barrier that was touched. There are three barriers in this method. Two horizontal barriers relate to profit-taking and the stop-loss. As introduced above, these two barriers are dynamic functions of volatility. Further, there is a vertical barrier which is related to a deadline or a time-stop. If the upper (Profit-taking) barrier is hit first, the observation is labelled 1. If the lower (stop-loss) barrier is hit first, the observation is labelled -1. Finally, if the vertical barrier (time-stop) is hit first, the label is either 0 or the return r.

The chart below shows the tripple-barrier method in action with the upper horizontal barrier being the profit target, the lower horizontal barrier being the stop-loss point, the left vertical barrier being the starting point of the bar and the right vertical barrier being the time-stop. In this case, neither profit-taking nor loss-taking is triggered and instead the time-stop is hit.


Side & Size of a Bet: ML algorithms can help us (i.e. learn) the side and the size of bets too. For instance, we would like to learn the side of a bet when a model to choose the side is not available. In this case it is impossible to distinguish between profit-taking and a stop-loss in the model. Consequently, the horizontal barriers will be non-existent or symmetrical.

Meta-Labelling: If the side of the trade is known, we still need to find the appropriate size of the trade. After all, trade sizing is a crucially defining factor. Here, meta-labelling enters the story. When meta-labelling is active, we know the side of a trade. Therefore, we may distinguish between profit-taking and loss-taking and the horizontal barriers do not have to be symmetric. The ML algorithm will be trained to decide whether or not to take a specific trade (0 or 1). Then, the probability of the prediction may be used in the sizing decision.

Such binary classification problems  represent a trade-off between type-I (False Positives) and type-II (False Negatives) errors. Evaluation metrics for these kinds of errors are Precision (True Positives/(True Positives+False Positives)), Recall (True Positives/(True Positives + False Negatives)), Accuracy ((True Positives + True Negatives)/(True Positives + True Negatives + False Positives + False Negatives)), and the F1-score (measuring efficiency of the classifier as average of Precision and Recall). Meta-labelling is a good approach to obtain a meaningfully high F1-score. Here, as a first step, a model with high recall is built and then meta-labelling is applied to correct for the low precision. As De Prado states “The role of the secondary ML algorithm [Meta-labelling] is to determine whether a positive from the primary (exogenous) model is true or false. [] Its purpose is to determine whether we should act or pass on the opportunity that has been presented. “

Further, four main advantages of Meta-labelling are:

  1. It may be a substitute to fundamental analysis leading to the quandamental way of investing.
  2. Because the ML algorithm does not decide the side but only the size of the bet, overfitting is limited.
  3. Because decisions regarding size and side are separated, “sophisticated strategy structures” are possible. This relates to different strategies on the short and long side.
  4. As De Prado states “achieving high accuracy on small bets and low accuracy on large bets will ruin you.” Therefore, the ML algorithm of meta-labelling focusing on this sizing aspect is truly crucial.

Communicating these concepts in words is rather difficult. As for chapter 2, applications and solutions to the exercises will be available later on my github page:



Uranium-The Ultimate Value Story?


recently I listened to a terrific podcast between Jesse Felder and Diego Parrilla. Mr. Parrilla mentioned how crucial it is to write down one’s thoughts on a thesis.  This is exactly what I want to do with this blog post. Basically, I want to write about a story that I have been following for quite a while. It seems to be one of the few value-stories out there. The more I learn about the situation, the more compelling the opportunity seems to me. This post will be about the Uranium sector. I will write about a small niche commodity which is one of  the most important energy sources in the world.

Basically, I write this down completely for the purpose of reflection and ordering my thoughts. I want to see if I can formulate the bullish Uranium thesis. Have I have at least partially understood the story, and can I lay out the thesis? We will see.

As a warning I want to mention that I have not researched the Uranium sector in depth myself. I am no commodity analyst. Admittedly, my understanding of this industry is very limited. What I want to do however, is summarize the reports and publications that I have studied and outline the bull case for Uranium – even without mining expertise. The most important sources I rely on include Kevin Muir´s TheMacroTourist, Goehring & Rozencwajg, Adam Rodman, Rick Rule, Marin Katusa, Mike Alkin, John Quakes and Amir Adnani. These sources are all Yellow Cake (the form of Uranium I´m talking about) bulls. Further, some of the arguments have been made for over 3 years. Thus, my story is bull-biased and while the outcome seems to be inevitable I am not certain that it is imminent. However, the resource sector in its nature is extremely cyclical and as is said “bear markets are the authors of bull markets and bull markets are the authors of bear markets”. So, let´s look into the story.

The Uranium story may be the ultimate contrarian play. There seems to be almost no commodity that is universally as hated as Uranium. Not just environmentalists hate Yellow Cake. Rather, the prolonged bear market has made investors reluctant to even consider the commodity.

History has seen three big Uranium cycles up to date. The first big bull market took place during the 1950s as a result of the nuclear arms race. Then, in the 1970s when the oil shocks hit nuclear energy became more prominent and desirable. Finally, the last bull market burst with the outbreak of the financial crisis in 2007 after Uranium had reached a spot price of 136 US$/lb.

Since then, the market has experienced a prolonged bear market. Nevertheless, before the Fokushima catastrophe, the spot price was still in the range of 70 US$/lb. However, the Fokushima incident and the resulting public sentiment shift marked the beginning of a new 8-year long bear market. After the catastrophe, globally policy makers in countries such as Germany vowed to take reactors offline. In Japan alone, 54 reactors shut down. Since then assets in the sector have been trading at depressed levels. Valuations are amongst the lowest of any industry. Uranium spot prices fell over 80% at some point and many miners and producers lost over 90% in equity value. Uranium companies have since belonged to the very most distressed companies one may find in the resource business. In fact, the sector has been decimated. Global market capitalization declined from 130 billion US$ pre-Fokushima to 10 billion US$ post-Fokushima. In the same time frame the number of companies operating in this business decreased from 500 to around 40 nowadays.

Source: Cameco

While asset assets are valued at the lowest valuations in history, the stigma about the nuclear industry is at a high. Ever since the catastrophe in Fokushima the perception of the nuclear industry has been that of a dying one. Being German, I can testify how quickly sentiment has changed in the aftermath of the Fokushima incident.

However, this stigma around the word “Nuclear” is what creates this investment opportunity in the first place.  People fail to make a distinction between nuclear enrichment for weapons and energy, which is fundamentally different.

Nuclear energy still is one of the cleanest, cheapest, most efficient, most reliable and by some, admittedly arguable, measures safest energy sources. We find ourselves in times of depressed prices, no interest and even hate directed towards this industry. At the same time fundamentals are changing rapidly towards the positive. This is exactly what makes the outlook so compelling.

The big deal is that we find ourselves in a situation in which uranium producers just can´t make any money. None! But first let us have a short look at the global production divide. Globally, there are just a handful of meaningful Uranium mining locations. The most important countries are Kazakhstan with a market share of around 40%, Canada with a market share in the range of 22% and countries less significant for production such as Australia, the United States of America and countries located in Africa. Given this situation, Kazakhstan and Canada basically dominate the Uranium market. We may consider them like OPEC on steroids, even though there is no cartel. The most significant Uranium producers in these two countries are Kazatoprom in Kazakhstan and Cameco in Canada. These two companies globally belong to the global very low cost producers. Interestingly, as we will see later, both recently cut production due to prices being uneconomical. These production cuts as a percentage of global supply were of a size that would make OPEC jealous.

But why is it not economic for virtually any miner in the world to mine Uranium under current conditions? Well, it is about the price of Uranium on the one hand and the cost of production on the other hand. Let us have a look at the marginal cost curve against spot prices. At the time of writing, the spot uranium price is less than 30 US$/lb. This price includes a considerable increase from the mid 20s in 2018. The average cost of production on the other hand is around 55 US$/lb. Yes, even after the rebound in 2018 spot prices still need to double for companies just to break even. The spot price needs to rally 100% just to reach the global marginal cost. This seems like a rather surreal situation. And indeed, it is. One may expect miners not continue operations when losses are made. If companies can´t even meet marginal costs, production will be reduced. This should be a logical conclusion.

This takes us back to Cameco and Kazatoprom, the duopoly that I just introduced (which we may call OPEC squared). Both companies recently have cut production. Most notably, in 2017 Cameco temporarily shut down its mine McArthur River. This shut down later was made indefinite. This shutdown is remarkable because of two reasons. First, McArthur River is one of the lowest cost mines in the world. If this mine is not economic, then virtually no mine is. Secondly, with McArthur River´s shutdown, 15 million pounds of around 160 million pounds of global annual production, or around 10% of global annual Uranium supply was taken offline. Yes, 10-11%! This is a big deal. Oil markets go crazy when OPEC announces comparably small production cuts. Uranium production on the other hand decreases significantly and no one talks about it. The shutdown of McArthur River is remarkable by a further yardstick. Because of contractual obligations, Cameco still needs to provide more Uranium to utilities than it produces. At present prices it is cheaper for the company to buy this Uranium spot in the market than mine it itself. You may reread this sentence again. It is cheaper for Cameco to buy Uranium at spot prices than to mine it itself! This on the other hand, brings in an additional spot buyer in the market. We will revisit this point a little later. Besides Cameco, Kazatoprom has been cutting production too. With these two giants decreasing mining activity, we may enter Uranium deficits much earlier than anticipated.

We see that Uranium at current prices is not sustainable. As any other commodity, Uranium has a tendency to move towards its marginal cost of production. Even when this price is reached, given current under-investment in the industry, not a lot of new supply will be able to come online quickly.

But why have miners continued to produce at all at these prices? As we will see, the important aspect and possibly the catalyst to change current market conditions is the contracting cycle in the Uranium sector.

In the Uranium market, there are two types of markets. Firstly, there is the spot market for the commodity. The spot price is what is publicly observable. It refers to the chart above and is the reference point for the price of Uranium. However, the spot market is less relevant for the industry than one may think. It is rather a measure of the overall market sentiment. However, the economics of the miners is not about the spot market but rather about the contract/forward market. Miners and Uranium producers generally do sell Uranium forward to utilities. The forward price is more abstract, or at least not publicly observable.  In the Uranium markets, these forward contracts are agreed upon in long-term cycles of 7-10 years. This explains, why miners have been able to survive up to this point. They still have been paid according to forward contracts signed in the past instead of the depressed spot market price.

For utilities, supply safety of Uranium is of utmost importance. Hence, generally utilities do not source in the spot market but rather in the forward market. The fact that miners and utilities interact in the forward contract market, and that contract cycles are of long-term nature is of special significance now. This is because one may expect many contracts to expire in the close future. Indeed, many contracts did expire already in 2017. Therefore, a new contracting cycle is about to start. This new contracting cycle may be the catalyst for Uranium to move. Given spot prices that need to rally 100% just to meet the marginal cost, no miner will sell Uranium forward at these prices.

A new contracting cycle for both existing and new reactors will eventually bring a new wave of demand into the market.

Thus far, utilities have been passive. In 2018, virtually no new contract volume was made available. After a prolonged bear market, investors and energy producers are still skeptical. There is another reason for the delay of the new contracting cycle however. This reason is called section 232. Under section 232 in the United States the Department of Commerce may take action to protect domestic industries from international supply. Currently, tariffs on foreign Uranium are considered. This is especially interesting when considering the fact than about 95% of US Uranium is imported. However, currently this situation puts utilities in a holding position. The US ruling is and thereby certainty as to where Uranium may be sourced is awaited.

When considering the demand of utilities, two main aspects have to be taken into account. Firstly, security of supply is of utmost importance for utilities. Currently, the start of the new contracting cycle may have been delayed due to low spot market prices, existing inventories and the awaited US ruling. However, one may expect utilities to start sourcing to secure supply soon. Secondly and even more important, utilities are very inelastic buyers. The cost of Uranium is a very small percentage of overall cost of operating a nuclear reactor. Therefore, Uranium prices virtually do not affect the behavior of this giant inelastic buyer at all. There is almost no Uranium price that this buyer would not pay. On the other hand, the buyer does not step in and buy more Uranium at depressed levels either. This is what makes the low-cost commodity Uranium such a hyper cyclical commodity.

Even if the start of the contracting cycle is further away than expected, one additional factor will support spot Uranium prices in the future. As already mentioned above, it is now cheaper for miners to buy Uranium which they are contractually required to supply in the spot market than mine it themselves. After having closed the world´s biggest mine at McArthur River, Cameco is one example of additional demand in the spot market. As price fundamentals further deteriorate, more production will be taken offline. Instead, the spot market will serve as opportunity to buy material. This demand will put a natural floor on Uranium prices until the contracting cycle starts again. Since the spot market is very thin, this is somewhat of a self-fulfilling process. To use the words of Warren Buffet and Benjamin Graham, we may even speak of a “Margin of Safety” here.

Even though the western media does not project this feeling, looking at the demand side of Uranium, one encounters much upcoming demand. Globally, there exist around 500 reactors, with 60 under construction and close to 190 new projects.

Traditionally, the center of nuclear power has been the west. The United States still has the largest fleet of reactors and countries such as France and Japan were strong advocates for nuclear energy for a long time. Post-Fokushima, the tides have turned a little bit with Japan taking reactors offline and Europe turning its back on the industry. In the west, many former supporters question this energy source nowadays. However, more importantly demand from new players is coming into the markets (while there will remain considerably demand from the old elite too).

Especially emerging markets lead growing demand. There is a strong necessity for this kind of energy in China, India and other countries such as South Korea. The developing world needs nuclear power to keep up with its energy demands. Nuclear energy is rather clean and reliable. Given the dire need to reduce carbon emission, if lights are to remain turned on nuclear energy is a necessity. While renewable energy sources will gain foot in the future, the fact that nuclear energy is always on (unlike solar or wind) is crucial for EM energy security. Exemplary, in China annually an expected 500 thousand to 1 million people die of causes related to air pollution. China´s commitment to decarbonize most likely will be of nuclear form. Currently, in Chine there exist 45 running reactors with 15 under construction. By 2030 however, this number is to increase to 111. Generally, there is quite a backlog of new reactors that will come online in China, India, the middle east and other emerging markets. However, the demand side has been underestimated not merely for EM. In France, the phasing out of nuclear energy will be delayed and Japan is bringing back some of its reactors too. Consequently, demand for the next decades seems to be secure.

Besides the utilities demand, there is another form of demand that is interesting. In the last years financial demand has entered the market. This financial demand is about investors buying and holding uranium. Yellow Cake  plc form the UK is an example of such an investment vehicle. Obviously, the uranium purchased by these investors is taken out of the market. This is especially noteworthy if one remembers the extreme thinness of the spot Uranium market.

Finally, there is one US centric perspective that must be investigated when analyzing the Uranium market. As mentioned earlier, world Uranium supply relies heavily on Russia. Russia and “Russia friendly” countries account for 65% of global supply. Russia itself accounts for 45% of global enrichment capacities. The united states on the other hand imports 95% of its Uranium demand oversees and 50% from Russia linked countries.  Currently, the United states merely has a stock of 12 months of Uranium supplies. considering the growing tensions between the West and Russia, energy safety in the United States seems to be at risk given this situation. This is not merely a theory. In the midsts of the tensions between Russia and the US, Russia did consider banning Uranium exports to the US at some point. One may interject that there is production capacity in countries such as Canada or Australia. However, producers there partly are in long-term contracts with China and India and, as elaborated on earlier, there is no ease in bringing on new production quickly. To incentivize domestic production in the US, spot price will need to increase. And as indicated above, the Trump administration is considering protection the domestic industry increasingly under section 232.

In summary, the Uranium story is very attractive because we face a commodity trading in the spot market below the marginal cost. To incentivize future production, prices have to increase. Currently, we are in the very depressed stage of the Uranium cycle. However, the hypercyclitality of this commodity is what makes the investment so appealing. Let´s remember that the demand for this commodity is very inelastic. Given the small proportion that Uranium has in the cost structure of running a nuclear reactor, one faces a buyer that basically is forced to buy, at almost any the price. Since Fokushima and the change in sentiment, globally there has been very little CAPEX. This situation contrasts increasing demand in the next decades however. Therefore, when the market goes into deficit in 2019/2020 for the first time since the 1980s, deficits will matter a lot in the context of inelastic demand and energy safety. Even if we only see prices go up slightly, margins for some producers may improve considerably.

Of course, there are risks to this thesis. Firstly, a further nuclear disaster would be a disaster for nuclear energy. Secondly, Chinese demand is a key input into the demand equation. If for some reason Chinese demand were to decrease, the thesis would be less weakened too. Thirdly, given existing inventories in utilities, a further delay of the contracting cycle would delay the bull market. However, given the importance of supply safety this seems unlikely. Further, given the thin market there almost seems to be a natural floor to the price.

Currently, I am not sure what the best investment opportunity is to play this bull story. I guess one may want to invest in the lowest cost producers in good jurisdictions. However, I will do more due diligence on this in the next weeks.

Writing this story, I realize that I really only have started to get a feel for this bull case and the investment opportunities. Obviously, I am bullish and very biased by my sources. However, I take this blog entry as a starting point to learn more about this great opportunity.



Advances in Financial Machine Learning – Ch2: Bars

Hi there,

Recently, I heard or read somewhere that the best way to learn something  is twofold.

  1. Explain the topic to someone else.
  2. Write down your thoughts. This forces you to express your understanding in a concise manner and consequently shows if you really do understand the concept.

That´s why I have decided to again write more about the things I am learning about.

Currently, I am studying Marcos López De Prado´s great, yet for a machine learning novice like me also challenging, book Advances in Financial Machine Learning. In the following months, it is my plan to introduce some summaries about the concepts covered in the book.

After an introductory first chapter, the second chapter of the book introduces some interesting approaches I had not heard about before. Firstly, the topic of “bars” is covered.

On this blog, I  generally aim to explain the theory proposed in the book. Code implementations and tentative solutions to exercises of the book will be implemented on my GitHub page soon:

Bar Types

There are various types of bars in finance. Each bar has different advantages and disadvantages and even more importantly special statistical properties. Probably the most famous and most used type of bars in the trading community are time bars, which are sampled according to specific time frames, i.e. one minute, five minutes, one hour etc. De Prado argues, that in machine learning these bars are not desirable  because of two reasons. Firstly, the market action throughout a given day does not flow at a constant rate of time. Rather, the open and close are usually much more volatile than the time during noon for example. Therefore, conventional time bars oversample low-activity, low-information times. Secondly, time bars exhibit poor statistical properties. They are often serially correlated and are subject to heteroscedasticity.

Below, you see one minute time bars throughout the day on the S&p 500 ETF SPX. Each minute, one bar is created


The next type of bars, tick bars, partly addresses these issues encountered when dealing with time bars. Tick bars establish a proxy for the arrival of information. A bar is created every time a predefined number of ticks has been reached. If a lot of information hits the market, more ticks and consequently more bars result. Therefore, these bars circumvent the constant market flow problem because bars are sampled at the rate at which information flows. Further, the statistical properties are better than for time bars. However, the possible fragmentation and number of the ticks introduces some arbitrariness. Is there more information in five one-share trades than in one five-share trade?

Volume bars address this issue of fragmentation. Here, bars are sampled whenever a given volume of shares has been traded. Even more robust than volume bars are dollar bars. The gist is than whenever a given dollar amount (or other currency) is reached, a bar is created. This makes intuitive sense because the dollar value of an asset changes over time. However, volume bars fail to account for this change in value. Dollar bars on the other hand are robust to this change or a change in the number of shares outstanding. Therefore the number of dollar bars over time is much more constant than the respective number for time or volume bars.

Next, I will have a look at so called information driven bars. These bars are based on the concept of market microstructure and aim to capture the occurrence of “informed trades”. As a comparison, the concept is similar to day traders watching an imbalance between red (bearish) and green (bullish) ticks on the time and sales tape. An imbalance may indicate information or other microstructure effects.

De Prado presents various imbalance bars. I will shortly introduce these without elaborating on the mathematical derivation. However, as mentioned above, there will be continuous applications of the topics covered on my GitHub site.

Tick Imbalance Bar (TIB): A tick imbalance bar is created whenever the imbalance of ticks exceed given expectations, where imbalance refers to an imbalance between ticks with a positive or negative price change. Accordingly, tick imbalance bars occur frequently and quickly when informed trading takes plays. We may therefore conclude that are candles which all contain the same amount of information.

Volume/Dollar Imbalance Bar (VIB/DIB): These two types of bars share the same idea. As tick imbalance bars, volume and dollar imbalance bars are related to order flow imbalances between bullish and bearish ticks. A bar is sampled whenever the volume or dollar imbalance exceeds a given expected threshold. Unlike TIBs, VIBs and DIBs do not depend on a constant bar size but adjust dynamically. Therefore, they are robust to corporate actions.

Runs bars monitor the sequence of trades in the overall volume of the ticks. Here, a bar is sampled whenever the length or accumulation of positive or negative ticks is especially large. The imbalance itself is not of importance here. Rather, the maximum of the length of positive of negative ticks is what these bars are looking for. As before, runs bars may be applied broadly. There are tick runs bars, volume runs bars and dollar runs bars to express various concerns.

That´s it so far for this entry.

Now, I want to try to implement these bars with code.