Recently, I heard or read somewhere that the best way to learn something is twofold.
- Explain the topic to someone else.
- Write down your thoughts. This forces you to express your understanding in a concise manner and consequently shows if you really do understand the concept.
That´s why I have decided to again write more about the things I am learning about.
Currently, I am studying Marcos López De Prado´s great, yet for a machine learning novice like me also challenging, book Advances in Financial Machine Learning. In the following months, it is my plan to introduce some summaries about the concepts covered in the book.
After an introductory first chapter, the second chapter of the book introduces some interesting approaches I had not heard about before. Firstly, the topic of “bars” is covered.
On this blog, I generally aim to explain the theory proposed in the book. Code implementations and tentative solutions to exercises of the book will be implemented on my GitHub page soon: https://github.com/Herrsosa/Advances_F_ML
There are various types of bars in finance. Each bar has different advantages and disadvantages and even more importantly special statistical properties. Probably the most famous and most used type of bars in the trading community are time bars, which are sampled according to specific time frames, i.e. one minute, five minutes, one hour etc. De Prado argues, that in machine learning these bars are not desirable because of two reasons. Firstly, the market action throughout a given day does not flow at a constant rate of time. Rather, the open and close are usually much more volatile than the time during noon for example. Therefore, conventional time bars oversample low-activity, low-information times. Secondly, time bars exhibit poor statistical properties. They are often serially correlated and are subject to heteroscedasticity.
Below, you see one minute time bars throughout the day on the S&p 500 ETF SPX. Each minute, one bar is created
The next type of bars, tick bars, partly addresses these issues encountered when dealing with time bars. Tick bars establish a proxy for the arrival of information. A bar is created every time a predefined number of ticks has been reached. If a lot of information hits the market, more ticks and consequently more bars result. Therefore, these bars circumvent the constant market flow problem because bars are sampled at the rate at which information flows. Further, the statistical properties are better than for time bars. However, the possible fragmentation and number of the ticks introduces some arbitrariness. Is there more information in five one-share trades than in one five-share trade?
Volume bars address this issue of fragmentation. Here, bars are sampled whenever a given volume of shares has been traded. Even more robust than volume bars are dollar bars. The gist is than whenever a given dollar amount (or other currency) is reached, a bar is created. This makes intuitive sense because the dollar value of an asset changes over time. However, volume bars fail to account for this change in value. Dollar bars on the other hand are robust to this change or a change in the number of shares outstanding. Therefore the number of dollar bars over time is much more constant than the respective number for time or volume bars.
Next, I will have a look at so called information driven bars. These bars are based on the concept of market microstructure and aim to capture the occurrence of “informed trades”. As a comparison, the concept is similar to day traders watching an imbalance between red (bearish) and green (bullish) ticks on the time and sales tape. An imbalance may indicate information or other microstructure effects.
De Prado presents various imbalance bars. I will shortly introduce these without elaborating on the mathematical derivation. However, as mentioned above, there will be continuous applications of the topics covered on my GitHub site.
Tick Imbalance Bar (TIB): A tick imbalance bar is created whenever the imbalance of ticks exceed given expectations, where imbalance refers to an imbalance between ticks with a positive or negative price change. Accordingly, tick imbalance bars occur frequently and quickly when informed trading takes plays. We may therefore conclude that are candles which all contain the same amount of information.
Volume/Dollar Imbalance Bar (VIB/DIB): These two types of bars share the same idea. As tick imbalance bars, volume and dollar imbalance bars are related to order flow imbalances between bullish and bearish ticks. A bar is sampled whenever the volume or dollar imbalance exceeds a given expected threshold. Unlike TIBs, VIBs and DIBs do not depend on a constant bar size but adjust dynamically. Therefore, they are robust to corporate actions.
Runs bars monitor the sequence of trades in the overall volume of the ticks. Here, a bar is sampled whenever the length or accumulation of positive or negative ticks is especially large. The imbalance itself is not of importance here. Rather, the maximum of the length of positive of negative ticks is what these bars are looking for. As before, runs bars may be applied broadly. There are tick runs bars, volume runs bars and dollar runs bars to express various concerns.
That´s it so far for this entry.
Now, I want to try to implement these bars with code.