Future of my TSCV package -- a letter to my users
Nearly two years ago, I developed a time-series cross-validation package, namely
tscv, which has since been widely adopted by scientists and quantitative traders worldwide.
Seeing ~1000 monthly downloads, I am delighted that I made some positive contributions to this world.
Meanwhile, in the last two years, a lot has happened to our world as well as to me.
Although I never for a second forgot my responsibility towards my users, I was, unfortunately, unable to maintain this package.
In consequence, as you may have noticed, this package is no longer compatible with
scikit-learn version 0.24 since two months ago.
To respond to this issue, I decide to restore the compatibility and enhance
tscv, and this post will witness my resolution.
Compatibility with Scikit-Learn v0.24 and onwards
In this section, I will first explain what happened in v0.24 and then my solution to restore the compatibility as well as the underlying reasoning guiding my solution.
The incompatibility results from the scikit-learn team’s decision to make the
safe_indexing function private.
safe_indexing was renamed as
This modification implies that the scikit-learn team can change the API of
_safe_indexing in every minor version upgrade (e.g.,
The net consequence is that, for third-party developers like me, we cannot rely on this function if we want to keep the compatibility within each major version family, or equivalently, across all minor versions (e.g.,
I have communicated with scikit-learn’s core developers and learned from their motive underneath this decision.
They told me that making
safe_indexing private gives them more freedom to expand the functionality of that function (e.g., indexing the new
I believe that this direction is a positive thing for the scikit-learn users.
Since scikit-learn is moving toward its first major version,
v1.0, with it being the renaming of the next version
v0.25, a private
_safe_indexing warrants the functionality expansion within the
That is, the users do not have to wait for
v2.0 for any compatibility-breaking enhancement.
To cope with the incompatibility, I have two choices, with the first one being keeping a forked copy of
By calling the
safe_indexing internally, I no longer need to worry about the compatibility.
The downside is that I cannot benefit from the evolution of
scikit-learn and thus limit the power of my
The second choice is to call the new
_safe_indexing instead so as to benefit from potential new features in
The downside is that I have to tune my package accordingly for every compatibility-breaking change the scikit-learn team makes.
I decide to take the second approach.
Indeed it may cause some trouble for us third-party developers, but this trouble is neglectable compared to the benefit aforementioned.
The effort to make my users enjoy the newest and most powerful features in
scikit-learn is worthwhile.
I intend to make
tscv compatible with every scikit-learn version onwards (>=
v0.22), and this will happen within the
As for older scikit-learn versions (<=
v0.0.5 version of
tscv (currently undergoing the stabilization process) will stay relevant.
I have released the first release candidate of
v0.0.5, and the binary can be downloaded here.
The final version is expected to come out by the end of the month.
If you notice any bug, please open a ticket in my GitHub repo.
Overlapped test sets
v0.0.5 also enables the feature known as overlapped test sets in the
From now on, you can use designs like the following:
|=======o**** | | =======o**** | | =======o****| = : train o : gap * : test
The level of overlap is controlled by the newly added
For instance, the above example has a
rollback_size of 2.
rollback_size is defaulted to 0 and must be less than the
rollback_size permits more cross-validation folds.
GapWalkForward class has 5 folds in default.
If a user wants to maximize the sample’s utility, say, with the first test set starting from the first data points, he will have to precompute the proper value of
It puts a burden on the users, especially when the
rollback_size parameter is in use.
This inconvenience results from the legacy implementation of
GapWalkForward, which is a subclass of the
_BaseKFold virtual class.
I reckon that it is not the optimal implementation and therefore am planning to re-implement it. It is not refactoring since refactoring should not change the API. Instead, I will overhaul the entire class, which will break the backward compatibility.
The overhaul will happen in
v0.1.0, which hopefully will be released by the end of April.
By then, my users will have the most flexible time-series cross-validation tool possible.
This feature will not be backported to the
v0.0.X line, and the old behavior will be deprecated in
It usually will not cause any trouble.
It will become an issue only when a user upgrades to
(Edit in 9 May: The
v0.1.0 but still wants to stick to the old behavior of
In this case, he can switch to the native
TimeSeriesSplit class of
scikit-learn, which is equivalent to
If he is not happy with
TimeSeriesSplit and wants the
v0.0.5 behavior implemented, he can open a ticket in the scikit-learn repository and @me.
v0.1.X line will still keep
GapWalkForward available for backward-compatibility; it is deprecated but not removed. The new functionality is incorporated in the
In contrast to a particular government that hides everything from its citizens and the rest of the world, I believe that transparency is the key to making our world a better place.
For this purpose, I wrote this letter to communicate the future of
tscv to my users.
I hope that transparency can make my work more reliable and make me more dependable.
I strive to make the best software for my users, and in return, I hope my users can support me.
Your support is vital to the release of
v0.1.0, which will also incorporate the continuous integration toolchain and documentation to make it more production-ready (see the
You can support me via the following methods:
- Be a sponsor.
- I have a short paper related to time-series cross-validation but not directly targeting this software. If it does not violate your academic integrity, please consider citing it (see README.md)
v0.0.5version will come out by the end of March. It will solve the compatibility issue and incorporate some enhancements. A pre-release is now available here.
v0.1.0version will come out by the end of April.
It will overhaul the(Edit in 9 May: it provides
GapWalkForwardclass to make it more flexible. The backward compatibility will be dropped.
GapRollForward, a more flexible and powerful cross-validator.)
- Please consider supporting my work.