In this post, I will make an introductory presentation about sketching, a statistical technique to handle large datasets. First, I will give the intuitive idea behind sketching, which is also the most important and valuable part of this post. Then, I will describe the various sketching algorithms in detail. Finally, I will give a non-exhaustive list of theoretical results concerning the soundness of sketching. Since this post is an introduction, I will build my presentation around Ordinary Least Square, which is the first topic in every machine learning course and arguably the most popular technique.
Zhang Shaogang is a television presenter in China, not esteemed but definitely controversial. His highly aggressive presenting style is beloved by many and equally hated by many. He had his glory but also has undergone major career failures. This post will analyze the reason behind his ebb and flow and contrast it with western presenters with equal (if not more) aggressiveness.
Not all tunnels are born equal; some can be your tomb. Not all vaccines are born equal; some can revive the epidemic. Not all software are born equal; some can cause air crashes. In this post, I will show you how a meticulous programmer can uncover the truth that even Richard Sutton has missed.
As a chronic sufferer of impostor syndrome, I always know what it is and how it is formed. Nevertheless, I never get ashamed; actually, it pushes me forward and helps me surpass myself again and again. To some extent, I am grateful and even feel proud of it. Although I am a master of impostor syndrome and proactively use it as a weapon, I never found a proper mathematical model to describe it until I came across Simpson’s paradox once again recently. In this post, I will explain both concepts and make the link between them.
Recently my girlfriend wanted to know whether I love her more than she loves me. For this purpose, she asked me to rate my love in a scale of 1 to 10. To escape from her interrogation, I returned the same question to her. It ended with that both of us wanted to know how deep the other’s love is, but neither wanted to disclose his/her own secret. As a smart solution expert, I proposed to treat this dilemma as Yao’s Millionaires’ Problem.
A package is a group of reusable modules organized in one or a hierarchy of folders. Although modules themselves are already reusable without being bundled in a package, a package structure allows the code to be published and used by other programmers. This blog post addresses some advanced package development issues which are not present in module development.
Nearly two years ago, I developed a time-series cross-validation package, namely
tscv, which has since been widely adopted by scientists and quantitative traders worldwide.
Seeing ~1000 monthly downloads, I am delighted that I made some positive contributions to this world.
Meanwhile, in the last two years, a lot has happened to our world as well as to me.
Although I never for a second forgot my responsibility towards my users, I was, unfortunately, unable to maintain this package.
In consequence, as you may have noticed, this package is no longer compatible with
scikit-learn version 0.24 since two months ago.
To respond to this issue, I decide to restore the compatibility and enhance
tscv, and this post will witness my resolution.
The accepted papers in NeurIPS 2020 have been announced. This year we have 1899 accepted papers. I have compiled the metadata of all these papers, based on which I can see the laureates of this year’s conference. To determine the laureates, for both individuals and organizations, I used the following four criteria: author contribution index, first author index, organization influence index, and organization sustainability index.
When I started the PhD, I knew nothing about the academia and thus spent a lot of time and efforts in mining the unspoken rules. I wish that someone could have lent me a hand, rather than leaving me wandering in the darkness. This painstaking experience has inspired me to help those younger so that they could have a smoother sailing in their intellectual journeys. In this post, I will try something similar but more profound.
Recently, some reader asked me whether my time series cross-validation package
TSCV can be used for nested cross-validation.
I mulled it over and found the answer to be favorable.
I planned to tell him this good news, but the answer quickly became lengthy.
Therefore, I decided to turn the answer into a standalone post to address this question.
In the following, I will explain the concept of nested cross-validation and its advantage as well as how to use
TSCV or any similar packages for it.
The same content is also hosted on GitHub.
If you have any question, you can ask in either place (preferably in both places).
All men are equal, but not all matrices have inverses. For instance, rectangular matrices do not have inverses; square matrices without full rank do not have inverses. The matrix rights activists (i.e. E. H. Moore, 1920; Arne Bjerhammar, 1951; and Roger Penrose, 1955) among mathematicians thus stood out and spoke for these computationally unfavored matrices. Thanks to their continual efforts, every matrix finally got an inverse, dubbed the Moore-Penrose (pseudo) inverse. These previously unfavored matrices have since contributed to the academia and revolutionized statistics and machine learning. In memory of its 100th anniversary, let me talk, in this post, about the Moore-Penrose inverse and its applications.
In August, I got interested in Amazon Web Service (AWS) and spent some time to get an AWS Cloud Practitioner certificate. To put into practice what I have learned during the training, why not develop a web application, I asked myself. Thus, I decided to create a Plotly Dash dashboard and deploy it on AWS. The service that I chose is AWS Elastic Beanstalk. You can find, on the Internet, several guides written by amateurs to teach you how to deploy Dash on AWS. However, there is something lacking in all these guides. Therefore, I, also an amateur, decided to write a guide myself. In the following, I will show you how to achieve this “feat” step by step. To understand this guide, it is a prerequisite to know how to develop a Dash application and what AWS Elastic Beanstalk is.
Earlier this month (July, 2019), mathematician Hao Huang posted a proof of the Sensitivity Conjecture, which has troubled mathematicians for 30 years. To people’s surprise, this proof is only 2 page’s long and involves only undergraduate level math. On the Internet, you can find some reports, written for the general public, about the background story and the interpretation of the sensitivity conjecture. Also, several experts, such as Terence Tao, are elaborating on it. Here, writing for students and non-experts, I will summarize the key steps in Hao Huang’s proof, in an attempt to help them quickly grasp the essential.
This guide documents one code style of static, class, and abstract methods in Python. Following this style, your code can be run in both Python 2.X and Python 3.X.
Abstract. This post introduces the concept of Nash Equilibrium into Pokémon GO meta-game, with the intention to build a minimalist PvE list. The idea is to build an all-round team for gym battles or for boss raids with few Pokémon. This approach allows the players to concentrate their resources to build a small team of strong Pokémon instead of a large group of mildly strong Pokémon. To demonstrate the usage of the Nash Equilibrium, I use Timeout as the win condition for gym battles and total damage output for boss raids. The resulting minimalist lists are for reference; I provided, at the end of the post, the dataset and the code necessary for the readers to build their own minimalist lists.
Yugioh Duel Links is a digital collectible card game (CCG), which could be played on mobile devices. Like many other CCGs, there is an in-game Ladder system, where players compete with each other to prove themselves as the best duelist in the world. However, many players complain about the mechanism of this Ladder and suggest replacing it with the Elo system. In this post, I can show you, thanks to the cutting-edge research of DeepMind, that the Elo system, or any other systems using averaging, is unavoidably inefficient for Duel Links.
Many talk about data science and machine learning with enthusiasm, but few know about one of the most important building components behind them – convex optimization. Indeed, nowadays nearly every data science problem will first be transformed into an optimization problem and then solved by standard methods. Convex optimization, albeit basic, is the most important concept in optimization and the starting point of all understanding. If you are an aspiring data scientist, convex optimization is an unavoidable subject that you had better learn sooner than later.
In this post I will discuss one of the two best papers in ICML 2018 – Delayed Impact of Fair Machine Learning. Contrary to other papers constructing various innovative definitions of fairness, this paper analyzes the delayed impact of fairness policy. It shows that these policies do not necessarily improve the situation of the disadvantaged population: It may hurt them, in some cases, in the long run.
TL;DR: Evolutionary game theory, initially developed for biology, has been successfully applied to other domains such as economics, sociology, and anthropology. This post will use it to explain the various meta of Yugioh Duel Links and predict the nonexistence of the kind of meta which is simultaneously diverse, accessible, and without being Rock–paper–scissors.
Image from Gustavo Santos at DeviantArt
Free-to-play (F2P) and Pay-to-play are two different business models. Pay-to-play requires the players to make a fixed-amount purchase to play the game. Examples include World of Warcraft, Eve Online and so forth. Free-to-play usually adopts a freemium model and does not require any payment in advance to access the game. Instead, players can make in-game micropayments to enhance their gaming experience, which is also the method with which the company makes money. Examples include League of Legends, Dota 2, Hearthstone, Clash Royale, Pokémon GO and so forth. F2P has been a great commercial success, and more and more online games follow this model.