What can we learn from the Simpson's Paradox?

The Simpson’s Paradox is one of the most well-known paradoxes in statistics. A quick google will find plenty of blog posts (many from the data science community) about this puzzling phenomenon. It is clearly a topic of real-world significance. There seem to be some important lessons that we are supposed to learn from it. But what are those lessons? Is it nothing more than a cautionary tale about how easy it is for data analyses to go wrong?
Read more

A mind-boggling analogy between machine learning and quantum physics

A recent paper published in PNAS titled “The Fermi-Dirac distribution provides a calibrated probabilistic output for binary classifiers” caught my attention, because it describes a surprising relationship between machine learning and quantum physics. In fact, surprising is an understatement. Mind-boggling is more like it. According to the analogy developed by the authors, positive samples in binary classification problems are like… fermions?! What?! I decided that I should try to understand the gist of this paper, at least to the extent that I can.
Read more

Heh, Emacs LISP function! Lemme give you a piece of advice!

Did you know that in some programming languages, you can give a function a piece of advice? The basic idea is this: if you are using an application or a library written by somebody else, what can you do if you need to modify the behavior of a particular function? You could modify its source code, if your version improves it for everybody. However, if you only want to customize it for your personal needs, a more lightweight solution might be desirable.
Read more

Beautiful ideas in programming: generators and continuations

In this post, I’ll summarize what I’ve learned from an attempt to gain a deeper understanding of two important concepts in programming: Python’s generators and Scheme’s continuation. The aim is not to teach Python or Scheme programming. Rather, what I want to do is to demonstrate that generators are special cases of a much more powerful construct - continuations. Continuations allow programmers to invent new control structures, and it is the foundation upon which iterators, generators, coroutines, and many other useful constructs can be built.
Read more

Simple exercises with grep, sed and awk in org-mode

For text processing, I had never bothered to learn classic Unix tools such as sed and awk, because I can always use Python's regular expression library. The syntax of sed and awk just appeared to be too arcane to me. However, recently I realize that for many simple ad-hoc tasks, even writing a Python script is too much overhead. This motivated me to learn to use regular expressions directly in the command line.
Read more