Doing Bad Data Science

I finished reading through Statistics Done Wrong e-book (free) which suggested the perils of bad statistical analysis.  The author goes on to ironically further suggest statistical evidence of rampant bad analytics in academia, as evidenced via fact-checking of published academic papers.  Maybe this is a sign that the author was pretty effective a teacher at teaching data skepticism?

After months of procrastinating, I’ve finally gotten around to learning the Python libraries numpy and pandas within the iPython environment, all of which wrapped in the Enthought Canopy IDE’s virtual environment. All of this is eerily similar to using R inside RStudio, minus the virtualization. I personally prefer doing all sorts of general computering in Python and I love the %timeit magic command in Enthought Canopy, which allows me to profile single lines of code while staying in the ipython environment. I’d say that knowing Python is merely psychological comfort because as numpy rhymes with the batteries-included data structures, pandas is more of a stranger.

I’m currently flirting with the idea of technical analysis, but drawing conclusions from asymmetric time series data just seems like something that would not work in theory. And in practice that data comes in simply too slow to compete with the professionals. Despite my own doubts, I’ve been enjoying the Poor Man’s High Speed Trading Platform, Quantopian. It has the neatest little forum that incorporates source code, message boards, and the ability to immediately run or modify the discussed code. I also found out about QuantConnect today, which is the C# alternative to Quantopian’s Python. Currently Quantopian is the livelier forum and has already begun beta testing of Interactive Broker integration, but I’ll be keeping an eye on both!

Greatest hits

  • 1,064 hits