Doing Bad Data Science

I finished reading through Statistics Done Wrong e-book (free) which suggested the perils of bad statistical analysis.  The author goes on to ironically further suggest statistical evidence of rampant bad analytics in academia, as evidenced via fact-checking of published academic papers.  Maybe this is a sign that the author was pretty effective a teacher at teaching data skepticism?

After months of procrastinating, I’ve finally gotten around to learning the Python libraries numpy and pandas within the iPython environment, all of which wrapped in the Enthought Canopy IDE’s virtual environment. All of this is eerily similar to using R inside RStudio, minus the virtualization. I personally prefer doing all sorts of general computering in Python and I love the %timeit magic command in Enthought Canopy, which allows me to profile single lines of code while staying in the ipython environment. I’d say that knowing Python is merely psychological comfort because as numpy rhymes with the batteries-included data structures, pandas is more of a stranger.

I’m currently flirting with the idea of technical analysis, but drawing conclusions from asymmetric time series data just seems like something that would not work in theory. And in practice that data comes in simply too slow to compete with the professionals. Despite my own doubts, I’ve been enjoying the Poor Man’s High Speed Trading Platform, Quantopian. It has the neatest little forum that incorporates source code, message boards, and the ability to immediately run or modify the discussed code. I also found out about QuantConnect today, which is the C# alternative to Quantopian’s Python. Currently Quantopian is the livelier forum and has already begun beta testing of Interactive Broker integration, but I’ll be keeping an eye on both!

Python 3.0 Print Oddity

English: Python logo Deutsch: Python Logo

Image via Wikipedia

So I tried out the python doc’s code sample for HTMLParser in the shell, but modified it a bit for Python 3.0 and also my own purposes.  It seems like if you just re-run the class in IDLE, the behavior will update but old print statements will persist.  So strange! I wonder why?

from html.parser import HTMLParserclass MyHTMLParser(HTMLParser):    def handle_starttag(self, tag, attrs):        if tag == 'p':            print ("Encountered a P tag:", tag)            print ("Attrib:", attrs)    def handle_endtag(self, tag):        if tag == 'p':            print ("Encountered  an end P tag:", tag)    def handle_data(self, data):        print ("Encountered   some data:", data)parser = MyHTMLParser()parser.feed('<html><head><title>Test</title></head>'            '<body><h1><p>Parse me!</p></h1></body></html>')

Looks like the correct behaviour, but why does the original print output get displayed?

Encountered   some data: Test
Encountered a start tag: p
Encountered   some data: Parse me!
Encountered  an end tag: p

This must be some residue of the previous Python version, although the docs suggest I should be able to do everything as before except wrap print parameters with the parentheses.  

 

Old: print "The answer is", 2*2New: print("The answer is", 2*2)

 

I guess in Python 3+, only the str.format is safe to use for all occassions — at the moment.  

Enhanced by Zemanta

 

Learning Python Online for (mostly) Free and for Fun

Python logo

Image via Wikipedia

I’ve spent the past few months learning Python independently, but I found so many free and friendly online resources that the task of learning could not have been easier.

At first I was discouraged by the spartan IDLE interactive environment, but luckily Microsft released their Python Tools for Visual Studio for either Cpython or IronPython! Having been spoiled by the niceties of the VS IDE, this much needed Python support was the whole package for me: adding debugging, intellisense, interactive environment, project creation support.

Starting from a clean slate free of any Pythonic knowledge, I began month one going over the chapters on the now-defunct Diveintopython3.org site and doing all practice problems on CodingBat.com, which cover some language basics.  I supplemented these with 25 problems from ProjectEuler.net, which is of course language agnostic.  

Since Diveintopython3.org doesn’t exist anymore, I’d say the next best equivalent is learnpython.org (still incomplete for the advanced topics at this time) and LearnPythonTheHardWay.  

In the next month, I did all practice problems on Pyschools.com which gives some practice in a bit more depth to Python language features.  I also found that doing all of the problems on CodeEval.com make for great practice in any language, but I have thus far used them exclusively to practice Python.

I’ve also read the book Data Structures and Algorithms with Python, which takes advantage of Python to create a most practical and beginner friendly introduction to this subject area.  I found these examples to be very educational, except for the very last chapter on trees.  It seemed as if the author got a bit sloppy towards the end with the sample code and didn’t quite finish the last chapter for complexity discussion.  Although this book is not free, it’s highly recommended to programming beginners looking to get more tools under their belt.

Related links

Enhanced by Zemanta

Greatest hits

  • 1,046 hits