Approaching acceptable performance

I finally did the timings of _sre on pypy-c, the C translated PyPy:

Pure literals: re.search(r'bar', 'bazbarfoo')
100 passes took 1.067957, 0.010680 per pass
Classes and stuff: re.search(r'\d+.\d+\s\w{,2}', 'Price 144,50 USD')
100 passes took 1.278839, 0.012788 per pass
Branching and grouping: re.search(r'<(strong|b|em)>.+?', 'Bla <em>bla</em>')
100 passes took 1.369234, 0.013692 per pass

I am pleasantly surprised, this is only around 1000 times slower than CPython with _sre.c. With the many general PyPy optimizations scheduled to happen soon we can expect this number to shrink considerably. Now really interesting is the comparsion with faked _sre (PyPy using _sre.c):

PyPy with faked _sre:
Pure literals: re.search(r'bar', 'bazbarfoo')
100 passes took 1.513024, 0.015130 per pass
Classes and stuff: re.search(r'\d+.\d+\s\w{,2}', 'Price 144,50 USD')
100 passes took 1.485282, 0.014853 per pass
Branching and grouping: re.search(r'<(strong|b|em)>.+?', 'Bla <em>bla</em>')
100 passes took 1.494799, 0.014948 per pass

Translated _sre performs pretty much the same as and even slightly better than faked _sre! Granted, I’m comparing apples (PyPy on top of CPython) to oranges (translated PyPy) here, but still, this really amazes me. In any case, this shows that PyPy is definitely on the right track.

Comments are closed.