Just a few commands without any context:
Profiling with cProfile
This helped me to find slowest functions, because when optimizing, I need to focus on these (best ration of work needed vs. benefits). This helped me to find function which did some unnecessary calsulations over and over again:
$ python -m cProfile -o cProfile-first_try.out ./layout-generate.py ...
$ python -m pstats cProfile-first_try.out
Welcome to the profile statistics browser.
cProfile-first_try.out% sort
Valid sort keys (unique prefixes are accepted):
cumulative -- cumulative time
module -- file name
ncalls -- call count
pcalls -- primitive call count
file -- file name
line -- line number
name -- function name
calls -- call count
stdname -- standard name
nfl -- name/file/line
filename -- file name
cumtime -- cumulative time
time -- internal time
tottime -- internal time
cProfile-first_try.out% sort tottime
cProfile-first_try.out% stats 10
Sat Aug 12 23:19:40 2017 cProfile-first_try.out
18508294 function calls (18501563 primitive calls) in 8.369 seconds
Ordered by: internal time
List reduced from 2447 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
27837 4.230 0.000 5.015 0.000 ./utils_matrix2layout.py:14(get_distance_matrix_2d)
10002 1.356 0.000 1.513 0.000 ./utils_matrix2layout.py:244(get_measured_error_2d)
5674796 0.572 0.000 0.572 0.000 /usr/lib64/python2.7/collections.py:90(__iter__)
5340664 0.219 0.000 0.219 0.000 {math.sqrt}
5432768 0.189 0.000 0.189 0.000 {abs}
230401 0.183 0.000 0.183 0.000 /usr/lib64/python2.7/collections.py:71(__setitem__)
1 0.178 0.178 0.282 0.282 ./utils_matrix2layout.py:543(count_angles_layout)
10018 0.119 0.000 0.345 0.000 /usr/lib64/python2.7/_abcoll.py:548(update)
1 0.102 0.102 6.749 6.749 ./utils_matrix2layout.py:393(iterate_evolution)
1142 0.092 0.000 0.111 0.000 /usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py:1299(svd)
To explain the columns, Instant User’s Manual says:
- tottime
- for the total time spent in the given function (and excluding time made in calls to sub-functions)
- cumtime
- is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
Lets compile to C with Cython
Simply performing this on a module which does most of the work gave me about 20% speedup:
# dnf install python2-Cython $ cython utils_matrix2layout.py $ gcc `python2-config --cflags --ldflags` -shared utils_matrix2layout.c -o utils_matrix2layout.so
There is much more to do to optimize it, but that would need additional work, so not now :-) Some helpful links:
- use python2-config to get compile and linking options
- when you want to create *.so instead of executable, you need to use
-shared - bloq with summary and nice FAQ