Python Speed-Optimizations
Benchmarking
Howto
sudo python3 -m cProfile -o profile.pstats /opt/shepherd/software/python-package/shepherd_sheep/cli.py -v run --config /etc/shepherd/example_config_emulation.yml
python -m pip install snakeviz
snakeviz.exe .\profile.pstats
sudo python3 -m cProfile -o profile.pstats /opt/shepherd/software/python-package/shepherd_sheep/cli.py -v inventorize
# ⇾ cleaner call
sudo python3 -m cProfile -o profile.pstats /usr/bin/shepherd-sheep -v inventorize
PyCode-Performance
Name |
CPU-Load |
Description |
---|---|---|
h5-NoCompression |
77 % |
47 mb / 30s |
h5-lzf |
89 % |
24 mb/ 30s |
loggerOpt1 - ifVerb |
84 % |
same |
loggerOpt2 - traceOff |
83 % |
|
profiler1 |
83 % |
|
with monitors |
91 % |
tevent.wait() instead of time.sleep() |
same w/o profiling |
81 % |
looking at individual commands - also imports
sudo python3 -m timeit -n 1 -r 1 "import shepherd_core"
# 12.8 s on sheep, pydantic 2.4.0
# 11.5 s on sheep, pydantic 2.4.1
# 11.9 s on sheep, pydantic 2.4.2
# 12.4 s on sheep, pydantic 2.5.0 - new pydantic import-improvements?
# 12.1 s on sheep, pydantic 2.5.2
# 12.1 s on sheep, pydantic 2.6.1
# 12.8 s on sheep, pydantic 2.7.0 / 2.18.1 - with lib 24.4.2 (16.6s without cache, 17.7/13.8 deferred)
# sudo rm /root/.cache/shepherd_datalib/fixtures.pickle
#
sudo python3 -X importtime -c "import shepherd_core"
# to shell
sudo python3 -X importtime -c 'from shepherd_core.data_models.task import EmulationTask' 2> profile_pydantic.csv
# to file, now only replace some symbols by ";" and open with excel to sort
In addition, there are benchmarks in the main-repo: https://github.com/orgua/shepherd/tree/main/software/python-package/tests_manual
Optimization - Low-hanging Fruits
These things can be done to reduce cpu load:
ditch lzf-compression for writing files
compression-tradeoff: double the file-size for ~ 12 less percent-points cpu-load (absolut value)
disable logging-sys-monitors (cpu/ram-usage, sync … get written to h5-file) ⇾ per argument
reduce python-logging ⇾ verbose level argument
wait for python 3.11 and later ⇾ four performance-stages for cpython are planned (https://github.com/markshannon/faster-cpython/blob/master/plan.md)
omit writing of timestamp
overhaul buffer-exchange (segmented ring-buffer should go)
according to forum the CPU can be overclocked by 60%, RAM by 50%
sudo cpufreq-set -g powersave ⇾ 230mA @ 300 MHz
sudo cpufreq-set -g performance ⇾ 330mA @ 1 GHz
cpufreq-info
More Complex, but still easiest optimizations beyond that
datastream from memory-carveout to hdf5-file should be ported to cython (seems to be possible)
replace h5py by a more performant lib?
“click” seems to be slowing down start of programs substantially (is it parsing the whole codebase?)
Logging Performance-Impact
Logging-module of python has serious performance impact
example: 4*10 msg/s in debug-mode are >20 % overhead on BB
follow https://docs.python.org/3/howto/logging.html#optimization
avoid assembling these 4 most critical fast-Strings
init.py/emulator.return_buffer(), external verbose
datalog.py/LogReader.read_buffers(), generator with internal verbose ⇾ good enough
shepherd_io.py/ShepherdIO.get_buffer(), external verbose
SharedMem.read_buffer(), external verbose & GPIO-Msg disabled
try to avoid collection of useless data (thread,process,_srcfile)
Implemented Improvements:
warn in config-yamls about impact of verbose>2
avoid overhead (assembling strings, jump in logging-routines) by hiding verbose code in if-branch
reduce or consolidate debug-code
General python-Cleanups
range(len(x))
⇾enumerate(x)
initializes
list([])
⇾[]
,dict()
⇾{}
allow resizing the fifo-buffer, largest value seems to be 107 (< 10k pages)
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
not needed
str()
casting for paths before open(), and a lot of other castings removedasyncio.sleep()
orthreading.Event().wait()
in code? ⇾ use sleep(), .wait() has small overheadcompile h5py for beagle ⇾ fails, see below
cython, numba, nuitka, pypy: https://doc.pypy.org/en/latest/faq.html
Updating Py-Libs (without compiling)
Watch out for H5Py improvements ⇾ main load according to profiler
Profiling-results for h5py-3.4
runtime |
Function |
---|---|
624 s |
total runtime |
26 s |
h5.shape |
96 s |
sleep |
34 s |
h5.datalog.read_buffers.getitem |
447 s |
h5.datalog.write_buffers |
184 s |
h5.datalog.?.getitem(h5.group.py) |
103 s |
h5.datalog.?.setitem_(h5.dataset.py) |
Update H5Py
sudo /usr/bin/python3 -m pip show h5py
# ⇾ v2.1?
sudo /usr/bin/python3 -m pip list --outdated
sudo /usr/bin/python3 -m pip install --upgrade wheel h5py
# ⇾ v3.4
updated numpy is giving libblas-trouble
sudo /usr/bin/python3 -m pip uninstall numpy scipy
sudo apt --reinstall install python3-numpy python3-scipy
# further update all packets
sudo /usr/bin/python3 -m pip install --upgrade click cryptography decorator distlib
# failing because of distutil greenlet: gevent platformdirs pybind11 msgpack-numpy
sudo /usr/bin/python3 -m pip install --upgrade pyyml six virtualenv zope.event zope.interface
# another distutils: xdg
sudo /usr/bin/python3 -m pip install --upgrade --force-reinstall h5py --no-binary :all:
# ⇾ still fails libhdf5.so after over 1h
# lib-experiments
sudo /usr/bin/python3 -m pip install --upgrade --force-reinstall h5py numpy scipy
sudo apt install python3-dev gfortran libopenblas-base liblapack3 libopenblas-dev liblapack-dev libatlas-base-dev
libopenblas* liblapack*
sudo apt remove libopenblas-base # could be the culprit that overwrites the one working and needed lib
# https://stackoverflow.com/a/34956540
h5py-compilation-cookbook from kai (slightly modded):
sudo apt-get install libhdf5-dev
sudo pip3 install --upgrade cython
ln -s /usr/include/locale.h /usr/include/xlocale.h
#sudo /usr/bin/python3 -m pip uninstall numpy h5py
#sudo /usr/bin/python3 -m pip install --only-binary=numpy numpy==1.17.5
sudo /usr/bin/python3 -m pip install --no-binary=h5py h5py
# ⇾ v3.4, created wheel filename=h5py-3.4.0-cp39-cp39-linux_armv7l.whl size=5487437
# ⇾ relatively quick, but no benefit to precompiled version