Hunting a Memory-leak (h5py)
First Proof
RAM-Increase 5% (24mb) in 10 min, slower but steady increase later on
sheep starts with 13.2 % of system memory ⇾ after 5000 s it uses 28 % already
setup: 10 h input file, no output-writing for V & C & GPIO
mem-profiler shows slightly asymptotic behavior ⇾ maybe normal lazy garbage collection depending on free ram?
after not even 6h the BB became unstable due to saturated RAM
TLDR: h5py was the culprit ⇾ reading from files
Investigation
only a few possibilities to leaking memory in python ⇾ look for circular references and custom del()-methods
try to avoid exception-handling as a default-strategy in mainloop ⇾ only in shepherdio._get_msg() ⇾ no difference
file-descriptors or other things without calling close() can leak ⇾ not the case here
tracemalloc-profiler is in stdlib ⇾ brings no clue as mem usage and peak settle at a low value
constant timejumps and higher cpu-usage after 30000 s or 464 of 484 mb RAM
profile code with pympler, tracker, muppy, … (https://pythonhosted.org/Pympler/muppy.html)
finds nothing, code must hide in cpython (compiled libs) out of scope for profiles
valgrind ⇾ powerful, but too slow to work
sudo valgrind --tool=memcheck shepherd-sheep -vv run --config /etc/shepherd/example_config_emulation.yml
sudo valgrind --tool=memcheck --leak-check=yes shepherd-sheep -vv run --config /etc/shepherd/example_config_emulation.yml
chap https://stackoverflow.com/questions/61288749/finding-memory-leak-in-python-by-tracemalloc-module
fil, python memory profiler, https://pythonspeed.com/fil/docs/fil/what-it-tracks.html
trouble as arm is not natively supported, but github-issue for arm-macos gives a fix
#sudo /usr/bin/python3 -m pip install filprofiler
sudo apt install rustc
pip install git+https://github.com/pythonspeed/filprofiler.git#egg=filprofiler
fil-profile run --no-browser shepherd-sheep -vv run --config /etc/shepherd/example_config_emulation.yml
# ⇾ fails to compile for armV7 ⇾ missing SYS_mmap2
Disable Submodules
candidates: logging, memread, h5pywrite, compression ⇾ one by one
loglevel = 0
disable h5-writer & compression
Mods to allow uninterrupted testing
pru0/main.c, line99, //send_status(…NOFREEBUF
pypkg/init.py, line 626, start_time = + 25
not use click and logging (logging.getLogger(name).addHandler(NullHandler()))
rec: mem-usage is growing 11.3? to 12.9 % in 10min, 50% CPU
emu1: 13.0 to 13.6.. %, 22 % CPU
emu2: 14.1 to 15.3, 55 % CPU ?? ⇾ why not ~80 % ?
also replace shared_mem.read_buffer() by random-data
emu1: 11.7 to 13.7 % ⇾ ram-usage stays between
emu2: to 16.3 %
also replace .get_msg/buffer and emu.return_buffer() by dummy, also gc.collect() in between
untrottled run on 100% cpu
emu1: 11.7 to 13.7
emu2: 13.7 to 16.4
also skip hdf5-writing
emu1: 11.6 - 13.5
emu2: 13.6 - 14.6 ⇾ improved memory - for real?
also skip databuffer-Class
e12: up to 14.8
reading or writing is problem? one h5py-issue mentions vlen-type
rec. 10.x - 12.5
removing lzf again
rec. 10.7 - 11.3
isolated datalogger, 25 min sim,
rec 5.6 - 7.9 % (seems to be maxed there), emu
emu 6.0 % - 16 % (after 2330 s) ⇾ that’s the bug! reading from h5py, (lzf?)
with this result the code could be isolated and the bug is reproducible with pypthon 3.10, windows and even on x86/64bit