Learnings from other Testbeds
notes from meetings with creators of established testbeds
with pitfalls and (possible) Solutions
significance for shepherd is incorporated
Server
have an upgrade path
currently TUD/ZIH offers vServers and plenty of Storage from its datacenters, can be changed, expanded
current server: 2 Cores, 4 GB RAM, 50 GB local Storage, 10 TB network storage
vmware-tools have option to set cores and ram to 64 (for privileged users)
avoid windows OS
splitting servers can be an option (storage with message broker, web-interface, computing)
Data-Management
database (not a must)
allows easy analysis of data
immediate access is nice! real time or direct access for devs (even jupyterNB)
needs lots of RAM
easiest interface is a socket with json-interface ⇾ inefficient for embedded systems, can often be avoided
can produce downloadable data on the fly (at least csv)
hdf5 / raw-data
needs more custom code
analysis not available right away
database-choices
influxDB allows nanosec-timestamps as a series
postgreSQL works for weeks
timescaleDB, inlux competitor, seems to scale better with large number of devices (>1000)
leave headroom, node-resources can easily become the bottleneck
using third party libs is always a good idea, but can come with performance penalties
usually compute-load can partially be moved to server
there are often performance optimized forks of established libs around (if needed)
established ways / libs to stream data out
protobuf (can be slow for python)
rabbitMQ / kombu
RPC
is privacy is feature? avoid personal data (OAuth, ActiveDirectory), make recorded data private (even only delete-access for web-admins)
grafana seems to have trouble with big datasets
Nodes
lockups ⇾ powercycle
we (shepherd) may not have the chance to control POE
there is an external watchdog on the shepherd-capelet
there should also be a watchdog integrated in cpu (untested)
until now i (ingmar) had no trouble with the nodes, software base seems solid
heat through sun, radiators, enclosure
degrades cpu, storage
BBone stays below 40 deg, case will support convectional cooling
avoid storage without wear-leveling
sd-cards are not up for constant data-storage (and industrial SLC-Versions are expensive)
large usb-sticks tend to have wear leveling ⇾ ours probably hasn’t
BBone has eMMC for the linux image, support for wear leveling is unclear
filesystems like F2FS have a software-based wear-leveling
possible solution
keep results in ram (avoid local storage bottleneck) and stream to server
use usb-stick for local storage of energy-traces (more static data)
keep system read-only (only deactivate for updates)
have replacement BBones at hand (automate installation)
collect logs
gives debugging-hints in events of unwanted behavior
temperature, ram-usage, cpu-usage, …
target repurposing
keep functionality general so it can be used for dev, testing, …
shepherds nRF-Target already has more gpio on edges and optional usb-port
alternative platforms (copy from 11_concept_hw)
zynq (arm-cores + FPGA + shared mem)
pro: similar price as BBAI, 1 GBE, FPGA in same Package, hw-timestamping
con: xilinx-toolchain, documentation is overwhelming, community small, long dev-cycle
embedded amd platform (v2000)
pro: compute-power, relatively cheap (>= 200€), fast ethernet, x86-64
con: real-time not out of the box, low gpio-count
slow software controlled GPIO could be a disadvantage
sensor and actuator control can have tight timing constraints
BBone PRU can also take control over gpio (but this is static / boot time decision)
observer should always have control over target-reset
shepherd can control soft-reset over jtag/swd
shepherd can cut power and power-cycle target
add resistor bridge just for safety
Web
flask-framework is sufficient (same with bigger brother django)
user-management should also include groups (to share data-pools)
experiment-scheduler could be done with rabbitMQ (message broker)
Testbed
secure against miss-use
nodes can vanish, wander
fixate the boxes :)
make them unobtrusive (glue below desk, non-transparent-case)
BUT be transparent about the function
user-scripts can get system-access (heavy cpu-load, buffer-overflow, damage linux-partition)
put in sandbox
limit py-code to shepherd-framework / no other libs
run as special user - only essential permissions
set to lowest cpu-priority (nice-level)
voltage source could destroy hardware
gpio- and supply-voltage for shepherd-target are always linked
target-pcb has over-voltage-protection
experiment-management will check voltage of emulated regulator to match target-constraints
sanity-check everything!
real world testing required
maintenance / development can easily occupy 1P full time
documentation for the next devs
TODO
try to design low maintenance, multipurpose, high functionality / speed / quality
Filesystem
f2fs for usb-sticks
find read-only-switch for system partition
is there an easy way to integrate a fresh BBone into the system?
test target-programming with current shepherd design
pyOCD could be an alternative
BBone
is cpu-usage really 70% during emulation?
do a performance profiling, find bottlenecks
raw data could be sent server, less overhead for BB