Learnings from other Testbeds

notes from meetings with creators of established testbeds
with pitfalls and (possible) Solutions
significance for shepherd is incorporated

Server

have an upgrade path
- currently TUD/ZIH offers vServers and plenty of Storage from its datacenters, can be changed, expanded
- current server: 2 Cores, 4 GB RAM, 50 GB local Storage, 10 TB network storage
- vmware-tools have option to set cores and ram to 64 (for privileged users)
avoid windows OS
splitting servers can be an option (storage with message broker, web-interface, computing)

Data-Management

database (not a must)
- allows easy analysis of data
- immediate access is nice! real time or direct access for devs (even jupyterNB)
- needs lots of RAM
- easiest interface is a socket with json-interface ⇾ inefficient for embedded systems, can often be avoided
- can produce downloadable data on the fly (at least csv)
hdf5 / raw-data
- needs more custom code
- analysis not available right away
database-choices
- influxDB allows nanosec-timestamps as a series
- postgreSQL works for weeks
- timescaleDB, inlux competitor, seems to scale better with large number of devices (>1000)
leave headroom, node-resources can easily become the bottleneck
- using third party libs is always a good idea, but can come with performance penalties
- usually compute-load can partially be moved to server
- there are often performance optimized forks of established libs around (if needed)
established ways / libs to stream data out
- protobuf (can be slow for python)
- rabbitMQ / kombu
- RPC
is privacy is feature? avoid personal data (OAuth, ActiveDirectory), make recorded data private (even only delete-access for web-admins)
grafana seems to have trouble with big datasets

Nodes

lockups ⇾ powercycle
- we (shepherd) may not have the chance to control POE
- there is an external watchdog on the shepherd-capelet
- there should also be a watchdog integrated in cpu (untested)
- until now i (ingmar) had no trouble with the nodes, software base seems solid
heat through sun, radiators, enclosure
- degrades cpu, storage
- BBone stays below 40 deg, case will support convectional cooling
avoid storage without wear-leveling
- sd-cards are not up for constant data-storage (and industrial SLC-Versions are expensive)
- large usb-sticks tend to have wear leveling ⇾ ours probably hasn’t
- BBone has eMMC for the linux image, support for wear leveling is unclear
- filesystems like F2FS have a software-based wear-leveling
- possible solution
  - keep results in ram (avoid local storage bottleneck) and stream to server
  - use usb-stick for local storage of energy-traces (more static data)
  - keep system read-only (only deactivate for updates)
  - have replacement BBones at hand (automate installation)
collect logs
- gives debugging-hints in events of unwanted behavior
- temperature, ram-usage, cpu-usage, …
target repurposing
- keep functionality general so it can be used for dev, testing, …
- shepherds nRF-Target already has more gpio on edges and optional usb-port
alternative platforms (copy from 11_concept_hw)
- zynq (arm-cores + FPGA + shared mem)
  - pro: similar price as BBAI, 1 GBE, FPGA in same Package, hw-timestamping
  - con: xilinx-toolchain, documentation is overwhelming, community small, long dev-cycle
- embedded amd platform (v2000)
  - pro: compute-power, relatively cheap (>= 200€), fast ethernet, x86-64
  - con: real-time not out of the box, low gpio-count
slow software controlled GPIO could be a disadvantage
- sensor and actuator control can have tight timing constraints
- BBone PRU can also take control over gpio (but this is static / boot time decision)
observer should always have control over target-reset
- shepherd can control soft-reset over jtag/swd
- shepherd can cut power and power-cycle target
- add resistor bridge just for safety

Web

flask-framework is sufficient (same with bigger brother django)
user-management should also include groups (to share data-pools)
experiment-scheduler could be done with rabbitMQ (message broker)

Testbed

secure against miss-use
- nodes can vanish, wander
  - fixate the boxes :)
  - make them unobtrusive (glue below desk, non-transparent-case)
  - BUT be transparent about the function
- user-scripts can get system-access (heavy cpu-load, buffer-overflow, damage linux-partition)
  - put in sandbox
  - limit py-code to shepherd-framework / no other libs
  - run as special user - only essential permissions
  - set to lowest cpu-priority (nice-level)
- voltage source could destroy hardware
  - gpio- and supply-voltage for shepherd-target are always linked
  - target-pcb has over-voltage-protection
  - experiment-management will check voltage of emulated regulator to match target-constraints
- sanity-check everything!
- real world testing required
maintenance / development can easily occupy 1P full time
documentation for the next devs

TODO

try to design low maintenance, multipurpose, high functionality / speed / quality
Filesystem
- f2fs for usb-sticks
- find read-only-switch for system partition
is there an easy way to integrate a fresh BBone into the system?
test target-programming with current shepherd design
- pyOCD could be an alternative
BBone
- is cpu-usage really 70% during emulation?
- do a performance profiling, find bottlenecks
- raw data could be sent server, less overhead for BB