So I have begun work on a project I call Dusk. Dusk is a system that allows you to manage multiple OpenWRT devices from one place. The goal of this project is to create a way to be able use use the Luci web interface to set and monitor lots of OpenWRT devices. In the architecture there is a configurable master node and then the rest of the devices are peers. You can make any device the master and there will even be a role called backupMaster for redundancy.
Right now there is zero code written as I am still coming up with the concept. I have finished a lot of the details but there is still some critical components that need to be engineered. My goal with this project is to make a system that is simple yet powerful. I want to avoid complexity so I am getting all the details straightened out ahead of time. It will be written in Rust and hopefully be small and light enough to be used in a lot of different places.
Currently it has the following planned features:
- Centralized data store duplicated across nodes
- there is a change log with file hashes and change signatures
- Cycles that allow for continuous communication between master and nodes
- Master sends out start and nodes respond. (change data is optionally passed)
- Changes are only accepted if they are properly signed. Additionally, data is encrypted in transit
- Users are implemented with a key that is encrypted with a password
- Sections can only be changed if they are signed by the proper key
- Each device has its own key that authenticates it on the network. That key is used by the device to update its current status in the device’s portion of the central datastore
- There is a backup recovery phrase that is a special user key that serves as a “break glass” recovery method. Without the proper signing keys nodes will refuse to accept changes thus the humans will be locked out. The phrase is intended to provide a way to regain access so that you don’t have to reimage all devices
What still needs to be figured out
- transport level protocols
- TCP
- This would mean that the master has to sustain a bunch of TCP connections. Not ideal for a embedded device.
- TCP by definition is two way which means that the master code would likely not be a separate part
- UDP (multicast)
- UDP means that data could be dropped which would cause issues
- The benefit would be that it has a lot less overhead. The master could send out a multicast and then the nodes would respond with simple UDP. This would scale nicely but it isn’t ideal. I like the idea of UDP but there are serious risks with reliability. I also don’t know multicast programming
- TCP
- Node reonlining
- If a node is offline for a long time it will need to be brought up to speed. No idea how to do that right now (I have the start of a few ideas)
- Node joining
- I could just have an authorized user login but that would be fairly tedious. Ideally there should be joining keys that can be provided as part of a deployment image (device boots up and joins) The problem I can forsee is accidental joinings (hard factory reset) that lead to lots of ghost devices.
- In short joining is easy but automating in securely is hard
- Applying Policies
- There will need to be a way to set groups and policies. This is long down the road
- Device transport layer
- Maybe in the far, far future I ould work on a raw transport between devices. This would provide a layer 3 network nodes could use to communicate. It wouldn’t be real time as the data would first go though the master. The idea is that it could be used by systems like DAWN and usteer
In short this is a complex project that probably will take some time. (assuming I don’t get bored) I posted this as I though it might be interesting to somebody.
Relevant docs: