Dusk: A project to create a simplified decentralized management system for OpenWRT and simular systems

So I have begun work on a project I call Dusk. Dusk is a system that allows you to manage multiple OpenWRT devices from one place. The goal of this project is to create a way to be able use use the Luci web interface to set and monitor lots of OpenWRT devices. In the architecture there is a configurable master node and then the rest of the devices are peers. You can make any device the master and there will even be a role called backupMaster for redundancy.

Right now there is zero code written as I am still coming up with the concept. I have finished a lot of the details but there is still some critical components that need to be engineered. My goal with this project is to make a system that is simple yet powerful. I want to avoid complexity so I am getting all the details straightened out ahead of time. It will be written in Rust and hopefully be small and light enough to be used in a lot of different places.

Currently it has the following planned features:

  • Centralized data store duplicated across nodes
    • there is a change log with file hashes and change signatures
  • Cycles that allow for continuous communication between master and nodes
    • Master sends out start and nodes respond. (change data is optionally passed)
  • Changes are only accepted if they are properly signed. Additionally, data is encrypted in transit
    • Users are implemented with a key that is encrypted with a password
    • Sections can only be changed if they are signed by the proper key
    • Each device has its own key that authenticates it on the network. That key is used by the device to update its current status in the device’s portion of the central datastore
  • There is a backup recovery phrase that is a special user key that serves as a “break glass” recovery method. Without the proper signing keys nodes will refuse to accept changes thus the humans will be locked out. The phrase is intended to provide a way to regain access so that you don’t have to reimage all devices

What still needs to be figured out

  • transport level protocols
    • TCP
      • This would mean that the master has to sustain a bunch of TCP connections. Not ideal for a embedded device.
      • TCP by definition is two way which means that the master code would likely not be a separate part
    • UDP (multicast)
      • UDP means that data could be dropped which would cause issues
      • The benefit would be that it has a lot less overhead. The master could send out a multicast and then the nodes would respond with simple UDP. This would scale nicely but it isn’t ideal. I like the idea of UDP but there are serious risks with reliability. I also don’t know multicast programming
  • Node reonlining
    • If a node is offline for a long time it will need to be brought up to speed. No idea how to do that right now (I have the start of a few ideas)
  • Node joining
    • I could just have an authorized user login but that would be fairly tedious. Ideally there should be joining keys that can be provided as part of a deployment image (device boots up and joins) The problem I can forsee is accidental joinings (hard factory reset) that lead to lots of ghost devices.
    • In short joining is easy but automating in securely is hard
  • Applying Policies
    • There will need to be a way to set groups and policies. This is long down the road
  • Device transport layer
  • Maybe in the far, far future I ould work on a raw transport between devices. This would provide a layer 3 network nodes could use to communicate. It wouldn’t be real time as the data would first go though the master. The idea is that it could be used by systems like DAWN and usteer

In short this is a complex project that probably will take some time. (assuming I don’t get bored) I posted this as I though it might be interesting to somebody.

Relevant docs:

@Darin755 I am very interested in your project and would like to offer my help as a tester. I would use TCP as the transport protocol for your project. Since you aren’t familiar with multicast programming, it would be better to use TCP instead of UDP. You can always change the protocol to UDP at a later date, or someone else who knows multicasting programming can fork your project. I have learned from managing my company programmers that if they wait until they have everything figured out to start the project, the project will never get done.

TCP simply isn’t going to scale the way I want. I want the master to work on a potato but keeping lots of TCP connections open will eat though resources and eat bandwidth.

I am looking into either multicast or broadcast with replies being UDP. I also need to figure out some way to fast forward nodes that get disconnected or powered off for a while. Ideally the master should only work in broadcast/multicast but that is tricky when you need to sync a out of date node with the network. I can make UDP work a little like TCP but it doesn’t work the other way around. UDP has way less overhead as it is stateless meaning I can just process the packets as the come in instead of worrying about TCP streams. Theoretically a cheap network device could handle thousands of UDP packets without issue. The key I need is a simple algorithm for only sending needed changes. A json patch is unlikely to fit in a single packet so it would need to be broken down and kept track of. My current idea is to have nodes send what data they need via the response and then have the master include that data on the start of a cycle. I am also looking at node to node communication outside of the master but that gets tricky as you need to account for the physical layer and latency between nodes.

In short this is very much a long term project

1 Like