Finding a way to test security of high information entropy data

Hi!

I’m working on a school project where we are supposed to implement some sort of iot solution. I had an idea of a distributed network of devices with sensors that can gather data with high noise (or entropy). The thing is i whant to find a way to prove that the data that is gathered has high entropy and can be used to seed CSPRNGs.

I was looking into a few papers on this topic such as NIST Test suite and NYU,MIT,BU

My idea for this comes from wikipedia and im not sure that it’s true that embedded devices have a hard time gathering entropy but i thought, hey maybe i could implement a network that generates entropy for the devices instead. The idea is that they could then fetch n nr of high entropy bits from some service idk if it’s a good idea but i thought it would be fun atleast

Hi, can you ELI5
on the entropy thing?
My understanding, from an outside looking in, is that you have a source, like a noise, which has like a pattern, think like dips and troughs of sound waves, right? But if the source sound is repetitive, then it is a set pattern, right?
So if the source is a metronome, the sound has a pattern, and so is not useful for generating entropy? I could be wrong here.
But if it was a camera watching a road, the data would have differences of shapes, colours, speed, and gaps of passing vehicles, so the system might take a long sample, and have a reasonable amount of variance between frames?

Obviously one would want to take a measurement for micro seconds, not minutes/hours, so the analogy is just to show variety?

If the analogy kinda stands, the the distributed network would need a source of data, from which to measure variance, and generate entropy?
So what would the devices have as a source/sensor, and would they be in environments that generate plentiful amounts of data at all times; no use measuring light levels in a sensor placed in a cupboard, or sound levels in a deserted field?

Unless Entropy is an actual measurable, sense-able thing, and not the product of analysing something else for variance/errors?

what you are saying is correct if there is a pattern to the noise there isn’t much entropy.

For example if i would as you said use a light sensor there would be a clear pattern to the noise (higher values during day, lower at night).

There is an example in this video: https://www.youtube.com/watch?v=_PG-jJKB_do&t=535s where she discusses the shanon entropy of the english language, since there are so many rules each letter doesn’t give the same amount of enropy. Entropy in this case just means how many guesses we need to make in order to find the answer. coin toss would be 1 since there are only 2 outcomes.

what i’m thinking is that i use sensors such as a microphone, air quality sensor, camera and so on (the more the marrier) to generate data in a pool which then gets scrambled using something like /dev/random or random.c uses (Twisted GFSR Generator + crc) while also seeding with some other information such as time between reads in microseconds. This would probably be fairly secure. However where my idea shines (in my opinion) is the distributed aspect, generating data from one source is one thing but then add 10 or more sources from around a country for example and then compiling it into a larger pool or maybe even multiple pools. The thing i whant to do is prove that it would be difficult/impossible to recreate these pools even if you had knowledge of how the data was gathered.

I think the lava lamps are in the lobby, but I suspect there is another entropy source locked away in a back room.

https://www.cloudflare.com/learning/ssl/lava-lamp-encryption/

1 Like