Free Downloadable RAM for Linux? Help!

The clickbait title got me here threading suspiciously through the posts. But looks like this is a true effort to make something different and I can appreciate the commitment.

Reading through everything so far brought up these question in my mind: how’s the impact on other system resources when running this application? In the side by side test you posted on your Youtube channel looks like the 2GB machine has more CPU overhead compared to the 4GB one not running BitFlux. I think the overhead scales in some form or another as the amount of RAM increases. Do you have a formula for that scaling?
How’s the impact on storage due to the constant logging of data needed? Will it be be possible, to accelerate and improve the AI workload using GPUs?

1 Like

All I saw was “we are adding telemetry to your Linux box” which defeats the purpose of running it for many people.

Linux memory management in my experience is fine. I do not see any of the issues described :man_shrugging:

1 Like

The product is not really aimed at me, but I think I get it. I don’t know how much cacheing and slop/over provisioning in linux there is, but I’m sure it is measurable…

Personally I don’t have many memory worries, but I just dumped too much ram at my system, so I would have plenty.
(4G should be fine, 8G to run all apps; I use ZFS, so 16GB should be enough, but I have 32 to remove worry)

But, my all-in-one router/nas/server runs VM’s and containers.
Almost all of them are over provisioned, and wasteful of resources, because “just in case” like, the pihole doesn’t need 2GB of memory etc.
My wasted box is just a couple sticks of ram, already paid for. If I were more efficient, I might be able to run a bunch more machines on it, but don’t really need to.

What the dude’s product does, is allows one to min/max the memory of a machine, and like advertised, save costs by needing to deploy fewer servers, with smaller memory demands.

The app is a product, and allows remote monitoring, graphing, (possibly even over-writing each deployment’s learned limits?)
But these remote tools require data extraction.
And the product needs to be licensed, so checks need to be made one doesn’t buy once, use on all the machines.

Like, I am fine with paid open source products, they can be forked if price excessive. Simply working on Linux is rarer on desktop than server, but I would not be surprised if the guys could legit save companies real money with it.

I know systems can be slimmed down to run on potatoes, but this might legit be a way to run more slim, without having to specifically tune for each of a dozen different tasks a score of temporary VM’s might have been spun up for in some corporate cloud?

Title is clickbait tho. (Small boo&hiss)

3 Likes

Yes. But buffer/cache is a good thing, not a bad thing.

Nope. Quite the opposite there. On systems without SSDs performance gets much, much better after the system has been up and running for quite a while. After weeks and weeks running, it performs well, UNTIL a reboot flushes the cache.

Wow! Does anybody really do this? That’s got to be one incredibly messed-up Linux system. I keep mine running for YEARS at a time, unless power outages force otherwise.

We originally thought this would be easier to characterize. Sure the CPU utilization is more spiky in the 2GB, but it’s also more responsive, does the users work faster, because it blocks a little less. So doing an apples to apples comparison is kinda harder than expected. The primary way we’re tracking it, given our target market, is to take a workload with high CPU requirements and see if we negatively impact the workload, we’ve seen flat to positive in our testing.

Not yet, this is exactly the sort of feedback that is so helpful for the community. I mean now that seems obvious. It does scale mostly linearly on the amount of memory, but we also have limits built in and make it mostly single threaded. In practice the % of CPU required goes down though, because higher RAM machines tend to have more and better cores.

Pretty minimal for the logging. We don’t really log the data continuously, it’s more like taking a snapshot at the minute scale ends up being I think in 100KB/s range for average machines. The extra swap overhead on a good candidate is also similarly minimal once you hit steady state.

Yes. We’ve had this working but required a giant TensorFlow library to be packaged with the app. We switched to TFlite. Need to either work on shrinking the ‘full’ library or do the enablement on TFlite, which is on by default for Android I think so should work.

1 Like

@Trooper_ish you get it.

I was going for funny. Like downloadable RAM is a joke. Missed the mark apparently. Sorry about that.

Does this product help prevent process death via OOM?

How does this product work with OOM killer in the mix?

Absolutely. The idea is to preemptively shrink your workloads down below the watermarks that will trigger the OOM killer. At the moment we aren’t messing with any of the OOM daemon settings or anything. We do for our Android variant, but that’s another story/business model etc.

The big idea is to flip peoples relationship with memory. IF you knew how much memory was needed, you’d just size everything correctly as cost of doing business. OOM killer is only a thing because you have no idea how much memory you actually need, right?

Say you create a little nodejs app. You have almost no control over how much memory is used in the node engine, the libraries you import, the page cache mechanisms of the kernel etc. Add Docker, k8s, databases, proxies, when you scale that app it gets worse.

The kernel memory metrics tell you what is but are not very useful in saying what could be. We proactively shrink your workloads down to what they can be to remove that guesswork, so overtime you have a very accurate model of how much memory your workload needs.

1 Like

Thanks for the comprehensive answer!

So the better responsiveness due to the better RAM availability offsets the increased RAM usage, that’s nice.

Does it make a difference if the program is scaled to run on more threads, like reduced turnaround times, or the single thread choice is there just to reduce the overall impact on the CPU?

That keeps storage latency added to a minimum.

This statement is not clear, can you elaborate on this?

Indeed. So you’re planning on porting it to x86 to run on the CPU? Most smartphones SoC integrate AI and ML hardware engines to accelerate this workload.

This mostly. Also the client side app does ask the kernel stuff from /proc and /sys that requires low level locks, we didn’t think having every core potentially tied up was a good idea. But we never really seriously tested that. We didn’t need to speed that function up, and it’d be extra work to multithread that part.

Say you start up a big workload. As the workload starts using BitFlux will learn it’s usage patterns and start swapping. That’s going to result in a bit of a spike in swap at the beginning. But once the learning is done it shouldn’t need to do any swapping. That’s what I mean by steady-state.

A good candidate workload is just one with something to learn. If it’s truly random, then BitFlux will have more swapping. The problem is a lot of benchmarks are random, while real workloads are much more like a normal distribution or pareto distribution.

More the other way around, we have a x86 oriented build we ported to Android for some partners in the mobile space. I’m just more commenting that the Android built seemed to have the HW engines enabled for TFlite, where the x86 version doesn’t make it as obvious how to do that. Random interesting aside, TensorFlow support for integrating C++ is pretty poor, which given the library is written in C++ we found surprising. We might actually rewrite in Rust, the build support here for TF seems robust in my build experiments.

1 Like

How well does Bitflux work with cache-hungry applications? Is performance impacted? Things like ZFS, SQL, Redis, Memcache, stuff like that.

Also, what does the monetization strategy look like, and how would that telemetry call back to you guys? Im all for paying for a good product, just curious how the books balance out.

Have you guys considered using the native LRU eviction policy preemptively, instead of your AI model? I am curious about the difference in performance there.

I gotta say, this all looks really neat.

1 Like

We love cache hungry applications. As an example we did a collaboration with a large payment processor that uses MySQL, in the test we set up for them we went from 6 to 8 instances of MySQL per server and increased the number of transactions per second by something like 10%.

We’re not charging at the moment, we’re planning on charging a fee for the resources under management. So for this product it would be per GB. We don’t know what the rate should be yet. We’re thinking some sort of Freeium model like under 32GB is free. Maybe free for non-commercial use.

Telemetry goes back to our dashboard, we upload encrypted json files with the high level data. Does that answer the telemetry question?

We interact with the existing LRU eviction mechanisms already. Our kernel changes amount to an interface to allow userspace to add stuff the front of the list and tell it to go.

As far as the policy goes… The kernel is very general purpose and tries desperately to not take up resources. I fundamentally don’t think policy should be the kernels business anyway. Different usecases require different policies. For example this is already accepted as normal for power management. A laptop has different policy requirements than a phone, a lightswitch, a server, a mainframe, and a TV. They all run Linux. You don’t want a unified policy to handle all those situations and you don’t want to add a new power management driver in the kernel for each usecase either. We happen to think that most systems resources the kernel manages should be subject to that same logic.

The second point is that AI is better suited to this policy level game than biased human designed algorithms.

Thanks!

1 Like

That makes total sense. Given also the amount of data the program is working with it wouldn’t make much difference using more threads. If I had to make a wild guess I think making the program multi thread would mean wasting more time on context switching rather than making work.

So the default behaviour is “if the data has not been accessed for a given time it goes to SWAP” right? Once the algorithm has done it’s job the program will reduce the SWAP-iness.
How’s the learning going to work for tasks that spend less than 1 minute in RAM since that’s interval between snapshots? Will they always be thrown to SWAP?

Sure, real workloads are more predictable and that’s how they’re designed to work. Else even required specs to run them wouldn’t make sense, right?

I was referring to TFLite, but I got it now.

I was asking if TFLite would run on the CPU exactly for this reason. Most ARM SoC are equipped with ML/AI hardware so this workload would run basically on dedicated hardware on those SoC. While on a x86 would run on the CPU only. Maybe something like a Google Coral NPU could accelerate the workload on low spec machines this software is designed for.

Found out myself too. But I guess it’s a result of the direction ML is going towards, which is the use of Python.
Happy to see that you’re going for C++, that’s great sign of a thought out project.

Mostly I think so… We evolved the algorithm using ML rather than designed it, right?

If the workloads don’t persist long enough for the AI to have an opinion about it they just get left alone. So if you have a workload that spikes for 10s we don’t intervene. So we’re all about making sure you have the free memory space to enable the spikes to allocate. We will however clean up the caches more aggressively than a default system.

Okay yeah, they even try to optimize with specialized instructions where they can.

Exactly. I’ve thought about branching into IOT, I just haven’t figure the market/biz case for that yet.

Kinda surprising right?

@Jared_Hulbert

It’s pretty cool someone is looking at this problem space.

Have you considered any collaborations with any cloud providers? … Have you talked to anyone working in that space already, like honeycomb.io or nurelic about potentially bundling parts of your solution into their offering?

1 Like

Thanks!

We’re for sure building out a DataDog plugin.

We have not thought too seriously about some of the others. If anybody has any contacts at those places we should talk to DM me, we’d be game to talk.