What is that Opensource code REALLY doing - and how do you know?

J0K3RSW1LD3 · November 16, 2016, 2:02am

This is a issue I frequently find myself discussing and I would like to get input from the community - especially Tin Foil hat types and maybe even @wendell ? It comes up frequently in passing, but I couldn't find a broad overall discussion thread on it.

How do we know that Opensource code is any safer than corporate code? So far my best answer is "You don't really, but I know Windows 10 is out to get me."

I'm interested in both preventative actions that can be taken by the user to verify software as well as the different ways you all answer it when someone asks you this question.

A successful checksum simply indicates that it hasn't been tampered with, but do you know what all it was originally designed to do?

I have heard of a few tricks, such as deploying behind a firewall that is blocking all traffic and watching what all is trying to get outside of the network, but that only tells me so much about what is going on under the hood of my machine.

We rely on community peer review for major repos and such, but who is to say that a whole project team isn't building in vulnerabilities or back doors? It may even be well intentioned, though still exploitable.

Thoughts, best practices, and past experiences are all welcome :)
Thanks!

Tex · November 16, 2016, 2:27am

Peer review is the simple answere. Do things sometimes get past? Yes (eg heartbleed in openssl. But this system is is much better. Open Source isn't perfect it is just much better.

delatjua · November 16, 2016, 2:28am

The easiest answer is a 2 parts:

Open source projects with a big community, has a lot of people looking into the code. It gets a lot of peer reviews. The more people involved, more chances to find a bug, security hole, etc.
It is possible for a project to be small and not get enough peer reviews to find all the bugs and security holes. That is an option. But in reality is not that frequent. Those who open their sources to third parties, are inviting people to look at it. If you have bad intentions from the get go you will try to hide it. But lets assume that it happens, either because of human error or bad intentions. It will still be more secure, because when audited, since the code is opensource, people will be able to find and fix/report those problems.

The biggest security problems that came from open source code, where from projects where there were a few developers maintaining the code, or some abandoned code that hasn't been updated in a while, etc.

Regarding best practices, learn to code/understand code and review it yourself? In the more down to earth side of things, look for projects that have big communities, that are frequently audited, etc..

Edit: as @Tex said,

J0K3RSW1LD3 · November 16, 2016, 2:43am

That is certainly good advice right here, I was actually just over on the "Intro to python" thread learning some basics based on this very train of thought :)

And to your other point, I suppose transparency to a larger audience does improve the overall peer review quality. I have heard of "Binary blobs" being in code though, and understand this is a tough area to make transparent/review? Admittedly my knowledge is limited on exactly what that is. I think it is "Pre-compiled" low level stuff, like drivers - is that correct? If so how would that typically be reviewed? I'm self/google taught so sorry if I sound a little dumb asking these questions :)

I guess this loops back into @Tex 's response that it's not perfect, just better.

delatjua · November 16, 2016, 2:57am

If it's open source, the code will be somewhere even if the final product is distributed as pre-compiled. But for those cases where the code is not distributed, the same rule as closed code applies. Reverse engineering. Take the input, take the output, and try to figure out what it does. I'm not a coder (not anymore at least). I started as a software developer a long time ago (last 12 years I've been a sysadmin/devops engineer) and now I can read code from different languages, but to code that's another story. One of those languages is assembly. Back then, my first experiences with cracking where to take a executable file, decompile it to assembly, and understand that...

Eden · November 16, 2016, 3:17am

Keep in mind that by corporate code im guessing you mean proprietary? As plenty of open free software code is corporate made code.

This isn't exactly correct. A checksum will verify the integrity of the data, by that i mean that you can verify what you downloaded is the same as what the site that had the checksum said it would be.

That doesnt verify it hasn't been tampered with, it could have been tampered with and the checksum changed.

How do you verify that what you downloaded is legitimate? You need to check it against a signature of the owner (e.g. the authors pgp key).

The way Fedora does this is they sign the checksums with their private key.
You download the checksum and verify it with Fedoras public key, this will show that the checksum was signed by Fedora and Fedora only.
You now know the checksum is legitimate and valid.
You can then use the checksum to verify the integrity of the ISO download.

In doing this you gain a couple of things:

You have proven who the download came from
You have verified that it is un-tampered and complete.

https://getfedora.org/verify

This is how it should be done. But not all distributions do this.

Essentially the same process is done for packages of your OS, they are verified with the pgp keys of the packages/Fedora project, that way you know they were packaged by trusted sources.

Debian goes one further with packaging and has begun implementation of reproducible builds. This essentially means you can build the .deb package from source code and produce an identical .deb allowing you to independently verify no extra code was added during the build.

There are a good few distros where there are a number of checks and balances done, and your able to see the whole process, something that can't be said for Windows or even some other distros.

community peer review,
company peer review,
independent audits (the EU is conducting an audit on Apache among other things)

Open code is essential in my opinion, but not that its just open, that it ensures you as an end user have specific freedoms (talking about the 4 freedoms)

The freedom to run the program as you wish, for any purpose (freedom 0).

The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.

The freedom to redistribute copies so you can help your neighbor (freedom 2).

The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

There's a reason these have been standing the test of time, as long as they stand you along with everyone else will always have the ability to look at the code and if you can look at the code you can check it works as intended.

There's a good video from stallman where he mentions this, because its almost herd immunity. Everyone needs to have those freedoms, but not everyone needs to enact them, you don't all need to know how to code, only enough need to know.

That's where its power lies, the fact that anyone can look at any code and find bugs, make fixes, add features, find vulnerabilities, is a good thing.

"The bad guys" will always find vulnerabilities regardless if the code is open or closed. At least with being open, anyone else can also find those vulnerabilities and report them and fix them.

But remember that code being open is much much more than just the ability to fix bugs, or its potential security benefits, it is at its core about giving and protecting certain freedoms for the users of that code.

(hope that makes sense, im very tired)

Yockanookany · November 16, 2016, 3:35am

If you're the ultra paranoid type, you're best to build your own OS I suppose. The answer is trust.

You can also firewall all the things then white list your way from there. That'll keep your OS from "calling home."

wendell · November 16, 2016, 3:57am

In addition to what @Eden and others have said don't forget many projects have undergone formal audits by third parties. Sometimes foreign governments and the like. This is what is responsible for many fixes we have had lately. And some missteps e.g. with the openssl libraries .. steps have been taken to ensure more and better code review.

There have been some clandestine attempts to slip in code here and there. Most have failed though a few have succeeded I would argue by being defective by design .

One such attempt to secretly co opt the Linux kernel was a single character change in the source but Linus caught it.

Spooky, eh? :)

J0K3RSW1LD3 · November 16, 2016, 3:59am

@Eden Thank you for the very thorough and comprehensive reply.

O_O time to reinstall. Everything. I never realized Fedora utilized this verification method, I'll certainly have to look into it more in depth as a distro. I will also look into this reproducible build concept of Debian, thanks for pointing me in this direction.

In regards to the 4 freedoms and herd immunity, I certainly think that there is something to this. Additionally, (to me at least) there is the social psychology principle of reciprocity in play. I picked up Ubuntu back in 2008 because I was in college, broke, and it was free. I'm still using Linux today - and try to contribute any way I can - because I've come to love it for these very freedoms. Any kid that can get their hands on some old hardware and some connectivity can learn, and even create, a lot on this platform. One of the ways I contribute is to tell people about it, and those repeated discussions are kind of what sparked this thread. Inevitably I get the response "Nothing in this world is free, if you are not paying for it then YOU are the product" ... and thus this thread was born :)

Thanks again for taking the time to respond, great post!

J0K3RSW1LD3 · November 16, 2016, 4:07am

@Wendell

This is very spooky indeed, and it wasn't until I heard this story that I really put on my tinfoil hat, stopped blindly trusting anything Linux, and started trying to learn how to verify my systems. It was a bit of a wake up call to know that something like that could happen to someone like him, and in his own dev environment. I'm certain that whoever did that originally is still trying ... and might have even succeeded already somewhere else.

Blunderbuss · November 16, 2016, 9:47am

Peer review system of Open Source has multiple weaknesses.

It foremost applies well to the more popular Open Source software. Software that is niche, or not considered critical will have fewer wary eyes on it.
The eyes on it may not have the resources (competence or time) to detect or deal with software weaknesses appropriately.
When you fix the problem and present a patch, but the patch may not be reviewed or accepted upstream within a reasonable amount of time, which may end up in you maintaining your own non-peer reviewed temporary branch.

If it is of paramount importance to you, you always should do a peer review at your company if you have the local resource to do it, unless there is a trusted third party formal audit, or a trusted third party to do it for you.

As mentioned and/or implied previously by others, high risk popular software such as OpenSSL, Linux Kernels, etc. usually enjoy plentiful peer review and audits. Hopefully they are also sufficient.

I believe also this to be related to the subject. Professionally I have overseen multiple introductions of third party open source code into our proprietary code (permissive licenses/linkages) and it ended up being handled according to the following three distinct strategies for our longest-running project under my supervision:
- OpenSSL, JDK, C/C++-standard libraries, Linux Kernel, etc., we trust them and we lack the resources to deal with them.
- We have incorporated a third party library into our own code, treating it the same as our own code (usually a minor complexity project, or a specific small part of another project).
- We have incorporated a third party library into our own code, but we diff every change (git pull) to the project manually into our local version, reviewing the code and documentation.

There is a lot of variance in quality of the Open Source code.

Apart from the specific subject of peer review weaknesses regarding safety and security, at work, we also highly value operational safety and operational availability. In simple words, we don't like bugs. We also acknowledge that our preferences, priorities, and needs, may not match those of the Open Source project maintainers (supplier incompatibilities, like they may want to take their project in an entirely different direction than we are going with ours). All these aspects taken together further contribute to us applying the above strategies.

Zoltan · November 16, 2016, 10:39am

The biggest forte of open source software is that developer performance is higher, because the software is visible to everyone and the developer's name stays attached to the code he's committed. The open source community is also very outspoken and pretty strict about the quality principles of open source development, and about code hygiene. Even open source code that is mainly developed by commercial outfits, immediately starts to suffer from quality loss. An example is Google: Google projects are based on open source code from other open source projects, and then they add their own projects, like chromium for instance. The problem is that Google developers cut corners, they will not take the time and study to work with the existing open source code from other developers that is the upstream of chromium, but they will change that code to fit it into the chromium code, which is downstream. That is why chromium and other commercial open source projects have been kicked out of many repositories for instance.
In practice, this system of strict open source principles and code hygiene works: for instance, the Fedora project kicked chromium out of their repos because Google was not respecting upstream and was not following the quality guidelines with regards to software packaging. Canonical on the other hand is very lenient with Google, in that they let a Google employee commit Google open source projects like chromium directly - without prior peer review - into the Ubuntu repositories. Now everybody knows that there is a serious quality difference between Fedora and Ubuntu. When a new release version of Ubuntu comes out, even an LTS release, its probably less stable and contains more buges than Fedora Rawhide on a bad day. A Fedora beta is more stable than a development release from Ubuntu ever gets during the entire release cycle. It's up to the user to make a wise choice.

A really high quality of code can be achieved by using a long term tested high quality code base, for example Debian Stable branch, which doesn't get much upgrades at all, basically only security patches, and compile it yourself locally based on strict needs. There is very little compromise in such a solution: you don't have to trust other people, you can keep an overview without making code review into a full time job, and it's a functional solution as long as the highest level of performance and compatibility with modern technology are not required.

Other solutions go for a strong safety net, like Fedora with SELinux, or OpenSuSE with the combination of custom user hardening through Yast, custom user software packaging through SuSEStudio, and a safety net through AppArmor. You can achieve a very high level of security with code that is not 100% perfect and code that might be largely untested and unproven because it's super bleeding edge.

Anyway you put it, these solutions are ONLY available in open source software. Comparison to proprietary software solutions isn't even possible because proprietary software doesn't even consider this code quality assurance level and these options.

J0K3RSW1LD3 · November 16, 2016, 11:52pm

@Blunderbuss
Thanks for bringing your professional experience to the conversation. To me, having not yet lived in the professional software development space, I've always guessed that professional dev companies have some code that they create internally and then they have resources that they tap for some of their codes more basic functionality. To an outsider looking in, your 3 tiered approach sounds like an excellent model - particularly for managing a team.

Basically dont mess with what your not equipped to handle, leverage pre-existing and high-quality resources to achieve your project goals quickly, and deep dive what is left over. I imagine that model is also easily scaled to a tiered funnel workflow if you have a large team of varying skill levels that you are trying to manage and don't want your product to hit the news as the target of the next big hack. If you don't mind me asking, what is the average size of the project teams you have supervised, and do you think that team sizes have played any role in your projects eventually gravitating to this 3 tiered approach?

J0K3RSW1LD3 · November 16, 2016, 11:53pm

@Zoltan
It is interesting to see the way the transparency of code effects the quality of the product, particularly in regards to the individual placing thier name on the code as an individual. It's almost like it instills more pride than just doing something as your 9-5 day job. And while it is true that the community seems very outspoken, I have sometimes seen some of these discussions - once again as a rookie looking in - and wondered who is right in thier opinions of code hygiene? From my perspective it seems that the same sense of pride that creates very high quality code can frequently escalate into core disagreements on "what right looks like." This, in turn, fragments the community into many different distros and ultimately undercuts the market share of Linux as a whole due to internal competition. I sometimes wonder if this truly makes the end product "better"? The bright side of all of this is that, with such a vast variety, it creates an outstanding real world proof of concept for competing viewpoints - such as you mentioned with your stability/vulnerability comparison of Fedora/Suse and Ubuntu.

Zoltan · November 17, 2016, 12:05am

There are rules in open source software development. There are general rules like "never break upstream" and "don't shift problems to userspace", but there are also rules that pertain to a certain distro or a certain packaging system. For instance, the requirements of packaging for RPM distros are quite stringent. These were made after years of experience by a well informed community, and they are not open for discussion. They are part of the quality aspect of RPM distros. Part of the success of SuSEStudio lies in the fact that SuSE, as RPM distro, can offer DEB packages quite easily, for the packaging requirements are much less stringent, yet people that are looking for DEB packages, know that they can trust the quality of the DEB packages in SuSEStudio because it's an RPM distro.
It's not like there is a lack of organisation or structure in open source software development. There is a strict hierarchy, which is also normal given the fact that code stays attributed to the creator thereof, whatever happens with that code at a later stage. GPL-licensed does not mean unlicensed.
Maintainers also keep a tight grip on code pools in projects. Github definitely was a big help in the process of organising the huge code pools of open source projects as open source development became so incredibly huge. The open source code pool is many many many time larger than any closed source pool, and there are many many many more maintainers and developers and contributors. Without rules and a good versioning and communications system, this would not be manageable. But all the required infrastructure is there, mainly because of the huge support from the enterprise world, government agencies, militaries, law enforcement, etc for open source software.

Many of the biggest members and contributing companies to open source, like Intel, Microsoft, Samsung, etc... have an extreme interest in open source software and are spending billions on it, but they have no interest in spreading open source software amongst the consumers, because they exploit the fact that - at the consumer level - the benefit of open source is largely undiscovered, and they want to keep it that way.

Hammerhead_Corvette · November 17, 2016, 12:09am

Most recent Telegram & Signal... With interesting results.

turin231 · November 17, 2016, 1:15am

I heard about Signal. How were the results for Telegram?

Hammerhead_Corvette · November 17, 2016, 1:23am

They passed but there were scepticism...

https://eprint.iacr.org/2016/1013.pdf

Blunderbuss · November 17, 2016, 11:12pm

Both the team sizes, the individual skills, and also the company culture have contributed major to developing this approach. We mainly and ideally work in small ad hoc teams, 4 or less lucid self-organizing individuals who recognize what they need and when they need it, and know each other well, and know to rely on each other, compensate, and distribute the burden. Communication is very efficient both within the team and outside.

Each project is started with one of us becoming a supervisor for it - I am a supervisor of several products and responsible for the properties and correctness of those products and selecting methods applied to developing and maintaining them. I can always call in for help, I can always be called on for help. The same applies to all of us - sometimes I work within my own parameters, sometimes I work within someone else's parameters all withing the same day, and those parameters are per project. A project supervisor vets everything that enters their home base.

As a result, our proprietary code for each product is managed very much like an Open Source project. Perhaps like a less popular one, with very few peers reviewing it, but with a guarantee that it will be properly reviewed by those peers, and also some specific external stakeholders.

The long running project I've mentioned simply has the property of having become honed through our skills and biases, and tempered by measuring and adjusting the results over time. I think of software development in general as a long run, (and particularly with that project which has turned 7 years old now, and still growing), so I have become inclined to apply the mass manufacturing principle to it - you cut ten times, then measure each cut and adjust your posture, ....cut ten times again, measure..., ... cut a hundred times, measure... cut a hundred times a few thousand times... you realize this is as good as it gets. You have just made both the quality of each cut, and the time it will take, very measurable, and very predictable. So it makes more sense to treat as much of the code as uniformly as possible. Downside is, anyone who wants to play with it needs to learn the necessary ten ways to cut, and needs to apply them thoroughly. Upside is, there is only ten ways to cut, and no need to polish. Presumably, anyone with sufficient cognitive ability can learn to do it efficiently. It can then also be further refactored easily, and the worst of the cutting and measuring can then be automated.

A project with a shorter life expectancy would have been treated very differently. A project with more people would probably have a more formal description of the process and a couple of manuals, but I don't think it should be treated any differently with the other parameters remaining the same. However, having more people in different company cultures and business models, it could just be more efficient having more people on it - optimizing the development parameters the way we did it could be more difficult to achieve (possibly inefficient to pursue) in a larger group who must be inclined to adapt to it, or in a group where the level of competence was (or was expected to be) more variable.

That is exactly what happens. If not to you, then to your customer.