Content credentials (C2PA) and privacy

danBhentschel · January 16, 2024, 12:09pm

I am a software developer employed by a company (Truepic) that is seeking to make media on the Internet more trustworthy. Two of my primary responsibilities are to interface with the C2PA working group and implement the specification in a closed-source library.

In a recent (January 12) news show, L1 briefly touched on C2PA technology and some valid privacy concerns that go along with that. In the comments on that video, I offered to have a dialogue on how some companies are choosing to address those concerns. I didn’t get any response, and perhaps that (YouTube comments) is not the best medium for such a discussion.

I’m opening up the topic here now, in case anyone is interested. My purpose is not to promote my company or the technology, but to:

Ensure that conversation on the topic is well informed.
Learn more about concerns from the community and perhaps brainstorm to determine if more can be done to address those concerns.

I will preface any such discussion with some caveats.
I am here without express permission from my employer, and I am not an official representative. My views do not necessarily represent the views of my employer. There are (pretty much always) things brewing in the C2PA community that I am not at liberty to disclose, and that fact might be reflected in some of my responses. Similarly, my company is constantly in discussion with multiple other companies about how content credentials can be added to their workflows. I can’t comment on any unannounced partnerships nor on the specific proprietary technologies in use.

If no one is interested in such a discourse, I am fine with that and will fade into the background again. But if you’re curious (or even if you just want to pick a fight) I’m opening the proverbial floor for discussion here.

regulareel · January 16, 2024, 1:40pm

I think the general concern is how do we use this tech without making our current dystopia worse.

Its not that the tech is bad, its kind of cool when you look at it in a technical level, that there is a verifiable fingerprint in the pictures taken from cameras.

When politics are involved and somehow seeks to subvert its content, how do we go forward better with less abuse on the tech and let someone be the artbiter of truthfulness?

danBhentschel · January 16, 2024, 3:57pm

Thank you for the comments. I appreciate the concerns and would like to try to address them, though there’s little information about specifically where those concerns stem from. So I’ll just try to talk, in general, about how the technology works, and see if maybe that lends direction to the discussion.

You mentioned that it is kind of cool that there is a verifiable fingerprint in the images. (Aside… I’ll move forward with the assumption that we’re talking about images, which is the most popular application right now, but the C2PA spec actually supports signing any set of digital data, from a simple text file to an entire file system of heterogeneous digital content.)

The term “fingerprint” is somewhat of an overloaded term. The current specification is designed as a layered stack of content hashes, the penultimate hash being a digital signature bound to a private / public key pair (PKI). The public key is delivered in a certificate that provides the identity of the signer, and thus provides context for the trust decision that must be made by the consumer of the image. If I am understanding correctly, I believe that this certificate (and the means of obtaining it) are the root of the caution surrounding this technology. Please correct me if I am wrong.

The way the specification is currently worded, there is no guidance about how a certificate is obtained, or how the consumer goes about determining whether or not the certificate (and thus the signer) is trustworthy. It is open to a spectrum of possibilities including:

I know you personally and send you a self-signed certificate that you can add to your personal trust list so that you can verify any images signed by me in the future.
Company X creates a verification software package with a built-in trust list that includes certificates for may reputable media publishing companies, and issues updates to their software package periodically when new companies are added.
Some central organization or cooperation maintains a universally accepted trust list behind a public API that can be accessed by any compliant software.

I will say that the industry, in general, has been very vocal recently in criticizing this lack of clarity in the specification, and it’s possible that there will be some level of guidance in the future about how a central trust list might be maintained, similar to how major browsers today ship with a list of trusted TLS/SSL certificate authorities.

I think I’ve also heard concerns that user activity could be tracked by matching up images that were signed with the same certificate. I can’t speak to all implementations, but I can tell you that the projects I’m involved with work very hard to avoid this type of correlation by providing new certificates (with new key pairs) to each camera on a daily basis, and this cadence could be tailored for different workflows. Avoiding this type of tracking was crucial to (for example) Project Providence, which used the technology to allow the documentation of damage in Ukraine. Allowing the irrefutable documentation without tracking the individual was of utmost importance in that application.

I also want to stress, in case it isn’t obvious, that the only “online” piece of this whole puzzle is the acquisition of a trusted certificate. Otherwise, the signing of an image and the verification of that signed image can both be done completely locally on client hardware, and don’t require any network connectivity, thus making those parts of the process inherently untraceable.

Hopping back to the term “fingerprint” for a bit, I mentioned that the current C2PA spec is, put simply, a set of hashes that are delivered along with the image. This is similar to how you can download a hash of a software package to verify that it hasn’t been tampered with. This hash can very easily be removed from an image, if desired, and is often inadvertently removed simply by posting a picture to a social media site, or sending it via text or email. While some might see this as a good thing, the general consensus is that this fragility is a liability of the technology, and is preventing it from wide adoption.

Two other technologies are being discussed in some circles that might provide for a stronger binding between the signature and the content. These are:

Fingerprinting - This tends to refer to using a perceptual hash to make a fuzzy, non-deterministic connection between an image and its content credentials, even when the signature has been (intentionally or inadvertently) stripped from the image. This works similarly to how reverse image searches work.
Watermarking - This is a digital code that is (usually) imperceptibly embedded in an image and can be read by (typically) proprietary software. Watermarks have a varying degree of robustness to image modifications, and can often be decoded even after various transforms have been applied. The data in the watermark could be used to retrieve content credentials if they have been misplaced.

While these technologies are more robust than C2PA alone, they also require internet access and infrastructure, and as such may be more susceptible to privacy concerns. I want to stress, though, that nothing formal has been done to link these technologies to C2PA. There might be some guidance or a sibling specification in the future, but as of right now, there are only tech demos and discussions.

I apologize for the long, rambling response. Hopefully there is some useful information in here. Please feel free to poke holes in my arguments. A lot of people have spent a lot of time debating these very concepts behind closed doors, but just because we think we have a solution that works well doesn’t mean that it’s infallible, or that there isn’t a better option.