Instead of trusting people you could place trust in something else.
You’ve no doubt seen movies where a kidnapped person is placed next to a television that is displaying a live broadcast, then both are filmed as proof that the kidnapped person was still alive at the time the broadcast went to air. An even older version is taking a photograph of a person holding up a newspaper in front of them. In the latter case, the date stamp on the masthead provides the trust mechanism. In the former case it’s the live broadcast. Both are out of control of the kidnapper, so both are something that can be trusted by whoever gets the ransom demand.
Depending on how important the file is, and how frequently you need to generate such files, a similar approach could be used. Instead of placing trust in people, place it in space and time.
Take a file and hash it. Print the hash onto a transparency sheet. Stick it to the window of your car. Drive your car through public streets (e.g. areas visited by tourists) or by a public event (e.g. a new year’s eve fireworks display) or a large-scale construction project (e.g. bridge). Take a few minutes of video from inside the car, through the transparency sheet with the hash, of the scene beyond. Bonus points if a laptop is performing text-to-speech on the hash at the same time. Bundle the video with the hash and the file into an archive and store/send that.
Public places/events/constructions are unique in time and space. You cannot “go back in time” and overlay a forgery of a physical/audible hash onto live video of such public spaces without the fake being obvious to even an amateur observer.
The reason why is simple: Each letter of the hash will obscure the public space beyond. The letter G could completely hide a person, for example, or half a car, a pram, or even a skyscraper. To substitute a different letter/hash means you would have to “make up” all of the obscured scenery that has now been revealed. That is computationally impossible without leaving artefacts that an average human eye would quickly and easily detect.
Current — state-of-the-art — algorithms can interpolate frames and dynamically predict and reconstruct backgrounds — but only if the occluded area is relatively small, geometrically simple, and frame deltas are minimal (the scenery isn’t moving very fast). A high resolution camera, filming at a low frame rate (say 1 frame per second), in a moving vehicle, easily undermines the interpolation process and makes artefacts inevitable and obvious.
So, not exactly ‘clickety-click’ easy — but doable… and you don’t need to trust any humans. In essence, all you are doing is recording the hash and a unique point in space and time — that is beyond your control and cannot be recreated — simultaneously. That preserves, validates and dates the hash. The rest is easy.