Cache busting - What's the 'correct' way?

Here’s one for the web developers out there.

I’m working on putting the Devember project back up in its new+improved MediaWiki form among other new updates, but am back at the same ‘problem’ I had three months ago. I have a domain, winterstorm.com, at which I’ll be using both home.winterstorm.com and wiki.winterstorm.com (not up yet, currently parked) for the homepage and MediaWiki respectively. Got the DNS ready and working for this; the homepage/landing page is just a small static HTML site, hosted on Apache HTTP under Arch.

In Devember, I realized that when I went to work and tried to update my page remotely on Linode at the time, I wasn’t seeing any changes - that’s when I learned about all the layers of caching that I had to bust in order for that update to actually make its way from the server to me.

My solution at the time was to just update the root directory by appending the current date and so the landing page would change from www.winterstorm.com/20211231/index.html to www.winterstorm.com/20220101/index.html etc every time I made a change. Yeah, it was a kludge but I’m a hardware person primarily and it’s the best I could come up with at the time. :man_shrugging:

I tried every browser hack and every .htaccess item I could find, but nothing reliably did what I’m looking for.

TL;DR: What is the ‘correct’ way to break web caching and force a page which is regularly updated to fetch every time it’s refreshed/reloaded?

I don’t know the correct answer, but you may want to investigate TTL (no, not the transistor version :stuck_out_tongue: ). Time To Live is an indication for how long data is valid and if it expires, it should be refreshed by reloading the page. I think you can handle that server-side in a config somewhere. Syntax and exact location depends on what server engine you’re running, obviously.

HTH!

1 Like

For scrips and assets and stuff you can just add a query param. Something like:

https://domain.tld/path/to/image.png?v=1.01

If your application isn’t looking for it, then the param doesn’t do anything but bust the cache because the request is treated as something new that’s never been seen before. I’ve had to do that before when working with overly sticky CDNs which refuse to give up their cached copies of stuff any other way.

If it’s “correct” or not really depends on what effect clearing a cache will have on the site. If it’s going to tank performance, then you probably shouldn’t do it. If not, then nobody cares.

There’s a whole other set of tricks for clearing caches while the site is under significant load mostly centered around avoiding the stampeding/thundering herd problem.

2 Likes

Advice taken, thanks.

I just accidentally ran into something that might come in handy too:

image

I’ve spent the morning throwing GoDaddy overboard and moving to CloudFlare for reasons unrelated to the original question, and in the course of setting everything back up I noticed the ‘Purge Cache’ button which sounds a lot like it’ll take care of the CDN side of things and I can handle the browser cache on my end.

At the office right now, so I’m away from the server and can’t really check how it works, will play around with it more tonight.

Nope, that’s the DNS cache, scratch that idea. Looks like the actual resource caching tools are only available on CloudFlare Pro plans.

Shift refresh in browser while you’re developing works well enough usually.

Basically you can set the Cache-Control header server side, and there’s some additional nuances to that.


Version skew is complicated, If-Modified-Since headers help with that.

That is actually not a good way to do this. Some cache servers don’t cache stuff when they see a query parameter, so this might actually have the opposite effect.

The best practice from what I know is still using a “fake” filename like https://domain.tld/path/to/image.12345.png and redirecting those to the version-less files via .htaccess or the corresponding nginx directive.

I’d also recommend anyone to use html5boilerplate, they also have server configs for various servers. It’s essentially a collection of best practices from webdev veterans and various pretty-well-known figures.

Regarding the cache busting, I’d suggest reading through this:

Check the very bottom for the actual cache busting.

I agree it falls into the “quick and dirty” category, and you’ll get no end of fervor and vitriol from the SEO folks when talking about query params in general. They’ve just got no respect for the little guys. Except for UTM params. They need those. :grinning:

If you’ve got direct access to cache control headers or tools made specifically for management of these caches, then by all means use them. If not, then you’ve got to do the best you can, because nobody really cares why a stuck image is stuck. They just want it unstuck.