Why do RSS 'readers' exist?

JCD · October 19, 2020, 12:26am

tl;dr: I can’t find a single RSS client designed to download and archive full webpages, and I don’t understand why.

I use RSS. I love RSS. I love the very idea of RSS. I get mad every time I see a “subscribe to my email newsletter” signup on a website instead of an RSS feed.

I love RSS, but I must be thinking about RSS in an entirely different way from most people, because I don’t how the (apparently) main use case (other than podcasting) is even a use case at all. Maybe someone here can help me understand (and maybe solve my problem!).

RSS is for recieving new things from a given source. It isn’t just that though. It exists to handle that problem in a particular way. It allows sources to be aggregated in a unified and centralized way - that is, you can parse, reorganize, and combine feeds.

That’s really cool. It lets us do things like listen to the latest of several different podcasts in one place.

The parsability of RSS feeds is obviously their core feature. What confuses me is what that parsability is being used for. With all the wonder of RSS we only seem to get RSS readers. (Note: I realize in the context of RSS “reader” is generally a synonym for “client.” I hope it will be clear what I mean from the rest of the post

What are the point of RSS readers? I don’t mean all types of RSS clients. There are some good applications that just exist to parse new links from feeds (and the data about those links/links), and open these links in whatever browser you want. I understand that- that’s just the core of what RSS is for.

What I don’t understand is the purpose of the “full” RSS readers. They seem generally to provide lots of extra little convenience features, without providing the one absolute foundational thing that any halfway competent developer would implement first - that is, if they were designing for any of the use cases that makes sense to me.

What RSS readers don’t do is what podcast clients (often) do. Many podcast clients include settings to automatically download the audio content of each episode. These are the only clients that I can see any point to.

If I’m sitting at a desktop (or even laptop) PC with a stable, high-speed internet connection, I can go to the webpage for whatever feed content I’m interested in. I can go to the page and download or stream the podcast at the time. The only reason to use a dedicated podcast app (for more than feed tracking) is to always keep the latest podcasts already downloaded to my device, “ready” in cases where I can’t or don’t want to download them when and where I am listening to them.

Lots of podcast client apps seem to share my intuition here. However, I have searched off and on for years to find a general purpose RSS client that is designed to account for this use case. I can’t find a single RSS client designed to download and archive full webpages. I don’t undersand why. I don’t understand why anyone would even sit down to write an RSS client app, and find themselves working on features like “favorites” for posts, built in browser or audio playing, etc., without first accounting for full HTML downloading of new posts in pages. It could just be integration with wget or something - but I haven’t been able to find any program that does this.

I must be missing something.

Does anyone know of a general purpose RSS client that supports automated full-file/webpage downloading?

Does anyone understand why this isn’t what people are always trying to do RSS?

soupvr · October 19, 2020, 2:52am

A lot of RSS feeds only include snippets. It takes another webpage parser to do the job of downloading the page. Pinboard may be the only service I know that will archive saved pages, but even that isn’t RSS specific.

imhigh.today · October 19, 2020, 3:18am

level1 · October 19, 2020, 3:22am

The absolute — Number 1 — reason that everyone who I know uses RSS is to be able to consume the actual information content without having to wade through all of the other distracting crap that you get on (modern) web pages themselves.

A typical news article is ~5 kB of actual textual information content. The average web page is ~4000 kB. That means (3995/4000=) ~99.9% of everything you download is pointless and irrelevant garbage.

Why on Earth would anyone possibly want to waste 99.9% of their bandwidth downloading full pages? It makes no sense at all (to me, at least).

RSS is an incredibly effective bandwidth-saver, bullshit filter, signal-to-noise-ratio amplifier, anti-tracking/privacy/security tool, etc., etc. — all while making news easier and more enjoyable to digest. You get all of that goodness precisely because readers/clients do not download full pages. That’s the whole point of Really Simple Syndication — it’s really simple. Simple is good. Simple works.

If you want Really Complex Syndication, with inbuilt Bandwidth Wasting, Tracking, and Gaping Security Holes, then you want a RCSBWTGSH reader, not a RSS reader.

Good luck with your search.

GigaBusterEXE · October 19, 2020, 3:35am

it was really relevant in the time dial up was common so websites didn’t have all the extra crap and ads back then

Whizdumb · October 19, 2020, 3:48am

I’d like this twice if i could

rcxb · October 19, 2020, 5:31am

Because that’s not what a “client” should do. A client should be lightweight, simple and fast. To convert RSS feed snippets into full content, you want a server or a service. Yahoo Pipes was very popular (but complex) until it closed down.

Other long-running services:
http://feedenlarger.com/
https://www.fivefilters.org/full-text-rss/
http://fullcontentrss.com/
https://www.freefullrss.com/

To do it yourself, you can install something like TT-RSS on Linux, or write your own with a tiny bit of python knowledge.

These days, a web page isn’t one file that can be downloaded… It’s loads of 3rd party javascript, style sheets, web socket requests, etc. It would be insane to shovel all the required complexity into a little client application. You might as well just use an RSS add-on in Firefox/Chrome on your phone.

mihawk90 · October 19, 2020, 6:34am

The RSS readers are doing their job in the way RSS was designed. It was designed to contain the entire article so that you could download the feed and get the entire article in the application and formatting of your choice.

But that is not how developers do RSS feeds these days. Most (especially commercial) websites provide a feed that only contains the headlines or with a bit of luck even a text snippet, but then only link to the full article instead of having the full article in the feed. The reasons are already listed above: Mostly advertising because with a full article in the feed that is obviously going away, essentially giving away your article for free (whereas with the website link the traffic to the website at least has a chance of generating ad revenue).
Some websites might have an optional payed tier that will include an RSS feed with the full article, but not every website has or does that.

In the end it comes down to the developers not implementing the full articles. It is neither the RSS standard’s nor the RSS reader’s fault. Downloading the entire website is simply out of scope for a typical RSS reader application, because as stated above it is just too complex for a simple reader. It might be frustrating, but that’s just how it is.

As for your comparison with Podcasts:
For a podcast the “full” content of the feed is the actual audio file, so it is already included in the feed. This is opposed to most website RSS feeds where the “full” content is not included, as described above. That is where your disconnection between the two lies.
In other words: Not including the full article in the feed is the equivalent of not including the link to the audio file in a podcast feed.

krzyzowiec · October 19, 2020, 11:50pm

Lots of podcast client apps seem to share my intuition here. However, I have searched off and on for years to find a general purpose RSS client that is designed to account for this use case. I can’t find a single RSS client designed to download and archive full webpages. I don’t undersand why. I don’t understand why anyone would even sit down to write an RSS client app, and find themselves working on features like “favorites” for posts, built in browser or audio playing, etc., without first accounting for full HTML downloading of new posts in pages. It could just be integration with wget or something - but I haven’t been able to find any program that does this.

I must be missing something.

The point of an RSS client is to get the raw data (the article) and display it to you in a nice format absent the usual distractions like ads or navigation features, etc, which come with a web page. An RSS feed doesn’t even include the information you are looking for. It is literally the text of the article in XML form.

It’s not any different than a podcasting client, you are just misunderstanding the form of the data the client is dealing with. It’s not receiving a full html page in the first place (which would be wasteful).

Why not just pin a website as a tab in your browser and archive the pages you want yourself? Or maybe get a web scraping tool that can watch a link and archive pages for you?

JCD · July 29, 2021, 6:47pm

Rebooting this post because I’m looking at the same problem again.

Of course, I got a lot of “This is dumb, why are you even interested in this?” responses before.

Responding in general terms:

Sure, it would be nice to filter out the extra ads and crap that comes with a modern webpage. It’s generally more important to have everything that you want that to not have anything that you don’t want, however.
As has been pointed out, contemporary RSS sites often only include snippets of articles. This is the other thing I hate about modern RSS, but I have a hackish solution for it. (Tools exist to follow those links and turn the main articles into a feed).
Yes, I know this is an extremely resource inefficient way of reading things. This is not about the best way of reading things. This is about archiving things.
To be clear, I want to download the full target of whatever the feed points to. If that is a stripped down version of an article - great! Many feeds do not provide properly cleaned up “usable” RSS feeds.

mihawk90 · July 30, 2021, 3:16am

If you want to do that, pipe the link the RSS points to into wget to download the entire page including assets. I’m sure there’s tutorials out there that do that.

If you want to go further and only want the text, you can also put this through lynx -dump.

rcxb · July 30, 2021, 4:19am

wget knows how to grab images, style sheets, etc., but it doesn’t speak javascript, so doesn’t work on a great many web pages these days.

If my previous list of RSS full article extractors wasn’t adequate, here’s some more:

https://github.com/AboutRSS/ALL-about-RSS#full-article-extractors

mihawk90 · July 30, 2021, 4:20am

Good point, depends on the websites I guess

The_DM_Barlow · July 30, 2021, 9:28am

I do a similar thing to what you’re wanting with the best stuff out of my RSS service.

Here’s what happens. I browse through FreshRSS (hosted on my home server), I find things that are interesting and read them. If I want to save them, I open the full webpage and copy it’s link. Then I open Shiori (hosted on the same server) and I add a bookmark to that link, with appropriate tags, and if I’m concerned that the content might change (it’s political, or maybe I’m dimly aware of that site having poor previous migrations, or I’m just paranoid), I can archive it by ticking a box.

Is it a long process? Sure. Is it annoying? No. I don’t read 80% of the stuff that comes up in my FreshRSS, I don’t care. Attention is limited. Of that 20%, I maybe use Shiori to save or bookmark another 20%, or 4% of the total. So it’s not common enough to be bothered by.

ThatGuyB · July 30, 2021, 11:24am

RSS doesn’t necessarily present the whole webpage. Most pages present just snippets of the page, in order to keep the RSS feed small (kinda like news titles). You then go visit the webpage and view the full article / site. Some people and software will fill the RSS feed with the whole website (except embedded stuff, like videos and images, usually they are presented as simple links in the RSS feed), to make it easier for people, so they don’t access the whole website. It’s a design philosophy, some people want people to come and open the site, to generate ad revenue or present “the whole experience,” others just want the information to be transmitted, or make it easier for their readers.

I believe newsboat can be scripted to download the whole webpage using wget / curl, similar to how you can use newsboat + youtube-dl to pre-download videos from youtube.

I feel you. Technically speaking, email subscriptions are “superior” insofar as you don’t have to constantly poll a resource every n minutes / hours to see if there’s anything new, you just receive an email every time something new comes up. What I don’t like about email is the privacy implication (and all the mail ads that are so inherently tied to email subscriptions, just because some people sell those lists, while others don’t know how to secure them and they get leaked).

RSS is just a protocol. What gets presented in it is entirely up to the administrator / maintainer / publisher etc.

This is just additional coding, RSS has nothing to do with it, other than just grabbing information for a source or multiple sources.

It could probably be because not many people know about RSS. I mentioned newsboat, it’s a CLI RSS client and being text-based, you can parse the text from it easily into other programs that can do the web download for you.

However, web 2.0 is absolute crap, even if you download the webpage, you most likely only download an empty template or at most a white page and some links and when you proceed to open the page, you still have to go to the website, due to all the javascript nonsense the “modern” web uses. So you can’t archive websites these days without your client also knowing how to process JS, like you could with basic html+css websites.

I realized this is an old post before I started bashing the keyboard. I see some of the info I wrote was already answered, but I took time to write so I will post anyway (and I believe I did at least a fair job at being somewhat neutral for the most part).

d0rk · July 30, 2021, 2:23pm

Simplicity TBH. Instead of scrolling through 8 different websites, wading through fluff/uninteresting crap/ads i get a chronological list of the news with headlines and blurbs to read (and the option to open the full article).

so, if i’ve not read the news in a few days, say for example a long weekend, I can buzz through the feedlist in my rss reader in about 15 minutes, picking out the interesting or good bits, whereas browsing the individual sites would eat up easily an hour and much more data (not that it matters, i’m on an uncapped network 95% of the time)

for me its another one of those tweaks that frees up time and brain cycles.

I can’t remember the name of the reader… but at once point there was an rss reader on android i used that actually would grab a (simplified) copy of the page for offline reading, even had options for grabbing embedded videos and pictures.