A set of crappy RSS scripts to handle RSS in an Unix way.
Go to file
2024-03-09 18:22:30 +01:00
.github/workflows Create jekyll-gh-pages.yml 2024-02-11 12:07:39 +01:00
bin chg: [rssmerge] Markdown output improved and code clean-up 2024-03-09 18:22:30 +01:00
LICENSE chg: [LICENSE] add missing license (2-clause BSD) 2024-02-11 12:05:13 +01:00
README.md chg: [rssfind] added a brute-force mode -b to discover potential feed source 2024-03-04 11:14:28 +01:00
REQUIREMENTS Merge branch 'master' of github.com:adulau/rss-tools 2024-03-04 15:03:44 +01:00

RSS tools

Following an old idea from 2007, published in my ancient blog post titled RSS Everything?, this set of tools is designed to work with RSS (Really Simple Syndication) in a manner consistent with the Unix philosophy.

The code committed in this repository was originally old Python code from 2007. It might break your PC, harm your cat, or cause the Flying Spaghetti Monster to lose a meatball.

As 2024 marks the resurgence of RSS and Atom1, I decided to update my rudimentary RSS tools to make them contemporary.

Forks and pull requests are more than welcome. Be warned: this code was initially created for experimenting with RSS workflows.

Requirements

  • Python 3
  • Feedparser

Tools

rssfind

rssfind.py is a simple script designed to discover RSS or Atom feeds from a given URL.

It employs two techniques:

  • The first involves searching for direct link references to the feed within the HTML page.
  • The second uses a brute-force approach, trying a series of known paths for feeds to determine if they are valid RSS or Atom feeds.

The script returns an array in JSON format containing all the potential feeds it discovers.

Usage: Find RSS or Atom feeds from an URL
usage: rssfind.py [options]

Options:
  -h, --help            show this help message and exit
  -l LINK, --link=LINK  http link where to find one or more feed source(s)
  -d, --disable-strict  Include empty feeds in the list, default strict is
                        enabled
  -b, --brute-force     Search RSS/Atom feeds by brute-forcing url path
                        (useful if the page is missing a link entry)

rsscluster

rsscluster.py is a simple script that clusters items from an RSS feed based on a specified time interval, expressed in days. The maxitem parameter defines the maximum number of items to keep after clustering. This script can be particularly useful for platforms like Mastodon, where a user might be very active in a single day and you want to cluster their activity into a single RSS item for a defined time slot.

rsscluster.py --interval 2 --maxitem 20 "http://paperbay.org/@a.rss" > adulau.xml

rssmerge

rssmerge.py is a simple script designed to aggregate RSS feeds and merge them in reverse chronological order. It outputs the merged content in text, HTML, or Markdown format. This tool is useful for tracking recent events from various feeds and publishing them on your website.

python3 rssmerge.py --maxitem 30 --output markdown "http://api.flickr.com/services/feeds/photos_public.gne?id=31797858@N00&lang=en-us&format=atom"  "http://www.foo.be/cgi-bin/wiki.pl?action=journal&tile=AdulauMessyDesk" "http://paperbay.org/@a.rss" "http://infosec.exchange/@adulau.rss"
Usage: rssmerge.py [options] url

Options:
  -h, --help            show this help message and exit
  -m MAXITEM, --maxitem=MAXITEM
                        maximum item to list in the feed, default 200
  -s SUMMARYSIZE, --summarysize=SUMMARYSIZE
                        maximum size of the summary if a title is not present
  -o OUTPUT, --output=OUTPUT
                        output format (text, phtml, markdown), default text
python3 rssmerge.py --maxitem 5 --output markdown "http://api.flickr.com/services/feeds/photos_public.gne?id=31797858@N00&lang=en-us&format=atom"  "http://www.foo.be/cgi-bin/wiki.pl?action=journal&tile=AdulauMessyDesk" "http://paperbay.org/@a.rss" "http://infosec.exchange/@adulau.rss

Sample output from rssmerge


- [harvesting society #street #streetphotography #paris #societ](https://paperbay.org/@a/111908018263388808)
- [harvesting society](https://www.flickr.com/photos/adulau/53520731553/)
- [late in the night#bynight #leica #streetphotography #street ](https://paperbay.org/@a/111907960149305774)
- [late in the night](https://www.flickr.com/photos/adulau/53520867709/)
- [geography of illusion#photography #art #photo #bleu #blue #a](https://paperbay.org/@a/111907911876620745)

rssdir

rssdir.py is a simple and straightforward script designed to convert any directory on the filesystem into an RSS feed.

rssdir.py --prefix https://www.foo.be/cours/ . >rss.xml
Usage: rssdir.py [options] directory

Options:
  -h, --help            show this help message and exit
  -p PREFIX, --prefix=PREFIX
                        http prefix to be used for each entry, default none
  -t TITLE, --title=TITLE
                        set a title to the rss feed, default using prefix
  -l LINK, --link=LINK  http link set, default is prefix and none if prefix
                        not set
  -m MAXITEM, --maxitem=MAXITEM
                        maximum item to list in the feed, default 32

rsscount

rsscount.py is a straightforward script designed to count the number of items in an RSS feed per day. It is utilized to construct the wiki creativity index. The script accepts an unlimited number of URL arguments. It can be used to feed statistical tools.

python3 rsscount.py https://paperbay.org/@a.rss | sort
20240121	3
20240124	1
20240128	4
20240130	1
20240131	1
20240201	1
20240203	2
20240204	3
20240210	4

License

rss-tools are open source/free software licensed under the permissive 2-clause BSD license.

Copyright 2007-2024 Alexandre Dulaunoy

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


  1. As web platforms continue to deteriorate in quality, and with the diminishing visibility across various pseudo-social networks coupled with the decline of RSS culture, the emergence of new open-source, federated networks using ActivityPub (an advanced RSS format) seems particularly timely. I believe that reviving open-source tools developed in 2007 for handling RSS is increasingly relevant. Many of these new federated platforms are revitalizing RSS, which is a trend that deserves encouragement and support. ↩︎