Update README and flag descriptions

This commit is contained in:
Gabriel Garrido 2024-05-19 18:09:28 +02:00
parent b5ead62468
commit bafed4ca9b
2 changed files with 173 additions and 21 deletions

184
README.md
View file

@ -20,17 +20,17 @@ Usage of mastodon-markdown-archive:
-dist string
Path to directory where files will be written (default "./posts")
-exclude-reblogs
Whether or not to exclude reblogs
Exclude reblogs
-exclude-replies
Whether or not exclude replies to other users
Exclude replies to other users
-filename string
Template for post filename
-limit int
Maximum number of posts to fetch (default 40)
-max-id string
Fetch posts lesser than this id
Fetch posts older than this id
-min-id string
Fetch posts immediately newer than this id
Fetch posts newer than this id
-persist-first string
Location to persist the post id of the first post returned
-persist-last string
@ -42,20 +42,20 @@ Usage of mastodon-markdown-archive:
-template string
Template to use for post rendering, if passed
-threaded
Thread replies for a post in a single file (default true)
Thread replies for a post in a single file
-user string
URL of User's Mastodon account whose toots will be fetched
URL of Mastodon account whose toots will be fetched
```
## Example
Here is how I use this to archive posts from my Mastodon account.
I use this tool programatically, and I certainly do not want to recreate the archive from scratch each time. I exclude replies to others, and reblogs.
I use this tool programatically, and I do not want to recreate the archive from scratch each time. I exclude replies to others, and reblogs.
I first use this to generate an archive up to a certain point in time. Then, I use it to archive posts made since the last archived post.
I first used this to generate an archive of all the posts that I had published to date. Then, I run it programatically to archive any new posts made.
Mastodon imposes an upper limit of 40 posts in their API. With `--persist-first` and `--persist-last` I can save cursors of the upper and lower bound of posts that were fetched. I can then use Mastodon's `max-id`, `min-id`, and `since-id` parameters to get the posts that I need, depending on each cae.
Mastodon imposes a maximum limit of 40 posts in this API. With `--persist-first` and `--persist-last` I can save cursors of the upper and lower bound of posts that were fetched. I then use the API's `max-id`, `min-id`, and `since-id` parameters to get the posts that I need, depending on each case.
### Generating an entire archive
@ -69,15 +69,40 @@ mastodon-markdown-archive \
--max-id=$(test -f ./last && cat ./last || echo "")
```
Calling this for the first time will fetch the most recent 40 posts. With `--persist-last`, the 40th post's id will be saved at `./last`.
Calling this for the first time will fetch the most recent 40 posts. With `--persist-last./last`, the oldest fetched post id will be saved at `./last`.
Calling this command iteratively will fetch the account's posts in reverse chronological time, 40 posts at a time. If my account had 160 posts, I'd need to call this command 4 times to create the archive.
Calling this command iteratively will fetch the account's posts in reverse chronological order, 40 posts at a time.
You can use simple bash script to automate this process. Adding the `--porcelain` flag prints the amount of fetched posts to stdout, which can then be used continue or stop fetching posts:
```bash
#!/bin/bash
while true; do
command="mastodon-markdown-archive --dist=./example \
--exclude-replies=true \
--exclude-reblogs=true \
--user=https://social.coop/@ggpsv \
--porcelain=true \
--threaded=true \
--persist-last=./last \
--max-id=$(test -f ./last && cat ./last || echo '')"
output=$($command)
if [[ "$output" -eq 0 ]]; then
echo "No posts returned. Exiting"
break
fi
echo "Fetched $output posts. Continuing."
sleep 1
done
```
### Getting the latest posts
Calling this for the first time will fetch the most recent 40 posts. With `--persist-first`, the most recent post's id will be saved at `./first`.
With `--persist-first=./first`, the most recent post id will be saved at `./first`.
Calling this command iteratively will only fetch posts that have been made since the last retrieved post.
Calling this command iteratively will only fetch posts that have been made since the last retrieved post:
```sh
mastodon-markdown-archive \
@ -89,9 +114,136 @@ mastodon-markdown-archive \
--since-id=$(test -f ./first && cat ./first || echo "")
```
## Template
## Threading
By default, this tool uses the [post.tmpl](./files/templates/post.tmpl) template to create the markdown file. A different template can be used by passing its path to `--template`.
By default, posts by the author in reply to another post by the author will be written out as separate files.
For information about variables and functions available in the template context, refer to the `Write` method in [files.go](files/files.go#L95-L101).
However, posts can be threaded together using the `--threaded=true` flag. With threading, the descendants of a post will not be written out as a separate files. Instead, only the top post will be written out.
The program will aggregate the post's descendants in reverse chronological order and make them available in the template via the [Descendants](https://pkg.go.dev/git.garrido.io/gabriel/mastodon-markdown-archive/client#Post.Descendants) method. This can be used in [templates](#templating) to render threaded posts as a single post, which the default template does.
When threading, the `AllMedia` and `AllTags` methods will yield the aggregated [MediaAttachment](https://pkg.go.dev/git.garrido.io/gabriel/mastodon-markdown-archive/client#MediaAttachment) and [Tag](https://pkg.go.dev/git.garrido.io/gabriel/mastodon-markdown-archive/client#Tag), respectively.
### Orphaned posts
Mastodon limits their statuses API to a maximum 40 posts at a time, and the `--limit` flag can be used to limit this further.
Because of this limit, it is possible that posts in a thread end up split across different responses. Or, a user may maintain a long-lived thread of posts that gets updated sporadically. This results in an orphaned post, which is a post whose parent is not within the same batch of posts returned by a single API call.
In either case, the program will fallback to using the [status context](https://docs.joinmastodon.org/methods/statuses/#context) endpoint to rebuild the corresponding thread from the top.
## Templating
The contents of the file and the filename for each post can be customized using templates. This provides enough flexibility to use this tool for various purposes. The templates are evaluated as Go [text templates](https://pkg.go.dev/text/template), so it should be possible to do anything that's supported in a Go template.
For example, if you're using this to syndicate posts to a site built using a static site generator, you can customize the output so that it adheres to specific requirements around front-matter structure or filename formats.
### Post
Out of the box, this tool uses the [post.tmpl](./files/templates/post.tmpl) template to create the post file. It converts the post content to markdown, threads replies, and defines some attributes in the front-matter using YAML.
For example, for this [post](https://social.coop/@ggpsv/112326240503555949):
```md
---
date: 2024-04-24 12:40:10.029 +0000 UTC
post_uri: https://social.coop/users/ggpsv/statuses/112326240503555949
post_id: 112326240503555949
tags:
- FrameworkLaptop
- fedora
---
Back at dual-booting on the [#FrameworkLaptop](https://social.coop/tags/FrameworkLaptop). Last time it was Ubuntu, but now I have gone with [#Fedora](https://social.coop/tags/Fedora) 40 KDE.
I'm impressed with how things just work with this laptop. Major props to the [@frameworkcomputer](https://fosstodon.org/@frameworkcomputer) team for supporting these distros out of the box.
I simply decrypted my drive, shrunk it, created a partition, booted off a USB key, installed Fedora, encrypted both partitions, and that's it.
Also, KDE Plasma 6 looks incredibly crisp on this screen.
```
A different template can be used by passing its path to `--template`. The template must comply with Go template syntax.
For example, a `jekyll.tmpl` template with customized front-matter :
```
---
layout: post
title: {{ substr 0 5 .Post.Id }}
published: true
---
{{ .Post.Content | toMarkdown }}
```
Passed to the command as `--template=./jekyll.tmpl` will yield a file that looks like this:
```md
---
layout: post
title: 11232
published: true
---
Back at dual-booting on the [#FrameworkLaptop](https://social.coop/tags/FrameworkLaptop). Last time it was Ubuntu, but now I have gone with [#Fedora](https://social.coop/tags/Fedora) 40 KDE.
I'm impressed with how things just work with this laptop. Major props to the [@frameworkcomputer](https://fosstodon.org/@frameworkcomputer) team for supporting these distros out of the box.
I simply decrypted my drive, shrunk it, created a partition, booted off a USB key, installed Fedora, encrypted both partitions, and that's it.
Also, KDE Plasma 6 looks incredibly crisp on this screen.
```
You might even want to use HTML as the output and thus have a `html.tmpl` file:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ .Post.Id }}</title>
</head>
<body>
{{.Post.Content}}
</body>
</html>
```
Passed to the command as `--template=./html.tmpl` will yield a file that looks like this:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>112326240503555949</title>
</head>
<body>
<p>Back at dual-booting on the <a href="https://social.coop/tags/FrameworkLaptop" class="mention hashtag" rel="tag">#<span>FrameworkLaptop</span></a>. Last time it was Ubuntu, but now I have gone with <a href="https://social.coop/tags/Fedora" class="mention hashtag" rel="tag">#<span>Fedora</span></a> 40 KDE.</p><p>I&#39;m impressed with how things just work with this laptop. Major props to the <span class="h-card" translate="no"><a href="https://fosstodon.org/@frameworkcomputer" class="u-url mention">@<span>frameworkcomputer</span></a></span> team for supporting these distros out of the box.</p><p>I simply decrypted my drive, shrunk it, created a partition, booted off a USB key, installed Fedora, encrypted both partitions, and that&#39;s it. </p><p>Also, KDE Plasma 6 looks incredibly crisp on this screen.</p>
</body>
</html>
```
### Filename
Out of the box, this tool uses `<post id>.md` as the post filename format. For example, this [post](https://social.coop/@ggpsv/112326240503555949) is saved `112326240503555949.md`
A different format for the filename can be used by passing a template string to `--filename`. The string must comply with Go template syntax.
For example, to create post files that are prefixed with the post's creation date in `YYYY-MM-DD` format and suffixed with the post id, pass `--filename='{{.Post.CreatedAt | date "2006-01-02"}}-{{.Post.Id}}.md`.
An extension suffixed to the filename template will be used if present. Otherwise, `.md` is used as the default file extension.
Following the HTML example in the [post template section](#post) above, you format the filename as `--filename='{{.Post.Id}}.html'` to use HTML as the extension.
### Available functions and variables
For both the post and filename templates, the following functions and variables are available:
#### Functions
* Standard Go template functions
* All [Sprig](https://masterminds.github.io/sprig/) functions
* `toMarkdown` to convert the post's HTML content to Markdown, without escaping any markdown syntax
* `toMarkdownEscaped` to convert the post's HTML content to Markdown, escaping any markdown syntax
#### Variables
* [Post](https://pkg.go.dev/git.garrido.io/gabriel/mastodon-markdown-archive/client#Post)

10
main.go
View file

@ -13,13 +13,13 @@ import (
func main() {
dist := flag.String("dist", "./posts", "Path to directory where files will be written")
user := flag.String("user", "", "URL of User's Mastodon account whose toots will be fetched")
excludeReplies := flag.Bool("exclude-replies", false, "Whether or not exclude replies to other users")
excludeReblogs := flag.Bool("exclude-reblogs", false, "Whether or not to exclude reblogs")
user := flag.String("user", "", "URL of Mastodon account whose toots will be fetched")
excludeReplies := flag.Bool("exclude-replies", false, "Exclude replies to other users")
excludeReblogs := flag.Bool("exclude-reblogs", false, "Exclude reblogs")
limit := flag.Int("limit", 40, "Maximum number of posts to fetch")
sinceId := flag.String("since-id", "", "Fetch posts greater than this id")
maxId := flag.String("max-id", "", "Fetch posts lesser than this id")
minId := flag.String("min-id", "", "Fetch posts immediately newer than this id")
maxId := flag.String("max-id", "", "Fetch posts older than this id")
minId := flag.String("min-id", "", "Fetch posts newer than this id")
persistFirst := flag.String("persist-first", "", "Location to persist the post id of the first post returned")
persistLast := flag.String("persist-last", "", "Location to persist the post id of the last post returned")
templateFile := flag.String("template", "", "Template to use for post rendering, if passed")