Skip to content

Instantly share code, notes, and snippets.

@ttscoff
Last active February 20, 2025 03:09
Show Gist options
  • Save ttscoff/4974164772623e95e8a1ef46a3eec68a to your computer and use it in GitHub Desktop.
Save ttscoff/4974164772623e95e8a1ef46a3eec68a to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# Archive linkding bookmarks to Markdown files
# Can use [Gather](https://brettterpstra.com/projects/gather-cli/)
# for conversion (if installed), or use Marky
# the Markdownifier (web-based).
#
# See options below for configuration
#
# This script is designed to run once initially, and then
# be set to run in the background at intervals using cron
# or launchd.
#
# The script can be run as root if needed, and will
# appropriately set permissions on generated files to make
# them accessible by your user (see config).
#
# You can specify a certain tag to only archive bookmarks
# containing that tag. This is useful because if you're
# just linking a tool or interesting app, you probably
# don't need a Markdown archive of it. Tagging allows
# selective and intentional archiving. But if you leave it
# blank, the script will just archive all bookmarks.
#
# Bookmarks will be pulled and markdownified using either
# [Gather](https://brettterpstra.com/projects/gather-cli/)
# (Mac only) or the web based [Marky the
# Markdownifier](https://markdownrules.com). This can be
# changed in the configuration.
#
# If you want to ensure future access to images, set
# `localize_images` to true and any accessible linked
# images in an article will be downloaded to your local
# machine.
#
# To use the script, you'll need Ruby available. Install
# with a package manager or version manager if you don't have
# it.
#
# Save the script in your PATH and make it
# executable with `chmod a+x linkding.rb`. Modify the
# options section at the top with your server name, API key
# (https://example.com/settings/integrations), and set the
# various options. Run it once from the command line to
# test and to archive initial existing bookmarks with the
# specified tag. Once run successfully, you can then set up
# a cron job or launchd agent to run in the background at
# intervals (I recommend at least a 5 minute interval).
#
# MIT license, copyright Brett Terpstra 2024
#
%w[time cgi fileutils json erb].each do |filename|
require filename
end
options = {
server: 'https://your_linkeding_base_address', # your linkding install
api_key: "XXXXXX", # API key, https://your_install/settings/integrations
note_tag: ".archive", # tag to archive, leave empty to archive all bookmarks
notes_folder: "/Users/username/Dropbox/LinkDing/", # folder to store converted notes in
localize_images: false, # if true, download images to notes folder
images_subfolder: "images", # leave empty for no subfolder
user: "username:staff", # for setting permissions when run as root
tool: :marky, # set to :gather or :marky, Gather is preferred (requires installation),
gather_path: "/usr/local/bin/gather" # path to Gather binary
}
module Util
class << self
# Run chown on a file to set owner
#
# @param file [String] path to file
#
def chown(file, user = nil)
return unless user
`chown #{user} "#{file}"`
end
# Run chmod on a file to set standard permissions
# (755 for dirs, 644 for files)
#
# @param file [String] path to file
#
def chmod(file)
if File.directory?(file)
`chmod 755 "#{file}"`
else
`chmod 644 "#{file}"`
end
end
# Combo method for setting owner and permissions
#
# @param file [String] path to file
# @param user [String] user and optional group to set on file
#
def permissions(file, user = nil)
return unless user
chown(file, user)
chmod(file)
end
end
end
class ::String
# Indent every line of a string a given
# number of spaces
#
# @param distance [Integer] number of spaces
#
def indent(distance = 2)
indent = " " * distance
indent + gsub(/\n/, "\n#{indent}").rstrip
end
def sanitize
gsub(/\//, ':').gsub(/\|/, '-').gsub(/#/, 'hash ').gsub(/[^:\-a-z0-9 _,.?!]/i, '')
end
#
# Discard invalid characters and output a UTF-8 String
#
# @return [String] UTF-8 encoded string
#
def scrub
encode('utf-16', invalid: :replace).encode('utf-8')
end
#
# Destructive version of #utf8
#
# @return [String] UTF-8 encoded string, in place
#
def scrub!
replace scrub
end
#
# Method to save linked images to local images
#
# @param options [Hash] hash of options
#
# @option options [String] :notes_folder path to notes folder
# @option options [String] :images_subfolder subfolder to save to (blank for root)
# @option options [String] :user user and optional group to set for permissions
#
def localize_images(options)
folder = options[:notes_folder]
subfolder = options[:images_subfolder]
user = options[:user]
begin
unless subfolder.nil? || subfolder.strip.empty?
folder = File.join(folder, subfolder)
FileUtils.mkdir_p(folder) unless File.directory?(folder)
end
gsub(%r{(?<=:\s|\()https?://[^ ]+/([^/ ]+\.(?:png|jpe?g|gif|pdf|avif|webp|mov|ogg|mp4))(?:\?.*?)?(?=\s|\)|"|$)}i) do
m = Regexp.last_match
img = m[0].strip
image_name = m[1].strip
target = File.join(folder, image_name)
unless File.exist?(target)
puts "🔻 Downloading image #{image_name}"
puts `curl -SsL -o "#{target}" "#{img}"`
Util.permissions(target, user)
end
subfolder.empty? ? image_name : "#{subfolder}/#{image_name}"
end
rescue StandardError => e
puts "Failed to localize images"
puts e
puts e.backtrace
self
end
end
# Destructive version of #localize_images
def localize_images!(options)
replace localize_images(options)
end
# Some fixes for content created by Gather
# Removes empty lines, fixes weird self-links
def gather_fix
empties = []
out = strip.scrub
out.gsub!(/^(\s*\n){2,}/, "\n\n")
out.gsub!(/(?<!\[)(<.*?>)\](\[\d+\])/) do
m = Regexp.last_match
empties << m[2]
m[1]
end
empties.each do |e|
out.gsub!(/^\s*#{Regexp.escape(e)}: .*?$/, '')
end
out
end
# destructive version of #gather_fix
def gather_fix!
replace gather_fix
end
# Some fixes for Marky output
# Removes metadata to be replaced by this script's metadata
def marky_fix
content = strip.scrub
content.gsub!(/^(date|title|tags|source):.*?\n/, '')
content.strip
end
# destructive version of #marky_fix
def marky_fix!
replace marky_fix
end
end
class Linkding
# Initialize a new instance
#
# @param options [Hash] hash of options
#
# @option options [String] :server Linkding server
# @option options [String] :api_key API key
# @option options [String] :note_tag Tag to archive
# @option options [String] :notes_folder Folder to save to
# @option options [Boolean] :localize_images Download images
# @option options [String] :images_subfolder Subfolder to download images to
# @option options [String] :user user[:group] to apply permissions
# @option options [Symbol] :tool Tool to use for markdownifying (:gather or :marky)
#
def initialize(options = {})
return self if options.empty?
@options = options
end
# retrieves the JSON output from the Linkding API
#
# @param api_call [String] the API path to call
#
# @return [Hash] Converted hash of output
def get_json(api_call)
JSON.parse(`curl -SsL -H 'Authorization: Token #{@options[:api_key]}' '#{@options[:server]}#{api_call}'`)
end
# compares bookmark array to existing bookmarks to find new urls
#
# @return [Array] array of unsaved bookmarks
#
def new_bookmarks
search = "&q=%23#{@options[:note_tag]}" if @options[:note_tag] && !@options[:note_tag].empty?
call = "/api/bookmarks/?limit=1000&format=json#{search}"
json = get_json(call)
bookmarks = json["results"]
offset = 0
while json["next"]
offset += 1
json = get_json(call + "&offset=#{offset}")
bookmarks.concat(json["results"])
end
existing_files = Dir.glob('*.md', base: @options[:notes_folder])
unless existing_files.empty?
bookmarks.reject! do |s|
existing_files.include? "#{s["title"].sanitize}.md"
end
end
bookmarks
end
# Test if URL result is meta redirect
#
# @return [String] final url after following redirects
#
def redirect?(url)
content = `curl -SsL "#{url}"`.scrub
if content =~ /meta http-equiv=["']refresh["'].*?url=(.*?)["']/
url = redirect?(Regexp.last_match(1))
end
url
end
# markdownify url with Marky the Markdownifier
#
# @param url [String] URL to markdownify
#
# @return [String] markdown content
#
def marky(url)
url = redirect?(url)
call = %(https://heckyesmarkdown.com/api/2/?url=#{CGI.escape(url)}&readability=1)
`curl -SsL '#{call}'`.marky_fix
end
# markdownify url with Gather
#
# @param url [String] url to markdownify
#
# @return [String] markdown content
#
def gather(url)
url = redirect?(url)
`#{@options[:gather_path]} "#{url}"`.gather_fix
end
end
## Should require absolute paths as running as root will expand to wrong path
# options[:notes_folder] = File.expand_path(options[:notes_folder])
ld = Linkding.new(options)
puts "#{Time.now.strftime('%Y-%m-%d %H:%M')}: Starting"
# retrieve recent bookmarks
new_bookmarks = ld.new_bookmarks
# Retrieve content with specified tool
def get_content(options, bookmark, ld)
puts "🕷️ #{options[:tool].to_s =~ /^m/ ? "Markdownifying" : "Gathering"} #{bookmark['url']}"
content = if options[:tool].to_s =~ /^m/
ld.marky(bookmark['url'])
else
ld.gather(bookmark['url'])
end
content
end
# archive content and merge new bookmarks into main database
new_bookmarks.each do |bookmark|
begin
content = get_content(options, bookmark, ld)
if content.strip.empty?
puts "😥 Failed to gather"
next
end
puts "✅ Gathered"
content.localize_images!(options) if content && options[:localize_images]
# puts content
title = bookmark["title"].strip
url = bookmark["url"].strip
description = bookmark["description"].strip.empty? ? "" : "\ndescription: >\n#{bookmark["description"].indent}"
notes = bookmark["notes"].strip.empty? ? "" : "\nnotes: >\n#{bookmark["notes"].indent}"
tags = bookmark["tag_names"]
tags.delete('.archive')
tags = tags.join(", ")
added = Time.parse(bookmark["date_added"]).strftime('%Y-%m-%d %H:%M')
# Template for markdown output, YAML headers and content
# description and notes added as header keys
template = ERB.new <<~ENDTEMPLATE
---
title: "<%= title %>"
source: <%= url %>
date: <%= added %><%= description %><%= notes %>
tags: [<%= tags %>]
---
<%= content %>
ENDTEMPLATE
out = template.result(binding)
filename = File.join(options[:notes_folder], "#{bookmark["title"].sanitize}.md")
puts "💾 Writing content to #{filename}"
File.open(filename, 'w') do |f|
f.puts out
end
# Set permissions on generated file, in case we're running as root
Util.permissions(filename, options[:user])
rescue StandardError => e
puts "😥 Failed to gather #{bookmark['url']}"
puts e
puts e.backtrace
end
end
puts "🗃️ Archived #{new_bookmarks.count} new bookmarks"
@krzysztofjeziorny
Copy link

This is excellent and works on a Mac with both Marky und Gather. Many thanks for putting it together.

As an idea, maybe would it be easy to extend the tools with Pandoc? It's a Swiss army knife for file conversions, and can create Markdown files from html, as a filter. So it would be an option on non-MacOS for local conversion. Example:
curl --silent https://pandoc.org/installing.html | pandoc --from html --to markdown_strict -o installing.md

@ttscoff
Copy link
Author

ttscoff commented Nov 28, 2024

As an idea, maybe would it be easy to extend the tools with Pandoc?

Entirely possible, but the benefit of both Marky and Gather is the automatic removal of menus, ads, comments, etc. Pandoc just converts everything on the page into the notes, which is rarely what I want. Marky uses Pandoc at its core, so it's a good solution for non-Mac users, with the benefit of detritus removal.

@krzysztofjeziorny
Copy link

Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment