Skip to content

Instantly share code, notes, and snippets.

@ttscoff
Last active February 20, 2025 03:09
Show Gist options
  • Save ttscoff/4974164772623e95e8a1ef46a3eec68a to your computer and use it in GitHub Desktop.
Save ttscoff/4974164772623e95e8a1ef46a3eec68a to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# Archive linkding bookmarks to Markdown files
# Can use [Gather](https://brettterpstra.com/projects/gather-cli/)
# for conversion (if installed), or use Marky
# the Markdownifier (web-based).
#
# See options below for configuration
#
# This script is designed to run once initially, and then
# be set to run in the background at intervals using cron
# or launchd.
#
# The script can be run as root if needed, and will
# appropriately set permissions on generated files to make
# them accessible by your user (see config).
#
# You can specify a certain tag to only archive bookmarks
# containing that tag. This is useful because if you're
# just linking a tool or interesting app, you probably
# don't need a Markdown archive of it. Tagging allows
# selective and intentional archiving. But if you leave it
# blank, the script will just archive all bookmarks.
#
# Bookmarks will be pulled and markdownified using either
# [Gather](https://brettterpstra.com/projects/gather-cli/)
# (Mac only) or the web based [Marky the
# Markdownifier](https://markdownrules.com). This can be
# changed in the configuration.
#
# If you want to ensure future access to images, set
# `localize_images` to true and any accessible linked
# images in an article will be downloaded to your local
# machine.
#
# To use the script, you'll need Ruby available. Install
# with a package manager or version manager if you don't have
# it.
#
# Save the script in your PATH and make it
# executable with `chmod a+x linkding.rb`. Modify the
# options section at the top with your server name, API key
# (https://example.com/settings/integrations), and set the
# various options. Run it once from the command line to
# test and to archive initial existing bookmarks with the
# specified tag. Once run successfully, you can then set up
# a cron job or launchd agent to run in the background at
# intervals (I recommend at least a 5 minute interval).
#
# MIT license, copyright Brett Terpstra 2024
#
%w[time cgi fileutils json erb].each do |filename|
require filename
end
options = {
server: 'https://your_linkeding_base_address', # your linkding install
api_key: "XXXXXX", # API key, https://your_install/settings/integrations
note_tag: ".archive", # tag to archive, leave empty to archive all bookmarks
notes_folder: "/Users/username/Dropbox/LinkDing/", # folder to store converted notes in
localize_images: false, # if true, download images to notes folder
images_subfolder: "images", # leave empty for no subfolder
user: "username:staff", # for setting permissions when run as root
tool: :marky, # set to :gather or :marky, Gather is preferred (requires installation),
gather_path: "/usr/local/bin/gather" # path to Gather binary
}
module Util
class << self
# Run chown on a file to set owner
#
# @param file [String] path to file
#
def chown(file, user = nil)
return unless user
`chown #{user} "#{file}"`
end
# Run chmod on a file to set standard permissions
# (755 for dirs, 644 for files)
#
# @param file [String] path to file
#
def chmod(file)
if File.directory?(file)
`chmod 755 "#{file}"`
else
`chmod 644 "#{file}"`
end
end
# Combo method for setting owner and permissions
#
# @param file [String] path to file
# @param user [String] user and optional group to set on file
#
def permissions(file, user = nil)
return unless user
chown(file, user)
chmod(file)
end
end
end
class ::String
# Indent every line of a string a given
# number of spaces
#
# @param distance [Integer] number of spaces
#
def indent(distance = 2)
indent = " " * distance
indent + gsub(/\n/, "\n#{indent}").rstrip
end
def sanitize
gsub(/\//, ':').gsub(/\|/, '-').gsub(/#/, 'hash ').gsub(/[^:\-a-z0-9 _,.?!]/i, '')
end
#
# Discard invalid characters and output a UTF-8 String
#
# @return [String] UTF-8 encoded string
#
def scrub
encode('utf-16', invalid: :replace).encode('utf-8')
end
#
# Destructive version of #utf8
#
# @return [String] UTF-8 encoded string, in place
#
def scrub!
replace scrub
end
#
# Method to save linked images to local images
#
# @param options [Hash] hash of options
#
# @option options [String] :notes_folder path to notes folder
# @option options [String] :images_subfolder subfolder to save to (blank for root)
# @option options [String] :user user and optional group to set for permissions
#
def localize_images(options)
folder = options[:notes_folder]
subfolder = options[:images_subfolder]
user = options[:user]
begin
unless subfolder.nil? || subfolder.strip.empty?
folder = File.join(folder, subfolder)
FileUtils.mkdir_p(folder) unless File.directory?(folder)
end
gsub(%r{(?<=:\s|\()https?://[^ ]+/([^/ ]+\.(?:png|jpe?g|gif|pdf|avif|webp|mov|ogg|mp4))(?:\?.*?)?(?=\s|\)|"|$)}i) do
m = Regexp.last_match
img = m[0].strip
image_name = m[1].strip
target = File.join(folder, image_name)
unless File.exist?(target)
puts "๐Ÿ”ป Downloading image #{image_name}"
puts `curl -SsL -o "#{target}" "#{img}"`
Util.permissions(target, user)
end
subfolder.empty? ? image_name : "#{subfolder}/#{image_name}"
end
rescue StandardError => e
puts "Failed to localize images"
puts e
puts e.backtrace
self
end
end
# Destructive version of #localize_images
def localize_images!(options)
replace localize_images(options)
end
# Some fixes for content created by Gather
# Removes empty lines, fixes weird self-links
def gather_fix
empties = []
out = strip.scrub
out.gsub!(/^(\s*\n){2,}/, "\n\n")
out.gsub!(/(?<!\[)(<.*?>)\](\[\d+\])/) do
m = Regexp.last_match
empties << m[2]
m[1]
end
empties.each do |e|
out.gsub!(/^\s*#{Regexp.escape(e)}: .*?$/, '')
end
out
end
# destructive version of #gather_fix
def gather_fix!
replace gather_fix
end
# Some fixes for Marky output
# Removes metadata to be replaced by this script's metadata
def marky_fix
content = strip.scrub
content.gsub!(/^(date|title|tags|source):.*?\n/, '')
content.strip
end
# destructive version of #marky_fix
def marky_fix!
replace marky_fix
end
end
class Linkding
# Initialize a new instance
#
# @param options [Hash] hash of options
#
# @option options [String] :server Linkding server
# @option options [String] :api_key API key
# @option options [String] :note_tag Tag to archive
# @option options [String] :notes_folder Folder to save to
# @option options [Boolean] :localize_images Download images
# @option options [String] :images_subfolder Subfolder to download images to
# @option options [String] :user user[:group] to apply permissions
# @option options [Symbol] :tool Tool to use for markdownifying (:gather or :marky)
#
def initialize(options = {})
return self if options.empty?
@options = options
end
# retrieves the JSON output from the Linkding API
#
# @param api_call [String] the API path to call
#
# @return [Hash] Converted hash of output
def get_json(api_call)
JSON.parse(`curl -SsL -H 'Authorization: Token #{@options[:api_key]}' '#{@options[:server]}#{api_call}'`)
end
# compares bookmark array to existing bookmarks to find new urls
#
# @return [Array] array of unsaved bookmarks
#
def new_bookmarks
search = "&q=%23#{@options[:note_tag]}" if @options[:note_tag] && !@options[:note_tag].empty?
call = "/api/bookmarks/?limit=1000&format=json#{search}"
json = get_json(call)
bookmarks = json["results"]
offset = 0
while json["next"]
offset += 1
json = get_json(call + "&offset=#{offset}")
bookmarks.concat(json["results"])
end
existing_files = Dir.glob('*.md', base: @options[:notes_folder])
unless existing_files.empty?
bookmarks.reject! do |s|
existing_files.include? "#{s["title"].sanitize}.md"
end
end
bookmarks
end
# Test if URL result is meta redirect
#
# @return [String] final url after following redirects
#
def redirect?(url)
content = `curl -SsL "#{url}"`.scrub
if content =~ /meta http-equiv=["']refresh["'].*?url=(.*?)["']/
url = redirect?(Regexp.last_match(1))
end
url
end
# markdownify url with Marky the Markdownifier
#
# @param url [String] URL to markdownify
#
# @return [String] markdown content
#
def marky(url)
url = redirect?(url)
call = %(https://heckyesmarkdown.com/api/2/?url=#{CGI.escape(url)}&readability=1)
`curl -SsL '#{call}'`.marky_fix
end
# markdownify url with Gather
#
# @param url [String] url to markdownify
#
# @return [String] markdown content
#
def gather(url)
url = redirect?(url)
`#{@options[:gather_path]} "#{url}"`.gather_fix
end
end
## Should require absolute paths as running as root will expand to wrong path
# options[:notes_folder] = File.expand_path(options[:notes_folder])
ld = Linkding.new(options)
puts "#{Time.now.strftime('%Y-%m-%d %H:%M')}: Starting"
# retrieve recent bookmarks
new_bookmarks = ld.new_bookmarks
# Retrieve content with specified tool
def get_content(options, bookmark, ld)
puts "๐Ÿ•ท๏ธ #{options[:tool].to_s =~ /^m/ ? "Markdownifying" : "Gathering"} #{bookmark['url']}"
content = if options[:tool].to_s =~ /^m/
ld.marky(bookmark['url'])
else
ld.gather(bookmark['url'])
end
content
end
# archive content and merge new bookmarks into main database
new_bookmarks.each do |bookmark|
begin
content = get_content(options, bookmark, ld)
if content.strip.empty?
puts "๐Ÿ˜ฅ Failed to gather"
next
end
puts "โœ… Gathered"
content.localize_images!(options) if content && options[:localize_images]
# puts content
title = bookmark["title"].strip
url = bookmark["url"].strip
description = bookmark["description"].strip.empty? ? "" : "\ndescription: >\n#{bookmark["description"].indent}"
notes = bookmark["notes"].strip.empty? ? "" : "\nnotes: >\n#{bookmark["notes"].indent}"
tags = bookmark["tag_names"]
tags.delete('.archive')
tags = tags.join(", ")
added = Time.parse(bookmark["date_added"]).strftime('%Y-%m-%d %H:%M')
# Template for markdown output, YAML headers and content
# description and notes added as header keys
template = ERB.new <<~ENDTEMPLATE
---
title: "<%= title %>"
source: <%= url %>
date: <%= added %><%= description %><%= notes %>
tags: [<%= tags %>]
---
<%= content %>
ENDTEMPLATE
out = template.result(binding)
filename = File.join(options[:notes_folder], "#{bookmark["title"].sanitize}.md")
puts "๐Ÿ’พ Writing content to #{filename}"
File.open(filename, 'w') do |f|
f.puts out
end
# Set permissions on generated file, in case we're running as root
Util.permissions(filename, options[:user])
rescue StandardError => e
puts "๐Ÿ˜ฅ Failed to gather #{bookmark['url']}"
puts e
puts e.backtrace
end
end
puts "๐Ÿ—ƒ๏ธ Archived #{new_bookmarks.count} new bookmarks"
@krzysztofjeziorny
Copy link

Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment