Last active
April 30, 2023 17:38
-
-
Save tombarys/f007574fda7cd25081698ce7171ea04a to your computer and use it in GitHub Desktop.
Cleans the focused Roam block after copy/pasting/importing from PDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; For all those who paste text from PDF into Roam using Copy-Paste or import via Readwise from PDF in the Reader app. | |
;; The problem is that there are often hard-coded "newline" characters at the end of lines in PDFs, | |
;; and words tend to be split by hyphens, which will break up the text when pasted into Roam. | |
;; I've made the simple script that cleans up the text, merges it, and if there's even a line shorter | |
;; than the preset threshold, treats it as the end of a paragraph (so it keeps the newline). | |
;; You can assign a keyboard shortcut to this, so it then works instantly. | |
;; Installation: | |
;; 1) copy this code as a children codeblock (`clojure`) under parent containing {{[[roam/cljs]]}} anywhere | |
;; in [[roam/cljs]] page | |
;; 2) confirm "Yes, I know what I am doing" | |
;; | |
;; Usage: | |
;; 1) focus block containing pasted text from PDF | |
;; 2) press Cmd-P (Ctrl-P on Windows) to show Command Palette | |
;; 3) search for "Clean PDF text" | |
;; 4) press Enter | |
;; | |
;; TIP: you can easily setup quick keyboard shortcut for this script with Command Palette | |
(ns clean-block-30-4-2023 | |
(:require [clojure.string :as str] | |
[roam.datascript :as rd] | |
[roam.block :as block])) | |
(def treshold 40) ;; set the line length treshold under which the line will be considered | |
;; as new paragraphs (=lines ending with \n character shorter than the treshold) | |
(defn block-content [uid] | |
(rd/q '[:find ?text . | |
:in $ ?uid | |
:where [?e :block/uid ?uid] | |
[?e :block/string ?text]] | |
uid)) | |
(defn clean [text] | |
(str/join | |
(map #(if (> treshold (count %)) (str % "\n") (str % " ")) | |
(str/split-lines | |
(-> text | |
(str/replace #"\-\s" "") | |
(str/trim)))))) | |
(defn main [] | |
(js/window.roamAlphaAPI.ui.commandPalette.addCommand | |
(clj->js | |
{:label (str "Clean PDF text") | |
:callback (fn [e] | |
(let [block-uid (:block-uid (js->clj (js/window.roamAlphaAPI.ui.getFocusedBlock) | |
:keywordize-keys true)) | |
text (block-content block-uid)] | |
(when-not (= text "") | |
(block/update {:block {:uid block-uid :string (clean text)}}))))}))) | |
(main) |
Author
tombarys
commented
Apr 30, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment