Draft 2023 05 06 Building An Org Mode To Markdown Exporter

Writing an org-mode to markdown converter on my own

Overall planning of the program

The first thing is to go through the buffer by line, and have a little state machine to drive the conversion.

I want `#+begin_ai` blocks to start blockquoting the individual interlocutors, with an empty line between each. We can experiment with adding separators or not.

We want to recognize if we are in:

ai mode
in ai ME section
in ai SYS section
in ai AI section
source mode
normal text mode

So, we need an enum for the states. That enum could be something like:

:normal (just no special context)
:in-dialogue (a `#+begin_ai` was recognized. We keep the interlocutor in a separate state variable, no need to overcomplicate things)
:in-dialogue-src (we are inside a quote block inside an AI response)
:in-src (we are in a `#+begin_src`) block
:in-results (we are in a results block, if need be)

We need the following methods:

is-begin-ai-block-p : begin_ai returns the header line
is-end-ai-block-p : end_ai
is-begin-src-block-p : begin_src, returns the language and header line
is-end-src-block-p : end_src
is-begin-quote-p : when triple-backquote inside an AI block, returns the language
is-end-quote-p : ending triple-backquote
is-ai-dialogue-change : we have a [..] block, returns the interlocutor
is-header-p
is-title-p

The main method of our state machine is going to be handle-state, which takes an input line, a current state, an output buffer, and returns a new state. I suspect an alist is the idiomatic way to keep structured data in emacs lisp.

Creating the state struct

I don’t really know how to do structures in emacs-lisp, so let’s ask the LLM.

Asking multiple times about the documentation parameter

Let’s switch to a more terse interlocutor. I suspect the information about defstruct is wrong, and the LLM started writing “common lisp”.

After looking at the definition of `cl-defstruct`, it does indeed support documentation and type annotations for the slots. The terse emacs seems to be wrong. Let’s let it be a bit more verbose.

We can now put this into an org-mode block and evaluate it.

(require 'cl-lib)

;; Defining structures
(cl-defstruct my-struct
  "Documentation for my-struct."
  (field1 nil :documentation "Doc for field1")
  (field2 0 :type integer :documentation "Doc for field2"))

(cl-defstruct (my-struct2
               (:constructor create-my-struct2)
               (:predicate my-struct2-p)
               (:copier copy-my-struct2))
  "Documentation for my-struct2."
  (field1 nil :documentation "Doc for field1")
  (field2 "" :type string :read-only t :documentation "Doc for field2"))

(cl-defstruct (my-struct3 (:include my-struct))
  "Documentation for my-struct3, inheriting from my-struct."
  (field3 '() :type list :documentation "Doc for field3"))

;; Instantiating structures
(setq instance1 (make-my-struct :field1 '(:a :b :c) :field2 42))
(setq instance2 (create-my-struct2 :field1 'hello :field2 "world"))
(setq instance3 (make-my-struct3 :field1 1 :field2 2 :field3 '("a" "b" "c")))

;; Accessing documentation and docstrings programmatically
(documentation 'my-struct)
(documentation 'my-struct-field1 'function)

The documentation function for `’mystruct ‘type` doesn’t seem to work.

Starting our actual implementation

No need to use the AI to write our state structure. We are going to use org-mode tangle to write it. This is the first time I use tangle, I don’t really know how it works, but decided to use the actual documentation to get started.

(require 'cl-lib)

(cl-defstruct org-ai-to-md-state
  "Represents the internal state of the state machine used to convert org-ai flavoured org mode files to markdown."
  (state :normal :type symbol :documentation "The state variable.
Can be:
- :normal (no special context)
- :in-dialogue (within a begin_ai block)
- :in-dialogue-src (within a code block inside a dialogue)
- in-src (within a begin_src code block)
- in-results (within the results of evaluating a code block)")
  (current-speaker nil :type string :documentation "The current speaker if dialogue is active and a [..] tag was recognized.")
  (current-src-language nil :type string :documentation "When inside a code block, the current language (used for syntax highlighting)"))

Let’s use the ai to write the main driver. This is more out of curiosity, because it is probably just easier to write it out.

(defun org-ai-to-md-handle-lines (state output-buffer lines)
  "The main driver of the converter. Iterates over the lines and accumulates them by calling HANDLE-LINE."
  (dolist (line lines state)
    (setq state (handle-line state output-buffer line))))

Writing out the parsing functions

We can now try to have the LLM do some real lifting. We have a description of our task, and we have thought about the parsing functions we wanted, as well as documented them.

Documenting argument and return types

This was quite a mouthful, so we definitely want to write some unit tests. But before we do that, I would actually like to see if there is an idomatic way to describe arguments and return values and their type.

We already saw that the super terse agent is often wrong, so let’s try again with a slight more verbose one.

Not entirely sure what to make out of all that, but it was interesting to see what is out there.

Asking about providing autocompletion of our functions

I’m intrigued by the autocomplete part, however. Not necessarily because I want it to be used in this project, but because I have been thinking about it for a few other package ideas.

Interesting, I’m not sure this is really useful per se, but I think there is a decent chance that the custom completion backend for company-mode is a real thing.

Enabling autocompletion within org-mode babel blocks

While I’m at it, I will ask it about completion in source blocks in org-mode, which is something that has been bugging me.

wesen/DRAFT - 2023-05-06 - Building an org mode to markdown exporter.org

Select an option

No results found