Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save zsim0n/5b4df135921dc3a2b827cb27844ade34 to your computer and use it in GitHub Desktop.
Save zsim0n/5b4df135921dc3a2b827cb27844ade34 to your computer and use it in GitHub Desktop.
Convert Contentful Export to Markdown (MDX)

Convert Contentful Export to Markdown (MDX)

I've prompted ChatGPT o1 to help me to create a script to convert contentful export to markdown / MDX. I've successfully used this script to migrate the content

The scrpits look the contentful export json and assets in the ´/export folder and migrates to the ´/mdx folder

Config

Setup the script with some variables:

// -------------------------------------------------
// 1) Update these paths to match your environment
// -------------------------------------------------
const exportFileName = 'contentful-export-...json'
const exportFolder = path.resolve('./export');

// The folder where Contentful images are stored
// (the default structure is images.ctfassets.net/<space_id>/).
// Adjust if your export structure differs.

const spaceFolder = '...';

// Where we’ll generate .mdx files
const mdxFolder = path.resolve('./mdx');

// Master assets folder — each post will get its own subfolder in here
const assetsFolder = path.join(mdxFolder, 'assets');

Run the script

foo@bar:~$ yarn convert 

Export from Contentful

# Install contentful-cli see more: https://www.contentful.com/developers/docs/tutorials/cli/installation/
foo@bar:~$ brew install contentful-cli

# login
foo@bar:~$ contentful login

# Find your Space ID <SpaceID> 
foo@bar:~$ contentful space list # find out 

# Export your space <SpaceID> 
foo@bar:~$ contentful space export --space-id <SpaceID> --download-assets --include-archived --include-drafts

# See more options like.  --export-dir --content-file --error-log-file etc...
foo@bar:~$ contentful space export --help
const fs = require('fs');
const path = require('path');
// -------------------------------------------------
// 1) Update these paths to match your environment
// -------------------------------------------------
const exportFileName = 'contentful-export-bz8w7lpo5q3h-master-2025-01-22T22-21-03.json';
const exportFolder = path.resolve('./export');
// The JSON file you exported from Contentful
const jsonFilePath = path.join(exportFolder, exportFileName);
// The folder where Contentful images are stored
// (the default structure is images.ctfassets.net/<space_id>/).
// Adjust if your export structure differs.
const spaceFolder = 'bz8w7lpo5q3h';
const imagesFolder = path.join(exportFolder, 'images.ctfassets.net', spaceFolder);
// Where we’ll generate .mdx files
const mdxFolder = path.resolve('./mdx');
// Master assets folder — each post will get its own subfolder in here
const assetsFolder = path.join(mdxFolder, 'assets');
// -------------------------------------------------
// 2) Create the MDX and assets folders if needed
// -------------------------------------------------
if (!fs.existsSync(mdxFolder)) {
fs.mkdirSync(mdxFolder, { recursive: true });
}
if (!fs.existsSync(assetsFolder)) {
fs.mkdirSync(assetsFolder, { recursive: true });
}
// -------------------------------------------------
// 3) Load the exported JSON
// -------------------------------------------------
const contentData = JSON.parse(fs.readFileSync(jsonFilePath, 'utf-8'));
// Usually you want entries of type "post"
const entries = contentData.entries || [];
// Also gather all known assets so we can handle images
const allAssets = contentData.assets || [];
// Build a map from assetId -> { fileName, localPath, url }
const assetMap = {};
allAssets.forEach((asset) => {
const assetId = asset?.sys?.id;
const assetUrl = asset?.fields?.file?.['en-US']?.url;
if (!assetId || !assetUrl) return; // skip weird or incomplete assets
// Example of assetUrl:
// //images.ctfassets.net/<spaceid>/>/<conentid>/someFolder/file.png
// Remove leading "//" to avoid empty splits:
const cleanedUrl = assetUrl.replace(/^\/\//, '');
// e.g. ["images.ctfassets.net","<spaceid>","<contentid>","someFolder","file.png"]
const splitted = cleanedUrl.split('/');
// Find the index of "spaceFolder"
const spaceIndex = splitted.indexOf(spaceFolder);
if (spaceIndex === -1) return; // should never happen, but just in case
// Last piece is the actual file name
const fileName = splitted[splitted.length - 1] || `asset-${assetId}.dat`;
// Everything AFTER "spaceFolder" is our local path
// e.g. ["contentfolder","someFolder","file.png"]
const relativeSegments = splitted.slice(spaceIndex + 1);
// Join them into one path: contentFolder/someFolder/file.png
const localPath = path.join(...relativeSegments);
assetMap[assetId] = {
url: assetUrl,
fileName,
localPath
};
});
/**
* Convert "some Title" -> "some-title" for folder / file naming
*/
function slugify(str) {
return (str || '')
.toLowerCase()
.replace(/\s+/g, '-')
.replace(/[^a-z0-9-]/g, '') // remove leftover weird chars
.replace(/-+/g, '-')
.trim();
}
/**
* Helper: find any inline references to Contentful-hosted images
* (//images.ctfassets.net/spacefolder/<assetId>/...) and rewrite them
* to local ./assets/<slug>/<filename>, also copying the image from the
* original subfolder structure.
*/
function rewriteAndCopyImagesInContent(content, slug) {
if (!content) return '';
// Regex matches e.g. "//images.ctfassets.net/${spaceFolder}/4I8F.../maybe/folders/file.png"
//const regex = /\/\/images\.ctfassets\.net\/${spaceFolder}\/([\w\d]+)\/[^\s)]+/g;
// Safely build a dynamic regex with the actual spaceFolder value
const regex = new RegExp(`//images\\.ctfassets\\.net\\/${spaceFolder}\\/(\\w[\\w\\d]+)\\/[^\\s)]+`, 'g');
// Destination folder for this post's images
const targetFolder = path.join(assetsFolder, slug);
let match;
let finalContent = content;
while ((match = regex.exec(content)) !== null) {
const foundAssetId = match[1]; // the captured <assetId>
if (!assetMap[foundAssetId]) continue;
const { fileName, localPath } = assetMap[foundAssetId];
// Build the source path, including subfolders under <assetId>
const sourcePath = path.join(imagesFolder, localPath);
if (fs.existsSync(sourcePath)) {
// Ensure the post's subfolder exists
if (!fs.existsSync(targetFolder)) {
fs.mkdirSync(targetFolder, { recursive: true });
}
// e.g. ./mdx/assets/<slug>/<fileName>
const destPath = path.join(targetFolder, fileName);
// Only copy if not already done
if (!fs.existsSync(destPath)) {
fs.copyFileSync(sourcePath, destPath);
}
}
// Replace the entire Contentful URL with the local reference
const localRef = `./assets/${slug}/${fileName}`;
finalContent = finalContent.replace(match[0], localRef);
}
return finalContent;
}
// -------------------------------------------------
// 4) Process each entry -> create .mdx files
// -------------------------------------------------
entries.forEach((entry) => {
// If you have multiple content types in your export, you might do:
// if (entry?.sys?.contentType?.sys?.id !== 'post') return;
const fields = entry?.fields;
if (!fields) return;
// Title, excerpt, body fields
const title = fields.title?.['en-US'] || 'Untitled';
const excerpt = fields.excerpt?.['en-US'] || '';
const body = fields.content?.['en-US'] || '';
// Skip if no title or body
if (!title && !body) return;
// Generate the slug
const slug = slugify(title);
// Possibly handle a "date" field
const rawDate = fields.date?.['en-US'] || '';
// Optionally handle a single thumbnail image field, e.g. 'thumb_img_path'
let thumbnailLocalPath = '';
if (fields.thumb_img_path?.['en-US']?.sys?.id) {
const assetId = fields.thumb_img_path['en-US'].sys.id;
const { fileName, localPath } = assetMap[assetId] || {};
if (fileName && localPath) {
// Build the full source path
const sourcePath = path.join(imagesFolder, localPath);
// Destination folder for this post's images
const postAssetsFolder = path.join(assetsFolder, slug);
if (!fs.existsSync(postAssetsFolder)) {
fs.mkdirSync(postAssetsFolder, { recursive: true });
}
const destPath = path.join(postAssetsFolder, fileName);
if (fs.existsSync(sourcePath)) {
fs.copyFileSync(sourcePath, destPath);
// e.g. ./assets/<slug>/<fileName>
thumbnailLocalPath = `./assets/${slug}/${fileName}`;
}
}
}
// Rewrite any inline images inside the body
const finalBody = rewriteAndCopyImagesInContent(body, slug);
// Build front matter
const mdxFrontMatter = `---
title: "${title.replace(/"/g, '\\"')}"
excerpt: "${excerpt.replace(/"/g, '\\"')}"
date: "${rawDate}"
thumbnail: "${thumbnailLocalPath}"
---
`;
// Combine front matter + content
const mdxFileContent = mdxFrontMatter + finalBody;
// Write the .mdx file
const mdxFilePath = path.join(mdxFolder, `${slug}.mdx`);
fs.writeFileSync(mdxFilePath, mdxFileContent, 'utf-8');
console.log(`Created: ${mdxFilePath}`);
});
console.log('All posts and images processed!');
{
"scripts": {
"convert": "node converter.js"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment