Migrating a Jekyll Blog to Docusaurus

July 22, 2024 · 12 min read

Owner, Software Engineer at 42dot

Recently, I undertook the task of moving my blog to a new platform. As I encountered various issues, I jotted down potential solutions, thinking they might be useful to others. Here’s a detailed account of the migration process.

Overview

my-home

In April 2024, my blog suddenly broke down.
- The issue was related to Jdelivery CDN.
- Although the problem might have resolved itself with time, the accumulated inconvenience led me to mistakenly believe I had written faulty code.
Jekyll felt restrictive and difficult to maintain, prompting me to migrate to Docusaurus.
- Especially since the blog was built via a fork, GitHub contributions weren’t being recorded, which was disappointing.
There was an existing issue requesting Docusaurus support in O2, but I hadn’t used it before and couldn’t resolve it - [FR] support mkdocs-material, docusaurus · songkg7/o2 · Discussion #346 · GitHub

These reasons collectively led to the decision to migrate the blog.

Features of Docusaurus

Built on React
Supports both technical documentation and blogging
Includes version control and i18n
Functionality can be extended with plugins
Supports writing documents in Markdown and MDX

Language

TypeScript. Although there have been attempts to move away from TS in some recent projects, I personally didn’t see the need. I prefer TS over JS, so I set up the environment with TS.

Package Manager

Docusaurus supports npm, yarn, and pnpm. Having used npm extensively, I wanted to try either pnpm or yarn this time.

In the end, I chose yarn for the following reasons:

There were some aspects of setting up GitHub Actions with pnpm that I didn’t like.
A decisive factor was a post on the Toss tech blog about package managers, which led me to switch to yarn.

With the package manager chosen, you can now check the blog.

yarn start

Blog or Docs

Docusaurus offers both docs and blog modes. Since a development blog only needs the blog mode, I considered removing the docs page. However, this would eliminate the main landing page, which felt like a loss of a design element.

After some thought, I decided to keep the landing page and modify the docs format instead of going blog-only.

Mermaid

Mermaid is a tool I frequently use for quickly and easily creating diagrams with code. Docusaurus supports it via a plugin, so let’s include it.

yarn add @docusaurus/theme-mermaid

const config: Config = {
    markdown: {
        mermaid: true,
    },
    themes: ['@docusaurus/theme-mermaid'],
};

For more details, refer to the official documentation.

Latex

Occasionally, I need to input mathematical formulas. For data engineers or those who frequently use math, this is even more common. Typically, Latex is used for this, so let’s set it up in Docusaurus.

Docusaurus supports Latex through the Katex plugin.

yarn add remark-math@6 rehype-katex@7

import remarkMath from 'remark-math';
import rehypeKatex from 'rehype-katex';

const config: Config = {
  presets: [
    [
      'classic',
      {
        blog: {
          remarkPlugins: [remarkMath],
          rehypePlugins: [rehypeKatex],
        },
      },
    ],
  ],
};

Include the CSS as well.

const config: Config = {
    stylesheets: [
        {
            href: 'https://cdn.jsdelivr.net/npm/katex@0.13.24/dist/katex.min.css',
            type: 'text/css',
            integrity:
                'sha384-odtC+0UGzzFL/6PNoE8rX/SPcQDXBJ+uRepguP4QkPCm2LBxH3FA3y+fKSiJ+AmM',
            crossorigin: 'anonymous',
        },
    ],
};

Code Block Highlight

Since Java isn’t supported by default, I added it through the prism settings, along with bash. If your frequently used language isn’t highlighted, you can add it as needed.

const config: Config = {
    themeConfig: {
        prism: {
            theme: prismThemes.github,
            darkTheme: prismThemes.dracula,
            additionalLanguages: ['java', 'bash'],
        },
    },
};

Deploying to GitHub Pages

There are several deployment methods, but I wanted to handle everything within GitHub, so I chose GitHub Pages. The default domain (~.github.io) is also clean and convenient.

Let’s set up CI/CD with GitHub Actions to automatically deploy the blog whenever a post is published. First, create a yaml file in ./.github/workflows/. Here’s the content:

name: Deploy to GitHub Pages

on:
  push:
    branches:
      - main

jobs:
  build:
    name: Build Docusaurus
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: yarn

      - name: Install dependencies
        run: yarn install --frozen-lockfile
      - name: Build website
        run: yarn run build

      - name: Upload Build Artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: build

  deploy:
    name: Deploy to GitHub Pages
    needs: build

    permissions:
      pages: write
      id-token: write

    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}

    runs-on: ubuntu-latest
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

Then, set the Source to GitHub Actions in Settings > Pages to complete the deployment. From now on, every push to the main branch will trigger an automatic deployment.

source

Check the Source

If you set the Source to Deploy from a branch, the blog service might be interrupted during the deployment process.

Connecting a Custom Domain

Using the default GitHub domain (~.github.io) is fine, but I decided to buy a domain to give my blog a more professional feel. Having a custom domain will also make it easier to maintain SEO if I switch platforms in the future.

Purchasing and Registering a Domain

Purchased the domain haril.dev from GoDaddy (annual cost: $20)
Registered the custom domain with GitHub Pages.
First, check the IP address of the existing domain.

dig songkg7.github.io

execute-dig

Then, go to the GitHub Pages settings and register the purchased domain.

custom-domain

Optional: Verifying the Domain to Prevent Hijacking

Go to Profile Settings -> Pages
Add a TXT record just like you added the IP

before-verifiy

Copy the values from 1 and 2 into the GoDaddy records.

add-a-record

After adding the record, click the Verify button.

verifed-domains

This completes the domain verification, protecting it from hijacking. ~~Though it’s hard to imagine anyone wanting to hijack this domain...~~

Giscus Comments

Docusaurus allows you to modify the design of existing components. By implementing and adding the Giscus component, you can easily add a comment feature.

Reference: Add Giscus Comments to Docusaurus 3 Blog Posts and Doc Pages | Riku Block

Search Engine

While I don’t publish a huge number of posts, once the list of posts exceeded a single page, I felt the need for a search feature. Indexing the posts would allow visitors to search not just by title but also by content, making it easier to find what they’re looking for.

There are two commercial search engines you can use with Docusaurus:

Algolia
- In the free version, the crawler updates the search index once a week.
- Officially recommended by Docusaurus.
- You can manually trigger the index update.
Orama
- Very easy to implement. If no analysis is needed, you can just paste the code.
- Detects deployment triggers and updates the index upon deployment.
- Supports semantic search with OpenAI integration.
- Has a prettier UI compared to the somewhat clunky Algolia. I personally value design as much as functionality.

Algolia’s downside is that it doesn’t automatically update the index upon deployment. Given the better UI/UX, I chose Orama.

orama

Orama’s main page is also beautiful

However, I had to roll back because Orama doesn’t support Korean tokenization yet, making Korean search impossible 😭. Since my main audience is Korean, I had to switch back to Algolia.

If Algolia Search Returns No Results

When using the Algolia search API with Docusaurus:

[
  "language:en",
  [
    "docusaurus_tag:default",
    "docusaurus_tag:docs-default-3.2.1",
    "docusaurus_tag:docs-community-current",
    "docusaurus_tag:docs-docs-tests-current"
  ]
]

These parameters are dynamically generated and sent with the request. These parameters act as filters, known as facets in search engines, and include:

docusaurus-tag
lang
language
type
version

Therefore, when creating an Algolia index, these facets must be set. Often, these facets are not set when the DocSearch index is created.

This can happen if the crawler runs before the Docusaurus settings are reflected after DocSearch approval.

The solution is to check if all necessary index facets are present and add any that are missing.

warning

Disabling contextualSearch by setting it to false will make the search work, but it’s not recommended.

facets

All the above facets must be present. In my case, the docusaurus_tag facet was missing, so I added it manually.

You can add missing attributes in Index > Configuration > Filtering and faceting - Facets by clicking + Add an Attribute.

add-an-attribute

After this, the search should work correctly.

Issue with Search Not Working Immediately After Index Update

After posting a new article and manually triggering the crawler to update the index, the new article wasn’t searchable.
Querying the Algolia index directly showed the new article.
Disabling contextualSearch locally didn’t help.
The issue seemed to be with the API.
The search results were correct immediately after triggering the crawl, but reverted to the old version when tested in a new window.
Suspected that the distributed search network (DSN) technology might be causing a delay in reflecting the updated index.
The issue resolved itself after about a day.

i18n

I’ve had a significant number of visitors from English-speaking countries and wanted to run an English blog for both practice and broader reach. Docusaurus supports i18n, so I decided to implement it.

yarn run write-translations --locale en

Docusaurus

Now, the challenge is translating the documents.

Manual Translation...?

I want articles written in Korean to be translated into English.
If only the title.md file is included in the PR, could it be translated and included as /en/title.md?
There doesn’t seem to be a well-known solution for this.
Tried using Sweep AI, but it wasn’t designed for translation.
How about using GitHub Action + Open AI or DeepL API?
Some basic pages like the 404 page might not redirect to the translated page depending on the hosting service. GitHub Pages doesn’t support this.

I found that such a GitHub action already exists (great minds think alike).

GPT Translate · Actions · GitHub Marketplace · GitHub

It requires a paid OpenAI API token, but since I already had one, it wasn’t a big deal.

🌐 Add LLM Translations by github-actions[bot] · Pull Request #10 · songkg7/haril.dev · GitHub

It works quite well. Since I’m also studying Japanese, I plan to provide content in English and Japanese after writing an article and proofreading the translations.

SEO

Enrich the front matter
- Docusaurus generates meta information based on the front matter.
You can easily check meta information on OpenGraph.

UpdateAt and UpdateBy

Just enable the option; no additional front matter is needed.

Since it determines updates based on git history, check the depth option when cloning in GitHub Action. The default is to fetch only the latest commit.

I rolled back the updateBy feature because:

Automated tasks showed GitHub Action as the modifier, which I didn’t like.
Although I aim for collaboration by exposing the edit button, I don’t expect many PRs.

For now, updateAt alone seems sufficient.

Migration Tip: Adjusting Folder Structure

ls-tree

Existing directory structure. Markdown documents are directly under the _posts folder.

Docusaurus supports organizing files by folders, which is convenient for grouping resources like images. Given the number of posts, let’s write a shell script to adjust the structure in one go.

#!/bin/bash

# Execute for all .md files
for file in *.md; do
  # Extract date from the filename
  date=$(echo $file | rg -o '\d{4}-\d{2}-\d{2}')

  # Create a directory for the date (ignore if it already exists)
  mkdir -p "$date"

  # Move the file to the directory
  mv "$file" "$date/"
done

# Execute for each directory
for dir in */; do
    # Remove the yyyy-MM-dd part from the filename
    new_filename=$(ls $dir | sed "s/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}-//g")
    mv "$dir/$(ls $dir)" "$dir/$new_filename"
done

after-script-execution

All files are neatly moved in one go. Next, move the resources referenced in the posts to the appropriate directories.

Use the ripgrep command to find references to local resources.

ripgrep

With the relative paths of the files, you can move them accordingly.

Move each resource file to blog/{yyyy-MM-dd}.
Modify the references in the posts to ./resource.webp instead of img.

Using a shell script would make this quick and easy, but since there were only a few resource images, I did it manually.

I’ll leave this as an exercise for you. 😜 (Asking GPT for help would make this a breeze)

Image Alt-Text

I used to neglect alt-text for images, but I learned it’s essential for accessibility, providing text alternatives for those using screen readers.
Knowing this, I realized the importance of alt-text.
GitHub has added a workflow for this, which is worth checking out.
- GitHub - github/accessibility-alt-text-bot: An action to remind users to add alt text on Issues, Pull Requests, and Discussions

Conclusion

Here’s a brief KPT (Keep, Problem, Try) retrospective.

Regret not setting up redirects when impulsively shutting down the old blog and creating a new one.
- Considering existing visitors, I should have at least set up redirects.
- Tried to keep URLs unchanged, but it didn’t work out as planned.
Not setting up redirects was a significant mistake. The traffic logs show consistent 404 errors.
Let’s pay more attention to aspects that were hard to maintain elegantly on the previous platform.
- e.g., design, applying frontend technologies

Overview​

Features of Docusaurus​

Language​

Package Manager​

Blog or Docs​

Mermaid​

Latex​

Code Block Highlight​

Deploying to GitHub Pages​

Connecting a Custom Domain​

Purchasing and Registering a Domain​

Optional: Verifying the Domain to Prevent Hijacking​

Giscus Comments​

Search Engine​

If Algolia Search Returns No Results​

Issue with Search Not Working Immediately After Index Update​

i18n​

Manual Translation...?​

SEO​

UpdateAt and UpdateBy​

Migration Tip: Adjusting Folder Structure​

Image Alt-Text​

Conclusion​

Reference​