作者 | Jonas Hietala
翻译 | 核子可乐
策划 | 褚杏娟

For the past nine years, I have been using Hakyll as a static site generator. Going further back, I primarily used Jekyll, and dynamic pages were implemented using Perl with Mojolicious and PHP with Kohana. However, I have only vague memories of those times, as Git was not available then, so many development traces are lost.

Now, I have finally made up my mind to switch to a custom site generator written in Rust. Through this rewrite, I mainly aim to solve the following three issues:

First, the increasingly slow speed. On my low-spec laptop, a complete site rebuild takes about 75 seconds (not including compilation, just site generation). I only have 240 posts on my blog, so it shouldn’t be this slow. Although I have a decent caching system in place and only execute the update watch command during editing, the overall execution speed is still too slow.

Second, external dependencies. Although the site generator itself is written in Haskell, it includes many Haskell libraries as well as other dependencies. My blog helper scripts are written in Perl, I use sassc for SASS conversion, Python’s pygments for syntax highlighting, and s3cmd to upload the generated site to S3. Managing and updating so many dependencies is really annoying, and I want to eliminate the hassle and focus back on the blog content.

Third, setup issues. Related to the numerous dependencies, my blog site sometimes crashes, requiring time to debug and fix. Sometimes, just when I have a spark of inspiration, the system crashes, and I need to quickly replace the site generator.

Some friends might ask, what could possibly crash such a simple site? The main culprit is often updates, which can cause issues in unexpected ways. For example:

After updating GHC, it cannot find the cabal package.
When running a Haskell binary, the system prompts:

[ERROR] Prelude.read: no parse

(This only occurs on the desktop; it runs fine on my low-spec laptop.)

Or the following Perl error:

Magic.c: loadable library and perl binaries are mismatched (got handshake key 0xcd00080, needed 0xeb00080)

(This only occurs on the laptop; it runs fine on the desktop.)

When different versions of Hakyll change Pandoc parameters, it breaks the code rendering in the Atom feed.

I know these are not huge problems, but I just want to write a blog post easily, so having it run smoothly is my top priority.

1 Haskell Caused My Mental Exhaustion

Actually, I quite like Haskell, especially its pure functions. Additionally, I appreciate the declarative approach Hakyll uses for site configuration. For generating static (i.e., standalone) pages, for example:

match "static/*.markdown" $ do    route   staticRoute    compile $ pandocCompiler streams        >>= loadAndApplyTemplate "templates/static.html" siteCtx        >>= loadAndApplyTemplate "templates/site.html" siteCtx        >>= deIndexUrl

Even if I don’t understand what $ and >>= represent, I can still see that we are looking for files in the static/ folder, sending these files to pandocCompiler (to convert the original markdown format), then sending them to the template, and finally canceling index URLs (to avoid links ending with index.html).

How simple and clear!

However, I haven’t used Haskell for many years, so every time I need to add a slightly more complex feature to the site, it requires a tremendous amount of effort.

For example, I wanted to add next/previous links in the posts, but found it difficult to implement easily. In the end, I had to take time to relearn Haskell and Hakyll. Even so, the solution I came up with was very slow, relying on linear search to find the next/previous post. Even now, I still don’t know how to implement this feature correctly through Hakyll.

I believe many experts have good methods, but for me, such a small feature consumes too much mental energy, which is truly unbearable.

2 Why Choose Rust?

I enjoy using Rust, and my preference is enough to determine the implementation method for such hobby projects.
Rust has strong performance, and text conversion is precisely a task it excels at.
Cargo is very hassle-free. After installing Rust, I can just run cargo build and wait for the results.

Why reinvent the wheel? Because I want to take the initiative and see what kind of static site generator I can write myself. This shouldn’t be too difficult; I can fully control my blog site with it and enjoy functionality and flexibility far beyond existing generators. Of course, I also know that tools like cobalt can flexibly convert pages with any language. I just want to enjoy the fun of solving problems while having that flexibility.

Due to space limitations, I cannot fully review the entire build process in this article. Interested friends can click here to view the project source code.

(https://github.com/treeman/jonashietala)

Breaking Down the “Hard Nuts”

Initially, I was worried that I wouldn’t be able to reproduce the various Hakyll features I was familiar with, such as template engines, syntax highlighting for multiple languages, or the watch command that automatically regenerates edited pages and acts as a file server, allowing me to view posts in the browser while writing.

But it turns out that each “hard nut” has a corresponding ideal tool. Here are some of the excellent libraries I used:

Using tera as the template engine. It is more powerful than Hakyll because it can perform complex operations like loops:

<div class="post-footer">  <nav class="tag-links">      Posted in {% for tag in tags %}{% if loop.index0 > 0 %}, {% endif %}<a href="{{ tag.href }}">{{ tag.name }}</a>{% endfor %}.  </nav></div>

Using pulldown-cmark to parse Markdown.

For the standard Markdown syntax specification CommonMark, pulldown-cmark performs excellently. Although it is faster, its support range is not as extensive as Pandoc, so I still need to combine it with other functionalities for supportive extensions. This issue will be discussed later.

Using syntect for syntax highlighting, which supports Sublime Text syntax.
Using yaml-front-matter to parse metadata in posts.
Using grass as a pure Rust SASS compiler.
Using axum to create a static file server responsible for hosting the site locally.
Using hotwatch to monitor file changes, so that the page updates when the file content changes.
Using scraper to parse the generated HTML. I need this for some tests and specific conversions.
Using rust-s3 to upload the generated site to S3 storage.

Even with these libraries, my Rust source files still exceed 6000 lines. I must admit that Rust code can be a bit verbose, and my own skill level is not high, but the workload for this project turned out to be much more than expected. (But it seems that software projects are always like this…)

Markdown Conversion

Although using standard markdown in posts can avoid this step, over the years, my posts have involved many features and extensions that pulldown-cmark cannot support, so I had to code the solutions myself.

Preprocessing

I set up a preprocessing step to create graphics containing multiple images. This is a general processing step, and the specific form is as follows:

::: <type><content>:::

I use it for different types of image collections, such as Flex, Figure, and Gallery. Here’s an example:

::: Flex/images/img1.png/images/img2.png/images/img3.png Figcaption goes here:::

It will be converted to:

<figure class="flex-33"><img src="/images/img1.png" /><img src="/images/img2.png" /><img src="/images/img3.png" /><figcaption>Figcaption goes here</figcaption></figure>

How is this achieved? Of course, using regular expressions!

use lazy_static::lazy_static;use regex::{Captures, Regex};use std::borrow::Cow;lazy_static! {    static ref BLOCK: Regex = Regex::new(        r#"(?xsm)        ^        # Opening :::        :{3}        \s+        # Parsing id type        (?P<id>\w+)        \s*        $         # Content inside        (?P<content>.+?)         # Ending :::        ^:::$        "#    )    .unwrap();}pub fn parse_fenced_blocks(s: &str) -> Cow<str> {    BLOCK.replace_all(s, |caps: &Captures| -> String {        parse_block(            caps.name("id").unwrap().as_str(),            caps.name("content").unwrap().as_str(),        )    })}fn parse_block(id: &str, content: &str) -> String {    ...}

(The image and graphic parsing part is too long, so let’s skip it for now.)

Extending pulldown-cmark

I also extended pulldown-cmark with my own conversions:

// Issue a warning during the build process if any markdown link is broken.let transformed = Parser::new_with_broken_link_callback(s, Options::all(), Some(&mut cb));// Demote headers (eg h1 -> h2), give them an "id" and an "a" tag.let transformed = TransformHeaders::new(transformed);// Convert standalone images to figures.let transformed = AutoFigures::new(transformed);// Embed raw youtube links using iframes.let transformed = EmbedYoutube::new(transformed);// Syntax highlighting.let transformed = CodeBlockSyntaxHighlight::new(transformed);let transformed = InlineCodeSyntaxHighlight::new(transformed);// Parse `{ :attr }` attributes for blockquotes, to generate asides for instance.let transformed = QuoteAttrs::new(transformed);// parse `{ .class }` attributes for tables, to allow styling for tables.let transformed = TableAttrs::new(transformed);

I have also previously attempted header demotion and embedding raw YouTube links, and it was quite simple to implement. However, in hindsight, embedding YouTube links in preprocessing or postprocessing steps might be better.

Pandoc supports adding attributes and classes to any element, which is very useful. So the following part:

![](/images/img1.png){ height=100 }

can be converted to this:

<figure>  <img src="/images/img1.png" height="100"></figure>

This functionality is used everywhere, so I decided to reimplement it in Rust, but this time in a less generic and clichéd way.

Another conflicting feature I used in Pandoc is evaluating markdown within HTML tags. The current rendering effect has issues:

<aside>My [link][link_ref]</aside>

I initially planned to implement this functionality in the general preprocessing step, but later I always forgot to reference the links. Therefore, in the following example:

::: AsideMy [link][link_ref]::: [link_ref]: /some/path

the link is no longer converted into a link, and all parsing is only completed within :::.

> Some text{ :notice }

This will call a notice parser, which will create an <aside> tag (instead of a <blockquote> tag) in the above example while retaining the parsed markdown.

Although existing crates use syntect for code highlighting, I still wrote a feature that packages it in the <code> tag to support inline code highlighting. For example, “Inside row: let x = 2;” will display as:

Performance ImprovementsI didn’t spend much time optimizing performance, but I did discover two performance points.

First, if using syntect and including custom syntax, the SyntaxSet should be compressed into binary format. The second point is to use rayon for parallel rendering. Rendering refers to the process of markdown parsing, applying templates, and creating output files. The power of Rayon lies in its efficiency during this task, which is only limited by CPU performance and is very easy to use (as long as the code structure is correct). Here’s a simplified example of rendering:

fn render(&self) -> Result<()> {    let mut items = Vec::new();     // Add posts, archives, and all other files that should be generated here.    for post in &self.content.posts {        items.push(post.as_ref());    }     // Render all items.    items        .iter()        .try_for_each(|item| self.render_item(*item))}

To achieve parallelization, we just need to change iter() to par_iter():

use rayon::iter::{IntoParallelRefIterator, ParallelIterator};items    .par_iter() // This line    .try_for_each(|item| self.render_item(*item))

It’s that simple!

I also admit that the performance improvement here is very limited; the real performance gains mainly come from the libraries I used. For example, my old site used an external pygments process written in Python for syntax highlighting, while the current alternative is a highlighter written in Rust. The latter is not only much faster but also easier to parallelize.

Robustness Checks

Maintaining my own website made me realize how easy it is to make mistakes in development projects. For instance, I might inadvertently link to a non-existent page or image, or forget to use [my link][todo] to define link references, and I always forget to update before publishing.

Therefore, in addition to testing basic functionalities like the watch command, I also parsed the entire site and checked whether all internal links exist and are correct (it will also validate the some-title part in /blog/my-post#some-title). External links also need to be checked, but I use manual commands for that.

At the beginning of the article, I listed some of my previous setup issues. Now let’s see how they were specifically resolved during the generation process. I also adopted relatively strict checking standards to avoid missing various odd errors as much as possible.

3 How Did It Turn Out?

At the beginning of the article, I listed some of the previous issues in the setup. Now let’s take a look at how they were specifically resolved.

Performance

Now, on the same low-spec laptop, a complete site rebuild (excluding compilation time) only takes 4 seconds. The performance has improved by 18 times, which is quite impressive. Of course, there is still room for improvement— for example, I use rayon for file IO, and if I adopt an asynchronous mechanism, I could optimize it further; also, I do not use a caching system, so I have to regenerate all files every time I build (but from observation, I found that the build process is quite intelligent).

Please note that I am not saying Rust is necessarily faster than Haskell; I am only comparing two specific implementations here. I believe there are definitely experts who can achieve the same speed improvements in Haskell.

Single Dependency

Now all my functionalities are implemented in Rust, and there is no need to install and maintain any external scripts/tools.

Cargo is Hassle-Free

As long as Rust is used in the system, cargo build will always be hassle-free. I think this might be one of Rust’s most outstanding advantages— the build system doesn’t cause trouble.

Everyone doesn’t need to manually search for missing dependencies, sacrifice certain sub-functions for cross-platform compatibility, or cause damage when the build system automatically pulls updates. Just lean back in your chair and wait for the code to compile.

4 Rust Cured My Mental Exhaustion

Although I found that creating series articles or next/previous links in Rust is indeed easier, I am not saying that Rust is simpler or easier to use than Haskell. What I mean is that Rust is easier for me to understand than Haskell.

The biggest difference is likely due to practical experience. I have been using Rust recently, while I have hardly interacted with Haskell since I created the website ten years ago. So if I also don’t touch Rust for ten years, it would definitely be painful to use it again.

Overall, I am very satisfied with this attempt. It was a fun and rewarding project, and although the workload exceeded my expectations, it did indeed eliminate the long-standing issues that troubled me. I hope my experience can be helpful to everyone.

Original link:

https://www.jonashietala.se/blog/2022/08/29/rewriting_my_blog_in_rust_for_fun_and_profit

Today’s article recommendation

80-Year-Old Unix Master Still Fixing AWK Code; Apple Returns to Office, Causing Anxiety in the US Tech Circle; Huawei Takes Survival as Its Main Agenda; Mastering SQL May Become a Major Advantage

80-Year-Old Unix Master Still Fixing AWK Code; Apple Returns to Office, Causing Anxiety in the US Tech Circle; Huawei Takes Survival as Its Main Agenda; Mastering SQL May Become a Major Advantage

Related posts

Leave a Comment Cancel reply