Improve performance of you Rust functions by const currying

Apr 6, 2024

Currying is a functional programming technique that allows you to partially apply a function’s arguments and return a new function that takes the remaining arguments. This is widely used in functional programming languages like Haskell, as a fundamental tool for many design patterns. However, today we use the technique in Rust to improve the performance of our functions.

TL;DR: You can also take a look to the proc macro const-currying directly.

Const generics

Const generics is a feature that allows you to use constant values as type parameters. A function with const generics will be monomorphized at compile time for each constant values passed in, allowing for more compiler optimizations.

We can take a simple example to demonstrate how const generics work:

#[inline(never)]
pub fn f<const B: bool>(x: i32) -> i32 {
    if B {
        x + 1
    } else {
        -x + 1
    }
}

pub fn g(x: i32, y: i32) -> (i32, i32) {
    let x = f::<true>(x);
    let y = f::<false>(y);
    (x, y)
}

We add the #[inline(never)] attribute to prevent the compiler from inlining the function f. Here the function is simple enough to inline, but in a real-world scenario, the function might be more complex and not inlined.

Here is the generated assembly code under -C opt-level=3 option:

example::f::hc5481d4b82c18b89:
        mov     eax, 1
        sub     eax, edi
        ret

example::f::heed6faf69bf7a990:
        lea     eax, [rdi + 1]
        ret

example::g::h7afc5ee1582b3b49:
        push    rbp
        push    rbx
        push    rax
        mov     ebx, esi
        call    example::f::heed6faf69bf7a990
        mov     ebp, eax
        mov     edi, ebx
        call    example::f::hc5481d4b82c18b89
        mov     edx, eax
        mov     eax, ebp
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret

There are two symbols generated for the function f, one for f::<true> and one for f::<false>. Both are the most optimized version under the given constant value. As a comparison, if we remove the const generics and use a runtime parameter to control the branch, the generated assembly code will be:

#[inline(never)]
pub fn f(B: bool, x: i32) -> i32 {
    if B {
        x + 1
    } else {
        -x + 1
    }
}

pub fn g(x: i32, y: i32) -> (i32, i32) {
    let x = f(true, x);
    let y = f(false, y);
    (x, y)
}

example::f::h3cb1252c09b7e367:
        mov     eax, esi
        neg     eax
        test    edi, edi
        cmovne  eax, esi
        inc     eax
        ret

example::g::h7afc5ee1582b3b49:
        push    rbp
        push    r14
        push    rbx
        mov     ebx, esi
        mov     esi, edi
        mov     r14, qword ptr [rip + example::f::h3cb1252c09b7e367@GOTPCREL]
        mov     edi, 1
        call    r14
        mov     ebp, eax
        xor     edi, edi
        mov     esi, ebx
        call    r14
        mov     edx, eax
        mov     eax, ebp
        pop     rbx
        pop     r14
        pop     rbp
        ret

The function f::h3cb1252c09b7e367 has much more instructions than the two functions generated by const generics.

In current Rust version, only primitive types like numbers/bool/char can be used as constant parameters until a new feature called adt_const_params is stabilized.

Runtime parameter

Unfortunately, in real-world scenarios, we may always make the arguments configurable at runtime, which makes it impossible to utilize the performance benefits of const generics. There are usually two cases we want to optimize:

A part of the function calls is with constant values, and the other part is with runtime values.
The function is always called with a runtime configurable value, which is likely a given default value. A typical example is a LIKE expression in SQL, which accepts an optional escape character likely be \.

In the first case, we can expect the compiler to inline the calls with constant values and generate optimized code. However, if the function is too complex to inline, the performance benefits can’t be achieved. In the second case, things are even worse, the compiler know nothing about the runtime default value, and is impossible to optimize for it.

Const currying by hand

We can manually create two versions of the function, one with the const parameters and the other accept runtime parameters.

fn like_ct<const ESCAPE: char>(s: &str, p: &str) -> bool { /* complex logic */ }
fn like_rt(s: &str, p: &str, escape: char) -> bool { /* complex logic */ }

Note: like_ct::<'\\'> is a curried version of like_rt with a given escape character \, which is why I call it const currying.

We can also create a dispatch function for a highly biased calling pattern:

#[inline(always)]
fn like(s: &str, p: &str: escape: char) -> bool {
    if escape == '\\' {
        like_ct::<'\\'>(s, p)
    } else {
        like_rt(s, p, escape)
    }
}

In most cases, the function like is more efficient than like_rt. And the function should be marked as #[inline(always)], which tells the compiler to always inline the function. There are several cases:

If a constant value \ is passed in, the branch is optimized away, and the function like_ct::<'\\'> is called directly.
If a runtime value \ is passed in, the function like_ct::<'\\'> is called with a simple branch overhead.
If other runtime values are passed in (unlikely), the function like_rt is called, and produce a correct result by an unoptimized path.

const-currying-rs

In the previous section, we have to duplicate the complex logic in two functions, which is error-prone and hard to maintain. Things would be even worse if we have more than one constant parameter, such as like_ct::<'\\', true> to represent a case-sensitive LIKE expression with \ escape character, and different currying may be used in different places, which will produces at most $2^n$ functions to maintain, where $n$ is the number of constant parameters.

To solve this problem, I created the proc macro const-currying to generate the codes for you.

For a simple example:

use const_currying::const_currying;

#[const_currying]
fn f1(
    #[maybe_const(dispatch = x, consts = [0, 1])] x: i32,
    #[maybe_const(dispatch = y, consts = [true, false])] y: bool,
    z: &str,
) -> (i32, String) {
    if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}

The following codes will be generated:

#[inline(always)]
fn f1(x: i32, y: bool, z: &str) -> (i32, String) {
    match (x, y) {
        (1, false) => f1_x_y::<1, false>(z),
        (1, true) => f1_x_y::<1, true>(z),
        (0, false) => f1_x_y::<0, false>(z),
        (0, true) => f1_x_y::<0, true>(z),
        (x, false) => f1_y::<false>(x, z),
        (x, true) => f1_y::<true>(x, z),
        (1, y) => f1_x::<1>(y, z),
        (0, y) => f1_x::<0>(y, z),
        (x, y) => f1_orig(x, y, z),
    }
}
#[allow(warnings)]
fn f1_orig(x: i32, y: bool, z: &str) -> (i32, String) {
    if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
#[allow(warnings)]
fn f1_x<const x: i32>(y: bool, z: &str) -> (i32, String) {
    if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
#[allow(warnings)]
fn f1_y<const y: bool>(x: i32, z: &str) -> (i32, String) {
    if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
#[allow(warnings)]
fn f1_x_y<const x: i32, const y: bool>(z: &str) -> (i32, String) {
    if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}

Peformance benifits

“No silver bullet” is a well-known principle in software engineering. The macro is not a silver bullet, and it’s not always the best choice to use it. At least, it may heavily increase you binary size, and likely no great performance improvement if there are not many branches or allocations to eliminate. You should always profile your code before and after using the macro, and make sure the performance is improved.

In a benchmark of LIKE expression with two const parameters, calling the generated dispatch function can achieve about 5% performance improvement compared to the original function with runtime default parameters, which is not very exciting, but still worth trying in a critical path.

use const_currying::const_currying;
#[const_currying]
fn like(
    s: &str,
    p: &str,
    #[maybe_const(consts = [b'\\'])] escape: u8,
    #[maybe_const(consts = [true, false])] case_sensitive: bool,
) -> bool {
    // complex logic
}

Keyword generics?

Keyword generics is a new feature proposed by the Rust team, which allows you to use keywords like const as type parameters. Although the feature doesn’t mention the argument position constness, it’s still possible to imagine a syntax like:

fn like(
    s: &str,
    p: &str,
    const? escape: u8,
    const? case_sensitive: bool,
) -> bool {
    // complex logic
}

The compiler should guarantee that a version of the function is generated for each tuples of constant values passed in if the arguments are marked as const?. If compiler can do this automatically by analyze the calling sites, many unused branches and codes can be eliminated.

Optimized automatically by a JIT runtime

I’m not a compiler expert, but I guess the optimization is already done by some programming languages with JIT support like V8. JIT runtime can analyze the hot paths and generate optimized codes, there is no obvious difference between optimizing by a runtime type or a runtime value, it’s easy to achieve by a JIT runtime. I guess the optimization can also be done by an AOT compiler with a profile-guided optimization.

Conclusion

The const-currying macro can help you to generate codes optimized for specified values, which is useful in critical scenarios. Generally, the crate is more like a workaround or hint to the current Rust compiler. I hope the crate can be deprecated in the future version by some ease-of-use builtin features in compiler.