Improve performance of you Rust functions by const currying
Currying is a functional programming technique that allows you to partially apply a function’s arguments and return a new function that takes the remaining arguments. This is widely used in functional programming languages like Haskell, as a fundamental tool for many design patterns. However, today we use the technique in Rust to improve the performance of our functions.
TL;DR: You can also take a look to the proc macro const-currying directly.
Const generics
Const generics is a feature that allows you to use constant values as type parameters. A function with const generics will be monomorphized at compile time for each constant values passed in, allowing for more compiler optimizations.
We can take a simple example to demonstrate how const generics work:
#[inline(never)]
pub fn f<const B: bool>(x: i32) -> i32 {
if B {
x + 1
} else {
-x + 1
}
}
pub fn g(x: i32, y: i32) -> (i32, i32) {
let x = f::<true>(x);
let y = f::<false>(y);
(x, y)
}
We add the #[inline(never)]
attribute to prevent the compiler from inlining the function f
. Here the function is simple enough to inline, but in a real-world scenario, the function might be more complex and not inlined.
Here is the generated assembly code under -C opt-level=3
option:
example::f::hc5481d4b82c18b89:
mov eax, 1
sub eax, edi
ret
example::f::heed6faf69bf7a990:
lea eax, [rdi + 1]
ret
example::g::h7afc5ee1582b3b49:
push rbp
push rbx
push rax
mov ebx, esi
call example::f::heed6faf69bf7a990
mov ebp, eax
mov edi, ebx
call example::f::hc5481d4b82c18b89
mov edx, eax
mov eax, ebp
add rsp, 8
pop rbx
pop rbp
ret
There are two symbols generated for the function f
, one for f::<true>
and one for f::<false>
. Both are the most optimized version under the given constant value. As a comparison, if we remove the const generics and use a runtime parameter to control the branch, the generated assembly code will be:
#[inline(never)]
pub fn f(B: bool, x: i32) -> i32 {
if B {
x + 1
} else {
-x + 1
}
}
pub fn g(x: i32, y: i32) -> (i32, i32) {
let x = f(true, x);
let y = f(false, y);
(x, y)
}
example::f::h3cb1252c09b7e367:
mov eax, esi
neg eax
test edi, edi
cmovne eax, esi
inc eax
ret
example::g::h7afc5ee1582b3b49:
push rbp
push r14
push rbx
mov ebx, esi
mov esi, edi
mov r14, qword ptr [rip + example::f::h3cb1252c09b7e367@GOTPCREL]
mov edi, 1
call r14
mov ebp, eax
xor edi, edi
mov esi, ebx
call r14
mov edx, eax
mov eax, ebp
pop rbx
pop r14
pop rbp
ret
The function f::h3cb1252c09b7e367
has much more instructions than the two functions generated by const generics.
In current Rust version, only primitive types like numbers/bool/char can be used as constant parameters until a new feature called adt_const_params is stabilized.
Runtime parameter
Unfortunately, in real-world scenarios, we may always make the arguments configurable at runtime, which makes it impossible to utilize the performance benefits of const generics. There are usually two cases we want to optimize:
- A part of the function calls is with constant values, and the other part is with runtime values.
- The function is always called with a runtime configurable value, which is likely a given default value. A typical example is a
LIKE
expression in SQL, which accepts an optional escape character likely be\
.
In the first case, we can expect the compiler to inline the calls with constant values and generate optimized code. However, if the function is too complex to inline, the performance benefits can’t be achieved. In the second case, things are even worse, the compiler know nothing about the runtime default value, and is impossible to optimize for it.
Const currying by hand
We can manually create two versions of the function, one with the const parameters and the other accept runtime parameters.
fn like_ct<const ESCAPE: char>(s: &str, p: &str) -> bool { /* complex logic */ }
fn like_rt(s: &str, p: &str, escape: char) -> bool { /* complex logic */ }
Note: like_ct::<'\\'>
is a curried version of like_rt
with a given escape character \
, which is why I call it const currying.
We can also create a dispatch function for a highly biased calling pattern:
#[inline(always)]
fn like(s: &str, p: &str: escape: char) -> bool {
if escape == '\\' {
like_ct::<'\\'>(s, p)
} else {
like_rt(s, p, escape)
}
}
In most cases, the function like
is more efficient than like_rt
. And the function should be marked as #[inline(always)]
, which tells the compiler to always inline the function. There are several cases:
- If a constant value
\
is passed in, the branch is optimized away, and the functionlike_ct::<'\\'>
is called directly. - If a runtime value
\
is passed in, the functionlike_ct::<'\\'>
is called with a simple branch overhead. - If other runtime values are passed in (unlikely), the function
like_rt
is called, and produce a correct result by an unoptimized path.
const-currying-rs
In the previous section, we have to duplicate the complex logic in two functions, which is error-prone and hard to maintain. Things would be even worse if we have more than one constant parameter, such as like_ct::<'\\', true>
to represent a case-sensitive LIKE expression with \
escape character, and different currying may be used in different places, which will produces at most $2^n$ functions to maintain, where $n$ is the number of constant parameters.
To solve this problem, I created the proc macro const-currying to generate the codes for you.
For a simple example:
use const_currying::const_currying;
#[const_currying]
fn f1(
#[maybe_const(dispatch = x, consts = [0, 1])] x: i32,
#[maybe_const(dispatch = y, consts = [true, false])] y: bool,
z: &str,
) -> (i32, String) {
if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
The following codes will be generated:
#[inline(always)]
fn f1(x: i32, y: bool, z: &str) -> (i32, String) {
match (x, y) {
(1, false) => f1_x_y::<1, false>(z),
(1, true) => f1_x_y::<1, true>(z),
(0, false) => f1_x_y::<0, false>(z),
(0, true) => f1_x_y::<0, true>(z),
(x, false) => f1_y::<false>(x, z),
(x, true) => f1_y::<true>(x, z),
(1, y) => f1_x::<1>(y, z),
(0, y) => f1_x::<0>(y, z),
(x, y) => f1_orig(x, y, z),
}
}
#[allow(warnings)]
fn f1_orig(x: i32, y: bool, z: &str) -> (i32, String) {
if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
#[allow(warnings)]
fn f1_x<const x: i32>(y: bool, z: &str) -> (i32, String) {
if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
#[allow(warnings)]
fn f1_y<const y: bool>(x: i32, z: &str) -> (i32, String) {
if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
#[allow(warnings)]
fn f1_x_y<const x: i32, const y: bool>(z: &str) -> (i32, String) {
if y { (x, z.to_string()) } else { (-x, z.chars().rev().collect()) }
}
Peformance benifits
“No silver bullet” is a well-known principle in software engineering. The macro is not a silver bullet, and it’s not always the best choice to use it. At least, it may heavily increase you binary size, and likely no great performance improvement if there are not many branches or allocations to eliminate. You should always profile your code before and after using the macro, and make sure the performance is improved.
In a benchmark of LIKE
expression with two const parameters, calling the generated dispatch function can achieve about 5% performance improvement compared to the original function with runtime default parameters, which is not very exciting, but still worth trying in a critical path.
use const_currying::const_currying;
#[const_currying]
fn like(
s: &str,
p: &str,
#[maybe_const(consts = [b'\\'])] escape: u8,
#[maybe_const(consts = [true, false])] case_sensitive: bool,
) -> bool {
// complex logic
}
Keyword generics?
Keyword generics is a new feature proposed by the Rust team, which allows you to use keywords like const
as type parameters. Although the feature doesn’t mention the argument position constness, it’s still possible to imagine a syntax like:
fn like(
s: &str,
p: &str,
const? escape: u8,
const? case_sensitive: bool,
) -> bool {
// complex logic
}
The compiler should guarantee that a version of the function is generated for each tuples of constant values passed in if the arguments are marked as const?
. If compiler can do this automatically by analyze the calling sites, many unused branches and codes can be eliminated.
Optimized automatically by a JIT runtime
I’m not a compiler expert, but I guess the optimization is already done by some programming languages with JIT support like V8. JIT runtime can analyze the hot paths and generate optimized codes, there is no obvious difference between optimizing by a runtime type or a runtime value, it’s easy to achieve by a JIT runtime. I guess the optimization can also be done by an AOT compiler with a profile-guided optimization.
Conclusion
The const-currying macro can help you to generate codes optimized for specified values, which is useful in critical scenarios. Generally, the crate is more like a workaround or hint to the current Rust compiler. I hope the crate can be deprecated in the future version by some ease-of-use builtin features in compiler.