otesunki said something like that)
egg is a IR pattern matching and transformation library that uses E-Graphs to magically solve (almost) all problems regarding IR pattern matching.
Even though egg solves most problems, I still recommend using a pattern matching dialect, especially in multi-level compilers, to be more flexible, and have more future-proof pattern matching, if for example there suddenly is a better alternative to egg, or you decide that you want to match some complex patterns manually.
A problem with using egg in combination with pattern matching dialects is that it might be harder to write the code to convert the pattern matching dialect to egg patterns.
Additionally, some more complex transformations (not the matches) can't always be directly converted to egg transformations (depending on the feature set provided by the transformation dialect), but that can be solved by using a custom egg::Applier
More Advantages of Structured Pattern Matching
Smart Pattern Matchers
Instead of brute-forcing all peephole optimizations (of which there can be a LOT in advanced compilers), the compiler can organize all the patterns to provide more efficient matching. I didn't yet investigate how to do this. If you have any ideas regarding this, please contact me.
There are other ways to speed up the pattern matching and rewrite process using this too.
x & (1 << b)
, but compilers tend to have a high-level bit test operation (with exceptions). A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations, but also some target architectures have a bit test operation, that is more optimal.x & (1 << b) != 0
, and matches for that in compiler passes that expect bit test operations.x & (1 << b)
(bit test) example. Optimizing compilers should be able to detect that, and other bit test patterns (like x & (1 << b) > 0
), and then replace those with a bit-test operation. But they also have to be able to convert bit-test operations back to their implementation for compilation targets that don't have a bit-test instruction. Currently, compiler backends do this by having separate patterns for converting bit-test to it's dedicated operation, and back.
A better solution is to associate a set of implementations with the bit test operation, and make the compiler automatically reverse those to generate the best implementation (in the instruction selector for example).
Implementing pattern/transformation reversion can be challenging however, but it provides many benefits, and all "big" compilers should definitely do this, in my opinion.
Runtime Library
Compilers typically come with a runtime library that implement more complex operations that aren't supported by most processors or architectures.
The implementation of those functions should also use that pattern matching dialect. This allows your backend to detect code written by users with a similar implementation as in the runtime library, giving you some additional optimizations for free.
I don't think any compiler currently does this either.
Problems with Pattern Matching
The main problem is ordering the patterns.
As an example, consider these three patterns:
;; A
(add x:Const y) => (add y x)
;; B
(sub (add x y:Const) z:Const) => (lea x y (const_neg z))
;; C
(add x 1) => (inc x)
Now what should the compiler do when it sees this:
(sub (add 5 1) 2)
All three patterns would match:
;; apply A
(sub (add 5 1) 2) => (sub (add 1 5) 2)
;; only B applies now
(sub (add 1 5) 2) => (lea 1 5 (const_neg 2))
;; nothing applies anymore
;; alternatively apply B
(sub (add 5 1) 2) => (lea 5 1 (const_neg 2))
;; nothing applies anymore
;; atlernatively apply C
(sub (add 5 1) 2) => (sub (inc 5) 2)
;; nothing applies anymore
Now which of those transformations should be performed?
This is not as easy to solve as it seems, especially in the context of instruction selection (specifically scheduling), where the performance on processors depends on a sequence of instructions, instead of just a single instruction.
Superscalar CPUs
Modern processor architecture features like superscalar execution make this even more complicated.
As a simple, unrealistic example, let's imagine a CPU (core) that has one bit operations execution unit, and two ALU execution units / ports.
This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time.
One might think that always optimizing a & (1 << b)
to a bit test operation is good for performance. But in this example, that is not the case.
If we have a function that does a lot of bitwise operations next to each other, and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit, which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time.
This shows that we won't know if an optimization is actually good, until we are at a late point in the compilation process where we can simulate the CPU's instruction scheduling.
This does not only apply to instruction selection, but also to more higher-level optimizations, such as loop and control flow related optimizations.
Conclusion
One can see how pattern matching dialects are the best option to approach pattern matching.
Someone wanted me to insert a takeaway here, but I won't.
PS: I'll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article.