Notes on Rewriting Systems, Part 1
I read about the Pocket Rewriting zine by Devine Lu Linvega the other day and decided it was interesting enough to implement it on my own. Of course as per usual, my interest is in a vastly different direction: have you ever heard about CLIPS? It's classified as an expert system engine, but the language should be Turing-complete and people have been using it like a "normal" programming language as well (there's also this book titled Adventures in Rule-Based Programming which uses CLIPS to write a text adventure game. I bought the book but I kinda regret it - it did serve as a nice introduction to CLIPS, but that's pretty much about it). Multiset rewriting systems are also used to model and verify security protocols (e.g. in tools like Tamarin), which I can't help but feel a little bit of familiarity because that's exactly what my failed graduation project was about…
A demo
I don't like Fractran for the simple reason of it using fractions which is one more layer of indirection, so I decided to directly work on symbols instead. This somehow took me an embarassing long time to write:
#lang racket/base (define (match-rule-head facts-pool rule-head) (for/and ([(rk rv) (in-hash rule-head)]) (and (hash-has-key? facts-pool rk) (>= (hash-ref facts-pool rk) rv)))) (define (apply-effect! acc r) (for ([(rk rv) (in-hash r)]) (let ((new-val (+ (hash-ref acc rk 0) rv))) (if (= new-val 0) (hash-remove! acc rk) (hash-set! acc rk new-val))))) (define (negate-effect r) (make-hash (for/list ([(rk rv) (in-hash r)]) (cons rk (- rv))))) (define (match-and-apply-rule facts-pool rule) (let ((x facts-pool) (rule-head (car rule))) (if (not (match-rule-head facts-pool rule-head)) #f (let ((rule-head-effect (negate-effect rule-head)) (rule-tail-effect (cdr rule))) (apply-effect! facts-pool rule-head-effect) (apply-effect! facts-pool rule-tail-effect) facts-pool)))) (define (run p) (define (check-and-run facts-pool program-acc program) (if (null? program-acc) facts-pool (let ((first-rule (car program-acc))) (let ((matchres (match-and-apply-rule facts-pool first-rule))) (if (not matchres) (check-and-run facts-pool (cdr program-acc) program) (check-and-run matchres program program)))))) (let ((program (car p)) (facts (cdr p))) (check-and-run facts program program))) (run (cons (list '(#hash((flour . 1) (sugar . 1) (apples . 1)) . #hash((apple-cake . 1))) '(#hash((apples . 1) (oranges . 1) (cherries . 1)) . #hash((fruit-salad . 1))) '(#hash((fruit-salad . 1) (apple-cake . 1)) . #hash((fruit-cake . 1)))) (make-hash '((sugar . 1) (oranges . 1) (apples . 2) (cherries . 1) (flour . 1))))) ;; '#hash((fruit-cake . 1))
I then began to think about how to compile a program in this language into code. Consider this: it is possible to produce a program that's majorly written in CLIPS, but the way you do this is to (1) generate a dump of your CLIPS program; (2) include the whole CLIPS implementation in your program (which is a thing you can do freely since CLIPS is in public domain) and (3) call the implementation with your CLIPS program dump in C. I don't like how I'm forced to include the whole CLIPS even if I only need a small part of it, so I think it would be nice if we don't have to do that.
Compiling these kind of programs is simple. One could easily make the observation that multisets are also sets of registers, and all the possible objects must have already been mentioned somewhere in the program, thus the number of registers can be inferred at compile-time. For example, for the following program:
flour, sugar, apples ==> apple-cake; apples, oranges, cherries ==> fruit-salad; fruit-salad, apple-cake ==> fruit-cake; [ sugar, oranges, apples, apples, cherries, flour ]
We first know the number of registers needed:
enum registers_enum { flour, sugar, apples, appleCake, oranges, cherries, fruitSalad, fruitCake, __ENUM_LIMIT }; char* register_names[8] = { "flour", "sugar", "apples", "apple-cake", "oranges", "cherries", "fruit-salad", "fruit-cake" }; long int r[8];
And then we can compile the initial state and the rules:
r[sugar] = 1; r[oranges] = 1; r[apples] = 2; r[cherries] = 1; r[flour] = 1; for (;;) { if (r[flour] >= 1 && r[sugar] >= 1 && r[apples] >= 1) { r[flour] -= 1; r[sugar] -= 1; r[apples] -= 1; r[appleCake] += 1; continue; } if (r[apples] >= 1 && r[oranges] >= 1 && r[cherries] >= 1) { r[apples] -= 1; r[oranges] -= 1; r[cherries] -= 1; r[fruitSalad] += 1; continue; } if (r[fruitSalad] >= 1 && r[appleCake] >= 1) { r[fruitSalad] -= 1; r[appleCake] -= 1; r[fruitCake] += 1; continue; } break; }
From this point forward you can do a lot of different things, including making it into a standalone program with a template like this:
#include <stdio.h> // enums... void report_state(enum registers_enum e) { printf("%s: %ld\n", register_names[e], r[e]); } int main() { // rules... for (int i = 0; i < __ENUM_LIMIT; i++) { if (r[i] > 0) { report_state(i); } } return 0; }
I added a very rudimentary parser to it; you can get the code here: https://codeberg.org/bctnry/lang3/src/commit/904e4f87acf399b8f0c739e4015a3231d7752629/main_demo.rkt.
This of course has the problem of only supporting the rewriting of simple symbols. If we extend the possible form of facts to support representing "compound" facts one would naturally want to be able to have variables in rules, e.g. expecting something like this:
path(?A, ?B), path(?B, ?C) ==> path(?A, ?B), path(?B, ?C), path(?A, ?C); [ path(dublin, dunleary), path(howth, dublin), path(malahide, dublin) ]
outputs path(howth,dunleary)
and path(malahide,dunleary)
, which would require one to be able to compile pattern matching and thus opening another whole can of worms…
Rete algorithm
CLIPS was said to use something called the Rete algorithm where rules are stored as some form of branch-intertwined tree; facts are funnelled down from the tree top whenever they're added to the fact database, and when a leaf node in the tree is reached it means that the full precondition of a rule is satisfied and that rule can be executed. I'm sure there's some "tree-based virtual machine" to be made and any set of rules can thus be compiled into programs for this virtual machine like how some regex engines compile regexes into programs for some specialized virtual machines, but unfortunately I don't have the time to properly figure this out…
Other similar things
"Multiset" means one does not care about the order of facts when matching for rules for execution. If we do care about the order the symbols are in, we've got Markov algorithm. If we incorporate randomness into it, we've got something similar to MarkovJunior. If we care about the order and have context-free rules we can have something similar to L-system, which is (most commonly) used to model plants.
Last update: 2024.11.2