This project can be found here on my GitHub.
At FLVS we have a Coding Club which I am apart of. The club does a lot of things, but every once in a while we do club projects, where we work on something as a group or individually, then present the project. I had a lot of Ideas for this upcoming project, but as I was preparing for my ACT, specifically the Math section, I was quite annoyed that a CAS wasn’t allowed on the test. In comparison, the SAT allows one of the math sections to have a calculator and it can be a CAS calculator. CAS stands for Computer Algebra System, a piece of software that can symbolically work with expressions. I love CAS a lot, not only because it helps me with my work but also I love the idea that I can symbolically work with expressions. Basically telling a computer how to work expressions. I had no idea how to write a CAS (and I still don’t know how to write one well) but I decided that it’d be a fun idea to write one. I was reading the The Rust Programming Language by Steve Klabnik and Carol Nichols and was starting to get confident in the language, so I decided to challenge myself by writing a CAS. This CAS is written in sub par Rust code as this is only my first project in the language. This overview will assume you have a basic understanding of Rust.
I first began by understanding what a CAS is composed of by parts, and most importantly what my implementation will be like. The resources for writing a CAS are quite minimal online, and I wasn’t really in the mood to dig super deep into papers. Besides, I really wanted to create my own CAS without any sort of pre-written example code, and instead rely on an Abstract paper which goes over the ideas of a CAS vs. the implementation of a CAS. There is plenty of CAS libraries that I could have utilized to get an idea of how to write one, but I instead opted to just utilize parts of The History of the Calculus and the Development of Computer Algebra Systems, most notably Part 5 which goes over the ideas of a CAS. I haven’t studied Calculus yet so it was a bit futile reading about stuff I don’t fully understand yet.
The paper utilizes the idea of a “Syntax Tree” as the data structure to represent an equation. This is done so that there can be multiple variables and to further simplify expressions, but due to time constraints and just minimizing the complexity of the project I opted to use a Binary Tree instead, thus only allowing for one variable. The difference here is that the Syntax Tree can have more than two children node, while the Binary Tree can only have two. This implementation is far from ideal for actual usage, as basically any complex expression that would be worthwhile to compute would include more than one variable. Along with this, the CAS must work recursively. The CAS supports:
If that order sounds familiar it is because it is, it is PEMDAS, something everyone learns. The order of operations in this case must be weighted reverse though, SADMEP. In actuality though Division and Multiplication are waited the same, and so is Addition and Subtraction. So it more looks like ASMDEP, but we don't have subtraction so it finally is just AMDEP. Instead of looking for the left most expression, we would instead go for the right most. This is important because for an expression like \(x \div 3 \times 2\), if we worked it should be \(2 \times x \div 3\), but writing it as a tree and not using the ASMDEP right-left rule, it'd be like writing \(x \times (3 \div 2)\) and I don't think we want that.
What We Are Working With
The Token Enum
The Token enum is defined as:
These are what each node's `data_type` can be, but LGROUP and RGROUP end up never being on our tree. Subtraction can be removed as \(x - x\) can be defined as \(x + (-1 \times x)\) and \(-x\) can be defined as \(-1 \times x.\)
The Tree Structure
The tree structure is defined as
Box pointer puts data on the Heap. This lecture by Sami Alqadi on the CS196 at Illinois channel goes over Smart Pointers in the Context of a Linked Lists, which includes the
Box pointer. If you want to understand these Smart Pointers more I recommend that video along with Learn Rust With Entirely Too Many Linked Lists. Line 1 says that the Structure is cloneable. Each Node consists of 3 parts. One, a
data_type that is a Token (as defined above). Two, an Option of a Box of a Node. And finally three, another Option of a Box of a Node. The left and right do matter, and the tree (should) follow these rules:
- When there are two NUMs, lesser NUM goes to the left.
- NUM Values always go to the left.
- VAR Values always go to the right unless there is another VAR or an Operator as sibling.
- Operators (which I refer to as Types in the comments) always go the right when there is NUM or VAR as sibling.
Exception: EXP and DIV parents ignore these rules, as they are left vs. right child sensitive.
Steps of the Program
- Ask the user to input an expression.
- Create a vector of
&strobjects, splitting them via the white spaces.
- Convert the vector from
\nfrom the end of the expression if it exists.
- Parse the Vector into
- Iterate over the vector and use
token::tokenizerto tokenize each part of the Expression.
token::tokenizesplits up the expression into it’s parts past what the vector did (eg. splitting
- Return a Vector of
- Parse the Vector of
tree::process(token_vector)puts the vector into
token::fix_groups(token_vector)which returns the fixed vector and all of the group locations. We will get into this process in a bit as this was actually one of the more fun things to work with.
- Create a binary tree by calling the recursive function
raw_nodeis a vector created by
tree::split_locator(token_vector, group_locations). This basically splits the expression into two halves. This splits the expression down and down while also following the tree’s rules. After we reach the bottom, it just returns all the way up, giving us a full tree that obeys the rules of PEMDAS.
- Parse our binary tree (Node object) to
- ’tree::simplify_node’ is a recursive method that again splits nodes then works back up. This is a little more complex as it also matches broader expressions to a certain depth. This method is a WIP right now, as it requires a lot of pattern matching and with that, time. A lot of expressions work, but a lot more don’t.
- Return the simplified node.
- Once we have this node, we print it out. Down the line it’d be optimal to use a parser written by another Team Member that generates a picture of the tree.
I won’t be going into everything I wrote for this, that would be too long. I will go over some stuff I enjoyed writing though. The comments on most of everything is good enough to learn what is going on with minimal knowledge of Rust, so if you want to take a look at it all and laugh at how I wrote some things, go ahead.
I really enjoyed writing this one and I think I came up with a clever way of doing it. This function does as it says, it finds groups. This problem was one of the more fun ones to solve.
- Lines 5-20: Find the total number of group values and if it isn’t divisible by 2, crash. We can’t have an uneven number of parenthesis, as that suggests they aren’t matched. Technically this doesn’t stop against ())) as that is still an even number but it is at least something.
- Lines 22-28: Declare
group_locationsas a Vector of
i32tuples. Then, if the
total_groupis equal to 0, we just return an empty vector.
- Lines 30-52: We declare
right_right_valueand then search for all left values until we find a right. Once we find a right, set the
left_right_valueto equal that place, and if there is only one pair of
RGROUPvalues, push these locations to
group_locationsand return it, otherwise break.
- Lines 54-78: While our
unsorted_lgroup_locationsisn’t empty, we are going to pop the last
lgroupvalue in the Vector then push that value along with our
left_right_value. After, we iterate from the
left_right_value + 1and the
token_vectorlength, thus shifting us over to find more left values then break if you find a right value. Repeat this process until we don’t have any left values.
- Lines 80-113: Now that we have the locations, we need to sort the values via the
left_valuein each vector, small to large. This is done using a selection sort, and I won’t get into that as that is pretty basic.
node_creation() function is used to create a Node, and as a result, the entire tree. It utilizes a function called
split_locater() to further split passed in branches. The
node_creation() function relies on this heavily to further split the Nodes.
- Lines 3-8: Get values for each of the
- Lines 10-17: If the branches are empy, that means that we are dealing with a
NUMvalue, so just declare
aas that Node.
- Lines 25-44: Now that we have established we aren’t working with
NUM, we have to split this Node into left vs. right, but to do that we have to follow the rules as spoken about in Knowledge Gathering. In the case of Exponents or Division though, we can ignore these rules as they are left vs. right sensitive. So, we just declare
ahere if the
- Lines 46-54: Now that we have established we aren’t working with
data_typewe can now split the
node_creationof a split of the
left_group_locations. We do the same for
second_branch_raw, just with the
right_branch. We call these first and second as we don’t know yet if the right will be on the left, or the left and right will just stay as left and right. We also declare the processed left and right branches, but they aren’t being used right now.
- Lines 55-63: I did a little oopsie here and forgot that you can check None values in Rust, so there is some useless
boolvalues here but it is fine, I will rewrite that part of the logic later. These variables check if there is a
NUMvalue for either left and right as well as
VARvalue for either left and right.
- Lines 65-101: Now, we determine if the
second_branch_rawis composed of
VAR, or neither. We do this using match statements. We store any
NUMvalues so as to compare them later.
- Lines 103-138: This is a huge chain that basically finds left vs. right. The comments indicate what it is comparing.
- Lines 140-148: Now that we have the
right_branch_processedvalues, we can declare that as
aand then return
a. When I rewrite this, I am going to remove the entire idea of
aand instead just use
returnstatements as it will be a bit faster.
This was a really fun project, but it isn’t quite finished yet. I still have to include more pattern matching and I would love to add one variable solving in the future. Coming from just learning Rust and knowing nothing about CAS, I am quite proud of how it turned out. Once I get some of the core features down, I plan on rewriting it to be a bit nicer and efficient, as well as turning it into a library instead of a program so you can import it. I hope that this project gives some insight into CAS at a very very basic level, as well as provides some good resources to learn more about them. I will update this later when I complete the project, either here or in a new article.