Introduction
This project can be found here on my GitHub.
At FLVS we have a Coding Club which I am apart of. The club does a lot of things, but every once in a while we do club projects, where we work on something as a group or individually, then present the project. I had a lot of Ideas for this upcoming project, but as I was preparing for my ACT, specifically the Math section, I was quite annoyed that a CAS wasn’t allowed on the test. In comparison, the SAT allows one of the math sections to have a calculator and it can be a CAS calculator. CAS stands for Computer Algebra System, a piece of software that can symbolically work with expressions. I love CAS a lot, not only because it helps me with my work but also I love the idea that I can symbolically work with expressions. Basically telling a computer how to work expressions. I had no idea how to write a CAS (and I still don’t know how to write one well) but I decided that it’d be a fun idea to write one. I was reading the The Rust Programming Language by Steve Klabnik and Carol Nichols and was starting to get confident in the language, so I decided to challenge myself by writing a CAS. This CAS is written in sub par Rust code as this is only my first project in the language. This overview will assume you have a basic understanding of Rust.
Knowledge Gathering
I first began by understanding what a CAS is composed of by parts, and most importantly what my implementation will be like. The resources for writing a CAS are quite minimal online, and I wasn’t really in the mood to dig super deep into papers. Besides, I really wanted to create my own CAS without any sort of pre-written example code, and instead rely on an Abstract paper which goes over the ideas of a CAS vs. the implementation of a CAS. There is plenty of CAS libraries that I could have utilized to get an idea of how to write one, but I instead opted to just utilize parts of The History of the Calculus and the Development of Computer Algebra Systems, most notably Part 5 which goes over the ideas of a CAS. I haven’t studied Calculus yet so it was a bit futile reading about stuff I don’t fully understand yet.
The paper utilizes the idea of a “Syntax Tree” as the data structure to represent an equation. This is done so that there can be multiple variables and to further simplify expressions, but due to time constraints and just minimizing the complexity of the project I opted to use a Binary Tree instead, thus only allowing for one variable. The difference here is that the Syntax Tree can have more than two children node, while the Binary Tree can only have two. This implementation is far from ideal for actual usage, as basically any complex expression that would be worthwhile to compute would include more than one variable. Along with this, the CAS must work recursively. The CAS supports:
- Parenthesis
- Exponents
- Multiplication
- Division
- Addition
- Subtraction
If that order sounds familiar it is because it is, it is PEMDAS, something everyone learns. The order of operations in this case must be weighted reverse though, SADMEP. In actuality though Division and Multiplication are waited the same, and so is Addition and Subtraction. So it more looks like ASMDEP, but we don't have subtraction so it finally is just AMDEP. Instead of looking for the left most expression, we would instead go for the right most. This is important because for an expression like \(x \div 3 \times 2\), if we worked it should be \(2 \times x \div 3\), but writing it as a tree and not using the ASMDEP right-left rule, it'd be like writing \(x \times (3 \div 2)\) and I don't think we want that.
What We Are Working With
The Token Enum
The Token enum is defined as:
|
|
These are what each node's `data_type` can be, but LGROUP and RGROUP end up never being on our tree. Subtraction can be removed as \(x - x\) can be defined as \(x + (-1 \times x)\) and \(-x\) can be defined as \(-1 \times x.\)
The Tree Structure
The tree structure is defined as
|
|
The Box
pointer puts data on the Heap. This lecture by Sami Alqadi on the CS196 at Illinois channel goes over Smart Pointers in the Context of a Linked Lists, which includes the Box
pointer. If you want to understand these Smart Pointers more I recommend that video along with Learn Rust With Entirely Too Many Linked Lists. Line 1 says that the Structure is cloneable. Each Node consists of 3 parts. One, a data_type
that is a Token (as defined above). Two, an Option of a Box of a Node. And finally three, another Option of a Box of a Node. The left and right do matter, and the tree (should) follow these rules:
- When there are two NUMs, lesser NUM goes to the left.
- NUM Values always go to the left.
- VAR Values always go to the right unless there is another VAR or an Operator as sibling.
- Operators (which I refer to as Types in the comments) always go the right when there is NUM or VAR as sibling.
Exception: EXP and DIV parents ignore these rules, as they are left vs. right child sensitive.
Steps of the Program
- Ask the user to input an expression.
- Create a vector of
&str
objects, splitting them via the white spaces. - Convert the vector from
&str
toString
. - Remove
\n
from the end of the expression if it exists. - Parse the Vector into
token::tokenize(string_vector)
. - Iterate over the vector and use
token::tokenizer
to tokenize each part of the Expression.token::tokenize
splits up the expression into it’s parts past what the vector did (eg. splitting(3
into(,3
.) - Return a Vector of
token::Token
. - Parse the Vector of
token::Token
intotree::process(token_vector)
. tree::process(token_vector)
puts the vector intotoken::fix_groups(token_vector)
which returns the fixed vector and all of the group locations. We will get into this process in a bit as this was actually one of the more fun things to work with.- Create a binary tree by calling the recursive function
tree::node_creation(raw_node)
. Whereraw_node
is a vector created bytree::split_locator(token_vector, group_locations)
. This basically splits the expression into two halves. This splits the expression down and down while also following the tree’s rules. After we reach the bottom, it just returns all the way up, giving us a full tree that obeys the rules of PEMDAS. - Parse our binary tree (Node object) to
tree::simplify_node(node)
. - ’tree::simplify_node’ is a recursive method that again splits nodes then works back up. This is a little more complex as it also matches broader expressions to a certain depth. This method is a WIP right now, as it requires a lot of pattern matching and with that, time. A lot of expressions work, but a lot more don’t.
- Return the simplified node.
- Once we have this node, we print it out. Down the line it’d be optimal to use a parser written by another Team Member that generates a picture of the tree.
Algorithm Breakdown
I won’t be going into everything I wrote for this, that would be too long. I will go over some stuff I enjoyed writing though. The comments on most of everything is good enough to learn what is going on with minimal knowledge of Rust, so if you want to take a look at it all and laugh at how I wrote some things, go ahead.
find_groups()
I really enjoyed writing this one and I think I came up with a clever way of doing it. This function does as it says, it finds groups. This problem was one of the more fun ones to solve.
Full Implementation:
|
|
- Lines 5-20: Find the total number of group values and if it isn’t divisible by 2, crash. We can’t have an uneven number of parenthesis, as that suggests they aren’t matched. Technically this doesn’t stop against ())) as that is still an even number but it is at least something.
- Lines 22-28: Declare
group_locations
as a Vector ofi32
tuples. Then, if thetotal_group
is equal to 0, we just return an empty vector. - Lines 30-52: We declare
unsorted_lgroup_locations
,left_right_value
andright_right_value
and then search for all left values until we find a right. Once we find a right, set theleft_right_value
to equal that place, and if there is only one pair ofLGROUP
andRGROUP
values, push these locations togroup_locations
and return it, otherwise break. - Lines 54-78: While our
unsorted_lgroup_locations
isn’t empty, we are going to pop the lastlgroup
value in the Vector then push that value along with ourleft_right_value
. After, we iterate from theleft_right_value + 1
and thetoken_vector
length, thus shifting us over to find more left values then break if you find a right value. Repeat this process until we don’t have any left values. - Lines 80-113: Now that we have the locations, we need to sort the values via the
left_value
in each vector, small to large. This is done using a selection sort, and I won’t get into that as that is pretty basic.
node_creation()
The recursive node_creation()
function is used to create a Node, and as a result, the entire tree. It utilizes a function called split_locater()
to further split passed in branches. The node_creation()
function relies on this heavily to further split the Nodes.
Full Implementation:
|
|
- Lines 3-8: Get values for each of the
raw_node
. - Lines 10-17: If the branches are empy, that means that we are dealing with a
VAR
orNUM
value, so just declarea
as that Node. - Lines 25-44: Now that we have established we aren’t working with
VAR
orNUM
, we have to split this Node into left vs. right, but to do that we have to follow the rules as spoken about in Knowledge Gathering. In the case of Exponents or Division though, we can ignore these rules as they are left vs. right sensitive. So, we just declarea
here if thedata_type
isEXP
orDIV
. - Lines 46-54: Now that we have established we aren’t working with
DIV
orEXP
as ourdata_type
we can now split thefirst_branch_raw
as anode_creation
of a split of theleft_branch
and it’sleft_group_locations
. We do the same forsecond_branch_raw
, just with theright_branch
. We call these first and second as we don’t know yet if the right will be on the left, or the left and right will just stay as left and right. We also declare the processed left and right branches, but they aren’t being used right now. - Lines 55-63: I did a little oopsie here and forgot that you can check None values in Rust, so there is some useless
bool
values here but it is fine, I will rewrite that part of the logic later. These variables check if there is aNUM
value for either left and right as well asVAR
value for either left and right. - Lines 65-101: Now, we determine if the
first_branch_raw
andsecond_branch_raw
is composed ofNUM
,VAR
, or neither. We do this using match statements. We store anyNUM
values so as to compare them later. - Lines 103-138: This is a huge chain that basically finds left vs. right. The comments indicate what it is comparing.
- Lines 140-148: Now that we have the
left_branch_process
andright_branch_processed
values, we can declare that asa
and then returna
. When I rewrite this, I am going to remove the entire idea ofa
and instead just usereturn
statements as it will be a bit faster.
Conclusion
This was a really fun project, but it isn’t quite finished yet. I still have to include more pattern matching and I would love to add one variable solving in the future. Coming from just learning Rust and knowing nothing about CAS, I am quite proud of how it turned out. Once I get some of the core features down, I plan on rewriting it to be a bit nicer and efficient, as well as turning it into a library instead of a program so you can import it. I hope that this project gives some insight into CAS at a very very basic level, as well as provides some good resources to learn more about them. I will update this later when I complete the project, either here or in a new article.