C++数据分析代写 | COMS W4115: Programming Assignment 2.2 (Syntax Analysis)


Course SummaryCourse Summary

Course: COMS 4115 Programming Languages and Translators (Fall 2021)

Website: https://www.rayb.info/plt4115-fall2021 University: Columbia University.

Instructor: Prof. Baishakhi Ray


Announcement Date:Announcement Date: October 26, 2021

Due Date:Due Date: Wednesday, November 8, 2021 by 11:59 PM.

Total Points:Total Points: 100

Grading BreakdownGrading Breakdown

Visualizing and Understanding ASTs:Visualizing and Understanding ASTs: 10

Task 1 (Recursive Functions):Task 1 (Recursive Functions): 40

Task 2 (Code Reformatting Tool):Task 2 (Code Reformatting Tool): 50

Assignment ObjectivesAssignment Objectives

From this assignment:

1. You will learnwill learn about the AST structure of the code.

2. You will get familiar with different AST APIs provided by LLVM/Clang.

3. You will get familiar with Clang’s ASTVisitor.

4. You will learnwill learn how to traverse an AST.

5. You will learnwill learn how to write a simple code reformatter.


You have already implemented a (albeit very simple) lexical analyzer/tokenizer that converts character sequences from a code stream to token sequences. As you learned in lecture, this token sequence is processed by the parser, and an abstract syntax tree (AST)abstract syntax tree (AST) is generated. In this assignment, you will use the clang compiler to generate an AST. Further, you will investigate two different case studies to analyze an AST.

Generating the AST (10 Points)Generating the AST (10 Points)

In Programming Assignment 2.1, you saw how to build LLVM. The Clang compiler front-end leverages the LLVM back-end infrastructure. For any code written in C , C++ , or Objective-C , you can generate (and visualize) the AST using Clang.

First, let the shell/terminal know where you built LLVM. We recommend adding the following line to your $HOME/.bashrc file:

export LLVM_HOME=<base directory where you built your llvm>; Next, try the following command:

$LLVM_HOME/build/bin/clang -cc1 -ast-dump examples/gcd.c -I /usr/include/ \

-I $LLVM_HOME/build/lib/clang/12.0.0/include/;

You should spend some time looking at the output. Please write a few sentences in the outputs/AST.txt file describing the following: * What information is being presented in the output * What you find interesting about the output

You can also generate a visual representation of the AST. Try running the following command:

export LLVM_HOME=<base directory where you built your llvm>;

$LLVM_HOME/build/bin/clang -cc1 -ast-view gcd.c -I /usr/include/ \

-I $LLVM_HOME/build/lib/clang/12.0.0/include/;

This will generate a dot file in your /tmp/ directory. Locate the created dot file and identify the filename. Run the following:

dot -Tpdf /tmp/<dot file filename> -o output/AstView.pdf

Take a look at the PDF file that is generated, and try to match that with the output generated in the previous step.

Analyzing the ASTAnalyzing the AST

The AST is not just for viewing purposes; there are many powerful things you can do with an AST. For example, you can warn developers when a variable name does

not follow a convention (

, camel casing) by traversing an AST; if you find or “visit” a variable name, you can verify whether that name matches the intended

convention. Otherwise, you can generate a warning.

In this particular assignment, you will investigate how to analyze an AST to extract important information for two different tasks.

Task 1: Recursive Functions (40 Points)Task 1: Recursive Functions (40 Points)

Recursive functions are often compact, easy to read, and easy to understand. However, they often incur huge overhead due to call stack maintenance, which is a major concern if you are developing for a resource-constrained system. Your task here is to determine whether a given function is direct recursive or not. We will describe the tools that you will use for this task in a later section.


1. int recursive_factorial(int n1){

2. if (n1 <= 1){

3. return 1;

5. return n1 * recursive_factorial (n1 – 1);

In the above C code snippet, the function recursive_factorial is a direct recursive function, since there is a call to itself inside the body (line 5). In contrast, the following function is not direct recursive; even though there is a function call at line 4, the callee is not the function itself:

1. int iterative_factorial(int n1){

2. int res = 1;

3. for(int i = 0; i <= n3; i++){

4. res = mult(res, i); // multiplication of res and i.

6. return res;

Task 2: Code Reformatting Tool (50 Points)Task 2: Code Reformatting Tool (50 Points)

We can perform a lot of interesting operations using an AST. In Task 1, you learned how to identify recursive functions. In this task, you will gain some hands-on experience on automatic code formatting by analyzing an AST.

Suppose you have a function call in your code such as the following:

Then, you will need to format it as:

foo (1, 2, 3, 5)

More specifically, you will need to format the call expression as <callee><space>(<arg1>,<space><arg2>,<space>…,<space><argn>) for all n arguments. The <space> represents a single space character ‘ ‘ . Note that the callee and/or arguments of a function may also be function calls themselves, and you will need to figure out how to reformat those nested function calls whenever they arise.


The conceptual AST structure of a CallExpr (Function Call Expression) node in an AST looks like this:


The Callee and any of the arguments (

, arg1 , arg2 , etc.) can also be CallExpr nodes. Take a look at the next examples, which illustrate (and explain) this


Note that the following code is very difficult to read:

1. int bar(int k){

3. bar ( k

6. return 0;

However, when you run your reformatter tool, the function call to foo at line 2 should be formatted as:

foo (bar (k), 1)

Now, it is definitely much easier to understand the function call!

Explanation:Explanation: in the function call at line 2, the callee is a function named foo , which takes in two arguments. One of the arguments (the first one) is also a function call that invokes bar with the argument k . The function call to bar is formatted as bar (k) , which contributes to the original foo function call’s formatting. Hence, the whole function call is reformatted as foo (bar (k), 1) .

Here is another example:

1. typedef int (*FuncPtr)(int, int);

3. int addNum(int a, int b) {

4. return a + b;

7. int mulNum(int a, int b) {

8. return a * b;

11. FuncPtr getFunc(int op) {

12. return op == 1 ? &addNum :

13. op == 2 ? &mulNum :

14. (FuncPtr)0;

15. } 16.

17. int main() {

18. int ret = getFunc(

19. 1+0 )( 5 , 6 );

20. return 0;

The reformatted code that you should generate is:

getFunc (1+0) (5, 6)

Explanation:Explanation: in line 18 of the above code, there is a function call. It is slightly more complicated than the one in the previous example.


Here, the Callee is not a function name; rather, it is another function call (a CallExpr node) to getFunc , which takes in one argument. Thus, we reformat the Callee to getFunc (1+0) , and we finally get the formatted output getFunc (1+0) (5, 6) .

Getting StartedGetting Started

To implement the above two tasks, you will build a Clang tool that uses LLVM/Clang’s RecursiveASTVisitor API. We have provided all the setup code to get started. However, we strongly recommend that you go over the API documentation of Clang tooling and AST visitors to understand the basic workflow.


1. Create a folder named clang-hw2 under $LLVM_HOME/clang/tools .

2. Copy the ClangHw2.cpp, CMakeLists.txt, hw2_util.h, and hw2_util.cpp files into $LLVM_HOME/clang/tools/clang-hw2 .

3. Edit the $LLVM_HOME/clang/tools/CmakeLists.txt file, and add this line: add_clang_subdirectory(clang-hw2) .

4. Now, go to $LLVM_HOME/build , and run make . When the build has successfully finished, it will generate a binary file named clang-hw2 in $LLVM_HOME/build/bin .

5. Finally, run the generated binary using the following command: $LLVM_HOME/build/bin/clang-hw2 examples/gcd.c —

About the CodeAbout the Code

The FunctionVisitor class is a recursive AST visitor, which implements three visitors for two different types of AST nodes. The VisitForStmt is called when Clang’s ASTVisitor encounters a ForStmt type of AST node. You DO NOTDO NOT have to do anything with this function; we are providing it to give you a head start with ASTVisitor. The VisitFunctionDecl function is called when a FunctionDecl (function declaration) node is encountered.

Here are some other notes about the tasks:

Task 1Task 1

We implemented VisitFunctionDecl , which calls the helper function isRecursiveFunction and decides whether that function is direct recursive or not. All youAll you

have to do is implement this have to do is implement this isRecursiveFunctionisRecursiveFunction function function.

You may consider the following constraint for Task 1:

We will only test C code inputs. You DO NOTDO NOT need to handle function calls in C++ or C++-specific functionality (including operator overloading or user-defined literals, etc.).

When you have fully implemented the first task and have run the tool with gcd.c , you will see the following output:

gcd_recursive – recursive

Task 2Task 2

From the VisitFunctionDecl function, we call analyzeCallExpressionReformat to perform a depth-first search (DFS) on the AST. While performing DFS, if we encounter any CallExpr node, we call the formatFunctionCall function for formatting the code of that call expression. Note that, you DON’TDON’T have to identify call

expressions in a given code snippet. We have already implemented that for you in this function. All you have to do is implement the All you have to do is implement the formatFunctionCallformatFunctionCall function function and return the formatted code string.and return the formatted code string.

You may consider the following constraints for Task 2: * You have to reformat only the CallExpr node. If you encounter any other node (for instance, 1+0 in line 19 of the second example is a BinaryOperator node ), you should copy the code as is from the input source. We have provided a helper function getSource to copy

the input code corresponding to a node. * The callee or arguments of a function call will be either a pure function call or a pure non-function call,

, there will not be a

mixture of functions and non-functions involved in binary expressions, conditional expressions, etc. As an example, we will NOTNOT test the following case:

foo(bar(3) + 1, 9 + bar(6))

We will only test C code inputs. You DO NOTDO NOT need to handle function calls in C++ or C++-specific functionality (including operator overloading or user-defined literals, etc.).

This problem may look like a simple character

” p a r s i n g a n d f o r m a t t i n g ”

problem, but you MUSTMUST use the template code we provided. Please DO NOTDO NOT change

function prototypes of any of the functions we have written.

We have also provided some other helper functions. SubmissionSubmission

Please follow these steps for submission:

1. Copy the completed ClangHw2.cpp file from your $LLVM_HOME/clang/tools/clang-hw2/ directory to the project’s src folder.

2. Complete the write-up in outputs/AST.txt.

3. Commit your code.

4. Push the code to the master branch. DisclaimerDisclaimer

This assignment belongs to Columbia University. It may be freely used for educational purposes.