banner
飞天御剑流

飞天御剑流

前端,jAVAsCRIPT
github

Frontend Compilation Principles the-super-tiny-compiler

Update After Four Months#

In fact, the recursive step behind is just using the reference data type feature of JS to simulate the pointer functionality similar to the C language 😐. It simply passes the pointer and continuously generates a data structure similar to a linked list, fills in the data, and finally obtains the target code... that's it... already.

At the beginning#

the-super-tiny-compiler is a compiler written in JavaScript on GitHub. The code comments refer to it as possibly the smallest compiler that can convert a Lisp-style language into a C-style language.

This compiler project can be said to be small but complete.

However, when I was reading the source code of the compiler, I was puzzled by the recursive process of generating a new AST.

So from now on, I will analyze the source code.

In simple terms, the principle of the compiler is as follows:
Lexical analysis
Syntax analysis (generate AST)
Convert oldAst to newAst
Finally, generate the target language syntax of the generated newAst
Using (add 2 (subtract 4 2)) as input, the resulting AST structure is as follows:

const ast = {
  type: "Program",
  body: [
    {
      type: "CallExpression",
      name: "add",
      params: [
        {
          type: "NumberLiteral",
          value: "2",
        },
        {
          type: "CallExpression",
          name: "subtract",
          params: [
            {
              type: "NumberLiteral",
              value: "4",
            },
            {
              type: "NumberLiteral",
              value: "2",
            },
          ],
        },
      ],
    },
  ],
};

After obtaining this structure, the following function is executed:


function transformer(ast) {
  let newAst = {
    type: "Program",
    body: []
  }

  ast._context = newAst.body
}

A new property is added to the ast object, and this property is pointed to the body property of newAst. Then, the following code is executed:

function transformer(ast) {
  let newAst = {
    type: "Program",
    body: []
  }

// Changing the properties of ast can directly affect newAst 
  ast._context = newAst.body

  traverser(ast, {
    // Handle numbers
    NumberLiteral: {
      enter(node, parent) {
        parent._context.push({
          type: 'NumberLiteral',
          value: node.value
        })
      }
    },
    // Handle strings
    StringLiteral: {
      enter(node, parent) {
        parent._context.push({
          type: 'StringLiteral',
          value: node.value,
        });
      },
    },

    CallExpression: {
      enter(node, parent) {
        let expression = {
          type: 'CallExpression',
          callee: {
            type: 'Identifier',
            name: node.name
          },
          arguments: []
        }

        node._context = expression.arguments;

        if (parent.type !== "CallExpression") {
          expression = {
            type: 'ExpressionStatement',
            expression
          }
        }

        parent._context.push(expression)
      }
    }


  })

  return newAst
}

As can be seen, the data changes of newAst are completed by executing only one traverser function. The function takes the ast just now as a parameter, and copies the body in newAst according to different types based on the behavior of traversing.

The internal of this function is as follows:

function traverser(ast, visitor) {

  function traverseArray(array, parent) {
    
  }

  function traverseNode(node, parent) {
    // Check if the passed-in node has the corresponding property
    let methods = visitor[node.type]

    // If it does, assign it to the body of its parent node
    if (methods && methods.enter) {
      methods.enter(node, parent)
    }

    // Then separately process the properties not included in the visitor
    switch (node.type) {
      // First execute the traversal of the body of the node
      case "Program":
        traverseArray(node.body, node);
        break;

      case "CallExpression":
        traverseArray(node.params, node)
        break;

      case "NumberLiteral":
      case "StringLiteral":
        break;

      default:
        throw new TypeError(node.type)
    }
  }

  traverseNode(ast, null)

}

When executing it, the main line is traverser -> traverseNode ->
ast is passed in as the node parameter, and the function is recursively called during the execution process.
When executed for the first time, methods is undefined, and then it enters the switch statement.
For the first time, the ast type is Program, and the body of ast is passed as a parameter and then break;
At this point, the main line of the function has been executed.
The subsequent execution flow is a pseudo-recursion or nested call to ast.body. Each time the function determines whether to generate a new ast based on its passed-in tree parameter according to the expression and parameters.

It is worth mentioning that when generating newAst, there is a statement like this:
node._context = expression.arguments;
It points the _context property of the passed-in node to a property under the current object, achieving the effect of reference. At this time, when using the visitor to traverse the syntax tree, no matter what object is passed in, because the reference relationship of the memory address has been built in the upper layer, it is only necessary to add to parent._context property.

Finally, the generated newAst is converted into our target language. There is nothing special about this. Just recursively generate the string.

In summary, this seemingly simple small project requires consideration of many details if you want to implement it yourself. Therefore, the road to in-depth exploration of the frontend is long and arduous!

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.