Understanding Babel and AST: A Deep Dive

Recommended Follow↓

Author: Chocolate Brain

https://juejin.cn/post/7235844016640933943

Babel

When I was very young, someone told me that code should be written artistically. I thought to myself: …… so advanced, how pretentious. However, with the passage of time and the adoption of various proposals, the way of writing JS has gradually upgraded, and the syntactic sugar has increased. Originally three or four lines of code can now be done in one line with just a snap, and looking at the entire code at a glance makes one feel, well, there’s something to it. Below is a small demo:

const demo = () => [1,2,3].map(e => e + 1)

After Babel transformation, this line of code actually looks like this:

var demo = function demo() {
  return [1, 2, 3].map(function (e) {
    return e + 1;
  });
};

This is how it can achieve backward compatibility to run in various possible environments. Therefore, the main functions of Babel are as follows:

  1. Code transformation
  2. Add missing features in the target environment via Polyfill (@babel/polyfill)

@babel/polyfill module includes core-js and a custom regenerator runtime module, which can simulate a complete ES2015+ environment. This means you can use new built-in components like Promise and WeakMap, static methods like Array.from or Object.assign, instance methods like Array.prototype.includes, and generator functions (provided that you use the @babel/plugin-transform-regenerator plugin). To add these features, polyfill will be added to the global scope and built-in prototypes like String (which will pollute the global environment). In fact, after Babel 7.4.0, @babel/polyfill has been deprecated, and corejs and regenerator are directly imported in the source code, and in practical applications, @babel/preset-env can be used as a substitute.

In practical projects, Babel can be configured in multiple ways (details can be understood in the source code analysis below). Here, we take babel.config.js as an example:

module.exports = {
    presets: [...],
    plugins: [...],
} 

The plugins are the basis and rules for Babel to perform transformations, while presets are a collection of plugins. You can learn more in the source code analysis. When two transformation plugins process a certain code snippet of the “program (Program)”, they will execute in the order of the arrangement of transformation plugins or presets:

  • Plugins run before Presets.
  • Plugin order is arranged from front to back.
  • Preset order is reversed (from back to front (the specific reason will be explained in the source code analysis)).

The compilation process of Babel can be divided into three stages:

  • Parsing: Parsing the code string into an abstract syntax tree.
  • Transformation: Performing transformation operations on the abstract syntax tree.
  • Code Generation: Generating a code string based on the transformed abstract syntax tree.

In this process, the most important thing is the AST, as follows.

AST

What is AST

A classic interview question: What is the principle of Babel? Answer: Parse – Transform – Generate. In simple terms, it is to convert code into a specific data structure through certain rules, then perform some CRUD operations on this data structure, and then convert this data structure back into code. Because it is “frontend”, with clearly defined levels of trees, the AST is such a tree. Going deeper leads to the profound subject of – “Compiler Principles”.

As a frontend developer, it’s hard not to come into contact with AST. Webpack, Eslint… or the underlying optimization… are all closely related to AST.

AST (abstract syntax tree) is an abstract representation of the syntax structure of source code. It represents the syntax structure of programming languages in a tree format, where each node on the tree represents a structure in the source code.

Generating an AST mainly consists of two steps:

Lexical Analysis (lexical analyzer)

Splitting the code, traversing the code according to predefined rules to convert each character into a lexical unit (token), thus generating a token list. The demo code converted to tokens is as follows:

[
  {
    "type": "Keyword",
    "value": "const"
  },
  {
    "type": "Identifier",
    "value": "demo"
  },
  {
    "type": "Punctuator",
    "value": "="
  },
  {
    "type": "Punctuator",
    "value": "("
  },
  {
    "type": "Punctuator",
    "value": ")"
  },
  {
    "type": "Punctuator",
    "value": "=>"
  },
  {
    "type": "Punctuator",
    "value": "["
  },
  {
    "type": "Numeric",
    "value": "1"
  },
  {
    "type": "Punctuator",
    "value": ","
  },
  {
    "type": "Numeric",
    "value": "2"
  },
  {
    "type": "Punctuator",
    "value": ","
  },
  {
    "type": "Numeric",
    "value": "3"
  },
  {
    "type": "Punctuator",
    "value": "]"
  },
  {
    "type": "Punctuator",
    "value": "."
  },
  {
    "type": "Identifier",
    "value": "map"
  },
  {
    "type": "Punctuator",
    "value": "(" 
  },
  {
    "type": "Identifier",
    "value": "e"
  },
  {
    "type": "Punctuator",
    "value": "=>"
  },
  {
    "type": "Identifier",
    "value": "e"
  },
  {
    "type": "Punctuator",
    "value": "+"
  },
  {
    "type": "Numeric",
    "value": "1"
  },
  {
    "type": "Punctuator",
    "value": ")"
  }
]

Syntax Analysis (Syntax analyzer)

After obtaining the token list, it can be associated through syntax rules to form an AST. The AST tree generated from the demo code is shown below (too long to show entirely):

Online AST conversion site: https://astexplorer.net/

Understanding Babel and AST: A Deep Dive Understanding Babel and AST: A Deep Dive

AST with Babel

Ok, we want to perform such a series of operations through Babel to generate AST, transform AST, and then convert it back to JS. We take the demo as an example and implement this operation by calling the API provided by Babel:

const transformLetToVar = babel.transformSync(`${beforeFile}`, {
    plugins: [{
        visitor: {
            // [const, let] transformed to var
            VariableDeclaration(path) {
                if (path.node.kind === 'let' || path.node.kind === 'const' ) {
                    path.node.kind = 'var';
                }
            },
            // () => {} transformed to function, note the absence of {}
            ArrowFunctionExpression(path) {
                let body = path.node.body;
                if (!t.isBlockStatement(body)) {
                    body = t.blockStatement([t.returnStatement(body)]);
                }
                path.replaceWith(
                    t.functionExpression(
                        null,
                        path.node.params,
                        body,
                        false,
                        false
                    )
                );
            },
            // Process arrays
            CallExpression(path) {
                if (path.get('callee.property').node.name === 'map') {
                    const callback = path.node.arguments[0];
                    if (t.isArrowFunctionExpression(callback)) {
                        let body = callback.body;
                        if (!t.isBlockStatement(body)) {
                            body = t.blockStatement([t.returnStatement(body)]);
                        }
                        path.node.arguments[0] = t.functionExpression(
                            null,
                            callback.params,
                            body,
                            false,
                            false
                        );
                    }
                }
            }
        }
    }]
});

The code uses the transformSync method from @Babel/core, which is a combination of @Babel/parser, @babel/traverse, and @babel/generator. If these three methods are used, the code is as follows:

// Step 1: Parse the code to AST
const ast = parser.parse(code);

// Step 2: Traverse and modify the AST
traverse(ast, {
    VariableDeclaration(path) {
        if (path.node.kind === 'let' || path.node.kind === 'const' ) {
            path.node.kind = 'var';
        }
    },
    ArrowFunctionExpression(path) {
        let body = path.node.body;
        if (!t.isBlockStatement(body)) {
            body = t.blockStatement([t.returnStatement(body)]);
        }
        path.replaceWith(
            t.functionExpression(
                null,
                path.node.params,
                body,
                false,
                false
            )
        );
    },
    CallExpression(path) {
        if (path.get('callee.property').node.name === 'map') {
            const callback = path.node.arguments[0];
            if (t.isArrowFunctionExpression(callback)) {
                let body = callback.body;
                if (!t.isBlockStatement(body)) {
                    body = t.blockStatement([t.returnStatement(body)]);
                }
                path.node.arguments[0] = t.functionExpression(
                    null,
                    callback.params,
                    body,
                    false,
                    false
                );
            }
        }
    }
});

// Step 3: Generate new code from the modified AST
const output = generator(ast, {}, code);

@Babel/core Source Code Analysis

In the above text, when converting with Babel, we used babel.transformSync, which comes from @Babel/core. Let’s start with it for a simple source code analysis. First, we go to Babel/core/index.js, which mainly contains some basic imports and exports. The relevant part about babel.transformSync is as follows:

export {
  transform,
  transformSync,
  transformAsync,
  type FileResult,
} from "./transform";

Next, we go directly to Babel/core/transform:

type Transform = {
  (code: string, callback: FileResultCallback): void;
  (
    code: string,
    opts: InputOptions | undefined | null,
    callback: FileResultCallback,
  ): void;
  (code: string, opts?: InputOptions | null): FileResult | null;
};

const transformRunner = gensync(function* transform(
  code: string,
  opts?: InputOptions,
): Handler<FileResult | null> {
  const config: ResolvedConfig | null = yield* loadConfig(opts);
  if (config === null) return null;

  return yield* run(config, code);
});
export const transform: Transform = function transform(
  code,
  optsOrCallback?: InputOptions | null | undefined | FileResultCallback,
  maybeCallback?: FileResultCallback,
) {
  let opts: InputOptions | undefined | null;
  let callback: FileResultCallback | undefined;
  if (typeof optsOrCallback === "function") {
    callback = optsOrCallback;
    opts = undefined;
  } else {
    opts = optsOrCallback;
    callback = maybeCallback;
  }

  if (callback === undefined) {
    if (process.env.BABEL_8_BREAKING) {
      throw new Error(
        "Starting from Babel 8.0.0, the 'transform' function expects a callback. If you need to call it synchronously, please use 'transformSync'.",
      );
    } else {
      // console.warn(
      //   "Starting from Babel 8.0.0, the 'transform' function will expect a callback. If you need to call it synchronously, please use 'transformSync'.",
      // );
      return beginHiddenCallStack(transformRunner.sync)(code, opts);
    }
  }

  beginHiddenCallStack(transformRunner.errback)(code, opts, callback);
};

export function transformSync(
  ...args: Parameters<typeof transformRunner.sync>
) {
  return beginHiddenCallStack(transformRunner.sync)(...args);
}
export function transformAsync(
  ...args: Parameters<typeof transformRunner.async>
) {
  return beginHiddenCallStack(transformRunner.async)(...args);
}

We will find that they all call transformRunner, which receives two parameters: code and opts:

const transformRunner = gensync(function* transform(
  code: string,
  opts?: InputOptions,
): Handler<FileResult | null> {
  const config: ResolvedConfig | null = yield* loadConfig(opts);
  if (config === null) return null;

  return yield* run(config, code);
});

Where InputOptions is:

export type InputOptions = ValidatedOptions;

export type ValidatedOptions = {
  cwd?: string;
  filename?: string;
  filenameRelative?: string;
  babelrc?: boolean;
  babelrcRoots?: BabelrcSearch;
  configFile?: ConfigFileSearch;
  root?: string;
  rootMode?: RootMode;
  code?: boolean;
  ast?: boolean;
  cloneInputAst?: boolean;
  inputSourceMap?: RootInputSourceMapOption;
  envName?: string;
  caller?: CallerMetadata;
  extends?: string;
  env?: EnvSet<ValidatedOptions>;
  ignore?: IgnoreList;
  only?: IgnoreList;
  overrides?: OverridesList;
  // Generally verify if a given config object should be applied to the given file.
  test?: ConfigApplicableTest;
  include?: ConfigApplicableTest;
  exclude?: ConfigApplicableTest;
  presets?: PluginList;
  plugins?: PluginList;
  passPerPreset?: boolean;
  assumptions?: {
    [name: string]: boolean;
  };
  // browserslists-related options
  targets?: TargetsListOrObject;
  browserslistConfigFile?: ConfigFileSearch;
  browserslistEnv?: string;
  // Options for @babel/generator
  retainLines?: boolean;
  comments?: boolean;
  shouldPrintComment?: Function;
  compact?: CompactOption;
  minified?: boolean;
  auxiliaryCommentBefore?: string;
  auxiliaryCommentAfter?: string;
  // Parser
  sourceType?: SourceTypeOption;
  wrapPluginVisitorMethod?: Function;
  highlightCode?: boolean;
  // Sourcemap generation options.
  sourceMaps?: SourceMapsOption;
  sourceMap?: SourceMapsOption;
  sourceFileName?: string;
  sourceRoot?: string;
  // Deprecate top level parserOpts
  parserOpts?: ParserOptions;
  // Deprecate top level generatorOpts
  generatorOpts?: GeneratorOptions;
};

Among them, properties like plugins and presets are included, where plugins is the second parameter we called when invoking babel.transformSync, which is the basis for implementing code transformation in Babel. Each plugin is a small JavaScript program that tells Babel how to perform specific code transformations, while presets are a collection of pre-defined plugins. Due to the numerous new features of JavaScript, if we had to manually specify all the required plugins each time, the configuration would be very cumbersome, so Babel provides presets that allow us to import a whole set of plugins with one line of code.

Returning to transformRunner, the main body of this method is divided into two steps: calling loadConfig and calling run. First, let’s look at loadConfig, which is actually the method babel-core/src/config/full.ts that handles loading configurations, presets, and plugins. The source code is as follows, and we will analyze it step by step:

export default gensync(function* loadFullConfig(
  inputOpts: unknown,
): Handler<ResolvedConfig | null> {
  const result = yield* loadPrivatePartialConfig(inputOpts);
  if (!result) {
    return null;
  }
  const { options, context, fileHandling } = result;

  if (fileHandling === "ignored") {
    return null;
  }

  const optionDefaults = {};

  const { plugins, presets } = options;

  if (!plugins || !presets) {
    throw new Error("Assertion failure - plugins and presets exist");
  }
  // Create presetContext object, adding options.targets to the original context
  const presetContext: Context.FullPreset = {
    ...context,
    targets: options.targets,
  };
  // ...
})

First, it calls loadPrivatePartialConfig to build the configuration chain. This method receives opts, processes various validations and transformations, and finally returns a processed configuration object. During this process, it also converts some values in the input options into absolute paths and creates some new objects and properties:

export default function* loadPrivatePartialConfig(
  inputOpts: unknown,
): Handler<PrivPartialConfig | null> {
  if (
    inputOpts != null &&
    (typeof inputOpts !== "object" || Array.isArray(inputOpts))
  ) {
    throw new Error("Babel options must be an object, null, or undefined");
  }
  const args = inputOpts ? validate("arguments", inputOpts) : {};
  const {
    envName = getEnv(),
    cwd = ".",
    root: rootDir = ".",
    rootMode = "root",
    caller,
    cloneInputAst = true,
  } = args;
  // Convert cwd and rootDir to absolute paths
  const absoluteCwd = path.resolve(cwd);
  const absoluteRootDir = resolveRootMode(
    path.resolve(absoluteCwd, rootDir),
    rootMode,
  );
  // If filename is a string, convert to absolute path
  const filename =
    typeof args.filename === "string"
      ? path.resolve(cwd, args.filename)
      : undefined;
  // resolveShowConfigPath method is used to resolve file paths, returning that path if it exists and points to a file
  const showConfigPath = yield* resolveShowConfigPath(absoluteCwd);
  // Reassemble the converted data into an object named context
  const context: ConfigContext = {
    filename,
    cwd: absoluteCwd,
    root: absoluteRootDir,
    envName,
    caller,
    showConfig: showConfigPath === filename,
  };
  // Call buildRootChain, the source code analysis of buildRootChain is below
  const configChain = yield* buildRootChain(args, context);
  if (!configChain) return null;

  const merged: ValidatedOptions = {
    assumptions: {},
  };
  // Iterate through configChain.options and merge into merged
  configChain.options.forEach(opts => {
    mergeOptions(merged as any, opts);
  });
  // Define a new options object
  const options: NormalizedOptions = {
    ...merged,
    targets: resolveTargets(merged, absoluteRootDir),

    // Tack the passes onto the object itself so that, if this object is
    // passed back to Babel a second time, it will be in the right structure
    // to not change behavior.
    cloneInputAst,
    babelrc: false,
    configFile: false,
    browserslistConfigFile: false,
    passPerPreset: false,
    envName: context.envName,
    cwd: context.cwd,
    root: context.root,
    rootMode: "root",
    filename:
      typeof context.filename === "string" ? context.filename : undefined,

    plugins: configChain.plugins.map(descriptor =>
      createItemFromDescriptor(descriptor),
    ),
    presets: configChain.presets.map(descriptor =>
      createItemFromDescriptor(descriptor),
    ),
  };

  return {
    options,
    context,
    fileHandling: configChain.fileHandling,
    ignore: configChain.ignore,
    babelrc: configChain.babelrc,
    config: configChain.config,
    files: configChain.files,
  };
}

This method mainly does three things: processes configurations, processes presets, and processes plugins, and we will analyze the source code below.

export function* buildRootChain(
  opts: ValidatedOptions,
  context: ConfigContext,
): Handler<RootConfigChain | null> {
  let configReport, babelRcReport;
  const programmaticLogger = new ConfigPrinter();
  // Generate programmatic options (program options), which will be used when using @babel/cli or babel.transfrom
  const programmaticChain = yield* loadProgrammaticChain(
    {
      options: opts,
      dirname: context.cwd,
    },
    context,
    undefined,
    programmaticLogger,
  );
  if (!programmaticChain) return null;
  const programmaticReport = yield* programmaticLogger.output();

  let configFile;
  // If a configuration file is specified, call loadConfig to load it; if not, call findRootConfig to load the root directory configuration
  if (typeof opts.configFile === "string") {
    configFile = yield* loadConfig(
      opts.configFile,
      context.cwd,
      context.envName,
      context.caller,
    );
  } else if (opts.configFile !== false) {
    configFile = yield* findRootConfig(
      context.root,
      context.envName,
      context.caller,
    );
  }
  // ...
}

The findRootConfig method traverses ROOT_CONFIG_FILENAMES and loads the first found configuration file from the root directory (the current execution directory).

export const ROOT_CONFIG_FILENAMES = [
  "babel.config.js",
  "babel.config.cjs",
  "babel.config.mjs",
  "babel.config.json",
  "babel.config.cts",
];
export function findRootConfig(
  dirname: string,
  envName: string,
  caller: CallerMetadata | undefined,
): Handler<ConfigFile | null> {
  return loadOneConfig(ROOT_CONFIG_FILENAMES, dirname, envName, caller);
}

function* loadOneConfig(
  names: string[],
  dirname: string,
  envName: string,
  caller: CallerMetadata | undefined,
  previousConfig: ConfigFile | null = null,
): Handler<ConfigFile | null> {
  const configs = yield* gensync.all(
    names.map(filename =>
      readConfig(path.join(dirname, filename), envName, caller),
    ),
  );
  const config = configs.reduce((previousConfig: ConfigFile | null, config) => {
    if (config && previousConfig) {
      throw new ConfigError(
        `Multiple configuration files found. Please remove one:\n` +
          ` - ${path.basename(previousConfig.filepath)}\n` +
          ` - ${config.filepath}\n` +
          `from ${dirname}`,
      );
    }
    return config || previousConfig;
  }, previousConfig);

  if (config) {
    debug("Found configuration %o from %o.", config.filepath, dirname);
  }
  return config;
}

Ok, we have finally completed the source code analysis of the run method, which receives three parameters: config (the return value of the source code processed above), code (the code string), and ast (an optional AST):

export function* run(
  config: ResolvedConfig,
  code: string,
  ast?: t.File | t.Program | null,
): Handler<FileResult> {
  const file = yield* normalizeFile(
    config.passes,
    normalizeOptions(config),
    code,
    ast,
  );
  // ...
}

The normalizeFile code is as follows, which uses config.passes (an array of plugins), normalized configuration, source code, and an optional ast as parameters:

export default function* normalizeFile(
  pluginPasses: PluginPasses,
  options: { [key: string]: any },
  code: string,
  ast?: t.File | t.Program | null,
): Handler<File> {
  code = `${code || ""}`;
  if (ast) {
    if (ast.type === "Program") {
      ast = file(ast, [], []);
    } else if (ast.type !== "File") {
      throw new Error("AST root must be a Program or File node");
    }

    if (options.cloneInputAst) {
      ast = cloneDeep(ast);
    }
  } else {
    ast = yield* parser(pluginPasses, options, code);
  }

  let inputMap = null;
  if (options.inputSourceMap !== false) {
    if (typeof options.inputSourceMap === "object") {
      inputMap = convertSourceMap.fromObject(options.inputSourceMap);
    }

    if (!inputMap) {
      const lastComment = extractComments(INLINE_SOURCEMAP_REGEX, ast);
      if (lastComment) {
        try {
          inputMap = convertSourceMap.fromComment(lastComment);
        } catch (err) {
          debug("discarding unknown inline input sourcemap", err);
        }
      }
    }

    if (!inputMap) {
      const lastComment = extractComments(EXTERNAL_SOURCEMAP_REGEX, ast);
      if (typeof options.filename === "string" && lastComment) {
        try {
          const match: [string, string] = EXTERNAL_SOURCEMAP_REGEX.exec(
            lastComment,
          ) as any;
          const inputMapContent = fs.readFileSync(
            path.resolve(path.dirname(options.filename), match[1]),
            "utf8",
          );
          inputMap = convertSourceMap.fromJSON(inputMapContent);
        } catch (err) {
          debug("discarding unknown file input sourcemap", err);
        }
      } else if (lastComment) {
        debug("discarding un-loadable file input sourcemap");
      }
    }
  }
  return new File(options, {
    code,
    ast: ast as t.File,
    inputMap,
  });
}

Where parser source code ultimately points to babel-core/src/parser/index.ts as follows:

export default function* parser(
  pluginPasses: PluginPasses,
  { parserOpts, highlightCode = true, filename = "unknown" }: any,
  code: string,
): Handler<ParseResult> {
  try {
    const results = [];
    for (const plugins of pluginPasses) {
      for (const plugin of plugins) {
        const { parserOverride } = plugin;
        if (parserOverride) {
          const ast = parserOverride(code, parserOpts, parse);

          if (ast !== undefined) results.push(ast);
        }
      }
    }
    if (results.length === 0) {
      return parse(code, parserOpts);
    } else if (results.length === 1) {
      yield* [];
      if (typeof results[0].then === "function") {
        throw new Error(
          `You appear to be using an async parser plugin, ` +
            `which your current version of Babel does not support. ` +
            `If you're using a published plugin, you may need to upgrade ` +
            `your @babel/core version.`,
        );
      }
      return results[0];
    }
    throw new Error("More than one plugin attempted to override parsing.");
  } catch (err) {
    if (err.code === "BABEL_PARSER_SOURCETYPE_MODULE_REQUIRED") {
      err.message +=
        "\nConsider renaming the file to '.mjs', or setting sourceType:module " +
        "or sourceType:unambiguous in your Babel config for this file.";
      // err.code will be changed to BABEL_PARSE_ERROR later.
    }

    const { loc, missingPlugin } = err;
    if (loc) {
      const codeFrame = codeFrameColumns(
        code,
        {
          start: {
            line: loc.line,
            column: loc.column + 1,
          },
        },
        {
          highlightCode,
        },
      );
      if (missingPlugin) {
        err.message =
          `${filename}: ` +
          generateMissingPluginMessage(missingPlugin[0], loc, codeFrame);
      } else {
        err.message = `${filename}: ${err.message}\n\n` + codeFrame;
      }
      err.code = "BABEL_PARSE_ERROR";
    }
    throw err;
  }
}

Returning to the run method, after obtaining the AST, it calls the transformFile method for transformation:

function* transformFile(file: File, pluginPasses: PluginPasses): Handler<void> {
  for (const pluginPairs of pluginPasses) {
    const passPairs: [Plugin, PluginPass][] = [];
    const passes = [];
    const visitors = [];

    for (const plugin of pluginPairs.concat([loadBlockHoistPlugin()])) {
      const pass = new PluginPass(file, plugin.key, plugin.options);
      passPairs.push([plugin, pass]);
      passes.push(pass);
      visitors.push(plugin.visitor);
    }

    for (const [plugin, pass] of passPairs) {
      const fn = plugin.pre;
      if (fn) {
        const result = fn.call(pass, file);
        yield* [];
        if (isThenable(result)) {
          throw new Error(
            `You appear to be using an plugin with an async .pre, ` +
              `which your current version of Babel does not support. ` +
              `If you're using a published plugin, you may need to upgrade ` +
              `your @babel/core version.`,
          );
        }
      }
    }
    const visitor = traverse.visitors.merge(
      visitors,
      passes,
      file.opts.wrapPluginVisitorMethod,
    );
    traverse(file.ast, visitor, file.scope);
    for (const [plugin, pass] of passPairs) {
      const fn = plugin.post;
      if (fn) {
        const result = fn.call(pass, file);
        yield* [];
        if (isThenable(result)) {
          throw new Error(
            `You appear to be using an plugin with an async .post, ` +
              `which your current version of Babel does not support. ` +
              `If you're using a published plugin, you may need to upgrade ` +
              `your @babel/core version.`,
          );
        }
      }
    }
  }
}

In the transformFile method, the pre, visitor, and post are called in order, which are:

  • pre(state: PluginPass): This method is called before traversing. It is typically used to set some initial state information that needs to be maintained throughout the traversal on the plugin state object. The state parameter is an instance of PluginPass that contains information related to the plugin execution context.
  • visitor: This object defines the methods to be called during the traversal. Each method’s key is the type of node to visit, and the value is the corresponding visitor method or an object containing enter and exit methods.
  • post(state: PluginPass): This method is called after the traversal is complete, typically used to perform some cleanup or to collect and use results computed during the traversal. The state parameter is the same as the pre method.

Next, we return to the run method:

{
  // ...
  let outputCode, outputMap;
  try {
    if (opts.code !== false) {
      ({ outputCode, outputMap } = generateCode(config.passes, file));
    }
  } catch (e) {
    e.message = `${opts.filename ?? "unknown file"}: ${e.message}`;
    if (!e.code) {
      e.code = "BABEL_GENERATE_ERROR";
    }
    throw e;
  }
  return {
    metadata: file.metadata,
    options: opts,
    ast: opts.ast === true ? file.ast : null,
    code: outputCode === undefined ? null : outputCode,
    map: outputMap === undefined ? null : outputMap,
    sourceType: file.ast.program.sourceType,
    externalDependencies: flattenToSet(config.externalDependencies),
  };
}

Finally, the generateCode method is called to convert the AST back to code. The source code is as follows, which is similar to parser:

export default function generateCode(
  pluginPasses: PluginPasses,
  file: File,
): {
  outputCode: string;
  outputMap: SourceMap | null;
} {
  const { opts, ast, code, inputMap } = file;
  const { generatorOpts } = opts;

  generatorOpts.inputSourceMap = inputMap?.toObject();

  const results = [];
  for (const plugins of pluginPasses) {
    for (const plugin of plugins) {
      const { generatorOverride } = plugin;
      if (generatorOverride) {
        const result = generatorOverride(ast, generatorOpts, code, generate);

        if (result !== undefined) results.push(result);
      }
    }
  }

  let result;
  if (results.length === 0) {
    result = generate(ast, generatorOpts, code);
  } else if (results.length === 1) {
    result = results[0];

    if (typeof result.then === "function") {
      throw new Error(
        `You appear to be using an async codegen plugin, ` +
          `which your current version of Babel does not support. ` +
          `If you're using a published plugin, you may need to upgrade ` +
          `your @babel/core version.`,
      );
    }
  } else {
    throw new Error("More than one plugin attempted to override codegen.");
  }

  let { code: outputCode, decodedMap: outputMap = result.map } = result;

  if (result.__mergedMap) {
    outputMap = { ...result.map };
  } else {
    if (outputMap) {
      if (inputMap) {
        outputMap = mergeSourceMap(
          inputMap.toObject(),
          outputMap,
          generatorOpts.sourceFileName,
        );
      } else {
        outputMap = result.map;
      }
    }
  }

  if (opts.sourceMaps === "inline" || opts.sourceMaps === "both") {
    outputCode += "\n" + convertSourceMap.fromObject(outputMap).toComment();
  }

  if (opts.sourceMaps === "inline") {
    outputMap = null;
  }

  return { outputCode, outputMap };
}

At this point, the source code analysis of the run method has been completed, and the source code analysis of @babel/core that started with babel.transformSync has also been finished!

Simple JavaScript Compiler (Like Babel)

Next, we will create a simple compiler for demo purposes, following the same process of parsing – transforming – generating as follows:

Node Types (constants.js)

const TokenTypes = {
    Keyword: "Keyword",
    Identifier: "Identifier",
    Punctuator: "Punctuator",
    String: "String",
    Numeric: "Numeric",
    Paren: 'Paren',
    Arrow: 'Arrow'
}

const AST_Types = {
    Literal: "Literal",
    Identifier: "Identifier",
    AssignmentExpression: "AssignmentExpression",
    VariableDeclarator: "VariableDeclarator",
    VariableDeclaration: "VariableDeclaration",
    Program: "Program",
    NumericLiteral: "NumericLiteral",
    ArrowFunctionExpression: 'ArrowFunctionExpression',
    FunctionExpression: 'FunctionExpression'
}

module.exports = {
    TokenTypes,
    AST_Types
}

Lexical Analysis (tokenizer.js)

const tokens = require("./constants")
// Match keywords
const KEYWORD = /let/
// Match "=", ";"
const PUNCTUATOR = /[\=;]/
// Match whitespace
const WHITESPACE = /\s/
// Match characters
const LETTERS = /[A-Za-z]/i
// Match numbers
const NUMERIC = /[0-9]/i
const PAREN = /[()]/;

const {TokenTypes } = tokens

function tokenizer(input) {
    const tokens = []
    let current = 0
    // Traverse the string
    while (current < input.length) {
    let char = input[current]
    // Handle keywords and variable names
    if (LETTERS.test(char)) {
        let value = ''
        // Use a loop to traverse all letters and store them in value
        while (LETTERS.test(char)) {
            value += char
            char = input[++current]
        }
      // Check if the current string is a keyword
        KEYWORD.test(value) ? tokens.push({
            type: TokenTypes.Keyword,
            value: value
        }) : tokens.push({
            type: TokenTypes.Identifier,
            value: value
        })
        continue
    }
    // Check if it’s a parenthesis
    if (PAREN.test(char)) {
        tokens.push({
            type: TokenTypes.Paren,
            value: char
        });
        current++;
        continue;
    }
    // Check if it’s an arrow symbol
    if (char === '=' && input[current + 1] === '>') {
        tokens.push({
            type: TokenTypes.Arrow,
            value: '=>'
        });
        current += 2; // Skip the two characters
        continue;
    }
    // Determine if it’s a number
    if (NUMERIC.test(char)) {
        let value = '' + char
        char = input[++current]
        while (NUMERIC.test(char) && current < input.length) {
            value += char
            char = input[++current]    
        }
        tokens.push({ type: TokenTypes.Numeric, value })
        continue
    }
    // Check if it’s a symbol, "=", ";"
    if (PUNCTUATOR.test(char)) {
        const punctuators = char // Create a variable to save the matched symbol
        current++
        tokens.push({
            type: TokenTypes.Punctuator,
            value: punctuators
        })
        continue;
    }
    // Handle whitespace, skip on whitespace
    if (WHITESPACE.test(char)) {
        current++
        continue;
    }
    // Handle strings
    if (char === '"') {
        let value = ''
        // Ignore the leading quote
        char = input[++current]
        // Traverse until the next quote is encountered
        while (char !== '"') {
            value += char
            char = input[++current]
        }
        // Ignore the trailing quote
        char = input[++current]
        tokens.push({ type: TokenTypes.String, value: '"'+value+'"' })
        continue;
    }
    // If no current matching rule is satisfied, throw an error
    throw new TypeError('Unknown' + char)
  }
  return tokens
}
module.exports = tokenizer

Syntax Analysis (parser.js)

const {TokenTypes, AST_Types} = require("./constants");

function parser(tokens) {
    let current = 0;

    function walk() {
        let token = tokens[current];

        if (token.type === TokenTypes.Numeric) {
            current++;
            return {
                type: AST_Types.NumericLiteral,
                value: token.value,
            };
        }

        if (token.type === TokenTypes.String) {
            current++;
            return {
                type: AST_Types.Literal,
                value: token.value,
            };
        }

        if (token.type === TokenTypes.Identifier) {
            current++;
            return {
                type: AST_Types.Identifier,
                name: token.value,
            };
        }

        if (token.type === TokenTypes.Keyword && token.value === 'let') {
            token = tokens[++current];

            let node = {
                type: AST_Types.VariableDeclaration,
                kind: 'let',
                declarations: [],
            };

            while (token.type === TokenTypes.Identifier) {
                node.declarations.push({
                    type: AST_Types.VariableDeclarator,
                    id: {
                        type: AST_Types.Identifier,
                        name: token.value,
                    },
                    init: null,
                });

                token = tokens[++current];

                if (token && token.type === TokenTypes.Punctuator && token.value === '=') {
                    token = tokens[++current];
                    if (token && token.type === TokenTypes.Paren) {
                        token = tokens[++current];
                        if (token && token.type === TokenTypes.Paren) {
                            token = tokens[++current];
                            if (token && token.type === TokenTypes.Arrow) {
                                token = tokens[++current];
                                let arrowFunction = {
                                    type: AST_Types.ArrowFunctionExpression,
                                    params: [],
                                    body: walk(),
                                };
                                node.declarations[node.declarations.length - 1].init = arrowFunction;
                            }
                        }
                    } else {
                        node.declarations[node.declarations.length - 1].init = walk();
                    }
                }

                token = tokens[current];
                if (token && token.type === TokenTypes.Punctuator && token.value === ';') {
                    current++;
                    break;
                }
            }
            return node;
        }
        throw new TypeError(token.type);
    }
    let ast = {
        type: AST_Types.Program,
        body: [],
    };
    while (current < tokens.length) {
        ast.body.push(walk());
    }

    return ast;
}
module.exports = parser;

Traverser (traverser.js)

const constants = require("./constants")
const { AST_Types } = constants
function traverser(ast, visitor) {
    // Traverse nodes, calling traverseNode
    function traverseArray(array, parent) {
        array.forEach(function(child) {
            traverseNode(child, parent);
        });
    }
    function traverseNode(node, parent) {
        // Check if there’s a corresponding method in the visitor for the type.
        const method = visitor[node.type]
        if (method) {
            method(node, parent)
        }
        // Handle each different type of node separately.
        switch (node.type) {
            case AST_Types.Program: 
                traverseArray(node.body, node) 
                break
            case AST_Types.VariableDeclaration:
                traverseArray(node.declarations, node);
                break;
            case AST_Types.VariableDeclarator:
                traverseNode(node.id, node);
                traverseNode(node.init, node);
                break;
            case AST_Types.ArrowFunctionExpression:
                traverseArray(node.params, node);
                traverseNode(node.body, node);
            case AST_Types.AssignmentExpression:
            case AST_Types.Identifier:
            case AST_Types.Literal:
            case AST_Types.NumericLiteral:
                break
            default:
                throw new TypeError(node.type)
        }
    }
    // Trigger the traversal of the AST, passing null for the root node which has no parent.
    traverseNode(ast, null)
}
module.exports = traverser

Transformer (transformer.js)

const traverser = require("./traverser")
const constants = require("./constants")
const { AST_Types } = constants

function transformer(ast) {
    const newAst = {
            type: AST_Types.Program,
            body: [],
            sourceType: "script"
    };
    ast._context = newAst.body 
    // Pass AST and visitor into traverser
    traverser(ast, {
            // Convert let to var
            VariableDeclaration: function(node, parent) {
                const variableDeclaration = {
                    type: AST_Types.VariableDeclaration,
                    declarations: node.declarations,
                    kind: "var"
                };
                parent._context.push(variableDeclaration)
            },
            ArrowFunctionExpression: function (node, parent) {
                const functionExpression = {
                    type: AST_Types.FunctionExpression,
                    params: node.params, // Keep parameter list unchanged
                    body: node.body, // Keep function body unchanged
                }
                if (parent.type === AST_Types.VariableDeclarator) {
                    parent.init = functionExpression;
                }
            },
    });
    return newAst
}
module.exports = transformer

Code Generator (codeGenerator.js)

const constants = require("./constants")
const { AST_Types } = constants

function codeGenerator(node) {
    // Handle different types of nodes
    switch (node.type) {
        // If it’s a Program node, traverse each node in its body property and add newline characters
        case AST_Types.Program:
            return node.body.map(codeGenerator)
                .join('\n')
        case AST_Types.VariableDeclaration:
            return (
                node.kind + ' ' + node.declarations.map(codeGenerator)
            )
        case AST_Types.VariableDeclarator:
            return (
                codeGenerator(node.id) + ' = ' + 
                codeGenerator(node.init)
            )
        case AST_Types.Identifier:
            return node.name
        case AST_Types.Literal:
            return '"'+node.value+'"' + "; }"
        case AST_Types.NumericLiteral:
            return node.value + '; }'
        case AST_Types.FunctionExpression:
            return 'function(' + node.params + ') { return ' +  codeGenerator(node.body)
        default:
            throw new TypeError(node.type)
    }
}
module.exports = codeGenerator

index.js

const tokenizer = require('./tokenizer')
const parser = require('./parser')
const transformer = require("./transformer")
const codeGenerator = require("./codeGenerator")

const demo = 'let a = () => 1;'
const tokens = tokenizer(demo)
const AST = parser(tokens)
const newAST = transformer(AST)
const newCode = codeGenerator(newAST)
console.log(newCode)
console.dir(newAST, {depth: null})

The final transformation result is as follows:

var a = function() { return 1; }

The generated new AST tree is as follows:

{
  type: 'Program',
  body: [
    {
      type: 'VariableDeclaration',
      declarations: [
        {
          type: 'VariableDeclarator',
          id: { type: 'Identifier', name: 'a' },
          init: {
            type: 'FunctionExpression',
            params: [],
            body: { type: 'NumericLiteral', value: '1' }
          }
        }
      ],
      kind: 'var'
    }
  ],
  sourceType: 'script'
}

– EOF –

Understanding Babel and AST: A Deep Dive

Add me on WeChat, not only will your frontend skills improve +1

Understanding Babel and AST: A Deep DiveUnderstanding Babel and AST: A Deep Dive

On a daily basis, I will also share frontend development learning resources and selected technical articles on my personal WeChat. Occasionally, I will share some interesting activities, job referrals, and how to use technology for side projects.

Understanding Babel and AST: A Deep Dive

Add me on WeChat, open a window

Recommended Reading Click the title to jump

1. Say goodbye to “copy and paste”, let’s build a Babel ourselves

2. What every frontend developer should know about AST

3. I tried an AI code completion tool, it’s really cool!

If you find this article helpful, please share it with more people

Recommend following “Frontend All-in-One” to improve your frontend skills

Likes and views are the biggest support❤️

Leave a Comment