Published on

TypeScript: Embrace the types

Authors
  • Eduard Lavuš
    Name
    Eduard Lavuš
    Title
    Developer
    Social media profiles

When I was offered to work in TypeScript, I immediately fell in love with its implementation of a type system. It made my head fill with thoughts of all the borderline insane types we could craft to make entire classes of errors a thing of the past. This comes at the cost of having to manage those type definitions, of course, but we still have tests. Strong types are complementary here, allowing tests to focus more on actual features and less on ensuring that we use our internal APIs correctly.

Creating a language parser

Having experienced Haskell at university, having written some template magicks in C++, and in general being interested in higher-order types, I felt more than comfortable experimenting in TypeScript. As I descended into the depths of development, I was tasked with producing a parser for a domain-specific language – Comlink – that powers our integrations at Superface. This language was simple enough (and the project was lenient enough) to create our own recursive descent parser, to define a lexer and its tokens, and to then start defining syntax rules. Afterwards came rule combinators, conceptually simple "or", "and", "optional", "repeat", etc. As the complexity was rising, it became clear that it was imperative to create comprehensive typing for our code to keep our sanity at reasonable levels. And we got more focused tests as a bonus. It allowed us to handle local complexity better by constraining global complexity—by checking correct integration at the type level.

Consider the following type definitions, which imply a very simple toy language, with a construct like foo = 1 + bar - baz. We also assume equal operator precedence.

type IdentifierToken = { kind: 'identifier'; identifier: string };
type NumberToken = { kind: 'number'; value: number };
type Operator = '+' | '-' | '=';
type OperatorToken = { kind: 'operator'; operator: Operator };

type Token = IdentifierToken | NumberToken | OperatorToken;
type TokenStream = Token[];

type ExpressionTerm = string | number;
type ExpressionOperator = '+' | '-';
type ExpressionChain = {
  head: ExpressionTerm;
  elements: [ExpressionOperator, ExpressionTerm][];
};
type Assignment = {
  name: string;
  expression: ExpressionChain;
};

The syntax rule for Assignment could be written in plain English: identifier and "=" and (number or identifier) and optional repeat (operator and (number or identifier)).

Ideally, we’d like our programmatic definition of this rule to be very similar so that the intuition is easy to transfer. We would also like to have generic and correctly typed partial rules. Having to define each rule individually, manually assign types, and compose them together would be overwhelming. Instead, we want a simple API with automatic type resolution. TypeScript gives us just that.

type NonemptyArray<T> = [T, ...T[]];
type PeelSyntaxRule<T> = T extends SyntaxRule<infer R> ? R : T;
type PeelSyntaxRuleArray<A> = { [k in keyof A]: PeelSyntaxRule<A[k]> };
type ApplyFilter<T, F> = F extends [] ? T : F[any];

abstract class SyntaxRule<T> {
  static identifier(): SyntaxRule<string>;
  static number(): SyntaxRule<number>;
  static operator<F extends Operator[]>(
    ...filter: F
  ): SyntaxRule<ApplyFilter<Operator, F>>;
  static or<U extends SyntaxRule<unknown>[]>(
    ...rules: U
  ): SyntaxRule<PeelSyntaxRuleArray<U>[any]>;
  static and<U extends SyntaxRule<unknown>[]>(
    ...rules: U
  ): SyntaxRule<PeelSyntaxRuleArray<U>>;
  static optional<U>(rule: SyntaxRule<U>): SyntaxRule<U | undefined>;
  static repeat<U>(rule: SyntaxRule<U>): SyntaxRule<NonemptyArray<U>>;

  abstract map<U>(f: (value: T) => U): SyntaxRule<U>;
  abstract match(stream: TokenStream): [T, TokenStream];
}

Note that since we are still in a dynamic language, we don’t have to worry as much about the practicality of the implementation. We are always scoped to a single function, so we can afford to use tricks such as the any type and as casts to convince the compiler that we know what we are doing. We localize complex code so that all other code can be made simpler, and the types check the usage.

And now we have everything to define the rules themselves. Composability comes by binding commonly partial rules to variables. In the code block above, we also defined some convenience in terms of filtering Operators and the map function, but these could just as easily be achieved by subclassing. The following example includes types explicitly, but the only requirements are return types when mapping. All other types are correctly inferred by TypeScript, especially for partial rules.

const ruleTerm: SyntaxRule<ExpressionTerm> = SyntaxRule.or(
  SyntaxRule.identifier(),
  SyntaxRule.number()
);
const ruleExpressionChain: SyntaxRule<ExpressionChain> = SyntaxRule.and(
  ruleTerm,
  SyntaxRule.optional(
    SyntaxRule.repeat(SyntaxRule.and(SyntaxRule.operator('+', '-'), ruleTerm))
  )
).map(
  (
    value: [
      ExpressionTerm,
      NonemptyArray<['+' | '-', ExpressionTerm]> | undefined
    ]
  ): ExpressionChain => {
    return {
      head: value[0],
      elements: value[1] ?? [],
    };
  }
);
const ruleAssignment = SyntaxRule.and(
  SyntaxRule.identifier(),
  SyntaxRule.operator('='),
  ruleExpressionChain
).map((value: [string, '=', ExpressionChain]): Assignment => {
  return {
    name: value[0],
    expression: value[2],
  };
});

And there we have it. We now have all the types needed to comfortably call ruleAssignment.match(stream) on a token stream and get an Assignment node for our toy language. If we ever need to change the definition of the assignment rule, we can do so knowing that the compiler will guide us, derive correct types, and show sensible errors. We have improved the developer experience by utilizing the power of the language.

Mr. President, I have types on my hands

You can find a working implementation in this TypeScript playground, and the real project is open source, so you can see the full implementation of the parser on GitHub. However, it is more complex because it deals with error reporting, has more token types, and includes an embedded scripting language. It is still in development, so some code distilled in this article might actually be cleaner than the one in the project. Iterative development is often like that, which is why it is so beneficial to have both an expressive typing system and strong testing coverage to ensure the quality of our product.

In the next article, I will follow up with another type construct from the same project, which pushes the boundaries of how far is too far. Stay tuned!

Automate the impossible.
Superface. The LLM-powered automation agent that connects to all your systems.

Try it now