Notes on how to create a Language Grammar and Custom Theme for a Textmate Bundle

Over the past couple of years I've been working a lot with Craft CMS and Twig. There was an existing PHP-Twig Textmate bundle which was a good starting point but there were several Craft-specific improvements I wanted to make. Certain keywords were not recognized or highlighted incorrectly and a number of handy tab shortcuts and key bindings were not available. On top of this, I wanted to customize a theme to my preferences and struggled to get the themes I had working with the grammar in a way that made me happy.

After a few attempts to customize the PHP-Twig bundle, I ended up using it mostly as a reference and re-building the Craft-Twig fork of the bundle from scratch. Rebuilding the bundle helped me get a better understanding of how to create a Language Grammar, how a Language Grammar and a Theme are related to one another, and gave me a fair bit of practice writing regular expressions (an arcane skill that mysteriously brings me joy).

Documentation on Language Grammars remains a bit elusive and, while some references on theming are a bit less scarce, I've found it plenty difficult to get up to speed on the topic and feel confident about what I was doing.

My treatment of these topics will not be comprehensive; this post is my attempt to document some of those things I learned along the way. First, as simply a reference for myself, and second, hopefully, as a resource that others can refer to if they'd like to help maintain, customize, or contribute to a Language Grammar or Theme that is valuable to their workflow.

I have no special knowledge on this topic. Most of what I'll share below has been learned through a process of trial and error and cursing – which I will refrain from in my memorandum.

Below, I'll discuss some highlights on how to:

  1. Create a Language Grammar that defines the syntax of your code
  2. Customize a Theme to style your code to your preferences

Creating a Language Grammar #

By far, the most challenging thing to get my head around while building out a Textmate Bundle was the Language Grammar. There is limited documentation and it takes a good amount of time banging your head against your own theories until things start to make sense. My first pass at this was customizing the PHP-Twig bundle to address some additional functionality. On my second pass, I decided to rebuild the Language Grammar from scratch. Starting from scratch, the structure of the grammar and how things related to one another became a lot clearer and I was able to address several more language patterns that I had initially struggled with.

Your Language Grammar consists mainly of two key things:

  1. Patterns - the rules (regular expressions) that will be used to identify the different parts of your document
  2. Scope selectors - the scope names (similar to CSS classes) that will be used by your theme style the code in your text editor

The patterns use Ruby-style regular expressions syntax. I found it helpful to have the online Ruby regex tester Rubular on hand for testing as well as the Ruby Doc Regex page open as a reference for things like lookaheads, lookbehinds, options, and so on.

The Textmate docs are probably the best source of information on how Language Grammars and Scope Selectors work. They definitely aren't the easiest doc pages to make sense of on the first read. They dive right into things and assume you have a certain amount of knowledge about the context of what you're doing. After enough fussing around and enough things finally clicking, I found myself referring back to the docs regularly and they now make a lot more sense in helping me tie together the things I learned along the way.

I'm also thankful to Matt Neuburg who took the time to document his own experience: Writing a TextMate Grammar: Some Lessons Learned. It was nice to know that I wasn't the only person struggling for weeks of time making sense of these things and his post has several worthwhile gotchas to look out for and tips that came in handy.

Workflow considerations #

There are a few things I found extremely helpful to my workflow while building out the Craft-Twig Language Grammar. It may take weeks to wrap your head around what's going on, but once you make sense of things, they are internally consistent and you can make a lot of progress quickly (or at least as quickly as you can write regular expressions!).

Language Grammar Unit Test

Color schemes can't be tested in the same way that code can, but taking the time to pull together a document (or documents) with examples of all of the different code you want to have supported in your theme is extremely helpful. On one side of my screen I could make changes in the Textmate bundle editor, and on the other, I could immediately see those changes in my example document with my language grammar and color scheme selected.

For my situation, I created this Craft-Twig Unit Test where I aggregated examples from across the Craft and Twig docs, and various other scenarios that you might run into while writing code for websites.

In the long run, as I polish up the my preferred themes to work with my full web development workflow, I'll probably add additional test files for PHP and Javascript and any other common languages I find myself using across a web build.

Language Grammar Comments

One of the biggest challenges of working with a Language Grammar is that as you get farther into the project, the project becomes more complex as code constructs start referring back on themselves. At first, you might just defining a comment block, a function, or a string. Later on, you'll have an array within an array that might refer to a function with a concatenated string.

To make things a little easier on ourselves (because, we won't be doing this again for another year, right?), we can leave ourselves comments using the comment key/value pair. Here's what that looks like in a simple pattern for arithmetic operators:

twig-operators-arithmetic = {
  name = 'keyword.operator.arithmetic.twig';
  comment = 'Twig Arithmetic Operators';
  match = '(\+|-|\*|/|%|//|\*\*)';
};

Free-Spacing Regular Expressions

Simple comments can add some clarity, but taking it a step further we can improve comments by enabling free-spacing in our regular expression patterns. This helps enormously as we build complex regular expressions and handle less common situations; it gives us a way to leave ourselves a little trail of breadcrumbs back into our own thought process down the road.

We can enable free-spacing using the x option which grants our regex the power to ignore whitespace and comments in the pattern. We can see how this helps add some clarity in the example of a more complex pattern that matches Twig filters ( which may appear in a couple of different formats and where we had to make a few more assumptions about how the pattern works).

twig-filters = {
  name = 'support.function.filters.twig';
  comment = 'Twig Filters.';
  match = '(?x) # Enable free spacing mode
    (      # Match all filters with a pipe character in front of them
          \s?  # optional space
          (\|) # a pipe character
            \s?  # optional space
            \b   # word boundary
            (
             (?!\d)  # make sure our filter does not begin with a number
             [\w]+   # any number of word characters
            )
            \b   # word boundary
        )
        |
        (      # Match all filters that appear after the filter block tag keyword
          (?<=filter\s) # Postive Lookbehind: Keyword "filter" and a space
          \b   # word boundary
            ([\w]+) # any number of word characters
            \b   # word boundary
        )
    ';
};

At the beginning of our regex pattern, we are triggering our options on the subexpression level using the (?on-off) construct: (?x).

Previewing and testing updates and edits

As you define your Language Grammar and see it come to life in your selected theme, you begin to want more details on how the Scope Selectors are being applied to your files. Textmate will show you a popup with the current scopes anywhere in your document if you select some code or place your cursor within the text you want more information about, and press Control-Shift-P.

When you make a change to your Language Grammar file in the Textmate Bundle Editor, you will see how those changes are interpreted by your theme immediately in your editor. The same is not true for themes. If you make an update to your Custom Theme in Textmate, you'll need to reload it before seeing how it's affected by the Language Grammar. For this reason, I found it better to work on the Language Grammar in Textmate and note down changes I wanted to make to a theme, and then work on Custom Theme updates in Sublime Text using the ColorSchemeEditor package which I discuss more below.

Miscellaneous

For additional insights, I will again recommend checking out the Matt Neuberg article I referenced above. It has some great notes about property list syntax, regular expressions, settings, and other gotchas like being aware of the order of your patterns as the order has an effect on how the rules get matched.

Considerations when creating a Custom Theme for your text editor #

We all love our preferred theme and then, sometimes we run into a scenario where we don't love it. As I've dug into customizing themes it's become apparent that – in a modern web development workflow where you may work across any number of Language Grammars on a given project – most themes don't take this into account.

Even on a fairly simple web project these days you might find yourself using some version of HTML, CSS, Javascript, PHP and potentially a handful of custom frameworks in many those languages. Most themes seem to be built with certain languages in mind and an unacknowledged disregard for other languages.

The challenge arises because all Language Grammars need to choose which scope selectors apply to the languages they are defining. With the limited clarity around how those Language Grammars work, and many Language Grammars being maintained by different developers who likely work primarily with one type of code over another, the scope selectors from language to language aren't applied consistently and, following suit, most themes seem to target support for one Language Grammar over another.

To get some insight into the state of affairs, I recommend looking over the Standard Themes and Standard Scopes sections in the Matt Neuburg article and comparing those to the Naming Conventions section of the Textmate Language Grammars doc page. Matt reviewed several themes and identifies common scopes selectors that can be found across themes. While Matt's done a good job pulling the data together, and paints a fairly accurate picture of the state of affairs, it's an unsatisfying list that leaves you with unsatisfying conclusions on how to best approach this for your own Language Grammar.

The main conclusions that can be drawn are:

  1. In practice, many Custom Themes implement scope selectors in a different way than the official list of scopes
  2. To support the most possible themes, plan your scope selectors to be as general as possible

The scope entity.name.tag is commonly used for tags. By using that as your base scope you increase the chances a Custom Theme will support that scope of your Language Grammar at all. Language Grammars can, theoretically, all support the same generic tag scope, and also add a more a more specific scope to allow theme designers to adjust to their specific languages use cases.

For example, if we wanted to add general support in the Craft-Twig Bundle, we might set tags with the scope of entity.name.tag. However, if we want to give designers the option to Customize their themes specific to our use case, we can update that scope to be entity.name.tag.twig. In doing so, tags will likely be styled in most themes, and if any designers want to build a theme that really takes into consideration the Twig use case, they have the control they need to style Twig tags differently than other tags.

Tools for theme building #

The whole reason we care for about defining a Language Grammar is so we have more control over how we style and interact with our code in the text editor. Once the Grammar is in place, any workflow considerations in your text editor can shift more towards usability and problem solving.

The biggest challenges I've had to overcome in understanding theming for text editors have been those of vocabulary and tooling. At the end of the day, it's really just like styling HTML with CSS, except with less flexible class names and uglier tools. Coming to that understanding took a while.

While the following example is wrong in several ways, it has worked for me as a paradigm to approach building Custom Themes and I think it is a paradigm that several web designers more familiar with HTML and CSS than Language Grammars will appreciate. Let's compare how styling a web page in HTML and CSS is like styling a text editor document with a Language Grammar and Custom Theme.

A Web Page with HTML and CSS

We have a css style definition that can vary in how specific it is:

.comment.block.twig { color: #5F5A60 }

And those CSS styles adjust the display of our HTML page:

<div class="comment block twig">{# Comment #}</div>

A Text Editor Document with Language Grammar and Scope Selectors

We define our styles in our Custom Theme file:

{   name = 'Comment';
  scope = 'comment';
  settings = {
    fontStyle = '';
    foreground = '#5F5A60';
  };
}

And the conceptual HTML in our Text Editor would look like the following (our Language Grammar defines what patterns to match and where to place the style classes, which it calls scopes):

<div class="comment block twig">{# Comment #}</div>

There are several tools and packages out there that can help with the process of building Custom Themes. They are not the most well documented tools nor do they have the most intuitive workflows, but with a little effort you can figure out a workflow that works for you and get the hang of them. I'm not going to go into too much detail on where to find or install themes here. In general, it's easiest to get started by duplicating another theme you'd like to use as a base, and making small updates to it until you get a hang of what it would take to start rebuilding a theme from scratch.

The TMTheme Editor helps you edit a color scheme for Textmate, Sublime Text and more. You can upload a theme and edit a theme right in your browser. Double clicking on a Scope Name in the online editor pops up a dialog where you can see the scope you are styling, and it has a color picker right in the interface. You can upload your own code examples and themes, and explore how other themes are styling similar scopes. While handy, it does not allow you to upload your own Language Grammar, so if your Grammar is not in the list, you'll need another way to preview it outside the tool.

To this end, another package I found handy is the ColorSchemeEditor for Sublime Text. When active, you can click on any part of your grammar in an open file and ColorSchemeEditor will take you to the line in your selected theme to review or edit. While it definitely feels a little clumsy to be editing stylesheets via XML files, if your primary task is updating colors, it is at least workable.


This article is primarily about Textmate bundles however the usage of Textmate bundles is much more widespread than just the Textmate editor. Textmate bundles work well with Sublime Text, their Language Grammars can be used with PhpStorm, and Atom has a script which you can run to convert a Textmate bundle into an Atom package (I've had mixed results so far).

While I've done my best to be accurate above, if you see any information that can be improved upon, drop me a note. If the one of you that has read this far is nearby, let's grab a coffee and write poetry using regular expressions. It'll be a blast.