Creating a Ruby DSL: A Metaprogramming Tutorial

This article is written by an author from toptal.com ( Origin )

Domain specific languages (DSL) are an incredibly powerful tool for making it easier to program or configure complex systems. They are also everywhere—as a software engineer you are most likely using several different DSLs on a daily basis.

In this article, you will learn what domain specific languages are, when they should be used, and finally how you can make your very own DSL in Ruby using advanced metaprogramming techniques.

This article builds upon Nikola Todorovic’s introduction to Ruby metaprogramming, also published on the Toptal Blog. So if you are new to metaprogramming, make sure you read that first.

What Is a Domain Specific Language?

The general definition of DSLs is that they are languages specialized to a particular application domain or use case. This means that you can only use them for specific things—they are not suitable for general-purpose software development. If that sounds broad, that’s because it is—DSLs come in many different shapes and sizes. Here are a few important categories:

Markup languages such as HTML and CSS are designed for describing specific things like the structure, content, and styles of web pages. It is not possible to write arbitrary algorithms with them, so they fit the description of a DSL.
Macro and query languages (e.g., SQL) sit on top of a particular system or another programming language and are usually limited in what they can do. Therefore they obviously qualify as domain specific languages.
Many DSLs do not have their own syntax—instead, they use the syntax of an established programming language in a clever way that feels like using a separate mini-language.

This last category is called an internal DSL, and it is one of these that we are going to create as an example very soon. But before we get into that, let’s take a look at a few well-known examples of internal DSLs. The route definition syntax in Rails is one of them:

Rails.application.routes.draw do
  root to: "pages#main"

  resources :posts do
    get :preview

    resources :comments, only: [:new, :create, :destroy]
  end
end

This is Ruby code, yet it feels more like a custom route definition language, thanks to the various metaprogramming techniques that make such a clean, easy-to-use interface possible. Notice that the structure of the DSL is implemented using Ruby blocks, and method calls such as get and resources are used for defining the keywords of this mini-language.

Metaprogramming is used even more heavily in the RSpec testing library:

describe UsersController, type: :controller do
  before do
    allow(controller).to receive(:current_user).and_return(nil)
  end

  describe "GET #new" do
    subject { get :new }

    it "returns success" do
      expect(subject).to be_success
    end
  end
end

This piece of code also contains examples for fluent interfaces, which allow declarations to be read out loud as plain English sentences, making it a lot easier to understand what the code is doing:

allow(controller).to receive(:current_user).and_return(nil)

expect(subject).to be_success

Another example of a fluent interface is the query interface of ActiveRecord and Arel, which uses an abstract syntax tree internally for building complex SQL queries:

Post.                               
  select([                          
    Post[Arel.star],                
    Comment[:id].count.             
      as("num_comments"),           
  ]).                               
  joins(:comments).                 
                                    
  where.not(status: :draft).        
  where(                            
    Post[:created_at].lte(Time.now) 
  ).                                
  group(Post[:id])

Although the clean and expressive syntax of Ruby along with its metaprogramming capabilities makes it uniquely suited for building domain specific languages, DSLs exist in other languages as well. Here is an example of a JavaScript test using the Jasmine framework:

describe("Helper functions", function() {
  beforeEach(function() {
    this.helpers = window.helpers;
  });

  describe("log error", function() {
    it("logs error message to console", function() {
      spyOn(console, "log").and.returnValue(true);
      this.helpers.log_error("oops!");
      expect(console.log).toHaveBeenCalledWith("ERROR: oops!");
    });
  });
});

This syntax is perhaps not as clean as that of the Ruby examples, but it shows that with clever naming and creative use of the syntax, internal DSLs can be created using almost any language.

The benefit of internal DSLs is that they don’t require a separate parser, which can be notoriously difficult to implement properly. And because they use the syntax of the language they are implemented in, they also integrate seamlessly with the rest of the codebase.

What we have to give up in return is syntactic freedom—internal DSLs have to be syntactically valid in their implementation language. How much you have to compromise in this regard depends largely on the selected language, with verbose, statically typed languages such as Java and VB.NET being on one end of the spectrum, and dynamic languages with extensive metaprogramming capabilities such as Ruby on the other end.

Building Our Own—A Ruby DSL for Class Configuration

The example DSL we are going to build in Ruby is a reusable configuration engine for specifying the configuration attributes of a Ruby class using a very simple syntax. Adding configuration capabilities to a class is a very common requirement in the Ruby world, especially when it comes to configuring external gems and API clients. The usual solution is an interface like this:

MyApp.configure do |config|
  config.app_id = "my_app"
  config.title = "My App"
  config.cookie_name = "my_app_session"
end

Let’s implement this interface first—and then, using it as the starting point, we can improve it step by step by adding more features, cleaning up the syntax, and making our work reusable.

What do we need to make this interface work? The MyApp class should have a configure class method that takes a block and then executes that block by yielding to it, passing in a configuration object that has accessor methods for reading and writing the configuration values:

class MyApp
  class << self
    def config
      @config ||= Configuration.new
    end

    def configure
      yield config
    end
  end

  class Configuration
    attr_accessor :app_id, :title, :cookie_name
  end
end

Once the configuration block has run, we can easily access and modify the values:

MyApp.config
=> 

MyApp.config.title
=> "My App"

MyApp.config.app_id = "not_my_app"
=> "not_my_app"

So far, this implementation does not feel like a custom language enough to be considered a DSL. But let’s take things one step at a time. Next, we will decouple the configuration functionality from the MyApp class and make it generic enough to be usable in many different use cases.

Making It Reusable

Right now, if we wanted to add similar configuration capabilities to a different class, we would have to copy both the Configuration class and its related setup methods into that other class, as well as edit the attr_accessor list to change the accepted configuration attributes. To avoid having to do this, let’s move the configuration features into a separate module called Configurable. With that, our MyApp class will look like this:

class MyApp
  include Configurable
  
end

Everything related to configuration has been moved to the Configurable module:

module Configurable
  def self.included(host_class)
    host_class.extend ClassMethods
  end

  module ClassMethods
    def config
      @config ||= Configuration.new
    end

    def configure
      yield config
    end
  end

  class Configuration
    attr_accessor :app_id, :title, :cookie_name
  end
end

Not much has changed here, except for the new self.included method. We need this method because including a module only mixes in its instance methods, so our config and configure class methods will not be added to the host class by default. However, if we define a special method called included on a module, Ruby will call it whenever that module is included in a class. There we can manually extend the host class with the methods in

ClassMethods:

def self.included(host_class)     
  host_class.extend ClassMethods  
end

We are not done yet—our next step is to make it possible to specify the supported attributes in the host class that includes the Configurable module. A solution like this would look nice:

class MyApp
  include Configurable.with(:app_id, :title, :cookie_name)

end

Perhaps somewhat surprisingly, the code above is syntactically correct—include is not a keyword but simply a regular method that expects a Module object as its parameter. As long as we pass it an expression that returns a Module, it will happily include it. So, instead of including Configurable directly, we need a method with the name with on it that generates a new module that is customized with the specified attributes:

module Configurable
  def self.with(*attrs)
    
    config_class = Class.new do
      attr_accessor *attrs
    end

    class_methods = Module.new do
      define_method :config do
        @config ||= config_class.new
      end

      def configure
        yield config
      end
    end

    Module.new do
      singleton_class.send :define_method, :included do |host_class|
        host_class.extend class_methods
      end
    end
  end
end

There is a lot to unpack here. The entire Configurable module now consists of just a single with method, with everything happening within that method. First, we create a new anonymous class with Class.new to hold our attribute accessor methods. Because Class.new takes the class definition as a block and blocks have access to outside variables, we are able to pass the attrs variable to attr_accessor without problems.

def self.with(*attrs)            
   
  config_class = Class.new do    
    attr_accessor *attrs         
  end

The fact that blocks in Ruby have access to outside variables is also the reason why they are sometimes called closures, as they include, or “close over” the outside environment that they were defined in. Note that I used the phrase “defined in” and not “executed in”. That’s correct – regardless of when and where our define_method blocks will eventually be executed, they will always be able to access the variables config_class and class_methods, even after the with method has finished running and returned. The following example demonstrates this behavior:

def create_block 
  foo = "hello"            
   return Proc.new { foo }   
end

  block = create_block       

  block.call                 
 => "hello"

Now that we know about this neat behavior of blocks, we can go ahead and define an anonymous module in class_methods for the class methods that will be added to the host class when our generated module is included. Here we have to use define_method to define the config method, because we need access to the outside config_class variable from within the method. Defining the method using the def keyword would not give us that access because regular method definitions with def are not closures – however, define_method takes a block, so this will work:

config_class =  

 class_methods = Module.new do      
   define_method :config do          
    @config ||= config_class.new   
   end

Finally, we call Module.new to create the module that we are going to return. Here we need to define our self.included method, but unfortunately we cannot do that with the def keyword, as the method needs access to the outside class_methods variable. Therefore, we have to use define_method with a block again, but this time on the singleton class of the module, as we are defining a method on the module instance itself. Oh, and since define_method is a private method of the singleton class, we have to use send to invoke it instead of calling it directly:

class_methods =  
 
Module.new do 
  singleton_class.send :define_method, :included do |host_class| 
    host_class.extend class_methods   
  end 
end

Phew, that was some pretty hardcore metaprogramming already. But was the added complexity worth it? Take a look at how easy it is to use and decide for yourself:

class SomeClass
  include Configurable.with(:foo, :bar)
end

SomeClass.configure do |config|
  config.foo = "wat"
  config.bar = "huh"
end

SomeClass.config.foo
=> "wat"

But we can do even better. In the next step we will clean up the syntax of the configure block a little bit to make our module even more convenient to use.

Cleaning Up the Syntax

There is one last thing that is still bothering me with our current implementation—we have to repeat config on every single line in the configuration block. A proper DSL would know that everything within the configure block should be executed in the context of our configuration object and enable us to achieve the same thing with just this:

MyApp.configure do
  app_id "my_app"
  title "My App"
  cookie_name "my_app_session"
end

Let’s implement it, shall we? From the looks of it, we will need two things. First, we need a way to execute the block passed to configure in the context of the configuration object so that method calls within the block go to that object. Second, we have to change the accessor methods so that they write the value if an argument is provided to them and read it back when called without an argument. A possible implementation looks like this:

module Configurable
  def self.with(*attrs)
    not_provided = Object.new
  
    config_class = Class.new do
      attrs.each do |attr|
        define_method attr do |value = not_provided|
          if value === not_provided
            instance_variable_get("@#{attr}")
          else
            instance_variable_set("@#{attr}", value)
          end
        end
      end

      attr_writer *attrs
    end

    class_methods = Module.new do
      

      def configure(&block)
        config.instance_eval(&block)
      end
    end   
    
  end
end

The simpler change here is running the configure block in the context of the configuration object. Calling Ruby’s instance_eval method on an object lets you execute an arbitrary block of code as if it was running within that object, which means that when the configuration block calls the app_id method on the first line, that call will go to our configuration class instance.
The change to the attribute accessor methods in config_class is a bit more complicated. To understand it, we need to first understand what exactly attr_accessor was doing behind the scenes. Take the following attr_accessor call for example:

class SomeClass
  attr_accessor :foo, :bar
end

This is equivalent to defining a reader and writer method for each specified attribute:

class SomeClass
  def foo
    @foo
  end

  def foo=(value)
    @foo = value
  end
  
end

So when we wrote attr_accessor *attrs in the original code, Ruby defined the attribute reader and writer methods for us for every attribute in attrs—that is, we got the following standard accessor methods: app_id, app_id=, title, title= and so on. In our new version, we want to keep the standard writer methods so that assignments like this still work properly:

MyApp.config.app_id = "not_my_app"
=> "not_my_app"

We can keep auto-generating the writer methods by calling attr_writer *attrs. However, we can no longer use the standard reader methods, as they also have to be capable of writing the attribute to support this new syntax:

MyApp.configure do
  app_id "my_app" 
  app_id          
end

To generate the reader methods ourselves, we loop over the attrs array and define a method for each attribute that returns the current value of the matching instance variable if no new value is provided and writes the new value if it is specified:

not_provided = Object.new

attrs.each do |attr|
  define_method attr do |value = not_provided|
    if value === not_provided
      instance_variable_get("@#{attr}")
    else
      instance_variable_set("@#{attr}", value)
    end
  end
end

Here we use Ruby’s instance_variable_get method to read an instance variable with an arbitrary name, and instance_variable_set to assign a new value to it. Unfortunately the variable name must be prefixed with an “@” sign in both cases—hence the string interpolation.
You might be wondering why we have to use a blank object as the default value for “not provided” and why we can’t simply use nil for that purpose. The reason is simple—nil is a valid value that someone might want to set for a configuration attribute. If we tested for nil, we would not be able to tell these two scenarios apart:

MyApp.configure do
  app_id nil 
  app_id     
end

That blank object stored in not_provided is only ever going to be equal to itself, so this way we can be certain that nobody is going to pass it into our method and cause an unintended read instead of a write.

Adding Support for References

There is one more feature that we could add to make our module even more versatile—the ability to reference a configuration attribute from another one:

MyApp.configure do
  app_id "my_app"
  title "My App"
  cookie_name { "#{app_id}_session" }
End

MyApp.config.cookie_name
=> "my_app_session"

Here we added a reference from cookie_name to the app_id attribute. Note that the expression containing the reference is passed in as a block—this is necessary in order to support the delayed evaluation of the attribute value. The idea is to only evaluate the block later when the attribute is read and not when it is defined—otherwise funny things would happen if we defined the attributes in the “wrong” order:

SomeClass.configure do
  foo "#{bar}_baz"     
  bar "hello"
end

SomeClass.config.foo
=> "_baz"

If the expression is wrapped in a block, that will prevent it from being evaluated right away. Instead, we can save the block to be executed later when the attribute value is retrieved:

SomeClass.configure do
  foo { "#{bar}_baz" }  
  bar "hello"
end

SomeClass.config.foo    
=> "hello_baz"

We do not have to make big changes to the Configurable module to add support for delayed evaluation using blocks. In fact, we only have to change the attribute method definition:

define_method attr do |value = not_provided, &block|
  if value === not_provided && block.nil?
    result = instance_variable_get("@#{attr}")
    result.is_a?(Proc) ? instance_eval(&result) : result
  else
    instance_variable_set("@#{attr}", block || value)
  end
end

When setting an attribute, the block || value expression saves the block if one was passed in, or otherwise it saves the value. Then, when the attribute is later read, we check if it is a block and evaluate it using instance_eval if it is, or if it is not a block, we return it like we did before.
Supporting references comes with its own caveats and edge cases, of course. For example, you can probably figure out what happens if you read any of the attributes in this configuration:

SomeClass.configure do
  foo { bar }
  bar { foo }
end

The Finished Module

In the end, we have got ourselves a pretty neat module for making an arbitrary class configurable and then specifying those configuration values using a clean and simple DSL that also lets us reference one configuration attribute from another:

class MyApp
  include Configurable.with(:app_id, :title, :cookie_name)
end

SomeClass.configure do
  app_id "my_app"
  title "My App"
  cookie_name { "#{app_id}_session" }
end

Here is the final version of the module that implements our DSL—a total of 36 lines of code:

module Configurable
  def self.with(*attrs)
    not_provided = Object.new

    config_class = Class.new do
      attrs.each do |attr|
        define_method attr do |value = not_provided, &block|
          if value === not_provided && block.nil?
            result = instance_variable_get("@#{attr}")
            result.is_a?(Proc) ? instance_eval(&result) : result
          else
            instance_variable_set("@#{attr}", block || value)
          end
        end
      end

      attr_writer *attrs
    end

    class_methods = Module.new do
      define_method :config do
        @config ||= config_class.new
      end

      def configure(&block)
        config.instance_eval(&block)
      end
    end

    Module.new do
      singleton_class.send :define_method, :included do |host_class|
        host_class.extend class_methods
      end
    end
  end
end

Looking at all this Ruby magic in a piece of code that is nearly unreadable and therefore very hard to maintain, you might wonder if all this effort was worth it just to make our domain specific language a little bit nicer. The short answer is that it depends—which brings us to the final topic of this article.

Ruby DSLs—When to Use and When Not to Use Them

You have probably noticed while reading the implementation steps of our DSL that, as we made the external facing syntax of the language cleaner and easier to use, we had to use an ever increasing number of meta-programming tricks under the hood to make it happen. This resulted in an implementation that will be incredibly hard to understand and modify in the future. Like so many other things in software development, this is also a tradeoff that must be carefully examined.

For a domain specific language to be worth its implementation and maintenance cost, it must bring an even greater sum of benefits to the table. This is usually achieved by making the language reusable in as many different scenarios as possible, thereby amortizing the total cost between many different use cases. Frameworks and libraries are more likely to contain their own DSLs exactly because they are used by lots of developers, each of whom can enjoy the productivity benefits of those embedded languages.

So, as a general principle, only build DSLs if you, other developers, or the end users of your application will be getting a lot of use out of them. If you do create a DSL, make sure to include a comprehensive test suite with it, as well as properly document its syntax as it can be very hard to figure out from the implementation alone. Future you and your fellow developers will thank you for it.

Visit & subscribe toptal.com for such insightful articles. It's a #1 blog for Engineers.

About the Author of this post :

Máté is a full-stack software engineer with 7+ years of experience working on web applications of all sizes, from small greenfield projects to complex legacy systems. He is extremely organized, communicates very well, and prides himself in writing clean, future-proof code with a low tolerance for technical debt. Máté's formal education is in accounting, banking, and finance; therefore, he has extensive domain knowledge in these fields as well.

Saturday 7 October 2017

Creating a Ruby DSL: A Metaprogramming Tutorial

What Is a Domain Specific Language?

Building Our Own—A Ruby DSL for Class Configuration

Making It Reusable

Cleaning Up the Syntax

Adding Support for References

The Finished Module

Ruby DSLs—When to Use and When Not to Use Them

No comments

Category

Popular Posts

Archive