We are generating ever-larger sets of data, far more than users can actually handle, and the technologies for managing the data are increasingly introducing greater complexity. This complexity is hard for end users to manage; particularly at scale and when using rigid, coupled and sometimes counter-intuitively designed interfaces. The way data is processed and generated is as important as the way it is consumed.

The Latin word data is the plural form of datum, which is closely related to the verb to give. In that sense, data may be understood as valuable content that can be given, which also involves asking for it. This is very interesting, since it highlights the need to work out a common language between giving to (machine) and asking for (human or machine), and therefore the importance of providing users with the right interfaces for retrieving data in an intuitive, flexible, consistent and efficient way.

I had the pleasure of attending the awesome "Custom Query Languages: Why? How?" talk by Anjana Vakil at the J on the beach conference this year which further interested me in the topic. Query Domain Specific Languages(DSL) are all around us, almost every database management system out there provides its own Query DSL which is the case for Valo too.

Valo’s core developers are dealing with a fairly complex system. Valo could be defined as a document-oriented database, but it goes beyond that. Valo is also a document-oriented database which provides very powerful text search capabilities for semi-structured data. It is also a column-oriented database which provides a much faster mechanism to perform scan operations through time series data. Valo also provides a great set of in-built complex analytical functions to process, aggregate, reduce or apply machine learning algorithms across your data whether it's historical or real time.

Providing all of these different services increases system complexity, but in the end, it's all about users interacting with data: how well are we able to allow our users to ask for the data they are looking for, and how well are we able to return to our users the data they are looking for. This leads us to work with abstractions that are not leaky, in order to model different scenarios in a decoupled way that allow our users to ask for the specific data they are interested in, in a way that feels natural and intuitive to them in the specific "real time, big data analytics" domain they are working on. This is what a domain specific language is a great tool for.

Domain specific languages expose the semantics that model concepts at the same abstraction level as the domain problems and by exposing that model targeted at the specific domain area, they enable us to share a common interface both users and machines are able to understand and that sits in between the user’s high level requirements and the technical low level requirements. This brings forward the flexibility, simplicity and accuracy users demand when dealing with and managing such highly complex technical systems, and driving us towards a decoupled system architecture which is easier to scale, to maintain and to adapt to constantly changing product requirements, joining together our client’s expectations and the software’s business logic required to work out such expectations properly.

After attending Anjana Vakil's talk (watch it above), we talked further about how a DSL is designed and implemented and the future trends relating to DSL design. Well-designed DSLs, as with general purpose programming languages, have been traditionally crafted by using external tools in the form of lexers and parser generators such as Yacc/Bison or ANTLR. Those tools are great choices if you want to start working on a DSL as they will help you focus on the language syntax and semantic rules without having to implement a lexer and a parser by yourself. Building a parser from scratch is a time-consuming effort but will eventually allow you to achieve a higher degree of control in areas such as better performance, better grammar conflict resolution, better error reporting or even error recovery, which is especially useful if the grammar is very complex and you really care about being user-friendly.

The way a DSL is built begins with the domain model. This may be the most important piece as that's what will make the DSL a fluent interface users may use to solve the specific domain related problems they are interested in. Distillation, extensibility or composability are qualities of a well designed DSL abstract model. A DSL model should expose just the essential domain characteristics, leaving out unnecessary details, it should also be easy to extend so it can grow incrementally without impacting users, and should be composable, leading to high-order abstractions [6]. This is often an iterative process which progresses from an abstract design, its implementation and integration to the final evaluation, making sure it actually fits user requirements.

This is why using parser generators like ANTLR become a better alternative for faster prototyping. ANTLR takes a grammar file with the DSL model rules as the input and generates a lexer and a parser as an output. The user then builds a DSL query that will input the lexer which first splits the query into pieces called tokens which represents meaningful entities easier to process. The list of tokens are then sent to the parser which performs all the necessary processing work and returns an abstract syntax tree (AST) which will be compiled into some other form (i.e. bytecode) or will be interpreted by traversing the syntax tree nodes and running the specific domain business logic related to them.

As an alternative to parser generators, most modern programming languages also provide support to develop domain specific languages by using parser combinators. This alternative allows us to develop parsers in a more declarative way and by using the full set of the host programming language features, which are far more expressive and familiar to developers than using external tools, and provide a very modular, legible structure, that is easily maintainable. If you are looking to implement a DSL parser combinator based solution, have a look at some of the great parser combinators based libraries such as JParsec for Java, Pysec for Python or the standard parser combinators library for Scala.

Whether or not you are using a domain specific language as a solution to your specific business use cases, learning about how to properly design and implement them will help you to build better applications, which are easier to maintain and scale, and might be the key to drive your business to success.

References

  • Debasish Ghosh. DSLs in Action, 2011.
  • Martin Fowler with Rebecca Parsons. Domain Specific Languages, 2010.