[CMake-Promote] Re: Language barrier

E. Wing ewmailing at gmail.com
Thu Dec 29 17:04:37 EST 2005


On 12/28/05, Brad King <brad.king at kitware.com> wrote:
> E. Wing wrote:
> > I think the biggest barrier to adoption of CMake is the language. (And
> > I struggle with the language myself at times.)
>
> There is a long history behind the state of the current CMake language.
>   The short version of the story is that it was not originally intended
> to be an imperative programming language at all.  It was originally
> meant to be a configuration file format to build a specific project.
> Over time its use was broadened to other projects and the language was
> developed and transformed as needed but had to mostly remain backwards
> compatible.  No one ever sat down to design a programming language.  The
> closest thing to that was when we wrote a real lexer and recursive
> descent parser to solve the parsing prolem and allow multi-line strings.


This story is fairly common which is intertwined with Lua's
popularity. Typically, somebody starts off with a simple configuration
language for a domain-specific problem. Then the language grows and
grows until it starts becoming a hinderance for things like new
features or performance because it was never designed to do any of
this. Many of Lua's user base came from this exact problem and then
adopted Lua to overcome it.

One of Lua's most famous success stories, told by  Bret Mogilefsky,
which was also the first commercial use of Lua in a game and what
launched Lua's popularity/domimance in the game industry comes from
this exact problem. LucasArts had their legacy SCUMM scripting engine.
The original engine seemed to be developed to address problems as they
needed and wasn't designed as a general purpose language. It was a
very nice and forgiving language, but as they added new capabilities
for new games, limitations arose and certain things they wanted to do
just couldn't be done. Many had tried writing a next-generation SCUMM
but all had failed. Then they got a new hire fresh out of school named
Bret Mogilefsky for Grim Fandango who didn't know what he was getting
into and volunteered to tackle the problem. He heard about Lua from a
colleague mentioning an article in Dr. Dobb's journal and decided to
try it. He got the initial embedding to throw up text onto a screen in
the first day. The interpreter was only 30KB at the time compared to
the multiple megabytes of SCUMM at the time. (He actually thought
there was a problem with the compile because the binary was so small
so he contacted the authors for help. They had to tell him that was
about the right size.)

Anyway, Lua turned out to be a major success for them. At a future
Game Developers Conference, there was an open panel to discuss how to
lend games to customization/scripting as it seemed to be the new
future, but nobody knew how to do it effectively. Many evil things
were discussed like interpreted C++. Near the end, there were 200
"miserable" people. Then Bret's colleague from LucasArts said, "Or you
could just use Lua like Bret did". Bret was dragged up (unprepared)
and gave a quick spiel on Lua. (It's fast, small (30k), easy to embed,
free, got running quickly, easy to learn for content creators
(non-developers), etc.) Everybody was furiously writing notes. This
event is credited to Lua's adoption and popularity in the game
industry.


Anyway, Lua origins are also from a configuration language and I think
it transformed to the language that it is today because of the same
realizations you have encountered.  Lua still retains elements of its
syntactic sugar for configuration/ descriptions which makes it still
an ideal candidate for configuration based projects.



> In the earliest days of CMake when it was still integrated in another
> project other developers suggested using Tcl or Python as the language.
>   Since at the time it was a configuration file format with a 50 line
> parser implementation the idea of bringing in a whole language seemed
> like overkill.  We also didn't want to become dependent on the other
> language's maintainers to support all our platforms and maintain the
> language over time.

Yes, I can imagine maintaining a language and getting it to compile on
all platforms is a pain. That's why I didn't recommend starting with
Python (or the others), because all of its features you typically
don't need which probably creeps into platform specific code which
complicates the ability to build reliably. I don't have much
experience with Tcl but remember hating the parser I was using because
it was really picky about whitespace and where braces went, and it was
really bad about reporting these errors. I also remember Tcl was kind
of slow.

Lua on the other hand is tiny and 100% ANSI C in the core so it is
really easy to build. People build and use Lua on all sorts of strange
embedded platforms so the community is very careful about keeping
platform specific code out of the core. The Lua standard library is
also very small (adds maybe 50KB compiled). It is also mostly ANSI C,
but they made an exception with the philosophy by adding dynamic
library support calling it the "mother of all features". Once you can
load dynamic libraries, you can access any potential feature you need
for your specific project. But you can opt out of this if you don't
have dynamic library support or don't want it.

Lua also deviates a little in philosophy from other projects in that
they encourage static linking instead of dynamic linking. They expect
you to embed Lua directly into your app so you aren't affected by
incompatibilities between versions of Lua or if you modify Lua itself
to meet specific needs. Lua is small and simple enough to do this.
They provide source code for all previous versions starting with
version 1 on the website so you don't have to worry about losing
access to the code, or being forced to upgrade when a new release
comes out.


> > I know this is a non-trivial task, but have you considered adopting a
> > secondary language that people can write project descriptions in? For
> > example, if you could write CMake descriptions in Python instead of
> > the current language, most of the SCons vs CMake discussion would die
> > immediately.
>
> I have thought about this idea many times since the use of CMake has
> broadened.  No matter what language we pick there will always be people
> who don't like that choice, but having a more general purpose language
> would be useful.  Lua seems to have a compatible license at first glance.
>

Yes, no matter what you do, somebody will not like it. But from my
vantage point, a well defined/documented general purpose language
would be very useful. And if  adopting a general purpose language is a
path you are willing to go down (I have no delusions that this isn't a
serious task), I think Lua would be the best one to go with (at least
to start with). It's simple, small, and fast. It seems to have become
one of the most popular and dominant languages for embedding. It has a
pretty good reputation in that few people seem overtly hostile to it
(compared to say Perl which can conjure up many arguments about the
syntax and readability). Lua's syntax is also probably familiar enough
to a C/C++ programmer (which is probably CMake's main user base) where
they can kind figure it out at first glance without knowing the
language and won't be confused/intimidated/repelled by it (as opposed
to perhaps something like Scheme). And since Lua is a prime candidate
for other C/C++ based projects to embed, potential CMake users might
already use Lua or might see it as an opportunity to learn it with the
potential of using it later in their own projects which could be a
good marketing point. Contrast to Python, the dominant Python culture
seems to prefer to do everything from within the language instead of
embedding, so the subset of these people who would actually need/use
something like CMake may be proportionally smaller.

Another thing that hasn't really been mentioned is that it could
potentially reduce the amount of code in the core CMake engine.
Depending if you ever phase out the current language, you could remove
the custom interpreter code. I'm also wondering if Lua's macro
facility is powerful enough to emulate the current CMake language. You
might imagine the whole CMake language being implemented through Lua.

In addition to possibly removing the need for the custom interpreter
code in the core engine, you might be able to move a lot of the core
C++ code over to Lua as well. Typically people who embed Lua complain
after they are finished with their project, they wrote way too much
code in C/C++ and should have pushed more in Lua because for most
cases, the performance difference is unnoticeable. I noticed it takes
quite awhile to compile CMake because of all the C++ code. It would be
neat to see this build time reduced.

As for the license, yes it's very liberal and you shouldn't have any
problems. They basically want Lua free to use and want to use the
license with the fewest words (which led them to MIT).


> Another approach would be to follow Kitware's namesake (note "Kit") and
> turn CMake into a build-system generation toolkit (CMTK is taken but
> CMakeTk is not).  We could define a set of objects storing all the
> information needed to generate a build system and define an API to
> construct them.  Another part of the API would control creation and
> execution of generators.  It could even provide callback to support to
> allow customized generation.
>

Another thing to consider is to set your eyes towards not some much a
scripting language, but a (cross-platform) GUI mock IDE where people
can construct build descriptions visually without writing any code.
This is the other way to get over the "language barrier". But this
would be hard. A variant of this form would be the ability to take a
project already created by Visual Studio or whatever, and convert it
to  a CMake representation so it can be translated into other
projects. But you miss out on platform specific stuff and features you
will probably need that don't exist in the IDEs.

By the way, CMakeTk makes me think of the widget toolkit Tk (as in
Tcl/Tk). It might not be a good name to use. Maybe CMakeKit would be
better.

> Then the API could be wrapped into any number of languages, much like
> Kitware's VTK (www.vtk.org).  The current CMake language itself could be
> one of those languages for backward compatibility and people who like
> it.  The wrapping process could be made part of the client project's
> build so that each project could choose the language in which it wants
> to configure the build system without needing to distribute support for
> all languages with CMake.

I'm not sure what the client side wrapping process would be. If it's
too hard to setup, then most people will probably not use it. I'm
thinking it can't be more than one or two easy steps before you lose
people.

> We would have to be extremely careful about how the wrapped API would
> communicate with the CMake binary.  It would probably have to run in a
> separate process and use pipes to communicate.  Otherwise all client
> machines would have to be able to compile binaries that are
> dynamic-link-compatible with the CMake binary, which has been a real
> pain for the now little-used loaded command support.  Once pipes are
> used then the client language might be able to communicate with CMake
> without even having a C compiler, as long as the language supports pipes.

This sounds very hard and scary, particularly from a cross-platform
perspective and also from a debugging perspective. I would suggest
much simpler options.


1) Make Lua (or whatever Python, Ruby, foo-mainstream language) a
secondary language and continue to support both the current CMake
language and the new one. There might be considerations to support
additional languages in the future as well which would be hard coded
directly into CMake. (None of the interprocess stuff.)

2) Once the Lua (or whatever) support is adopted and if it is
successful, adopt Lua (or whatever) as the new official language and
phase out the current one over time or make it a legacy support thing
with few or no new features added.

3) Add Lua (or whatever) with an eye towards something like Parrot.
I'm not a Parrot expert, but certainly the ideal situation would be
(assuming Parrot ever gets sorted out) that a Lua-mod comes out that
supports the common Parrot bytecode format (or can translate to Lua
bytecode) and then you could automatically support any Parrot
supported language (which would probably be any and all of the
mainstream scripting languages). Then the language barrier completely
drops out of the picture as well. I think this would ultimately be
cleaner and more portable than the interprocess idea. However, it is
very contingent on things that haven't been developed yet :)


Thanks,
Eric



More information about the CMake-Promote mailing list