Language Design Deal Breakers

I’m a bit of a programming language nerd. I love to learn new languages. That said, I spend most of my days writing C++. It’s a truly awful language, with many warts and problems, but I know it well and with enough effort and pain you can get the job done. The biggest benefit of C++ is purely an historical accident: it’s great because it’s popular. There’s no trouble finding people who know it, or libraries that work with it.

The point is, that in order for me, or anyone else, to actually switch from C++ to a different language, there has to be a substantial improvement to overcome the inertia that C++ has built up over the years.

I find that when I look at a new language, there are a number of “deal breakers” which simply mean that regardless of all its other nice features I will never take it as a serious contender. Note that this isn’t a fair fight. Something can be a deal breaker even if C++ doesn’t have that feature either. Any language has to be so much better than C++ that the benefits outweigh the inertia. A deal breaker is a feature that either A) puts it at a disadvantage compared to C++ or B) puts it on equal footing with C++ in an area where C++ is doing particularly poorly.

Here’s my list of language design requirements, which if unmet would be deal breakers for me, what’s yours?

1. Must be statically typed

I know you’re supposed to be diplomatic and claim that there’s two sides to this story, and no real right answer, but really people who think dynamic typing is suitable for large scale software development are just nuts. They’ll claim it’s more flexible and general but that’s just nonsense in my opinion. It may be true compared to utter strawman examples of static typing. If you think this is an argument, go learn Haskell, and then get back to me.

There’s a wonderful correspondence between programs that can be statically typed, and programs that can be statically understood by other people. In other words, programs that you can understand without having to run a debugger or REPL, by just reading code. If you’re writing programs that rely on dynamic typing to work, you’re probably writing exceedingly difficult to understand code. If you are writing code that can be statically understood by other people, you’re probably writing down comments to indicate the allowable types and preconditions anyway (or you should) – so you might as well have those properties checked by the compiler.

Most code does not rely on dynamic typing to work. They could be written just as well in statically typed languages. In that case it’s just a matter of convenience – make specifying types light-weight enough that the extra syntactic overhead doesn’t outweigh the benefits (type inference helps, even the simple “local” kind that C# and C++ 11 has cuts down on much of the noise).

Then there’s the correctness issue. Static typing catches mistakes early. This is a good thing. Now, dynamic typing advocates will say that of course any _real _program will also have a unit testing suite, so types can be checked that way! This is obviously naïve in the extreme. In the real world writing unit tests for everything very frequently fails the bang-for-buck test – the cost is just too high. This cost comes in two forms, first the cost of writing the tests, and second the friction it introduces to future modifications and refactorings – after all if you have to update 20 unit tests to refactor a messy interface, you’re a lot less likely to do it. Examples of code you probably don’t want to write unit tests for includes gameplay logic which gets written and rewritten twenty times before ship as the result of iteration. Having to write test harnesses and tests for all this throwaway code is madness. Then of course there’s code that’s just plain hard to unit test, such as graphics. The point is that pervasive unit testing is a fantasy – it may be feasible in some domains, but it’s certainly not the case everywhere. Static typing is a great compromise – you basically document the things you would’ve put in comments anyway, but you do so in a standardized way that can be checked by the compiler. This catches many errors for very little cost. When unit tests makes sense (as they sometimes do), you can add those too for even more coverage, but when they don’t you still have a basic sanity check.

Finally, performance. Yes, JITers can be very clever and employ tracing and stuff to close some of the gap here, but of course those techniques could be used by a statically typed language implementation as well. If we can statically decide what methods to call and what types to use we will always be at an advantage compared to a language where this has to be determined at runtime. Not to mention that statically typed languages will encourage you to write code that can be executed efficiently (e.g. variables don’t change types over the course of a program). Also, static typing allows you to make performance guarantees that won’t be randomly broken tomorrow. For example you can store an unboxed array of values, or store a value on the stack or embedded within another object. You don’t have to rely on a runtime tracing system to uncover and apply these optimizations (with typically no way to ensure that it actually occurred), which also means you don’t have to worry that you’ll fall off a performance cliff in the future when something subtle changed which caused the JITer to fail to optimize something. The programmer is in control, and can statically enforce important performance-related choices.

Of course if you want to allow dynamic types as an option for a subset of cases where it makes sense (e.g. dealing with untyped data read from external sources), that’s fine.

2. Memory safety

C++ fails this one, but it’s still a deal breaker. If I was content with worrying about memory scribbles and random access violations I’d just stick to C++. In order to offer enough benefits to convince me to switch from C++, your language needs to be memory safe.

Now, let me also state that I think the language needs an “escape hatch” where you can tag some function or module as “unsafe” and get access to raw pointers, the point is that this needs to be something optional that you have to explicitly enable (using a compiler switch, ideally) for very isolated cases, and you should never be required to use it for typical “application level code”.

This also implies some kind of guarantees about memory reclamation…. We’ll get to that.

3. Yes, memory safety – no null pointer exceptions!

I use a broader definition of memory safety than some. In my view, if a program follows a reference and crashes because the reference is invalid, it’s not a memory safe language. This implies that in addition to making sure you never point to stale memory or memory of the wrong type (through type safety and automatic storage reclamation), you also have to make sure that a reference never points to null.

Null pointer exceptions have no business in modern languages. Getting rid of them costs no performance, and statically eliminates the one big remaining cause of runtime crashes that we still see in otherwise “modern” languages. Yes, it requires some thought (particularly w.r.t. how you do object construction), but this is a solved problem. There is only one valid reason not to do this and that’s legacy code – perhaps you didn’t know what you were doing when you started off, not realizing the issue and how simple the solution is, and now you have too much legacy code to fix it. A language without legacy issues should get this right, and yet many don’t.

Note: it’s not enough to simply have a “non-nullable-pointer” type, not even if it’s checked at compile time. You have to make sure that nullable pointers cannot be dereferenced. In other words, using the pointer-dereferencing operator on a nullable pointer should literally be a compile-time type error. You should have to do a check of the pointer first which introduces a dynamic branch where on one of the sides the type of the pointer has been changed to be non-nullable, which enables the dereferencing operator again.

Having every few statements potentially trigger a runtime crash is an extremely expensive price to pay… for nothing! We’re not talking about a pragmatic tradeoff here, where you pay the cost of reduced reliability but get something else in return – there’s just no real upside to allowing dereferencing of potentially null pointers.

4. Efficient storage reclamation

I mentioned above that I require automatic storage reclamation, well I also require it to be efficient. Too many modern languages fail this. I don’t really mind how they achieve performance, but two ways you could do it is to support pervasive use of value types (i.e. types that get embedded within their owners – which drastically reduces the number of pointers in your heap, and therefore the cost of GC), along with thread-local heaps. Concurrent collection is another solution, though you’d have to be careful not to incur too much of an overhead on the mutator with whatever memory barrier you use. I’ve written about this issue twice in the past: one two

5. Great Windows support

Windows is still the most popular OS by far. If you don’t give me installers, or easily-buildable code, for windows I’m not buying. I realize a lot of academia is used to *nix, which causes a disproportionate preference for linux and mac in projects such as this, but in the real world your customers actually use windows so you should make sure it works.

Conclusion

The most common one of these deal breakers is probably dynamic typing. I usually just close the tab immediately when I see that a language is dynamically typed. So while strictly speaking this is the most common issue, it doesn’t feel like it because I never invest any amount of time even considering those languages anymore. They may have some benefits in terms of theoretical elegance (e.g. Lisp, or even Lua or IO), but for actual work I’ve seen too much of the benefits of static typing, and too much of the horrors of not having it. It’s so obvious, that it barely qualifies as a deal breaker – it’s not even in the running to begin with.

So really, the most common deal breaker is probably point 3 above - Null pointer exceptions. Whenever I see promising languages like Go, D or Nimrod, that still repeat Hoare’s billion dollar mistake I get a bit sad and frustrated. It’s like seeing a beautiful, elegant, car that has just one tiny flaw in that the wheels occasionally fall off. It simply doesn’t matter how many awesome and innovative features you have if you don’t get the fundamentals right.

sebastiansylvan