It's another take on "Haskell with a Lisp syntax".
Would you actually prefer Hakell with Lisp syntax? Or is that not the point, and you just wanted a project that would teach you various things.
It's partly the point.
I'm not confident in this answer, but... I don't think I'd prefer Haskell-of-2020 if you straight up switched it to Lisp syntax. But if you took an early version of Haskell, switched that to Lisp syntax, and then continued evolving that through to 2020, I think I might like that better than actual Haskell-of-2020. (Assuming it managed to get a comparable amount of effort put into it, which is dubious. Not everyone likes Lisp syntax. And assuming the effort was directed by a comparable amount of... taste?, which is also dubious. Like, you're making the language design process more individualistic but also more democratic, there's no way that doesn't have some effect on what you end up with. I don't have strong opinions on whether the effect is good overall.)
Thanks! I've never used Lisp, but all the parentheses strike me as unappealing. Maybe I would like it if I tried it though.
My short experience with Lisp led me to impression that Lisp actually doesn't have more parentheses than C/Java/JavaScript. It only seems so, because it has less of... all kinds of other things.
If you write the same algorithm in Lisp and in C/Java/JavaScript, the total number of parentheses will be approximately the same in both, but the Lisp code will be much shorter. After realizing this, the parentheses stopped bothering me, because their density suddenly felt like a good thing.
(Also, when you write Lisp code, you usually use an editor that highlights matching parentheses, and even inserts the closing ones automatically based on indentation. So forgetting to match them is actually not a problem in practice.)
Aside from the ease of meta-programming with Lisp syntax – as I mentioned in this comment on this post – the other major (historical) reasons why Lisp was nice to use have been greatly copied by newer languages since.
I've found functional programming languages to be roughly as nice as the Lisps I've used previously, and with more 'standard' syntaxes.
But meta-programming can be extremely powerful and thus anything that makes it easier can be pretty useful too.
Clojure was the most recent Lisp (or Lisp-like) language I used. It's very nice and much more 'batteries included' than other Lisps I've played with in the past.
I've been doing a lot of work with Elixir lately. It doesn't have Lisp syntax, but I find it too be very nice in a lot of the ways that Lisp languages often are too.
Open variants (the sum-type version of extensible records) might be on that list too, but I’m not familiar with any prior uses of them so I guess there’s probably something that makes them difficult? Maybe they’re just not very ergonomic in actual use, in which case cool, they’ll fit right in.
OCaml has polymorphic variants which might be related to what you're thinking of.
Yes, thanks! Someone on reddit also pointed me at purescript.
I've realized that since the only language I know with extensible records is Elm, it doesn't say much that I don't know any with open variants.
Nice!
Related to [6], I have a vague hunch that the chief benefit of 'Lisp syntax' is that it's easy to parse and represent as Lisp data. Writing Lisp is much easier with 'paredit' plugins/features in one's editor. I often heavily format other's code to match my idiosyncratic 'visually scannable' style (tho not just in Lisp or Lisp-like languages).
I think I basically agree. If I had to pick a chief benefit (which I don't) I'd say that it enables easy macros - but it does that because it's easy to parse and represent as Lisp data, so to some extent it just depends what level you feel like looking at.
This is a toy language I've been designing, or at least implementing, for about a year.
It's another take on "Haskell with a Lisp syntax". I'm aware of prior art: Hackett, Axel and Liskell. I haven't looked closely at any of them, because doing that seemed like it might make me less likely to keep working on Haskenthetical.
I call it "toy" in the sense of… well, right now it's a toy like an RC plane is a toy. But my vague goal is to make it into a toy like the Wright flyer would have been a toy if it had been built in 2003. I'd like to get it, say, 80% of the way to being a "real" language. I have no intention or expectation of taking it the other 800% of the way. I have no intention or expectation of taking on responsibility-according-to-me here.
(And honestly, even the first 80% is super ambitious. I don't expect to get that far, it would just be nice to. If I never touch the project again after this, I won't consider my time wasted.)
If you're curious, the source code is available here¶1. If you have stack,
stack build --fast
should suffice to build and thenstack exec -- haskenthe -h
to run the executable.So far I've implemented basic Hindley-Milner type inference, the ability to define new types, and pattern matching. The only built-in types are
->
,Float
(which is a HaskellDouble
under the hood), andString
(a HaskellText
). Builtin functions are+
,-
and*
(all of type-> Float (-> Float Float)
anderr!
(of type-> String $a
). I don't yet have division because I haven't decided how I want to handle division by 0. I don't expect I'll come up with anything particularly exciting there, but also I haven't felt the need for division yet.(Actually, the types
Either
and,
are also builtin, along with functions to work with them: constructorsLeft
,Right
and,
, and destructorseither
,car
andcdr
. But that's just because I added them before I added custom types, and I haven't bothered to remove them yet.)I have a long list of things I'd like to include in future. Probably the ones that interest me most right now are macros2, extensible records, and compilation. I don't know how macros and compilation are supposed to fit together, but I have some vague ideas in my head. And clearly it's been done in the past, so I assume I can find out how.
Other things include IO, comments, imports, exhaustiveness checking and FFI. Maybe typeclasses, but I'm curious whether macros and lazy evaluation can make those less useful. Maybe lazy evaluation, but I'm on the fence about that.
Open variants (the sum-type version of extensible records) might be on that list too, but I'm not familiar with any prior uses of them so I guess there's probably something that makes them difficult? Maybe they're just not very ergonomic in actual use, in which case cool, they'll fit right in.
What I have so far isn't very interesting as a language, but it might be interesting enough to be worth writing about.
General language overview
There are seven types of expression right now. I'm going to assume you'll understand most of them just by name.
Literal values are Strings and Floats. Variables are bare words with very few restrictions. (Right now they can contain any combination of printable characters other than whitespace,
"
,(
and)
; except that they can't start with something that would parse as a Float.) Lambdas have the syntax(λ (arg1 arg2 ...) expr)
, or if there's only one argument,(λ arg expr)
is also acceptable3. Function calls, unsurprisingly, look like(func arg1 arg2 ...)
, where all of those are expressions.There are two forms of let-binding. Syntactically they're similar:
([let/letrec] ((name1 expr1) (name2 expr2) ...) body-expr)
.let
is nonrecursive, so thatexprN
can only refer tonameM
for M < N. (You can reuse names, e.g.(let ((a 1) (a (+ 1 a))) a)
gives 2.)letrec
is recursive, so that anyexprN
can refer to anynameM
. (I haven't implemented any checks to forbid you from reusing names here, but probably only the first or last use of a name would have any effect.)Finally there's pattern binding with
(if~ val pat if-match else)
. Patternpat
can be a literal string or float, or a variable name prefixed with$
, or a constructor name possibly with other patterns as arguments. If the value matches the pattern,if-match
gets evaluated, with any variables bound to the relevant parts of the pattern. Otherwise,else-match
gets evaluated. For example:(I'm leaning towards
#
for comments when I get around to that. Also, I'm assuming here that theMaybe
type has been defined.)There's no equivalent of a
case
statement that matches the same thing against multiple patterns. For that you'd need nestedif~
, and there's no exhaustiveness checking (i.e. nothing that would say "you missed a possibility").This is all typechecked, so that you get a compilation error if you try to multiply a string by a float or whatever.
You can add explicit type declarations to expressions, or to parts of patterns or to the variables in
let
orletrec
bindings.4 A type declaration looks like(: expr type)
and a type is either a bare name for a type constructor, or a$name
for a type variable, or a type application like e.g.Maybe $a
or-> (Maybe Float) String
. The root of a type application has to be a constructor, not a variable.If a type declaration is more specific than it could be, it constrains the type of the expression; if it's more general, that's an error5:
Apart from expressions, the statements I've implemented so far are
def
for global definitions, andtype
to declare a new type.Currently all the
def
statements get pulled together and brought into a singleletrec
around the top-level expression. (Each program currently is required to have exactly one of those.) Sois sugar for
Type declaration introduces new types, constructors, and eliminator functions. For example,
introduces three values into the environment:
Nothing
of typeMaybe $a
;Just
of type(-> $a (Maybe $a))
; andelim-Maybe
of type(-> $a (-> (-> $b $a) (-> (Maybe $b) $a)))
6. This last is the standard Haskellmaybe
function, but you get one for free whenever you declare a type.Other type declarations would look like:
(I'm tempted to name the standard unit type and its value
∅
instead. That's a bad name for the type, which is not an empty set, but it's a decent name for the value. It would be a fine name for the void type, but that type isn't useful enough to deserve such a concise name.)letrec
has something that's either a bug or, generously, a "missing feature that looks an awful lot like a bug when you don't realize that you're expecting it to be there". The way the typechecking works, inside the bindings forletrec
, you can only use each bound variable at a single type. So you can't dobecause that uses
id
at types-> Float Float
and-> String String
. (Never mind that if you could it would be an infinite loop.) Haskell has this limitation too, though I'm not sure I've ever run into it naturally; I couldn't think of a non-contrived example.In Haskenthetical, this applies across an entire binding group. So you also can't do this:
But that example would work if you translated it to Haskell. What gives?
Well, since
id
doesn't depend onsome-float
orsome-str
, you could easily rewrite that example asAnd it turns out that Haskell just does that transformation for you automatically. It figures out what depends on what and groups them in such a way as to impose the fewest possible restrictions. If you make that impossible by adding some contrived mutual references, you can make Haskell fail in the same way:
(You actually only need to reference one of
someFloat
orsomeStr
, because onceid_
is used at a specific type, it no longer generalizes toa -> a
in the body of thelet
.)I haven't implemented this in Haskenthetical yet.
Implementation
I don't think there's anything particularly exciting about the implementation, if you're familiar with such matters. But for those who aren't, and who want to hear about them from me, read on.
I parse¶ the input text into a list of syntax trees using Megaparsec. The syntax tree only knows about a few types of token:
Then I parse each tree into a statement (or expression, but that's just a type of statement) by recognizing specific
STBare
values (at the head of anSTTree
) as needing special handling and passing everything else through to "assume this is a function getting called".Typechecking¶ is Hindley-Milner. When I wrote that essay, I said I didn't know how to implement HM typechecking. I have some idea now, and would describe it vaguely like this:
Of course it's more complicated than that. For example,
let
andletrec
need you to run solving during the unification phase. Also, declared types need to be treated specially so that you can reject if the user declaresJust 3
asMaybe $a
.Aside, a thing I don't fully understand: I haven't tried timing it, but this implementation looks to me like it's something like O(n²) in the size of the input. It's supposed to be roughly linear. I'm not sure if I'm missing something or if there's just a more efficient algorithm.
Anyway, that's roughly how I do it. I take this approach mostly from Write You a Haskell (notably chapter 7, section "constraint generation"7, but also other chapters were useful for other parts of Haskenthetical). But I had to figure out how to handle
letrec
myself, because the language implemented there usesfix
instead8. I also took a lot from Typing Haskell in Haskell, especially pattern matching. (I hadn't discovered it by the time I implementedletrec
.) Neither source implements explicit type declarations9, so I had to figure out those for myself too. I'm not convinced I did a very good job.Finally, evaluation¶: for the most part that's fairly straightforward. For example, when we evaluate a variable, we look up its value in the environment. When we evaluate a
let
, we evaluate something, add it to the environment under the relevant name, and go on to the next thing. There are a few types of values¶ that we need only when evaluating:λ
expression. It captures a snapshot of the current environment, the name of the argument, and the body expression. If a λ has multiple arguments, it returns nested closures.Val -> Either Text Val
(plus a name to distinguish them). Builtins and closures are ultimately the only things that can be called as functions.letrec
needs them because we can't evaluate bindings before adding them to the environment or we'd get infinite recursion. Type eliminators are builtin values, but theVal
they return is a Thunk (with empty environment) to avoid the Haskell file Env.hs from having to reference Eval.hs.Text
under the hood) paired with a list of other values. Constructors wrap their arguments in a tag, and eliminators and pattern matching compare those symbols. There's no way to look at or manipulate the symbol directly in Haskenthetical, but I'd be curious to explore that direction.I'll mention a couple other things that might be of note. These probably require more background knowledge of Haskell to make sense.
Firstly: I have the data type
which some types use as a parameter, like
This lets us use a slightly different type
TVar
in different parts of the codebase. When we've merely parsed the program, we have no way to tell the kind of a type variable, so we haveNoExt
there. When it's been typechecked, the kind is known, so we include it. If there was a pass in which type variables simply shouldn't exist, we could writeand we wouldn't be able to use a
TVar
in that pass at all.This technique is called "trees that grow", and I copied it directly from GHC. I'm not currently using it everywhere I could, for no principled reason that I can recall. There's a chance it'll be more trouble than it's worth at the level I'm working at. An annoying thing about it is that you can't use a regular
deriving
clause, so I havewhich kind of sucks10.
Secondly: builtin functions are kind of a pain to write manually. For example,
either
was previously defined asBuiltin $ Builtin' "either" heither
where(
Builtin
is a constructor of typeVal
containing aBuiltin
, andBuiltin'
is the only constructor of typeBuiltin
. These names do not spark joy.)It works, but it always felt like I should be able to do better. I spent a while trying to figure that out and now the value is simply
heither
whereI dunno if this is much better, honestly, but there we are. It needs
ApplicativeDo
; I never managed to either figure out a Monad that could do this, or prove that no such monad exists. (There's no Monad instance for the specific type that I use to implement this, because to writejoin
for that monad you'd need to be able to extract the inner[w]
from([w], r -> ([w], r -> a))
without having anr
to pass to the outer function, and that's not a thing that even makes sense to be able to do11. But there might be a different type that enables what I'm trying to do and does admit a Monad instance.)So that's where it's at right now. Feel free to point out ways that it sucks, although not-sucking isn't the point. I'm also interested in pointers to how I might implement some of the things on my future list (I'm aware of Implementing a JIT Compiled Language with Haskell and LLVM), or other cool things I may like to put on that list, or even things you might happen to like about Haskenthetical.
I'm using ¶ to indicate links to the state of the repository as of this writing. ↩
I don't think I'll try for hygienic macros, despite recent events. My only experience with those has been in the small amount of Racket I've worked on, and I didn't manage to get my head around them. ↩
I want unicode in languages to be more mainstream. There are good reasons why it's not, but at least some of those are chicken-egg problems. For example, most people aren't set up to easily write in unicode, but that's partly because most people never have to. Fortunately, I'm in a position where I can ignore all the good reasons not to do something. ↩
While writing this I realized that while you can attach them to
λ
params as well, those currently aren't typechecked at all. ↩But the error doesn't seem to work for type declarations in pattern bindings. That's another thing I noticed while writing this. ↩
Gee, you ever think maybe there's a reason Haskell doesn't use Lisp syntax? I feel like Lisp syntax kind of needs variadic applications to be readable, but Haskell semantics don't go well with those. I'm hoping to solve this disconnect with macros. ↩
Be aware that the implementation of let on that page doesn't work. It's been fixed in the repository, but not on the website. ↩
It's possible¶ to implement
fix
in Haskenthetical withoutletrec
, so maybe I didn't need to figure it out. I could have just waited until I get macros and then implementedletrec
in terms offix
. ↩THIH does have them for binding groups (like found in
let
and at the top level), but not expressions. That made me wonder if those weren't in the Haskell 98 report, like how Elm doesn't have them. But they're there: §3.16, "Expression Type-Signatures". ↩If it annoys me too much, I can enable
UndecidableInstances
and do↩
You could actually get somewhere by passing in
undefined
, as long as the inner[w]
doesn't depend on the outerr
and everyone involved is careful about strictness. I don't recommend this. ↩