no synonyms
[...]
Use compound words.
These two goals conflict. When compounding is common, there will inevitably be multiple reasonable ways to describe the same concept as a compound word. I think you probably want flexible compounding more than a lack of synonyms.
Thx.
Yep there are many trade-offs between criteria.
Btw, totally unrelatedly:
I think in the past on your abstraction you probably lost a decent amount of time from not properly tracking the distinction between (what I call) objects and concepts. I think you likely at least mostly recovered from this, but in case you're not completely sure you've fully done so you might want to check out the linked section. (I think it makes sense to start by understanding how we (learn to) model objects and only look at concepts later, since minds first learn to model objects and later carve up concepts as generalizations over similarity clusters of objects.)
Tbc, there's other important stuff than objects and concepts, like relations and attributes. I currently find my ontology here useful for separating subproblems, so if you're interested you might read more of the linked post even though you're surely already familiar with knowledge representation (if you haven't done so yet), but maybe you already track all that.
Criteria for creating a good language
Phonology
Vocabulary
Easy to learn grammar
Avoiding misunderstandings
Making rational thinking easier / Making irrational thinking harder
Allowing high expressivity. Allowing clear precise thinking
Ability to not needing to include much more information than is relevant
Conciseness: Be able to communicate quickly without needing too many syllables
Other criteria
Proposing an approach for creating a language
TLDR: Often translating sentences into formal representations probably is probably useful practice for getting a sense of how to design a good language. [1]
(The following is sorta low-quality. Unless you're specifically interested in designing a language, I basically recommend to stop reading here.[2])
(Thanks to claude-3.7 for helping me phrase parts of this section.)
The "starting formal" approach
One approach, for designing a good grammar, is to start with a formal system, in which everything that can meaningfully be expressed in natural languages can be expressed, and then practicing expressing lots of sentences in that framework, and then adding parsing rules for making expressing statements more convenient.
Advantages of the approach
Unambiguity: Logical statements in formal systems have precise meanings. This clarity can be preserved while adding convenience-oriented parsing rules.
Enhanced inference capabilities: Maintaining proximity to formal logical representations may facilitate easier inference, potentially enabling more effective recognition of conceptual connections. Furthermore, it might make it easier to see when an argument is actually supporting a position, vs when it doesn't directly or there's just a vague resemblance.
Strengthened argumentation: The formal structure could enable more proof-like chains of reasoning, potentially allowing for more precise articulation of complex positions. This enhanced concreteness might also facilitate pinpointing specific flaws in arguments by making each logical step explicit and examinable. (Though the extent of this benefit remains speculative.)
Canonical expression: The initial formal system largely provides a standard way to express any given concept, eliminating the need to process multiple equivalent formulations. This property might make cognitive processing more efficient, in a similar way to how simplifying statements in automated theorem provers makes proof search more efficient.
(Though this canonicity has limitations. Introducing synonymous predicates can undermine it, and predicate logic itself offers alternative expressions (e.g., "NOT EXISTS" vs. "FORALL NOT"). Establishing conventions for preferred forms can help address this challenge.)
Promotion of precision: At least in my experience, working within formal systems encourages precise thinking. The process of trying to express something formally often highlights vague concepts, motivating the use of more concrete and well-defined predicates. Among other things, this can be useful to avoid Motte-and-Bailey fallacies.
Disadvantages of the approach
Foundation-dependent quality: The effectiveness of this approach hinges entirely on the initial formal representation chosen. An inadequate foundational system will likely propagate its limitations throughout the language.
Missing benefits from language evolution: Natural languages evolve organically over centuries, developing solutions to communicative challenges through distributed experimentation. A designed system bypasses this evolutionary process and may encounter unforeseen problems that natural selection would have addressed.
Potential learning barriers: There's a risk that the resulting grammar might present increased acquisition challenges for young language learners. The formal underpinnings could create cognitive hurdles not present in naturally evolved languages, though not sure.
Defining shortcodes on top of our formal language
This section directly builds upon the formal statement representation presented in my post "Introduction to Representing Sentences as Logical Statements".
I didn't practice that much yet in expressing statements in the formal logic system, and don't know yet which parsing rules might be most needed, but here I make two examples for parsing rules that seem likely useful.
Agentic causes
Most English sentences which express events contain an agent who usually can be seen as causing the event. Consider the sentence "Alice gave the pen to Bob". We could express this in our system as:
{[t1,t2]: giving(Alice, Pen, Bob)} CAUSES {[t2, t2+delta]: holding(Bob, Pen)}
Where "giving(x1,x2,x3)" could be more precisely expressed as something like "x1 holding x2 in hand and moving it in the direction of the location of x3".
However, we might want to also be able to just say "Alice caused Bob to hold the pen" without needing to specify the method. The need for being able to treat agents as causes can be more directly seen in examples like "The chef caused the soup to taste fantastic", "The parent made the child do his homework", and "The doctor healed the patient".
However, the "CAUSES" connective in our system only connects statements. To nevertheless naturally capture the possibility of agents causing statements, I propose that when we want to express
"A causes X"
, where "A" is an agent and "X" is a statement, this can usually be interpreted as "The fact that A was trying to achieve X and A was competent enough to achieve X, caused X".[3]Thus, I propose adding a keyword "causes" with the parsing rule:
"A causes X"
gets parsed into"{try(A, X) AND can(A, X)} CAUSES X"
Having added this to our language, we can now express some statements more concisely, e.g.:
"The doctor healed the patient."
Doctor causes {health(Patient, high)}
State changes
The sentence "Ben became rich in 2010" not only expresses that Ben is rich at the end of 2010 (and likely onward), but also that he wasn't rich before.
Likewise, "It started to rain at time t" doesn't just convey the same information as "It rained at t", but also that "It didn't rain for some period before time t".
Since it is annoying to write statements like
"{[t1,t2]: NOT {rains(Location)}} AND {[t2, t3]: rains(Location)}"
in full, we introduce the keyword "BECOME" which has the following parsing rule:"BECOME(X, t1=?, t2=?, t3=?)" gets parsed into "{[t1,t2]: NOT X[t]} AND {[t2, t3]: X[t]}"
[4]The "=?" means that those are optional parameters where by default existential quantification over the variables is used. (So
"BECOME(X)"
returns"EXISTS t1, t2, t3: {{[t1,t2]: NOT X[t]} AND {[t2, t3]: X[t]}}"
.)[5]Now we can e.g. express "Mary fell asleep at 11pm yesterday" as:
BECOME({lambda t. sleep(Mary, t)}, t2="2025-03-29 11pm CET")
[6]Actually, most English event-expressing sentences implicitly describe a state change, and if the information that the previous state was different is useful to convey, the "BECOME" keyword can be used here as well.
Concluding thoughts on designing a good language
The two keywords with corresponding parsing rules defined in this work—"causes" and "BECOME"—represent merely the initial steps in what would be a comprehensive language development process using the "starting formal" approach. The transition from a bare formal logical system to a well-useable grammar would require addressing numerous additional challenges:
The proposed approach seems useful beyond the construction of a good grammar. By reducing abstract sentences to more concrete low-level statements, we can more clearly see the underlying meaning, from which we may carve natural ontologies with clear concepts. The methodical construction of a language with explicit design principles also provides a unique investigative lens into the nature of language itself, potentially revealing which conceptual primitives truly form the foundation of effective communication.
The language we are striving to create shares the ambitious vision of Leibniz's characteristica universalis (Leibniz, 1666)—a universal symbolic language capable of expressing all conceptual thought with mathematical precision and clarity. This path remains lengthy and complex, requiring substantial intellectual discipline throughout the process, particularly in resisting the temptation to introduce convenient but imprecise abstractions prematurely. While such a complete language may remain an aspirational goal, even partial progress toward this ideal can yield valuable insights for linguistics, cognitive science, and the philosophy of language.
Actually I'm proposing something more specific than this, but I'm not really that confident in the more specific version.
I wrote this for my (half-assed) Bachelor's thesis, and given that I wrote it I thought I might as well post it.
Note that this doesn't let us express agents as causes of states that were produced unintentionally. We cannot express "Mary broke the vase" as "Mary caused the vase to be broken" (assuming she didn't deliberately break it). I think this is a feature, not a bug.
In case you're wondering about the X[t]: Remember that
"[t2, t3]: X"
stands for"(t2 ≤ t ≤ t3) ⇒ X"
. So given this reminder, it probably becomes clear that X[t] stands for the statement X evaluated at time t. So the statement X must actually take a time as input. Read further to see an example.We might want a BECOME keyword that has more freedom for specifying times, like for describing that something changed during some period, without specifying when exactly. I guess actually maybe we could define a probability distribution over t2. But overall not sure whether this definition of BECOME is optimal.
Actually I perhaps ought to have defined the BECOME keyword in a way we don't need to write the "lambda t" explicitly.