Programming in a natural language

I was wondering today how different programming and natural languages really are, and if one day someone could write applications in a natural language - maybe even just speak alike you would to a human, to a computer, and have it understand what it needs to do and be able to recognise itself the efficient methods for approaching the problems provided and use them.

This has raised two sets of questions for me. They're don't exactly all flow, they're very much separate questions.

The first is, if we define "programming" as being to "provide (a computer or other machine) with coded instructions for the automatic performance of a task." (thanks, Google), would this qualify telling your phone to set an alarm as "programming"? This can certainly be done even by voice command, now... and it meets the definition - you're providing a computer of other machine, in this case your phone, with instructions (coded in a natural language of your choice), that'll follow a syntax it can recognise (e.g. "Set an alarm for _ o'clock"), so it later automatically performs the task of ringing/vibrating/setting off fireworks under your bed/yodeling/whatever. You've just programmed your phone to set off an alarm - making the natural language you just used a programming language itself, too? Would that make all natural languages also programming languages? If you're willing to push the "other machine" part of that definition to include humans, natural languages would definitely qualify as being programming languages - "instructing" would become almost synonymous with "programming". I think someone's already had a dig at this and that's how the esoteric language "chef" happened, maybe?

The next is something I came across when people were comparing natural and programming languages - error tolerance. With most natural languages there's a high error tolerance, yuo hfal cna dpro snetenesce, rwod mujble nda lstli nudsterood. With programming languages, error tolerance is typically pretty strict - misspell a variable name or omit a curly brace and you're out of luck... but this made me wonder, would it be possible to write a compiler or interpreter with error tolerance alike that I used in the sentence I hope you understood above? In golang you can omit variable types upon declaration by using "short variable declarations" where the type for the variable is inferred from what's on the right hand side of the declaration - maybe alike how in conversation I could some words and you'd still be to what I meant (inb4 grammar nazi I omitted some words there to prove the point).
Would it be that much of a stretch to write a plugin for something like visual studio code that at least suggests what you might've meant to have typed when you misspelt a vrabiael nmae, or place suggested semicolons where you've put a newline but forgotten the semicolon? Is this also something people have thought about before but dismissed the idea of out of fear that it'd result in more problems like it thinking you've omitted something by accident that was in fact deliberate?

1 Like

Very good questions, and in part feasable, translating human language to machine code to be compiled and understood. Like the entire grammar, dictionary and vocabulary of a specific language as variables... I think as Deep Learning evolves it will eventually be considered some form of what you are talking about.

if (WhatYouSay){
} else { return IDisagree; }

Just be creative with your variable names ;)


Natural languages and formal languages are more or less alike, with some minor differences. For one, formal languages cannot handle ambiguity, but natural languages can. But underlyingly, natural languages are computed in the brain using similar syntactic structures and command-relationships as formal languages. You should look up the works of Noam Chomsky, Massimo Piatelli-Palmarini, Jerry Fodor and Zenon Pylyshyn on this issue. Start with the seminal book titled Aspects of the Theory of Syntax.

Next, to answer your question regarding whether we can code in natural languages? We already do. On two levels! First, our speech and thought are generated by algorithms that are coded using the syntax of natural languages. Secondly, the syntactic operations (MERGE, MOVE, COPY, C-command, M-command) etc. are the foundations of formal languages. See for instance the Chomsky Hierarchy, or the Chomsky-Schutzenberger Enumeration and Representation Theorems!

You're right about programming languages, but wrong about natural languages. The example you have provided do not violate any major syntax rule in natural languages. When natural language rules violate syntactic rules of the sort that are encoded in formal languages by curly braces, for example, our brains throw out a syntax-error while parsing the sentence. One example is what computational neurobiologists call N400 effect.

Either way, you should look at Aspects of the Theory of Syntax, The Algebraic Mind and The Pisa Lectures: Lectures of Government and Binding for some details on encoding, governing, binding, coreferencing and co-indexing syntactic rules in natural languages. The questions you are asking have been worked upon since the 1950s, and are still among the frontiers of Science.


First Question:
Yes you could say setting your alarm clock with your voice is programming. But that doesn't make natural language a programming language the reason being that a programming language has to fulfill certain restraints 1. It has to be formal and natural language is by definition informal. 2. And when we usually refer to programming languages we mean Turing complete language, which basically means that you can solve all computable problems with that language which you can't with your example.

Second Question:

Uhm I don't think that is possible and even it were I doubt that it would actually be useful it would probably break infinitely more things than it would solve.Let me elaborate.

The Compiler has no clue what the semantics of your code is. The reason the compiler is very strict is that to make sure that after you have successfully compiled your code it still does exactly the same as what you would expect. Making the compiler more relaxed would mean that you could have 2 separate variable names mistaken as the same: Name, CName. Also there is an infinitely many ways to misspell a variable or omit syntax elements. The compiler would have to parse through your code and somehow make a guess to how the code is actually supposed to look like even if that was possible the performance issues alone would make this basically unusable.

But as side note I still don't think that it is impossible though. I could see how a you could train a neural network to translate from natural language to an actual programming language and from that to machine code. And if you put in some constraint into the things a person can say in what order I think it is very feasible.

Dynamic_Gravity's suggestion actually works well for some languages, especially higher level interpreted ones like python, perl and batch.

set bootFilesDrive=S:
set bcdStoreBIOS=%bootFilesDrive%\EFI\microsoft\boot\bcd
if exist "%bcdStoreBIOS%" (attrib -a -h -s "%bcdStoreBIOS%"
del "%bcdStoreBIOS%")

Practically still English, except for the cryptic switches in the attribute command.

Higher level languages and programs actually handle ambiguity relatively well already, examples: the one above, type conversions in javascript, and all programs that implement algorithms that derive contextual information (ffmpeg).

However, programming is ultimately stack based.

The way we make the lower-level structures more robust is by stacking more structures on top. Machine code is abstracted by Assember. Assember by C, C by JVM, Java by library engineers, and the Java language by the finished program with a GUI/CLI. We will likely someday get to the point where "programmers" can use natural language voice commands to perform complicated tasks but ultimately, we need people to know the unforgiving syntax of the low-level languages.

While it is possible to extend an existing system using higher level-structures (programs, voice commands, scripts), creating completely new functionality and debugging requires digging deeper into the stack.

If the phone alarm auto-crashes when set to 3:14 am, no voice command is going to fix it and a plugin that suggests corrections in VS still requires an engineer who understands the syntax of the language to know if that is what they intended.

High level languages (informal/slang English, javascript, programs) can have a "high" level of ambiguity exactly because the low-level rules of the language are unforgiving. There being one right way makes it simple enough to derive exactly what was meant to be said. Think of positional arguments in functions or contractions "who's they'd it's."

As a side note: the more possible variations, the more computation the interpreter takes to figure out the correct intent. A function with several valid argument variations requires the compiler to examine the types of the arguments being passed to that function and move on until it finds one where the types all match. Compilers that do not do this will just crash.

In other words, Speakin' in contractions (they'd = "they had" or "they would"), abrvs metaphors, and terribad spelling etc. makes it 'ard 4 ppl to see wat you're sayin and sum Will jus call it. Having to deal with that ambiguity "or abstraction" is why higher-level languages (shell) are so slow compared to lower-level ones (C).

1 Like