When I write machine learning software, I tend to use the Place-Based Programming (PBP) paradigm. PBP caches your computations so you rarely have to perform the same computation twice.
The fundamental unit of data is a place, which refers to a location on disk. Consider the hard-coded string "I am Satoshi Nakamoto."
. You can complete the place of a string by hashing it.
;; This code is written in Hy.
;; See https://docs.hylang.org/en/stable/ for documentation of the language.
(import [hashlib [md5]] os)
(setv +place-dir+ ".places/")
(defn place-of [expression]
"Returns the place of an expression"
(os.path.join
+place-dir+
"str/"
(+ (.hexdigest (md5 (.encode (str expression))))
".pickle")))
;; prints ".places/<class 'hy.models.HyString'>/17f36dc3403a328572adcea3fd631f55.pickle"
(print (place-of '"I am Satoshi Nakamoto."))
In Lisp, the '
tag means "do not evaluate the following expression". Note how we did not compute the place of the string's value directly. We evaluated the place of the source code which defines the string. We can replace our function with a macro so the user does not have to quote his or her code.
(import [hashlib [md5]] os)
(setv +place-dir+ ".places/")
(defmacro place-of [expression]
"Returns the place of an expression"
`(os.path.join
+place-dir+
(str (type '~data))
(+ (.hexdigest (md5 (.encode (str '~expression))))
".pickle")))
;; prints ".places/<class 'hy.models.HyString'>/17f36dc3403a328572adcea3fd631f55.pickle"
(print (place-of "I am Satoshi Nakamoto."))
Whenever a function returns a place, it implicitly guarantees that the place is populated. The place-of
macro is not allowed to just compute where a place would be if it existed. The macro must also save our data to the place if the place is not already populated.
(defmacro/g! place-of [expression]
"Returns the place of an expression"
`(do
(setv ~g!place
(os.path.join
+place-dir+
(str (type '~code))
(+ (.hexdigest (md5 (.encode (str '~expression))))
".pickle")))
(if-not (os.path.exists ~g!place)
(with [f (open ~g!place "wb")]
(pickle.dump (eval '~expression) f)))
~g!place))
;; prints ".places/<class 'hy.models.HyString'>/17f36dc3403a328572adcea3fd631f55.pickle"
(print (place-of "I am Satoshi Nakamoto."))
Reading from a place is easier.
(defn value-of [place]
(with [f (open place "rb")]
(pickle.load f)))
;; prints "I am Satoshi Nakamoto."
(print (value-of (place-of "I am Satoshi Nakamoto.")))
This constitutes a persistent memoization system where code is evaluated no more than once.
(import [time [sleep]])
(print (value-of (place-of (do (sleep 5) "This computation takes 5 seconds"))))
The first time you call the above code it will take 5 seconds to execute. On all subsequent runs the code will return instantly.
I think this is fixable. An invocation
(f expr1 expr2)
will produce the same result as the last time you invoked it if:f
is the same as last time.f
. Also every macro and type definition that is used transitively. Basically any code that it depends on in any way.expr1
andexpr2
also obey this checklist.I'm not sure this list is exhaustive, but it should be do-able in principle. If I look at a function invocation and all the code it transitively depends on (say it's 50% of the codebase), and I know that that 50% of the codebase hasn't changed since last time you ran the program, and I see that that 50% of the codebase is pure, and I trust you that the other 50% of the codebase doesn't muck with it (as it very well could with e.g. macros), then that function invocation should produce the same result as last time.
This is tricky enough that it might need language level support to be practical. I'm glad that Isusr is thinking of it as "writing a compiler".