Nice! I always enjoy reading these logs :-)
Python objects are scattered all over the place [on the heap] ... performance degradation is the price for Python's simple memory model. ... NumPy is optimized for making use of blocks of contiguous memory.
Numpy also has the enormous advantage of implementing all the numeric operators in C (or Fortran, or occasionally assembly. (If you want hardware accelerators, interop is a promising work in progress)
You can substantially reduce memory fragmentation and GC pressure with only the standard library array
module and memoryview
builtin type, if your data suits that pattern. This is particularly useful to implement zero-copy algorithms for IO processing; as soon as the buffer is in memory anywere you just take pointers to slices rather than creating new objects.
JIT implementations of Python (PyPy, Pyjion, etc) are also usually pretty good at reducing the perf impact of Python's memory model, at least if your program is reasonably sensible about what and when it allocates.
With
progn
and:=
, it's possible to combine multiple statements into one, so effectively create a lambda with multiple statements.
Sounds like you're partway to updating onelinerizer.com for Python 3!
Sounds like you're partway to updating onelinerizer.com for Python 3!
For the avoidance of doubt, the "obvious way" to do this (for an acculturated Python programmer) is with a nested def
, which makes the progn
thing non-obvious and therefore unpythonic. I strongly hinted at the obvious approach here, but konstell latched onto using a lambda
instead (probably because she didn't realize that named functions could also be closures). I saw a teaching opportunity in this, so I rolled with it. I got to dispel the myth that lambdas can only have one line and also introduced assignment expressions. I was going to get around to the obvious way, but we ran out of time.
With progn and :=, it's possible to combine multiple statements into one, so effectively create a lambda with multiple statements.
konstell is using the terminology is a little imprecisely here. In Python, an "expression" evaluates to an object, while a "statement" is an instruction that does not evaluate to an object (not even None
). Most statement types can contain expressions, however expressions cannot contain statements (exec()
doesn't count).
One of the simplest types of statements in Python is the "expression statement", which contains a single expression and discards its result. A progn()
expression can discard the results of subexpressions in a similar way, making them act like expression statements, but they are not technically Python statements. We also found an expression substitute for an assignment statement. It's ultimately possible to use expressions for everything you'd normally use statements for, but this is not the "obvious way" to do it.
See my Drython and Hissp projects for more on "onlinerizing" Python.
[Note to readers: The Jupyter notebook version of this post is here]
Previously: https://www.lesswrong.com/posts/fKTqwbGAwPNm6fyEH/an-apprentice-experiment-in-python-programming-part-3
Python Objects in Memory (from comments)
In the previous post, purge commented:
When I was coming up with an answer to this question, I got stuck on what the operator
is
did. I only had a vague sense of how to use it—I knew comparison withNone
was done viais
but didn't know why—so I had to look up whatis
actually did.Here's the doc for
id()
:Then I understood that
is
would literally check if two objects are the same object. So in the above example we'd getTrue False False
fromprint(a == b, id(a) == id(b), a is b)
andTrue True True
fromprint(c == d, id(c) == id(d), c is d)
.Object Storage in Memory
Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):
Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python's model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python's simple memory model.
For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.
==
andis
Some remarks gilch made about
==
andis
:The
==
operator calls the__eq__
method of an object. The default__eq__
inherits fromis
, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of aTrue
orFalse
.{}
and Set ConstructorWe went into a tangent where gilch checked my understanding of sets. We encountered some corner cases like Python interpreting
True
as1
andFalse
as0
:{}
is used to represent both sets and dictionaries, but{}
itself would be interpreted as an empty dictionary instead of an empty set:To make an empty set, we'd use the
set()
constructor:Gilch gave me a puzzle: make an empty set without using the
set()
constructor.I came up with the answer
{1} - {1}
pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:Using splat, the other way to make an empty set without using the
set()
constructor isMagic Methods for Attributes (Continued from last time)
When I was working on the solution that involved modifying the
__dict__
last time, I was getting pretty confused about the difference betweendir()
,vars()
and__dict__
.Gilch started by asking me to construct a simple class and making an instance:
Then we listed out the attributes of
sc
in different ways:The difference between
dir
andvars
is thatdir
returns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand,vars
only returns attributes stored in the default__dict__
attribute, which excludes inherited attributes. This StackOverflow question goes into more details.__mro__
__mro__
stands for "method resolution order," which provides the inheritance path from the current class all the way up toobject
. It is honestly the most handy tool I've learned from this session.Note that
__mro__
is a class attribute, not an instance attribute:Magic Methods for Attributes
Now we can verify that
dir(sc)
returns the sum ofvars(sc)
,vars(SimpleClass)
andvars(object)
:Why did we need to covert the two lists to sets when comparing them at the end?
Two of the attributes,
__init__
and__doc__
, were overridden.Inheritance and
__mro__
Noticing that I didn't understand inheritance completely, gilch gave another example.
Here,
SimpleClass3
inherits fromSimpleClass
andSimpleClass2
. BothSimpleClass
andSimpleClass2
have implemented class methodx
, which one wouldSimpleClass3
have?However, this changes when we switch the order of inheritance:
So the inheritance order decides which superclass takes precedence. The Python documentation on method resolution order as well as this talk gives more detailed explanations of the algorithm.
__slots__
__slots__
is used for saving memory.What happened here is that by overriding
__slots__
we have restricted the__dict__
attribute of any instance ofSimpleClass4
. Not adding instance methods means less memory used.As we can see here,
sc4
does not have a__dict__
attribute here, sovars(sc4)
has become invalid too.Accessing Attributes of a Superclass
Next, gilch provided an example of using the keyword
super
. First, we create a classNewTuple
that inherits fromtuple
:Then we can access the constructor of the superclass by calling
super().__new__
and passing in thetuple
class as the first argument:We get a
tuple
object when we callNewTuple()
. However, this only works for subtypes of the superclass of the current class. If we pass inlist
--which is not a subclass oftuple
--we would get an error:Of course, we can always pass in the current class to make the constructor return an instance of the current class:
Trace
Next puzzle from gilch: make a
@trace
decorator that prints inputs and return values.I came up with a first pass solution:
Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.
I was pretty stumped on this one. It seemed that I'd need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression
print('hi') or 1 + 2
evaluates to. Then it occurred to me that, sinceprint
returnsNone
, I could useor
to combine statements as long as only one of them evaluates to something with boolean valueTrue
. After an attempt, I also realized that the statement that produces theTrue
value would need to come last to prevent the expression evaluation being short-circuited.Progn
Gilch asked me to write a function named
progn
that takes any number of parameters and only returns the last one. Usingprogn
, we can get rid of theor
's:Assignment Expression
Gilch introduced the assignment expression, and we rewrote the solution to use it:
Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn't. With
progn
and:=
, it's possible to combine multiple statements into one, so effectively create a lambda with multiple statements.