All of SatvikBeri's Comments + Replies

using vector syntax is much faster than loops in Python

To generalize this slightly, using Python to call C/C++ is generally much faster than pure Python. For example, built-in operations in Pandas tend to be pretty fast, while using .apply() is usually pretty slow.

1Rudi C4y

Just use Julia ;)

Spoiler-Free Reviews: Monster Slayers, Dream Quest and Loop Hero

I didn't know about that, thanks!

SatvikBeri4y50

I found Loop Hero much better with higher speed, which you can fix by modifying a variables.ini file: https://www.pcinvasion.com/loop-hero-speed-mod/

Is there a good software solution for mathematical questions?

SatvikBeri4y100

I've used Optim.jl for similar problems with good results, here's an example: https://julianlsolvers.github.io/Optim.jl/stable/#user/minimization/

2ChristianKl4y

The example looks to me like it gives the library one equation that has to be minimized. I on the other hand have a bunch of equations.

The general lesson is that "magic" interfaces which try to 'do what I mean' are nice to work with at the top-level, but it's a lot easier to reason about composing primitives if they're all super-strict.

100% agree. In general I usually aim to have a thin boundary layer that does validation and converts everything to nice types/data structures, and then a much stricter core of inner functionality. Part of the reason I chose to write about this example is because it's very different from what I normally do.

Important caveat for the pass-through approach

... (read more)

This is a perfect example of the AWS Batch API 'leaking' into your code. The whole point of a compute resource pool is that you don't have to think about how many jobs you create.

This is true. We're using AWS Batch because it's the best tool we could find for other jobs that actually do need hundreds/thousands of spot instances, and this particular job goes in the middle of those. If most of our jobs looked like this one, using Batch wouldn't make sense.

You get language-level validation either way. The assert statements are superfluous in that sense.

... (read more)

Thiel on secrets and indefiniteness

The reason to be explicit is to be able to handle control flow.

The datasets aren't dependent on each other, though some of them use the same input parameters.

If your jobs are independent, then they should be scheduled as such. This allows jobs to run in parallel.

Sure, there's some benefit to breaking down jobs even further. There's also overhead to spinning up workers. Each of these functions takes ~30s to run, so it ends up being more efficient to put them in one job instead of multiple.

Your errors would come out just as fast if you ran check_dataset_para

... (read more)

4philh4y

This has its own problems, but you could use inspect.signature, I think?

3Zolmeister4y

This is a perfect example of the AWS Batch API 'leaking' into your code. The whole point of a compute resource pool is that you don't have to think about how many jobs you create. It sounds like you're using the wrong tool for the job (or a misconfiguration - e.g. limit the batch template to 1 vcpu). You get language-level validation either way. The assert statements are superfluous in that sense. What they do add is in effect check_dataset_params(), whose logic probably doesn't belong in this file. No, I meant a developer introducing a runtime bug.

[Letter] Advice for High School #1

"refine definite theories"

Where does this quote come from – is it in the book?

6[anonymous]4y

SatvikBeri4y80

Is there a reason you recommend Hy instead of Clojure? I would suggest Clojure to most people interested in Lisp these days, due to the overwhelmingly larger community, ecosystem, & existence of Clojurescript.

6lsusr4y

I recommend Hy because it's what I personally use and I can therefore vouch for it. I have heard nothing but good things about Clojure. I even attend a Clojure user group. The Clojure programmers I meet tend to be smart which is a good sign.

Place-Based Programming - Part 1 - Places

Place-Based Programming - Part 1 - Places

Ah, that's a great example, thanks for spelling it out.

SatvikBeri4y6-1

This is sometimes true in functional programming, but only if you're careful.

I think this overstates the difficulty, referential transparency is the norm in functional programming, not something unusual.

For example, suppose the expression is a function call, and you change the function's definition and restart your program. When that happens, you need to delete the out-of-date entries from the cache or your program will read an out-of-date answer.

As I understand, this system is mostly useful if you're using it for almost every function. In that case... (read more)

6justinpombrio4y

It really depends on what your domain you're working in. If you're memoizing functions, you're not allowed to use the following things (or rather, you can only use them in functions that are not transitively called by memoized functions): * Global mutable state (to no-one's surprise) * A database, which is global mutable state * IO, including reading user input, fetching something non-static from the web, or logging * Networking with another service that has state * Getting the current date Ask a programmer to obey this list of restrictions, and -- depending on the domain they're working in -- they'll either say "ok" or "wait what that's most of what my code does". That's very clever! I don't think it's sufficient, though. For example, say you have this code: (defnp add1 [x] (+ x 10)) ; oops typo (defnp add2 [x] (add1 (add1 x))) (add2 100) You run it once and get this cache: (add1 100) = 110 (add1 (add1 100)) = 120 (add2 100) = 120 You fix the first function: (defnp add1 [x] (+ x 1)) ; fixed (defnp add2 [x] (add1 (add1 x))) (add2 100) You run it again, which invokes (add2 100), which is found in the cache to be 120. The add2 cache entry is not invalidated because the add2 function has not changed, nor has its inputs. The add1 cache entries would be invalidated if anything ever invoked add1, but nothing does. (This is what I meant by "You also have to look at the functions it calls (and the functions those call, etc.)" in my other comment.)

Place-Based Programming - Part 1 - Places

This is very cool. The focus on caching a code block instead of just the inputs to the function makes it significantly more stable, since your cache will be automatically invalidated if you change the code in any way.

3justinpombrio4y

More stable, but not significantly so. You cannot tell what an expression does just by looking at the expression. You also have to look at the functions it calls (and the functions those call, etc.). If any of those change, then the expression may change as well. You also need to look at local variables, as skybrain points out. For example, this function: (defn myfunc [x] (value-of (place-of [EXPR INVOLVING x]))) will behave badly: the first time you call it it will compute the answer for the value of x you give it. The second time you call it, it will compute the same answer, regardless of what x you give it.

SatvikBeri4y80

If you're using non-modal editing, in that example you could press Alt+rightarrow three times, use cmd+f, the end key (and back one word), or cmd+righarrow (and back one word). That's not even counting shortcuts specific to another IDE or editor. Why, in your mental model, does the non-modal version feel like fewer choices? I suspect it's just familiarity – you've settled on some options you use the most, rather than trying to calculate the optimum fewest keystrokes each time.

Have you ever seen an experienced vim user? 3-5 seconds latency is completel... (read more)

I ended up using cmd+shift+i which opens the find/replace panel with the default set to backwards.

So, one of the arguments you've made at several points is that we should expect Vim to be slower because it has more choices. This seems incorrect to me, even a simple editor like Sublime Text has about a thousand keyboard shortcuts, which are mostly ad-hoc and need to be memorized separately. In contrast Vim has a small, (mostly) composable language. I just counted lsusr's post, and it has fewer than 30 distinct components – most of the text is showing different ways to combine them.

The other thing to consider is that most programmers will use at lea... (read more)

2ChristianKl4y

Let's think about an example. I want to move my cursor. I might be in a situation when 3W, lllllllllllllllllllllllllllllllll, / with something else $b are all valid moves to get at my target location for the cursor. This has probably something like 3-5 seconds latency because I not only have to think about where my cursor should go about also about the way to get there. On the other hand without VIM, having a proper keyboard that makes arrow keys easy to reach I might have a latency of maybe 700 milliseconds. VIM frequently costs mental processing capacity because I have to model my code in my head in concepts like words (for w and b) that I wouldn't otherwise.

3ChristianKl4y

The issue is not just more choices but more choices to achieve the same result. In programming languages Python achieved a large user-base through being easy to use with it's core principles like "there should be one obvious way to do things". The problem is that it's not dependable when you can use the Vim shortcuts within user editors. If I use IdeaVim in IntelliJ I can use "*y to copy a lot of things to the clipboard but not for example the text in hover popups for which I actually need Crtl+c and where I lose the ability to copy the text when I let Vim overwrite the existing shortcut.

I did :Tutor on neovim and only did commands that actually involved editing text, it took 5:46.

Now trying in Sublime Text. Edit: 8:38 in Sublime, without vim mode – a big difference! It felt like it was mostly uniform, but one area where I was significantly slower was search and replace, because I couldn't figure out how to go backwards easily.

2John_Maxwell4y

Interesting, thanks for sharing. Command-shift-g right?

This is a great experiment, I'll try it out too. I also have pretty decent habits for non-vim editing so it'll be interesting to see.

6SatvikBeri4y

I did :Tutor on neovim and only did commands that actually involved editing text, it took 5:46. Now trying in Sublime Text. Edit: 8:38 in Sublime, without vim mode – a big difference! It felt like it was mostly uniform, but one area where I was significantly slower was search and replace, because I couldn't figure out how to go backwards easily.

SatvikBeri4y30

Some IDEs are just very accommodating about this, e.g. PyCharm. So that's great.

Some of them aren't, like VS Code. For those, I just manually reconfigure the clashing key bindings. It's annoying, but it only takes ~15 minutes total.

5paragonal4y

Thanks for your answer. Part of the problem might have been that I wasn't that proficient with vim. When I reconfigured the clashing key bindings of the IDE I sometimes unknowingly overwrote a vim command which turned out to be useful later on. So I had to reconfigure numerous times which annoyed me so much that I abandoned the approach at the time.

I would expect using VIM to increase latency. While you are going to press fewer keys you are likely going to take slightly longer to press the keys as using any key is more complex.

This really isn't my experience. Once you've practiced something enough that it becomes a habit, the latency is significantly lower. Anecdotally, I've pretty consistently seen people who're used to vim accomplish text editing tasks much faster than people who aren't, unless the latter is an expert in keyboard shortcuts of another editor such as emacs.

There's the paradox o

... (read more)

2ChristianKl4y

How much experience do you have with measuring the latency of things to know what takes 400ms and what takes 700ms? Even if total time for the task is reduced the latency for starting the task might still be higher.

As far as I know there's almost no measurement of productivity of developer tools. Without data, I think there are two main categories in which editor features, including keyboard shortcuts, can make you more productive:

By making difficult tasks medium to easy
By making ~10s tasks take ~1s

An example of the first would be automatically syncing your code to a remote development instance. An example of the first would be adding a comma to the end of several lines at once using a macro. IDEs tend to focus on 1, text editors tend to focus on 2.

In general, I... (read more)

2ChristianKl4y

I would expect using VIM to increase latency. While you are going to press fewer keys you are likely going to take slightly longer to press the keys as using any key is more complex. There's the paradox of choice and having more choices to accomplish a task costs mental resources. Vim forces me to spent cognitive resources to chose between different alternatives of how to accomplish a task. All the professional UX people seem to advocate making interfaces as simple as possible.

Very cool, thanks for writing this up. Hard-to-predict access in loops is an interesting case, and it makes sense that AoS would beat SoA there.

Yeah, SIMD is a significant point I forgot to mention.

It's a fair amount of work to switch between SoA and AoS in most cases, which makes benchmarking hard! StructArrays.jl makes this pretty doable in Julia, and Jonathan Blow talks about making it simple to switch between SoA and AoS in his programming language Jai. I would definitely like to see more languages making it easy to just try one and benchmark the results.

SatvikBeri4y170

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." – Donald Knuth

3Gunnar_Zarncke4y

That would be my preferred quote too.

SatvikBeri4y30

Yup, these are all reasons to prefer column orientation over row orientation for analytics workloads. In my opinion data locality trumps everything but compression and fast transmission is definitely very nice.

Until recently, numpy and pandas were row oriented, and this was a major bottleneck. A lot of pandas's strange API is apparently due to working around row orientation. See e.g. this article by Wes McKinney, creator of pandas: https://wesmckinney.com/blog/apache-arrow-pandas-internals/#:~:text=Arrow's%20C%2B%2B%20implementation%20provides%20essential,... (read more)

"You and Your Research" – Hamming Watch/Discuss Party

I see where that intuition comes from, and at first I thought that would be the case. But the machine is very good at iterating through pairs of arrays. Continuing the previous example:

function my_double_sum(data)
    sum_heights_and_weights = 0
    for row in data
        sum_heights_and_weights += row.weight + row.height
    end
    return sum_heights_and_weights
end
@btime(my_double_sum(heights_and_weights))
>   50.342 ms (1 allocation: 16 bytes)

function my_double_sum2(heights, weights)
    sum_heights_and_weights = 0
    for (height, weight) in zip

... (read more)

5gjm4y

Looks like it is marginally quicker in the first of those cases. Note that you're iterating over the objects linearly, which means that the processor's memory-access prediction hardware will have a nice easy job; and you're iterating over the whole thing without repeating any, which means that all the cache is buying you is efficient access to prefetched bits. After defining function mss(data::Vector{HeightAndWeight}) s = 0 for i=1:40000000 j=((i*i)%40000000)+1 @inbounds s += data[j].weight * data[j].height end return s end function mss2(heights::Vector{Int},weights::Vector{Int}) s = 0 for i=1:40000000 j=((i*i)%40000000)+1 @inbounds s += weights[j] * heights[j] end return s end (mss for "my scattered sum"; the explicit type declarations, literal size and @inbounds made it run faster on my machine, and getting rid of overhead seems like a good idea for such comparisons; the squaring is just a simple way to get something not too predictable) I got the following timings: julia> @btime(mss(heights_and_weights)) 814.056 ms (0 allocations: 0 bytes) 400185517392 julia> @btime(mss2(just_heights,just_weights)) 1.253 s (0 allocations: 0 bytes) 400185517392 so the array-of-structs turns out to work quite a lot better in this case. (Because we're doing half the number of hard-to-predict memory accesses.) Note how large those timings are, by the way. If I just process every row in order, I get 47.9ms for array-of-structs and 42.6ms for struct-of-arrays. 40.3ms if I use zip as you did instead of writing array accesses explicitly, which is interesting; I'm not that surprised the compiler can eliminate the superficial inefficiencies of the zip-loop, but I'm surprised it ends up strictly better rather than exactly the same. Anyway, this is the other way around from your example: struct-of-arrays is faster for me in that situation. But when we process things in "random" order, it's 20-30x slower because we no