Hi Dmitry,
To me it seems not unreasonable to think that some ideas from tropical geometry might be relevant ML, for the simple reason that the functions coming up in ML with ReLU activation are PL, and people in tropical geometry have thought seriously about PL functions. Of course this does not guarantee that there is anything useful to be said!
One possible example that comes to mind in the context of your post here is the concept of polyhedral currents. As I understand it, here the notion of "density of polygons' is used as a kind of proxy for the deriva...
Get that agreement in writing.
I'm not sure that would be particularly reassuring to me (writing as one of the contributors). First, how would one check that the agreement had been adhered to (maybe it's possible, I don't know)? Second, people in my experience often don't notice they are training on data (as mentioned in a post above by ozziegooen).
I think this is a key point. Even the best possible curriculum, if it has to work for all students at the same rate, is not going to work well. What I really want (both for my past-self as a student, and my present self as a teacher of university mathematics) is to be able to tailor the learning rate to individual students and individual topics (for student me, this would have meant 'go very fast for geometry and rather slowly for combinatorics'). And while we're at it, can we also customise the learning styles (some students like to read, some like to sit in class, some to work in groups, etc)?
This is technologically more feasible than it was a decade ago, but seems far from common.
Thanks Charlie.
Just to be double-sure, the second process was choosing the weight in a ball (so total L2 norm of weights was <= 1), rather than on a sphere (total norm == 1), right?
Yes, exactly (though for some constant , which may not be , but turn out not to matter).
Is initializing weights that way actually a thing people do?
Not sure (I would like to know). But what I had in mind was initialising a network with small weights, then doing a random walk ('undirected SGD'), and then looking at the resulting distribution. Of course this will b...
Maths at my Dutch university also has homework for quite a few of the courses, which often counts for something like 10-20% of final grade. It can usually be submitted online, so you only need to be physically present for exams. However, there are a small number of courses that are exceptions to this, and actually require attendance to some extent (e.g. a course on how to give a scientific presentation, where a large part of the course consists of students giving and commenting on each other's presentations - not so easy to replace the learning experience with a single exam at the end).
But this differs between Dutch universities.
I suspect the arXiv might not be keen on an account that posts papers by a range of people (not including the account-owner as coauthor). This might lead to heavier moderation/whatever. But I could be very wrong!
Some advice for getting papers accepted on arxiv
As some other comments have pointed out, there is a certain amount of moderation on arXiv. This is a little opaque, so below is an attempt to summarise some things that are likely to make it easier to get your paper accepted. I'm sure the list is very incomplete!
In writing this I don't want to give the impression that posting things to arXiv is hard; I have currently 28 papers there, have never had a single problem or delay with moderation, and the submission process generally takes me <15 mins these days....
The arXiv really prefers that you upload in tex. For the author this makes it less likely that your paper will be flagged for moderation etc (I guess). So if it were possible to export to Rex I think that for the purposes of uploading to arXiv this would be substantially better. Of course, I don’t know how much more/less work it is…
Hi Charlie, If you can give a short (precise) description for an agent that does the task, then you have written a short programme that solves the task. I think then if you need more space to ‘explain what the agent would do’ then you are saying there also exists a less efficient/compact way to specify the solution. From this perspective I think the latter is then not so relevant. David
provable guarantees on the safety of an FHE scheme that do not rely on open questions in complexity theory such as the difficulty of lattice problems.
is far out of reach at present (in particular to the extent that there does not exist a bounty which would affect people’s likeliness to work on it). It is hard to do much in crypto without assuming some kind of problem to be computationally difficult. And there are very few results proving that a given problem is computationally difficult in an absolute sense (rather than just ‘at least as ...
P.s. the main thing I have taken so far from the link you posted is that the important part is not exactly about the biases of SGD. Rather, it is about the structure of the DNN itself; the algorithm used to find a (local) optimum plays less of a role than the overall structure. But probably I’m reading too much into your precise phrasing.
Hi Thomas, I agree the proof of the bound is not so interesting. What I found more interesting were the examples and discussion suggesting that, in practise, the upper bound seems often to be somewhat tight.
Concerning differential advancement, I agree this can advance capabilities, but I suspect that advancing alignment is somewhat hopeless unless we can understand better what is going on inside DNNs. On that basis I think it does differentials advance alignment, but of course other people may disagree.
Thanks very much for the link!
If you get the daily arXiv email feeds for multiple areas it automatically removes duplicates (i.e. each paper appears exactly once, regardless of cross-listing). The email is not to everyone's taste of course, but this is a nice aspect of it.
I was about to write approximately this, so thank you! To add one point in this direction, I am sceptical about the value of reducing the expectation for researchers to explain what they are doing. My research is in two fields (arithmetic geometry and enumerative geometry). In the first we put a lot of burden on the writer to explain themselves, and in the latter poor and incomplete explanations are standard. This sometimes allows people in the latter field to move faster, but
p.s.
For the more substantive results in section 4, I do believe the direction is always flat --> sharp.
I agree with this (with 'sharp' replaced by 'generalise', as I think you intend). It seems to me potentially interesting to ask whether this is necessarily the case.
Vacuous sure, but still true, and seems relevant to me. You initially wrote:
Regarding the 'sharp minima can generalize' paper, they show that there exist sharp minima with good generalization, not flat minima with poor generalization, so they don't rule out flatness as an explanation for the success of SGD.
But, allowing reparametrisation, this seems false? I don't understand the step in your argument where you 'rule out reparametrisation', nor do I really understand what this would mean.
Your comment relating description length to flatness seems nice. T...
Thank you for the quick reply! I’m thinking about section 5.1 on reparametrising the model, where they write:
every minimum is observationally equivalent to an infinitely sharp minimum and to an infinitely flat min- imum when considering nonzero eigenvalues of the Hessian;
If we stick to section 4 (and so don’t allow reparametrisation) I agree there seems to be something more tricky going on. I initially assumed that I could e.g. modify the proof of Theorem 4 to make a sharp minimum flat by taking alpha to be big, but it doesn’t work like that (basically...
I'm not sure I agree with interstice's reading of the 'sharp minima' paper. As I understand it, they show that a given function can be made into a sharp or flat minimum by finding a suitable point in the parameter space mapping to the function. So if one has a sharp minmum that does not generalise (which I think we will agree exists) then one can make the same function into a flat minimum, which will still not generalise as it is the same function! Sorry I'm 2 years late to the party...
if we gave research grants to smart and personable university graduates and gave them carte blanche to do with the money what they wished that would work just as well as the current system
This thought is not unique to you; see e.g. the French CNRS system. My impression is that it works kind of as you would expect; a lot of them go on to do solid work, some do great work, and a few stop working after a couple of years. Of course we can not really know how things would have turned out if the same people had been given more conventional positions,
The request for elaboration concerned how the experience described related to the LCS hierarchy described in the post, which was (and remains) very unclear to me.
Definitely the antagonistic bits - I enjoyed the casual style! Really just the line ‘ Sit down. Sit down. Shut up. Listen. You don’t know nothing yet’ I found quite off-putting - even though in hindsight you were correct!
Thanks! I thought it might be, but was unsure, and didn't want to make an awkward situation for the OP in case it was something very different...
I really liked the content, but I found some of the style (`Sit down!' etc) really off-putting, which I why I only actually read the post on my 3rd attempt. Obviously you're welcome to write in whatever style you want, and probably lots of other people really like it, I just thought it might be useful to mention that a non-empty set of people find it off-putting.
Super valid, I appreciate the feedback! For my own future reference, if you have an answer - was it more the general kind of casual/eclectic style, the "antagonistic" bits like what you quoted, or something else?
Can you elaborate on this a bit? I'm sorry to hear that you had a bad experience during fieldwork, though I'm afraid I'm not certain what you refer to by 'Active Personal Life'. Can you explain how the experience you relate connects to the LCS hierarchy?
I'm sceptical of your decision to treat tenured and non-tenured faculty alike. As tenured faculty, this has long seemed to me to be perhaps the most important distinction.
More generally, what you write here is not very consistent with my own experience of academia (which is in mathematics and in Europe, though I have friends and collaborators in other countries and fields, so I am not totally clueless about how things work there).
Some points I am not seeing in your post are:
For many academics, being able to do their own research and work with brilliant
So the set of worlds, , is the set of functions from to ...
I guess the should be a ? Also, you don't seem to define ; perhaps ?
I expect most people on LW to be okay being asked their Cheerful Price to have sex with someone.
I find this a surprising assertion. It does not apply to me, probably it does apply to you. Ordinarily I would ask if you had any other data points, but I don't want to take the conversation in this direction...
Sure, in the end we only really care about what comes top, as that's the thing we choose. My feeling is that information on (relative) strengths of preferences is often available, and when it is available it seems to make sense to use it (e.g. allowing circumvention of Arrow's theorem).
In particular, I worry that, when we only have ordinal preferences, the outcome of attempts to combine various preferences will depend heavily on how finely we divide up the world; by using information on strengths of preferences we can mitigate this.
(actually, my formula doubles the numbers you gave)
Are you sure? Suppose we take with , , then , so the values for should be as I gave them. And similarly for , giving values . Or else I have mis-understood your definition?
I'd simply see that as two separate partial preferences
Just to be clear, by "separate partial preference" you mean a separate preorder, on a set of objects which may or may not have some overlap with the objects we considered so far? Then somehow the work is just postponed to
...This seems really neat, but it seems quite sensitive to how one defines the worlds under consideration, and whether one counts slightly different worlds as actually distinct. Let me try to illustrate this with an example.
Suppose we have a consisting of 7 worlds, , with preferences
and no other non-trivial preferences. Then (from the `sensible case'), I think we get the following utilities:
.
Suppose now that I create two new copies , of the world which each differ by the p
...Thanks! I like the way your optimisation problem handles non-closed cycles.
I think I'm less comfortable with how it treats disconnected components - as I understand it you just translate each separately to have `centre of mass' at 0. If one wants to get a utility function out at the end one has to make some kind of choice in this situation, and the choice you make is probably the best one, so in that sense it seems very good.
But for example it seems vulnerable to creating 'virtual copies' of worlds in order to shift the centre of mass and push connected co
...Thanks for the comment Charlie.
If I am indifferent to a gamble with a probability of ice cream, and a probability 0.8 of chocolate cake and 0.2 of going hungry
To check I understand correctly, you mean the agent is indifferent between the gambles (probability of ice cream) and (probability 0.8 of chocolate cake, probability 0.2 of going hungry)?
If I understand correctly, you're describing a variant of Von Neumann–Morgenstern where instead of giving preferences among all lotteries, you're specifying a certain collection of special type of pairs of lo
...Thanks for pointing me to this updated version :-). This seems a really neat trick for writing down a utility function that is compatible with the given preorder. I thought a bit more about when/to what extent such a utility function will be unique, in particular if you are given not only the data of a preorder, but also some information on the strengths of the preferences. This ended up a bit too long for a comment, so I wrote a few things in outline here:
https://www.lesswrong.com/posts/7ncFy84ReMFW7TDG6/categorial-preferences-and-utility-functions
It may...
Never mind - I had fun thinking about this :-).
Hi Stuart,
I’m working my way through your `Research Agenda v0.9’ post, and am therefore going through various older posts to understand things. I wonder if I could ask some questions about the definition you propose here?
First, that be contained in for some seems not so relevant; can I just assume X, Y and Z are some manifolds ( for some )? And we are given some partial order on X, so that we can refer to `being a better world'?
Then, as I understand it, your definition says the following:
Fix X, and Z. Let Y be a manifold...
Thanks for the reply, Zack.
The reason this objection doesn't make the post completely useless...
Sorry, I hope I didn't suggest I thought that! You make a good point about some variables being more natural in given applications. I think it's good to keep in mind that sometimes it's just a matter of coordinate choice, and other times the points may be separated but not in a linear way.
Hi Zack,
Can you clarify something? In the picture you draw, there is a codimension-1 linear subspace separating the parameter space into two halves, with all red points to one side, and all blue points to the other. Projecting onto any 1-dimensional subspace orthogonal to this (there is a unique one through the origin) will thus yield a `variable' which cleanly separates the two points into the red and blue categories. So in the illustrated example, it looks just like a problem of bad coordinate choice.
On the other hand, one can easily have muc...
Hmm, so I'm very wary of defending tropical geometry when I know so little about it; if anyone more informed is reading please jump in! But until then, I'll have a go.
Hmm, even for a very small value of `might'? I'm not saying that someone who wants to contribute to ML needs to seriously consider learning some tropical geometry, just that if one already knows tropical geometry it's not ... (read more)