All of Meni_Rosenfeld's Comments + Replies

EFF stops accepting Bitcoins

Ouch, I completely forgot about this (or maybe I never knew about it?), and that's a talk I wanted to hear...

Is it possible perhaps to get it in text form?

Meni_Rosenfeld12y60

It's worth mentioning that EFF has resumed accepting Bitcoin donations a while ago.

https://supporters.eff.org/donate

When Truth Isn't Enough

Meni_Rosenfeld12y100

I don't suppose it's possible to view the version history of the post, so can you state for posterity what "DOCI" used to stand for?

2gjm5y

It appears to have been something like "denotation OK, connotations iffy". (Someone objects to "iffy" in one of the comments.)

How not to sort by a complicated frequentist formula

How not to sort by a complicated frequentist formula

I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).

1A1987dM12y

That's likely way too fast. It's not that rare for people to comment on posts several years old (especially on Main), and I'd guess such people also vote comments. (Well, I most certainly do.) You can use an exponential decay with a very large time constant, but that would mean that comments from yesterday are voted nearly as often as comments from three months ago. So, the increase in realism compared to a constant rate isn't large enough to justify the increase in complexity. (OTOH, hyperbolic decay is likely much more realistic, but it also has more parameters.)

Meni_Rosenfeld12y40

I think it's reasonable to model this as a Poisson process. There are many people who could in theory vote, only few of them do, at random times.

How not to sort by a complicated frequentist formula

Meni_Rosenfeld12y20

Given that a and b are arbitrary, I think the differences can be large. Whether they actually are large for typical datasets I can't readily answer.

In any case the advantages are:

Simplicity. Tuning the parameters is a bit involved, but once you do the formula to apply for each item is very simple. In many (not all) cases, a complicated formula reflects insufficient understanding of the problem.
Motivation. Taking the lower bound of a confidence/credible interval makes some sense but it's not that obvious. The need for it arises because we don't model th

... (read more)

4EHeller12y

Yes, a and b are arbitrary- but if they aren't chosen well your model could be hugely inferior. I'd suggest making a few toy data sets and actually comparing the standard methods (Wilson Score, Jeffreys interval) to yours before suggesting everyone embrace it. Edit for clarity: Just to be clear, the Jeffery's interval (which is usually very close to the Wilson coeff) is essentially the same as your model but with the initial parameters 1/2,1/2.

How not to sort by a complicated frequentist formula

How not to sort by a complicated frequentist formula

True. This is a problem since the current net vote count is mutable, while an individual vote, once cast, is not. You could try fitting a much more complicated model that can reproduce this behavior, calibrate it with A/B testing, etc. Or maybe try to prevent it by sorting according to quality, but not actually displaying the metrics.

Meni_Rosenfeld12y20

But I fear that it would cause irreparable damage if the world settles on this solution.

This is probably vastly exaggerating the possible consequences; it's just a method of sorting, and either the Wilson's interval method and a Bayesian method are definitely far better than the naive methods.

I just feel that it will place this low-hanging fruit out of reach. e.g.,

Me: Hey Reddit, I have this cool new sorting method for you to try!
Reddit: What do you mean? We've already moved beyond the naive methods into the correct method. Here, see Miller's paper

Meni_Rosenfeld12y110

It means that the model used per item doesn't have enough parameters to encode what we know about the specific domain (where domain is "Reddit comments", "Urban dictionary definitions", etc.)

The formulas discussed define a certain mapping between pairs (positive votes, negative votes) to a quality score. In Miller's model, the same mapping is used everywhere without consideration of the characteristics of the specific domain. In my model, there are parameters a and b (or alternatively, a/(a+b) and a+b) that we first train per-domain, an... (read more)

How not to sort by a complicated frequentist formula

Meni_Rosenfeld12y40

This is interesting, especially considering that it favors low-data items, as opposed to both the confidence-interval-lower-bound and the notability adjustment factor, which penalize low-data items.

You can try to optimize it in an explore-vs-exploit framework, but there would be a lot of modeling parameters, and additional kinds of data will need to be considered. Specifically, a measure of how many of those who viewed the item bothered to vote at all. Some comments will not get any votes simply because they are not that interesting; so if you keep placing them on top hoping to learn more about them, you'll end up with very few total votes because you show people things they don't care about.

5Eliezer Yudkowsky12y

Yep. You'd want to check or guess the size of the user's monitor and where they were scrolling to, and calculate upvotes-per-actual-user-read. As things are read and not upvoted, your confidence that they're not super-high-value items increases and the value of information from showing them again diminishes.

How not to sort by a complicated frequentist formula

Meni_Rosenfeld12y10

The beta distribution is a conjugate prior for Bernoulli trials, so if you start with such a prior the posterior is also beta, which greatly simplifies the calculations. It also converges to normal for large alpha and beta, and in any case can be fit into any mean and variance, so it's a good choice.

Whatever your target function is, you'll want the item with the greatest posterior mean for this target. To do this generally you'll need the posterior distribution of p rather than the mean of p itself. But the distribution just describes what you know about p, it doesn't itself encode properties such as "controversial".

What is the best paper explaining the superiority of Bayesianism over frequentism?

How not to sort by a complicated frequentist formula

Well, I think there is some sense of Bayesianism as a meta-approach, without regard to specific methods, which most of us would consider healthier than the frequentist mindset.

There are surely papers showing the superiority of frequentism over Bayesianism, and papers showing the differences between various flavors of Bayesianism and various flavors of frequentism. But that's not what I'm after right now (with the understanding that a paper can be on the "Bayesian" side and be correct).

Meni_Rosenfeld12y70

(Link to How Not To Sort By Average Rating.)

I forgot to link in the OP. Then remembered, and forgot again.

Something of interest: Jeffery's interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it handles small sample sizes more gracefully. (I think, but I'm certainly willing to be corrected.)

This seems to use specific parameters for the beta distribution. In the model I describe, the parameters are tailored per domain. This is actuall... (read more)

What is the best paper explaining the superiority of Bayesianism over frequentism?

Meni_Rosenfeld12y90

In the notation of that post, I'd say I am interested mostly in the argument over "Whether a Bayesian or frequentist algorithm is better suited to solving a particular problem", generalized over a wide range of problems. And the sort of frequentism I have in mind seems to be "frequentist guarantee" - the process of taking data and making inferences from it on some quantity of interest, and the importance to be given to guarantees on the process.

1jsteinhardt12y

How wide a range did you have in mind? It's certainly not the case that Bayesian methods are universally better than frequentist ones.

What is the best paper explaining the superiority of Bayesianism over frequentism?

Meni_Rosenfeld12y60

Would it? Maybe the question (in its current form) isn't good, but I think there are good answers for it. Those answers should be prominently searchable.

Less Wrong used to like Bitcoin before it was cool. Time for a revisit?

Meni_Rosenfeld13y20

Except it's not really a prediction market. You could know the exact probability of an event happening, which is different from the market's opinion, and still not be able to guarantee profit (on average).

Rational Romantic Relationships, Part 1: Relationship Styles and Attraction Basics

Meni_Rosenfeld14y30

but the blue strategy aims to maximize the frequency of somewhat positive responses while the red strategy aims to maximize the frequency of highly positive responses.

It's the other way around.

Enjoying food more: a case study in third options

Meni_Rosenfeld14y20

I guess the Umesh principle applies. If you never have to throw food away, you're preparing too little.

1jimrandomh14y

Or worse, eating things you otherwise wouldn't whenever that would be necessary to keep things from spoiling.

Making money with Bitcoin?

Meni_Rosenfeld14y30

If you haven't already, you can try deepbit.net. I did, and it's working nicely so far.

Where in the world is the SIAI house?

Meni_Rosenfeld14y10

Thanks, will do.

Jerusalem meetup Nov. 20

do you know that group? do you want their contact info?

No, and no need - I trust I'll find them should the need arise.

Jerusalem meetup Nov. 20

An Xtranormal Intelligence Explosion

I'm interested in being there, but that's a pretty long drive for me. Is there any chance to make it in Tel-Aviv instead?

2DanArmak15y

If it helps, I'm coming to the meetup from T-A, and I can give you a ride to Jerusalem and back.

1AnnaSalamon15y

Not this time, although I could be talked into doing a Tel Aviv meet-up in a few weeks. This is meet-up is actually flowing out of a Tel Aviv Transhumanist meet-up a week and a half ago (do you know that group? do you want their contact info?) which, in retrospect, I really should have announced on here.

Meni_Rosenfeld15y70

At the risk of stating the obvious: The information content of a datum is its surprisal, the logarithm of the prior probability that it is true. If I currently give 1% chance that the cat in the box is dead, discovering that it is dead gives me 6.64 bits of information.

Eliezer Yudkowsky Facts

Meni_Rosenfeld15y110

Eliezer Yudkowski can solve EXPTIME-complete problems in polynomial time.

Sorry, I'm not sure I know how to answer that.

-4adsenanim15y

The more complex a system becomes, the easier it is to destabilize it. Is this a conditional argument?

Eliezer Yudkowsky Facts

Just in case anyone didn't get the joke (rot13):

Gur novyvgl gb qvivqr ol mreb vf pbzzbayl nggevohgrq gb Puhpx Abeevf, naq n fvathynevgl, n gbcvp bs vagrerfg gb RL, vf nyfb n zngurzngvpny grez eryngrq gb qvivfvba ol mreb (uggc://ra.jvxvcrqvn.bet/jvxv/Zngurzngvpny_fvathynevgl).

Eliezer Yudkowsky Facts

Meni_Rosenfeld15y140

When Eliezer Yudkowsky divides by zero, he gets a singularity.

0Meni_Rosenfeld15y

Just in case anyone didn't get the joke (rot13): Gur novyvgl gb qvivqr ol mreb vf pbzzbayl nggevohgrq gb Puhpx Abeevf, naq n fvathynevgl, n gbcvp bs vagrerfg gb RL, vf nyfb n zngurzngvpny grez eryngrq gb qvivfvba ol mreb (uggc://ra.jvxvcrqvn.bet/jvxv/Zngurzngvpny_fvathynevgl).

Meni_Rosenfeld15y10

Now that I've looked it up, I don't think it really has the same intuitions behind it as mixed strategy NE. But it does have an interesting connection with swings. If you try to push a heavy pendulum one way, you won't get very far. Trying the other way you'll also be out of luck. But if you push and pull alternately at the right frequency, you will obtain an impressive amplitude and height. Maybe it is because I've had firsthand experience with this that I don't find Parrondo's paradox all that puzzling.

1adsenanim15y

From what you are saying, with the mixed strategy NE, I get that possible moves increase in relation to the complexity of the equilibrium, so that it becomes increasingly likely that any possible action could have an added emphasis that would cause a specific outcome as the equilibrium increases in complexity. e.g. What you are describing with the pendulum motion, the pendulum does not require additional effort in both directions to increase, only one direction, and the effort need be only the smallest (or smaller in addition) in relation to the period, and direction. An action to large in the same direction, or against the direction will destabilize it. Isn't it true that the more precise the equilibrium, the less effort is required to destabilize it? I think that the main difference between our arguments is that while you are talking of simultaneous action, I am talking of sequential action...

Meni_Rosenfeld15y10

Okay, then it looks like we are in agreement.

I'll consider it, But I don't know if I'm the right person for that, or if I'll have the time.

Meni_Rosenfeld15y170

Short answer: I already addressed this. Is your point that I didn't emphasize it enough?

One thing should be kept in mind. A Nash equilibrium strategy, much like a minimax strategy, is "safe". It makes sure your expected payoff won't be too low no matter how clever your opponent is. But what if you don't want to be safe - what if you want to win? If you have good reason to believe you are smarter than your opponent, that he will play a non-equilibrium strategy you'll be able to predict, then go ahead and counter that strategy. Nash equilibria ar

... (read more)

2khafra15y

To expand on palfchristiano's quibble with "if you don't know exactly what is going on, you should just play NE": The technically correct phraseology for that part may be too complicated for this post. But it would be closer to "if your belief that you can predict your opponent's non-ideal behavior, multiplied by your expected gain from exploitation of that prediction, exceeds your expected loss from failing to correctly predict your opponent's behavior, go ahead."

4paulfchristiano15y

I didn't really mean to insult your post (although I apparently did :). I was probably just as surprised as you at many of the comments in that thread. I agree that you should understand NE if you want to say anything useful about games (and that they are basically the complete story for two-player zero sum games from a theoretical perspective). The one thing I object to is the sentiment that "if you don't know exactly what is going on, you should just play NE." After re-reading your latest post this a bit of a straw man. I agree totally that if you know nothing else then you have nothing to do but play the NE, which is all that you actually said. However, you can put any game that you are faced with into the reference class of "games humans play," and so I don't think this fact is very relevant most of the time. In particular, if the question was "what should you do in an MMO with these properties" then there are many acceptable answers other than play the NE. It may or may not be the case that anyone in the thread in question actually gave one. In particular, because I can put all games I have ever played into the reference class "games that I have ever played," I can apply, over the course of my entire life, an online learning algorithm which will allow me to significantly outperform the Nash equilibrium. In practice, I can do better in many games than is possible against perfectly rational opponents.

Meni_Rosenfeld15y20

Fixed.

Any LW-ers in Munich, Athens, or Israel?

Meni_Rosenfeld15y50

Thanks. I'll move it as soon as I have the required karma.

Swords and Armor: A Game Theory Thought Experiment

I don't know if I qualify as a LW-er, but I'm in Israel. I'll be happy to meet you two if you are interested. I'm in Tel-Aviv every day.

Gambit said the only equilibrium was mixed, with 1/5 each of (blue sword, blue armor), (blue sword, green armor), (yellow sword, yellow armor), (green sword, yellow armor), and (green sword, green armor).

FWIW, my calculations confirm this - you beat me to posting. One nitpick - this is not the only equilibrium, you can transfer weight from (blue, green) to (red, green) up to 10%.

Meni_Rosenfeld15y20

But if you can't look at the current distribution, you still need to use the equilibrium for this single choice. Otherwise, you're at risk that everyone will think the same as you, except for a few smarter players who will counter it.

Is Rationality Maximization of Expected Value?

I actually briefly considered mentioning correlated equilibria, but the post was getting long already.

Meni_Rosenfeld15y10

You can't have your cake and eat it too. If the probability is low enough, or the penalty mild enough, that the rational action is to take the gamble, then necessarily the expected utility will be positive.

Taking your driving example, if I evaluate a day of work as 100 utilons, my life as 10MU, and estimate the probability to die while driving to work as 1/M, then driving to work has an expected gain of 90U.

Is Rationality Maximization of Expected Value?