Summary
We’re worried that impact markets may set incentives that prevent exactly the sort of projects that are good across many moral systems and worldviews from being able to benefit from them. (Cf. the Giver’s Dilemma.) Prospective funding suffers from the same problem, so it is not a disadvantage of retrofunding over prospective funding. Nevertheless we’ve collected a few ideas for how to address the problem if it comes up.
The Retrofunder’s Dilemma
GiveWell writes:
Imagine that two donors, Alice and Bob, are both considering supporting a charity whose room for more funding is $X, and each is willing to give the full $X to close that gap. If Alice finds out about Bob’s plans, her incentive is to give nothing to the charity, since she knows Bob will fill its funding gap. Conversely, if Bob finds out about Alice’s funding plans, his incentive is to give nothing to the charity and perhaps support another instead. This creates a problematic situation in which neither Alice nor Bob has the incentive to be honest with the other about his/her giving plans and preferences – and each has the incentive to try to wait out the other’s decision.
Let’s suppose that the charity in question is in fact the top priority of each of the donors. The perhaps most internecine aspect of this incentive structure is that this charity is more likely to be uncontroversially good from an impartial perspective the more donors there are who can agree that it is the top priority. It is exactly this most uncontroversially good charity that suffers most.
This is also something that we want to steel impact markets against.
Luckily, the outcome is in the interest of none of the participants, so that we’re potentially faced with a mere “assurance game” problem. If we can find an arrangement that funds the top charity and that all participants agree to, we’ve solved the problem.
Defection by Sellers
There are two importantly different versions of this problem. The first is the one above where retrofunders get stuck in a defect-defect equilibrium. But you could also imagine a world where funders run their separate prize contests and accept submissions not through a public marketplace like ours but through a form whose responses only they can see.
In that world it is feasible for a seller to target only those prizes that are awarded by groups that are least value aligned with them to preserve the resources of the more aligned groups for other projects.
Impact markets probably already alleviate this problem by making projects and funding decisions public and by making it impossible to target applications at particular funders only. There are also already norms against this kind of defection in effective altruism. I think the current norms still fall short, but at least they exist to some extent.
Broader or Narrower Targets
I previously thought that the panacea to maximize the counterfactually valid impact of impact markets was to make the prize questions more specific so that they could target only neglected problems. I now think that that would be tantamount to defecting in a defect-defect equilibrium that leaves the most impactful funding opportunities untapped.
The targets or research questions that a prize contest rewards can be broader or narrower: “We want to see a proof or disproof that Vingean reflection is possible” as opposed to “We want to see contributions to AI safety.” There are advantages and disadvantages to both extremes.
With impact markets we currently want to bring together the people who currently want to but can’t do good (e.g., researchers with little money) and those who can but currently won’t do good (especially for-profit investors) – at least at the current margin, because many of them are probably already doing good up to the point that they can or want to.
But there are also those who can and do do good: funders and researchers with sufficient means. They are similar in some ways and could be seen as one group.
If so, these groups are not cooperating very well at the moment, so a one-sided attempt to cooperate would just lead to exploitation. If funders take this view, they’ll defect back, and we’ll be in a defect-defect equilibrium. This means in practice that they’ll pick targets that they think no one would otherwise work on, and therefore usually targets that are highly specific.
But these groups could also be seen as distinct because funders have a lot of flexibility in what to fund, so that they have a choice, whereas many researchers may be specialized to the point where the criterion of their personal fit more or less prescribes what they need to do. If so, it seems odd to interpret their behavior as defection. If funders take this view, it becomes less clear how they should act.
Another consideration is that the defection, if it is one, is not malicious. If asked, researchers will probably be truthful about the counterfactual of their work in a world without a given prize. So even broad targets will probably mostly allow us to measure our impact.
I don’t currently know how to think about this, so at the moment it seems reasonable to start with broad targets and possibly accept some exploitation, or to start with narrow targets and gradually build up cooperation one contract at a time. Either could lead to our desired outcome of a norm where you can expect retrofunding if you do something great. In any case, different funders will probably have different preferences, and most likely we’ll be able to measure which leads to the better outcomes in the long run.
Remedies
Against Defection by Retrofunders
Confabs and S-Process. The most straightforward way to coordinate retrofunders is to let them talk to each other. There will probably be very few retrofunders for a long time – maybe some two to five or so – which should make it easy for them to talk to each other and hash things out. (This excludes people who fund their own work, who are sort of like retrofunders but perhaps less flexible, as mentioned above.)
In the causal case, coordination mechanisms like the S-Process (or some modification of quadratic funding) could help remedy the problem.
An advantage of retrofunding is that the budgets typically get announced in advance and receive a lot of attention, probably more attention than statistics in an annual report.
Polis. The S-Process may become harder to use for a larger number of retrofunders, esp. less sophisticated ones. But arguably the right kind of UI can still allow users who don’t understand marginal utility functions to derive benefits from it. If we do end up scaling to the level of many retrofunders or decide to consider self-funded researcher retrofunders of their own right or want to start an alternative market for only highly trusted impact certificates that is open to a larger number of retrofunders or some other variation, then there’s another promising tool for us: the coordination mechanism Polis that was used in Taiwan. It seems plausible to me that we can limit the marketplace to research questions that come out of the Polis process: A large number of retrofunders enter their research questions and upvote or downvote all the others. The Polis algorithm then gradually highlights the questions that are most uncontroversially interesting to all market participants. Then only some number of these are entered into the market.
Evidential Cooperation in Large Worlds may have recommendations for how to bargain in the acausal case, and these may be transferable to the causal case in certain communities.
Reselling. Funders may not always consume all the impact that they purchase but may actually want to hold it in expectation of profits via a future retrofunder. This introduces the opposite incentive to the one we’re worried about.
Against Defection by Sellers
No private trades. retrofunders should generally not accept offers made to them in private but only those that are public on the marketplace. That prevents sellers from targeting particular retrofunders.
Automatic auctions. An automatic auction system like the one we want to use has the advantage that sellers can’t just refuse better offers. Still there might be multiple marketplaces, so that a seller could choose the one that is used more by people with different values, or they could sell their cert outside any market.
Big projects. Impact markets should generally be used to finance projects that are big enough that they require seed investments. The profit motive of the investors counters the interests of the altruists running the project and will generally push toward making public offers.
Norms. What would be useful for prospective and retrospective funding alike would be to (1) do community-internal advocacy to strengthen moral cooperation norms, and (2) do community-internal education to make sure we all understand/agree what we consider a defection. That way we’ll be more likely to feel bad about defecting and avoid it, and peers can pressure each other not to do it.
Paul Christiano has also thought about the problem: “By ‘norm’ I mean a rule that individuals can use for deciding how much to fund each public good. Here are two plausible desiderata for a norm: (1) If everyone always follows the norm, then we end up with the optimal levels of funding for the public goods. (2) If you start with a community that follows the norm and add a bunch of new people who behave manipulatively, they can never make the original community worse off.”
It stands to reason that either broader or narrower targets will lead to better or stronger norms or allow them to emerge more quickly. This would be interesting to investigate.
Appendix
Original Formulation
When we were first asked about this scenario, it was in a configuration that we think is unlikely in practice. A fictional example with fictional actors:
Open Philanthropy Project (Open Phil): AI safety is 2x as valuable as international development.
Secretive organizer named Philip (Closed Phil): International development is 100x as valuable as AI safety.
Open Phil announces a contest: Do stuff we value, and we’ll reward you up to $1m!
Closed Phil invests $1m into international development and submits it to the contest. Closed Phil offers the impact to Open Phil at $400k. Open Phil takes the offer because it looks like something worth > $800k, so they rather buy it than some other AI safety impact.
Closed Phil reinvests the $400k into international development.
Rinse and repeat until Open Phil is out of money. Closed Phil has successfully leveraged all or most of Open Phil’s money, and Open Phil has invested comparatively little into AI safety.
After three iterations, > $1.5m may be invested into international development and < $500k into AI safety. If Open Phil had not announced any prize contest, $1m + $333k would’ve gone into international development and $667k would’ve gone into AI safety, which is closer to Open Phil’s preferences, so they don’t want to do a prize contest.
In this formulation it’s not clear why Open Phil would buy the impact, so we have a modus ponens/modus tollens type of situation. Clearly, Closed Phil is continually operating at a loss, so they’re either crazy or they’re interested in the impact and capable of producing it also without the retrofunding. Hence the counterfactual effect of the retrofunding is likely minimal. Open Phil doesn’t want to do something that is not counterfactually impactful. Maybe they will pay out once (maybe at a still lower price) to not seem unpredictable to other participants and reduce trust in their retrofunding, but they will then adjust the purview of the contest to avoid this failure going forward. They can do this, for example by using separate budgets for international development and AI safety or by focusing on only AI safety.
An exception is if Closed Phil manages to deceive them, e.g. by lying about how much they invested or by claiming that they’re just cutting their losses because the project failed by their lights. They can make up some metrics as a pretext that the project didn’t meet that Open Phil doesn’t care about.
The scenario can be rephrased to one where the deception is subtle and no one intentionally retrofunds nonoptimally. That’s the “defection by sellers” scenario above.
Another fun shower thought I had earlier today, was about the problem where researchers (read: me) could be disincentivised to work on or publish something because, once it's published, prospective funders aren't going to pay for it. As usual, they'll reason that their purchase has no counterfactual impact if something is already done.
Solution: Do your work ahead of time and hold it in hostage!
Ideally, I could just go up to a funder, tell them "hand me X money and I'll publish or let you read this work I did." Now, I can *demonstrate* the work before they pay for it, meaning trust and credentials has a lesser role to play. Unfortunately, if I show them my work before they pay for it, it obviates their incentive to pay. So I'm back to having to accept their reduced offer which will be heavily penalised by uncertainty about my competence.
So instead, both me and the potential funder(s) should pay a mutually-trusted third-party to valuate the work based on the funders' subjective values. They're committed to respecting IP.
Meh, it's dumb and I should go to sleep.
Yes! Another reason to worry about narrow targets is costs of compromise*, and researchers are often better equipped to judge their own abilities, interests, and which questions are highest-impact for them to research. Rarely will they find a prize that corresponds to what they can most effectively contribute with.
Re Polis: It may help against the defect-defect coordination problem you talk about, but it doesn't solve the cost-of-compromise problem. The most upvoted research questions are still very narrow from the researcher's perspective.
*https://www.lesswrong.com/posts/DdDt5NXkfuxAnAvGJ/changing-the-world-through-slack-and-hobbies