The Top AI Safety Bets for 2023

GiveWiki’s Latest Recommendations

Nov 11, 2023

Summary: The AI Safety GiveWiki (formerly Impact Markets) has completed its third round of retroactive impact evaluations – just in time to provide updated recommendations for the giving season! Here is a reminder of how the platform works.

Want to donate? Open up the page of our top project/s, double-check that they are still fundraising, and ka-ching!

Interested in regranting? Check out our post on the (now) $700,000 that want to be allocated.

Top Projects

Our top projects stand out by virtue of their high support scores. There are a lot of ties between these top projects, so we’ve categorized them into tiers.

Note that we determine the top projects according to their support. Further down we’ll cover how our latest evaluation round worked out. But the support scores are two hops removed from those results: (1) Projects receive support in the form of donations as a function of donation size, earliness, and the score of the donor; (2) donors get their scores as a function of size and earliness of their donations and the scores of the beneficiary projects; (3) projects receive their credits from our evaluators:

Project credits → donor scores → project support.

This mimics the price discovery process of a for-profit impact market. Hence it’s also likely that the scores are slightly different by the time you read this article because someone may have entered fresh donation data into the platform.

We have tried to find and reach out to every notable AI safety project, but some may yet be missing from our list because (1) they haven’t heard of us after all, (2) they’re not fundraising from the public, (3) they prefer to keep a low profile, (4) etc. But at the time of writing, we have 106 projects on the platform that are publicly visible and fundraising.

Ties for Tier 1

These are the projects at the very top! FAR AI and the Simon Institute with a support score of (at the time of writing) 213.

Ties for Tier 2

They all have a support score of 212. Such small differences in support are probably quite uninformative. New data or tweaks to our algorithm could easily change their rank.

Ties for Tier 3

Other projects with > 200 support

Note that, while we now market the platform to AI safety, really any project can use it and some may even fare well! We may introduce other specialized GiveWikis in the future.

If you’re just here for the results then this is where you can stop reading.

Evaluation Process

Preliminaries

For this evaluation round, we recruited Charbel-Raphael Segerie, Dima Krasheninnikov, Gurkenglas, Imma Six, Konrad Seifert, Linda Linsefors, Magdalena Wache, Mikhail Samin, Plex, and Steven Kaas as evaluators. Matt Brooks, Frankie Parise, and I may also have pitched in. Some of them ended up not having time for the evaluation. But some of our communication was under the Chatham House Rule, so I’m listing them anyway for added anonymity.

Our detailed instructions included provisions for how to score project outputs according to quality and impact; how to avoid anchoring on other evaluators; how to select artifacts to strike a compromise between comprehensiveness, redundancy, and time investment; how to evaluate projects using wiki credits; and some tips and arrangements.

Outputs are such things as the papers or hackathons that organizations put out. They can create one project per output on our platform, or they can create one project for the whole organization. Conferences cannot be directly evaluated after the fact, so what our evaluators considered were artifacts, such as recordings or attendance statistics. This distinction makes less sense for papers.

The projects were selected from among the projects that had signed up to our website (though in some cases I had helped out with that), limited to those with smaller annual budgets (in the five or lower six digits, according to rough estimates) and those that were accepting donations. The set of outputs was limited to those from 2023 in most cases to keep them relevant to the current work of the project, if any. We made a few exceptions if there were too few outputs from 2023 and there were older, representative outputs.

We hadn’t run an evaluation round at this scale. Previously we were three and could just have a call to sync up. This time everything needed to be more parallelizable.

Hence we followed a two-pronged approach with (1) evaluations of individual outputs using scores, and (2) evaluations of the AI safety activities of whole projects using our wiki credits. If one kind of evaluation fell short, we had another to fall back on.

Lessons Learned

Fast-forward four fortnights, and it turned out that there were too many outputs and too few evaluators so only two outputs had been evaluated more than twice (and 10 had been evaluated more than once). According to this metric, AI Safety Support and AI Safety Events did very well, leaving the other project in the dust by a wide margin – but those numbers were carried just by the scores of one or two evaluators so they’re most likely in large part due to the Optimizer’s Curse.

Hence we decided not to rely on this scoring for our evaluation and rather fall back on the credits for that. But the evaluations came with insightful comments that are still worth sharing.

Next time we’ll use credits only and at most list some outputs to help evaluators who are not familiar with the work of the project to get an idea of what its most important contributions were.

Wiki Credits Ranking

These are the normalized average credits that our evaluators have assigned to the projects. As mentioned above, these determine how richly donors to these projects get rewarded in terms of their donor scores, which then determine the project support: Project credits → donor scores → project support.

Qualitative Results

AI Safety Events

AI Safety Unconference at NeurIPS 2022: One of the evaluators attended it and found it high value for networking, but (empirically) only for networking within the AI safety community, not for recruiting new people to the space.

ML Safety Social at NeurIPS 2022: One evaluator estimated, based on this modeling effort, that the social was about 300 times as impactful as the reference output (“AI Takeover Does Not Mean What You Think It Means”). The estimate was even higher for the safety unconference at the same conference.

Hence AI Safety Events had generally very high ratings. It is not listed among our top recommendations because we don’t have enough donation data on it. If you have supported AI Safety Events in the past, please register your donations! You may well move a good chunk of the (now) $700,000 that donors seek to allocate!

The Inside View

AI Takeover Does Not Mean What You Think It Means: This was our calibration output – it allowed me to understand how an evaluator is using the score and to scale their values up or down. The evaluators who commented on the video were generally happy with its production quality. Some were confused by the title (Paul’s models are probably well known among them) but found it sad that it had so few views. The main benefit over the blog post is probably to reach more people with it, which hasn’t succeeded to any great degree. Maybe we need an EA/AIS marketing agency? I’m also wondering whether it could’ve benefited from a call to action at the end.

AI X-risk Research Podcast

Superalignment with Jan Leike: This interview was popular among evaluators, perhaps because they had largely already watched it. Some were cautious to score it too highly simply because it hadn’t reached enough people yet. But in terms of the content it was well regarded: “The episodes are high-quality in the sense that Daniel asks really good questions which make the podcast overall really informative. I think the particular one with Jan Leike is especially high-impact because Superalignment is such a big player, in some sense it’s the biggest alignment effort in the world.” (The episodes with Scott Aaronson and Vanessa Kosoy received lower impact scores but no comments.)

AI Safety Ideas

The website: “Seems potentially like a lot of value per connection.” The worries were that it might not be sufficiently widely known or used: “I think the idea is really cool, but I haven’t heard of anyone who worked on an idea which they found there.” And does it add much value at the current margin? “I couldn’t find a project on the site which was successful and couldn’t be attributed to the alignment jams. However, if there were some successful projects then it’s a decent impact. And I suspect there were at least some, otherwise Esben wouldn’t have worked on the site.” The evaluators didn’t have the time to disentangle whether people who participated in any Alignment Jams got some of their ideas from AI Safety Ideas or vice versa. All in all the impact scores were on par with the Jan Leike interview.

Orthogonal

Formalizing the QACI alignment formal-goal: This output scored highest on quality and impact (with impact scores in between the three AXRP interviews above) from among Orthogonal’s outputs. It got lower scores on the quality side because the evaluator found it very hard to read (but noting that it’s also just really hard to create a formal framework for outer alignment). But it scored more highly on the impact side. The evaluator thinks that it (the whole QACI idea) is very unlikely to work but highly impactful if it does. The other evaluated outputs were less notable.

Center for Reducing Suffering

Documentary about Dystopian Futures | S-risks and Longtermism: One evaluator gave a lower quality score to the documentary than to the reference output (“AI Takeover Does Not Mean What You Think It Means”) but noted that it “represents longtermism decently and gives an OK definition for s-risk.” They were confused, though, why it was published on a channel with seemingly largely unrelated content (since the context of the channel will color how people see s-risks) and concerned that talking about s-risks publicly can easily be net negative if done wrong.

Avoiding the Worst - Audiobook: The audiobook got the highest impact rating among the CRS outputs even though an evaluator noted that they only counted what it added over the book – another way to access it – which isn’t much in comparison. (The book itself was outside of our evaluation window, having been published in 2022.)

FAR AI

Pretraining Language Models with Human Preferences: One evaluator was excited about this paper in and of itself but worried that it might be a minor contribution on the margin compared to what labs like OpenAI, DeepMind, and Anthropic might’ve published anyway. They mention Constitutional AI as a similar research direction.

Training Language Models with Language Feedback at Scale: While this one scored slightly lower quantitatively, the qualitative review was the same.

Improving Code Generation by Training with Natural Language Feedback: One evaluator was concerned about the converse in this case, that is that the paper might’ve contributed to capabilities and has hence had a negative impact.

Centre For Enabling EA Learning & Research (EA Hotel)

In general: “CEEALAR doesn’t have particularly impressive direct outputs, but I think the indirect outputs which are hard to measure are really good.” Or “the existence of CEEALAR makes me somewhat more productive in my everyday work, because it is kind of stress-reducing to know that there is a backup option for a place to live in case I don’t find a job.”

AI Safety Support

AI Alignment Slack: Invaluable for information distribution. One evaluator mentioned the numerous times that they found out about opportunities through this Slack.

Lots of Links page: “The best collection of resources we currently have,” but with a big difference between the quality and the impact score: “It could be better organized and more up to date (even at a time when it was still maintained).”

Epilogue

Want to regrant some of the (now) $700,000 aggregate donation budget of our users? Please register your donations! The GiveWiki depends on your data.

You’re already a grantmaker or regrantor for a fund? Use the GiveWiki to accept and filter your applications. You will have more time to focus on the top applications, and the applicants won’t have to write yet another separate application.

We’re always happy to have a call or answer your questions in the comments or by email.

GiveWiki and the Markets of Impact

Discussion about this post