The Leaky AI Safety Pipeline

May 19, 2026

At a glance

Many programs and fellowships exist in the AI safety community to get people interested in AI safety and build up their technical knowledge. However, there remains a substantial difficulty in getting from knowledgeable to a credible researcher with published and peer-reviewed output. I share my own guidelines to breaking out as an independent researcher and argue for an online resource that matches aspiring researchers with collaborators and presents guides from successful established researchers.

This post was submitted to LessWrong and is targeted specifically to that community.

My first AI security paper as an independent researcher (with one other independent collaborator) was just accepted to the Security in Machine Learning Applications workshop at ACNS 2026. This was an 8-month process: I spent 2 weeks convincing myself that what I was trying to do was possible, 4 months building, evaluating, and collecting results, and another 4 months writing the paper (and making strategic mistakes) before submitting.

My paper was not radical or paradigm-shifting by any means, but I think getting through peer review and being told by other researchers that the contribution has value is a reasonable bar for becoming a “productive” researcher. If maximizing AI safety research output is our goal, then we ought to maximize the number of productive researchers.

How do we actually do that? I quite like this figure, taken [directly from Chris_Leong’s earlier post](directly from Chris_Leong’s earlier post) on the pipeline toward becoming a productive researcher:

I want to focus specifically on the final two stages of this pipeline. Programs like BlueDot and SPAR can get people to understand the frameworks of AI safety, but the conversion rate into productive researchers with peer-reviewed contributions is a major leak in the AI safety research pipeline. There are several fellowships with substantial research output like MATS and other smaller groups, but these are notoriously selective and suffer from a serious shortage of mentors. Once you’re past this hurdle and get a paper through peer review, even at a small workshop, the credibility you gain and the networking opportunities you can access at an academic conference can turn AI safety research into a viable career.

It is not the case, as I have heard from many of my own friends in the EA community, that getting into a fellowship is the only way to pursue AI safety research. What I propose is that more people should commit to a high-collaboration approach to research, and that we should have scalable online BlueDot-style resources to outline the process and connect aspiring researchers with peer collaborators.

“But we already have Neel Nanda’s guide!” I hear you object. True, he advocates for an approach that worked for him. But this is not the approach that worked for me. And clearly it is also not a sufficient framework for many others, as the low turnover from BlueDot graduates to productive researchers remains a persistent issue, including among my own contacts.1 The challenge is to turn research—the discovery and validation of novel ideas—into a scalable online curriculum like BlueDot, with minimal extra burden on existing mentors. Personally, I believe this can be done. My own guidelines to getting started in research look pretty different from Nanda’s (by no means comprehensive and not the point of my post):

  1. Do not work alone. Your thinking will be biased, especially if you rely overly on LLMs. LLMs can reinforce flawed premises and waste significant time, as I experienced firsthand. Just having another human being working with you is immensely valuable to self-calibration.2

    a. Nanda does note toward the end of his 16k-word guide: “Much easier than finding a mentor is finding collaborators, other people to work on the same project with, or just other people also trying to learn more about mech interp, who you can chat with and give each other feedback.” As far as I can tell, this is the extent of his argument for collaboration; the rest of the subsection addresses how to find collaborators. I think it is quite telling that one of my friends summarized his approach to me as “do short sprint projects alone.” I don’t believe he has sufficiently emphasized the importance of collaboration in research. In my experience, ~80% of the calibration that would ordinarily come from a mentor can be achieved just by explaining your thinking to an invested collaborator. Trying to “go it alone” has been hugely detrimental to my own research journey, and I would strongly advise anyone going down this path to at least try to find a collaborator first.

  2. Commit to a single problem in which you’re most interested, and turn this into your “spike”. It takes decades to gain the knowledge of a professor, but weeks or months to gain professor-level knowledge in a tiny slice of a field. Read as much as you can, reimplement papers, think about simple extensions.

    a. In his Stage 3, Nanda argues: “I recommend working in 1-2 week sprints. At the end of each sprint, reflect and make a deliberate decision: continue, or pivot? The default should be to pivot unless the project feels truly promising.” But flitting between projects whenever things got difficult was exactly the problem that prevented me from making substantive contributions in the past. I had to learn to commit and work past failure instead of running from it. Failure is really not a signal to quit or pivot; it means you’re gaining valuable experience and insight about why the problem is hard. Stick with it and keep trying new ideas, and eventually you will have learned enough to start making tangible progress.

  3. Email the authors on the papers you like and propose to them simple extensions or validation/comparison studies. The less well-known the faculty, the better. This will be significantly easier with some mentorship, even high-level guidance. I did not do this, and was able to get by with only strategic guidance by people in entirely different fields. But you do need somebody that knows what they’re talking about to calibrate your thinking, and emailing the authors of the papers I based my own work on is something I absolutely should’ve done in retrospect.

    a. Nanda strongly argues for mentorship. I don’t disagree, but I recognize the reality that there aren’t enough mentors in AI safety research to go around. A less well-known faculty in your rough research area is more than sufficient. Failing that, someone in a different field can still help you with strategy.

  4. Submit something, even if you expect to be rejected. My very first independent paper was completely outside my area of expertise and got soundly rejected by the Berkeley Economic Review. But I did this before my first in-domain paper to get comfortable putting my writing out there, get over my perfectionism, and get myself accustomed to failure and rejection. I got very valuable feedback and learned a lot about what the submission process looks like. I won’t promise that this is a good use of your time, but it helped me get over my submission anxiety.3

    a. Nanda also argues for the importance of writing up your work, but he notes blog posts, Arxiv papers, a workshop paper, and a conference paper as viable options. I strongly disagree with the first two because of the lack of peer review from domain experts. Anybody can tell you that you did something cool, but reviewers at a workshop or conference can give you detailed technical feedback that’s very difficult to find otherwise. Peer review is basically free mentorship! Even if your paper is rejected, you will get incredibly useful signal on what the field values, where your thinking is flawed, and how you can improve for your next attempt. Moreover there is basically no downside, the worst they can tell you is that your idea is bad or poorly defended.

    b. If you are accepted, it’s a massive boost to your credibility, and you get invited to a gathering of many other researchers all interested in similar questions as you. This is a fast-track to finding a good mentor! You don’t even have to submit a full paper; there are short papers, posters, position papers, and many other less competitive options.

I am still thinking through exactly what a “BlueDot-style resource” for turning people into productive researchers would look like. It won’t be a soapbox for my own approach to research; my point in outlining it is to show that “read Neel Nanda’s guide” is insufficient for many people, and we ought to aggregate different approaches from people that have actually broken out as independent researchers. Most importantly, we need to link people with collaborators who share the same interests. Collaborator matching alone could achieve an incredibly high return for a very low investment by mitigating the mentorship bottleneck and improving the quality of ideas.

I’d like to coordinate with 3-5 people to develop this resource, comment if interested or if you have thoughts on what it should look like.


  1. Out of roughly a dozen of my friends in the EA community who have gone through BlueDot, very few (including myself) have subsequently published peer-reviewed technical AI safety work. Many have applied to more competitive fellowships like MATS, but none have been accepted. All have expressed significant interest in doing more for the field, but have cited a lack of mentorship and guidance as their reasons for giving up on research. ↩︎

  2. Do not be grubby about authorship either. I do not believe there is any valor in sole-author papers. Bring on collaborators who are equally passionate about your project, even if they aren’t as technical, and use them to refine your thinking. ↩︎

  3. At the very least, it’ll help you calibrate your sense of how strong your work is. For my recent accepted paper, I believed it had a 70% chance of rejection. That was clearly false. For the previous rejected economics paper, I believed it had a 65% chance of acceptance. That was also clearly inflated. You build this skill by taking risks, making mistakes, and collecting more signal. ↩︎

Comments

All comments have a 300 character minimum to promote meaningful contributions and civil discussion. I look forward to reading your thoughts, feedback, and criticism!

Subscribe via Email

Enter your email and we'll send you a confirmation link. After confirming, you'll get an email each time a new post is published.

What would you like to see?

Copy this URL and paste it into any RSS reader (like Feedly, Inoreader, or Apple News):