Selected Essays: ML Alignment and Theory Scholars
Featured Essays
A Threat-First Framework for AI's Impact on Humanity
My most basic north star is harm reduction, especially that of people who are alive today and who live all around the world, including and especially the Global South...
Contextual Image Description Evaluation for Accessibility
My project addressed the evaluation methodology for image captions, especially those written by VLMs, for images on the Internet, particularly for accessibility purposes to blind and low-vision people...
China's AI Regulation: A 57.5% Forecast
I claim that over the next 5 years, the ruling government of the People's Republic of China will impose regulations that significantly impede domestic private sector development of advanced AI...
Office Hours: A Mentorship Matching Network for AI Safety
I was one of many Computer Science majors who applied to our AI Safety club simply because it has "AI" in the name. I'm now in SPAR, have a future internship from EAG, and help run our fellowships...
Red-Teaming as a Service: A For-Profit AI Safety Org
Advertising on the basis of profitability attracts a fundamentally different but still equally talented audience as nonprofit or principles-forward approaches to AI safety...
Leading the Autonomous Car Internship
In summer 2023, I headed a computer vision internship at the University of Rhode Island; our team of six had four weeks to build autonomous model cars that could navigate a color-delineated track...
Revolutionary Love and the AI Safety Talent Pipeline
At a BlueDot Impact panel on AI safety talent, mid-career attendees frustrated by selectivity weren't satisfied by tips to "produce legible output." I think revolutionary love — community over competition — is the lever the field is missing...
A Beijing–Washington Hotline for AI Risk
The longstanding US-China rivalry has sweeping and sometimes tragic effects on scientists. If geopolitical tensions over the AI race flare, will we be lucky enough to have another Arkhipov or Petrov?...
01 · A Threat-First Framework for Ranking AI's Impact on Humanity
Prompt: Imagine each of the options below are fully achievable. Which would be the most beneficial for the long-term flourishing of humanity? Rank from top (most beneficial) to bottom (least). Explain the framework you used to arrive at your top three ranked outcomes. Please address all three of the following: What's your north star? What criteria did you apply? Walk through your top three through the lens of those criteria.
- Building oversight mechanisms that detect when AI systems pursue unintended goals
- Minimizing unrestricted access to state-of-the-art AI chips
- Building defenses against AI-enabled biological threats
- Detecting and labeling AI-generated misinformation and synthetic media
- Ensuring fair compensation for workers displaced by AI automation
- Reducing the environmental impact of AI data centers
- Protecting creators' work from unauthorized use in AI training data
- Accelerating AI applications in scientific research
- Building AI-powered robotic systems that can learn from physical interaction with their environment
- Reducing the computational cost of training frontier AI models
My most basic north star is harm reduction, especially that of people who are alive today and who live all around the world (including and especially the Global South). I do think that especially with new developments like Claude Mythos, the issue of scalable oversight (my top choice) essentially subsumes most other technical problems in AI safety in NP-complete fashion. My ranking also captures a universal lack of trust for both AI and human agents, keeping in mind the sobering fact that capabilities are advancing far faster than we can even measure, let alone control (as is currently the case with the ongoing saturation of METR's time horizon metric).
Generally I ordered by a few different categories, those being "direct threats" (top 3–4), "rippling threats" (next 3–4), and finally "positives" (the rest). Although obviously positive advancements would help humanity and presumably solve global issues, I don't believe they have the same urgency as is posed by the threats of extremely capable AI models which could actively cause massive deaths thanks to misalignment or bad actors. Thus the top ones have a high potential to actively cause death due to misused power; the middle ones have the power to greatly lower quality of life, especially for marginalised communities, in the longer-term; and the final ones solve extant problems but aren't risks that need to be mitigated so much as perpetual goals we get to have once we mitigate the risks.
(I think misinformation is middling between the first two categories, since I personally feel most people underestimate the severe effects on a society of mis- and mal-education, especially relating to matters of propaganda, war, job skills, and general social harmony/function.)
The top three choices most definitely fit into the first category, though, which is why they top my list; I would say they're ordered from least to most predictable. Although AI-enabled biological pandemics would of course be incredibly bad, virology, immunology, epidemiology, etc. are pretty developed fields, and we have some level of infrastructure after COVID in place to protect against such threats. But misuse by rogue human agents (#2), and (being even smarter than the humans) rogue AI agents (#1) are extremely difficult to predict, as well as probably being unthinkably powerful. I think the existence of existing weapons of mass destruction and the dependence of our modern economy and world on digitised infrastructure leaves the world insanely vulnerable.
02 · Contextual Image Description Evaluation for Accessibility
Prompt: Tell us about a project, problem, or research question you've spent more than 20 hours on in the past 12 months. Please address each of the following separately in your response: What were you trying to figure out, and why did it matter? What did you actually do? What surprised you? What did you do wrong, or what turned out to be suboptimal in retrospect? What's still unresolved or unknown?
My project addressed the evaluation methodology for image captions, especially those written by VLMs, for images on the Internet, particularly for accessibility purposes to blind and low-vision (BLV) people. Current referenceless image description metrics like CLIPScore (a model based upon OpenAI's CLIP architecture, used to measure the semantic alignment between an image and an associated piece of text) only measure image-text compatibility, or whether a description is visually accurate. However, a description can be visually accurate while either failing to describe what users would actually notice, or failing its contextual communicative purpose. Kreiss et al. (2022) establish context relevance as an unsolved problem with CLIPScore, which only measures compatibility; I proposed that description quality decomposes into these two independent axes as well as saliency (i.e. how conspicuous or noticeable certain objects or features in an image tend to be to human viewers). This had implications for the work of Kapur et al. (2026), who use CLIPScore as a direct measure of description specificity; I expanded on this work by also introducing better-defined metrics for usefulness and accessibility to human readers. Especially as my findings showed comparative bias in favour of sighted users when an LLM-as-judge method was used, both the novel decomposition of caption quality along several independent axes as well as the examination of evaluations from a disability justice perspective are impactful as we try to embed justice and equity into digital spaces.
I spent about 30 or so hours on this project across the latter half of this academic term. I designed and implemented several referenceless end-to-end NLP/CV evaluation pipelines, with methods grounded in literature review and validated using human subject rating data on a specially-curated dataset of images, associated contexts, and human-written sample captions. I used PyTorch and pandas to compute CLIPScore, BERTScore recall, and a saliency alignment metric that I designed myself using both heatmap visualisation (grounded in human eye-tracking data) and also bounded object detection. With these, I conceptually and empirically proved each axis' independence (i.e. the compatibility of a caption with its image, the amount of salient objects it describes, and its relatedness to surrounding context are all independent). I presented this as an internal finding with my lab and intend to continue it to eventually submit for publication.
One thing that surprised me was the fact that BERTScore (which quantitatively measures token-level semantic alignment) showed significantly more alignment with the preferences of BLV users when rating sample captions, while GPT-4o evaluation (LLM-as-judge) showed significantly more alignment with the preferences of sighted users. It is yet unknown but hypothesised that BLV users depend more on lexical content while sighted users can compensate with visual information (and that models like GPT somehow "think like sighted people" while underrepresenting BLV perspectives).
Probably the most suboptimal thing in retrospect was the way I designed the saliency pipeline; it showed lower correlation than expected with human-rated caption quality.
It is also still unknown to me how much influence saliency may or may not have over human preferences as opposed to there being failures present in my engineering of the saliency pipeline; this is probably the biggest extant issue upon which I'd like to expand in future work.
Works Cited
Kapur, Karren, et al. "Referenceless Evaluation of Image Descriptions via Discriminativity."
arXiv preprint arXiv:2601.04609, 2026.
Kreiss, Elisa, et al. "Context Matters for Image Descriptions for Accessibility: Challenges for
Referenceless Evaluation Metrics." Proceedings of the 2022 Conference on Empirical Methods in
Natural Language Processing, Association for Computational Linguistics, 2022, pp. 4912–4930.
03 · China's AI Regulation: A 57.5% Forecast
Prompt: Make a claim about advanced AI over the next 5 years that thoughtful, knowledgeable people would disagree about (i.e., don't state obvious claims). The claim should be strategically meaningful. State the claim precisely enough to be checkable 5 years from now. What probability do you assign to it? Why? What are some sub-questions or mechanisms that your estimate depends on? Name a specific observation in the next 12 months that would update you ≥20 percentage points up, and a separate observation that would update you ≥20 points down.
I claim that over the next 5 years, the ruling government of the People's Republic of China will impose regulations that significantly impede domestic private sector development of advanced AI before their frontier models reach the same capabilities that those of the United States of America have in the same timespan. I assign this event a 57.5% probability.
This claim is founded on questions of: the incentives that the Chinese government does or does not have to develop advanced AI, their conceptions of AGI and AI safety, and my understanding of Chinese politics and its relationship with the market. Specifically, I have firstly understood that some Chinese citizens believe the ruling party of the PRC has little incentive to encourage development of super-advanced AI systems, primarily because this would destabilise their political power structure. Because of its centralised and highly-regulated nature, particularly regarding digital spaces, capabilities that they cannot scalably oversee ultimately threaten their interests in maintaining established political power and social order/harmony.
Additionally, the understanding that government officials of any country have of AI is only questionably similar to that of its tech leaders. I believe that the CPC does not have the same frameworks when it comes to AI capabilities as do Chinese or American tech leaders. Firstly, their politics are not directly motivated by the forces of capital as they are in the United States, and so they actually often operate antagonistically to tech leaders, billionaires, and other private institutions. Especially as they have a vested interest in maintaining/restarting their economic boom, I at the very least don't believe that they operate and forecast via X-risk, nor does the understanding of AI safety as something focused on capabilities specifically necessarily matter as much as does its economic and socio-symbolic impact on their populace. In other words, their understanding of advanced AI may be entangled enough with established sectors (e.g. robotics, networking, IoT) as opposed to in notions of AGI that their goal in the advanced AI race wouldn't depend on accelerating capabilities alone, nor are they incentivised for these capabilities once they are achieved to lie in the hands of individuals and private companies.
As such, I find it somewhat likely that they regulate private companies as a baseline, especially in response to rapidly-shifting economic conditions, such that their capabilities are not able to reach those of advancing American AI. Although, this probability is middling as any one of these conditions might change, including paranoia about losing the AI race and the development of stronger AI safety infrastructure in the PRC.
I'd be updated 20% points up in this claim if the CPC cracked down on domestic tech leaders in AI as they did on Jack Ma in 2020, as this confirms the strength I'd anticipated of their priority list. I'd be updated 20% points down if they were revealed to have funded private-sector industrial espionage from American frontier AIs in, say, model weight theft from Anthropic on behalf of DeepSeek.
04 · Office Hours: A Mentorship Matching Network for AI Safety
Prompt: Describe the organization or project you want to found. It should be related to AI safety and/or biosecurity. What problem does it address? Why now? What's the cheapest (in terms of time and money) possible test of whether it should exist?
I was one of many Computer Science majors who applied to our AI Safety club simply because it has "AI" in the name. I'm now in SPAR, have a future internship from EAG, and help run our fellowships, thanks mostly to enlightening conversations with high-context upperclassmen who gave me necessary foresight, support, and encouragement. I also personally know a lot of very talented and smart people my age who were turned away from AI safety and EA as a whole due to very minor rejections and lack of bandwidth: a rejection from my own university AI safety club, or from an advising call with 80k Hours. Given my own success story, I have an extremely firm belief in the simple power of organising and motivation on the scale of individual attention. Even if capacity is limited, the fact of an acceptance rather than a rejection is enough to inspire people towards a cause in my experience, especially younger people who are competing for every single job, position, social experience, and general opportunity in sight in today's world. Thus, I propose an "Office Hours" matching network, hosted by a nonprofit (whose name I'd still be working on), whereby high school and university students (especially those with little exposure to AI safety) can receive 1:1 mentorship calls with industry professionals, student organisers, and other high-context individuals. Even a short chat from such people could carry immense weight, helping students both find opportunities and the confidence to pursue them.
Many clubs and alumni programs at my school already offer such 1:1 chats with industry professionals; I've found them useful as a participant, but consistently can't go deep on career advice with them because "I'm not in research"; "I'm not in AI/ML"; or "I don't know anything about AI safety." I'm lucky to have access to experienced peers to fill this gap, but most students might not have that chance. Scaling the reach of high-context people, from maybe only mentoring 2–3 people deeply to having meaningful 30-minute conversations with 20+ students/year, could multiply otherwise scarce attention. It would also be extremely low-cost; in a similar fashion to SPAR, a few organisers could bridge the talent gap by allowing networking, mentorship, and even simple human connection between the brightest minds within and outside of AI safety. Again, university students in particular are a primary and growing demographic for targeted advertisement of opportunity; in a world that may feel hopeless and barren of possibility, the inspiration to make change and invitation to contribute however one can is possibly the single greatest motivator. As most AI safety orgs are starting to recognise, generalist bandwidth especially is quite necessary in AI safety, and I think that dedicated low-cost initiatives to scale their talent and attract attention to the cause would contribute extensive impact to downstream research, projects, and careers.
Probably the cheapest test of why it should exist would be free: simply advertise 1:1 mentorship in prestigious AI careers from professionals at companies like OpenAI and Anthropic, and simply see the sheer number of signups from students at my university alone. We already get speakers from these companies and collaborators in our organising efforts willing to make time for our club members; this wouldn't be much further of a stretch, although dedicated nonprofit facilitation would serve its purpose and scaling ability far greater than can one single club. Much like my school club did for me, it could pipeline students otherwise on the periphery of the field into campus/local groups, advanced programs, and a supportive community in which to collectively thrive.
05 · Red-Teaming as a Service: A For-Profit AI Safety Org
Prompt: From Eric Ho's post about possible for-profit AI alignment org ideas, find a need or idea that you think is interesting and sketch an org idea around it. What's the theory of impact? What's the crux determining whether you're successful? Is this best-suited to be a non-profit or a for-profit org? Why?
From Eric Ho's post about possible for-profit AI alignment org ideas, I think that it's an extremely true sentiment that advertising on the basis of profitability attracts a fundamentally different but still equally talented audience as nonprofit or principles-forward approaches to AI safety. While I still believe that principles-forward approaches are extremely important and probably would continue to make up the bulk of impact, I have also seen in my personal life the denial of the concept of X-risk as well as the extreme apathy to it as unrelated to climbing the American socio-economic ladder. As such, I believe there is a place for for-profit orgs to have some sort of interaction with nonprofits, frontier labs, and the government such that top talent is not wasted in quant trading or the MIC. This is also summarily my theory of impact: there are many people who start from even college or high school aiming to simply maximise their incomes, seeking to utilise what they know are their own exceptional talents for causes that may actively bring harm into the world and actually increase extinction risk. As such, I think that the establishment of strong for-profit organisations that recruit from such talent pools, such as evals/auditing and red-teaming contractors, could be extremely beneficial to these goals, as they depend directly on innovation and out-of-distribution problem-solving. Even if these orgs end up contracting primarily for nonprofits or even governments, I think that the minor change in structure would allow them to take basically the same cooperative place in the AI safety ecosystem as would a nonprofit while offering what is probably exponentially more competitive talent. Particularly in cases like red-teaming, malicious agents motivated by financial gain could be incentivised to do the same kind of work but for good instead. As such, business models like evals and red-teaming as a service, particularly for frontier labs and governments, could become an extremely beneficial organ for such organisations without them needing to uproot their current testing pipelines and employee composition.
Probably the crux determining whether this initiative was successful would actually be whether it were able to offer competitive salaries and career pathways to "10x engineers" or other top performers in relevant fields. I would think that hiring statistics and the simple amount of revenue the company generates in response to market pressures and the financial willingness of other organisations for this need to be filled by (hopefully) cream-of-the-crop engineers are suitable indicators for the concept being proven or disproven.
06 · Leading the Autonomous Car Internship
Prompt: Describe a concrete instance of you taking ownership of a project, program, or organization-level effort and getting something shipped. Include the scope, your role, the outcome, and what was suboptimal in retrospect.
In summer 2023, I headed a computer vision internship at the University of Rhode Island; our team of six had four weeks to build autonomous model cars that could navigate a color-delineated track. The team spanned seasoned programmers and robotics veterans to artists and first-time coders. I structured our collaboration around these experiential gaps, creating open space for casual yet dynamic discussion, pairing less-experienced members with stronger ones on specific bottlenecks, and honouring the intellectual risk of experimental ideas with rigorous iteration alongside established methods. When we hit motor stops and navigation failures mid-project, integrating feedback from cross-background conversations produced the redesign that fixed it.
As I also organised most of the logistics of our internship, including funding, scheduling, catering, and transportation, I found myself naturally co-optimising our workflow for both flexible productivity and spontaneous fun. Revamping our meal plans for vegetarian accessibility enhanced creativity through inclusion, giving everyone the confidence to contribute. Cycling carpool groupings generated interdisciplinary build ideas that may never have organically surfaced in-lab. Paid time off for our intern retreat (a day trip to the Newport Mansions) rejuvenated spirits enough to finish a whole data processing pipeline the following Monday.
What was suboptimal in retrospect was probably an under-focus in structure. Although my team members delivered spectacular results, I do think that enforcing whole-team meetings and collaboration times would have probably benefitted the cohesion and vision of the project. Especially as I was doing a lot of the intern work myself, I recognise that I also got caught up in the individual engineering problems I was solving rather than the grander vision or executive direction of the project; I or someone else stepping up and taking more explicit managerial responsibilities probably would've helped us collate our various efforts into something more lubricated.
Overall, though, this balance of rigorous investigation and active listening, independence and community, allowed us to ultimately deliver an impressive product, running challenging tracks with 95% successful runs on a scalable system. I'm personally most proud of how my highly adaptive, peer-to-peer leadership style facilitated the success of a diverse team while also building a long-lasting camaraderie.
07 · Revolutionary Love and the AI Safety Talent Pipeline
Prompt: What do you think is the most important lever for making sure AI goes well for humanity?
"'Revolutionary love' is the choice to labor for others, for our opponents, and for ourselves in order to transform the world around us" (Revolutionary Love Project).
At the BlueDot Impact panel "AI Safety Needs Generalists — Here's How to Get In", held on May 22, 2026, there were an array of mixed emotions expressed by the attendees (and, I'd assume, the panelists, too). As one can probably infer from the title, the panelists explained the urgent capacity bottlenecks that the field of AI safety is experiencing in terms of talent and organising capacity, and proposed some limited number of opportunities for people to apply to so that they can compete to fill that gap. Expectedly, some attendees, especially many of those who'd been recently rejected from the inaugural class of the generalist-focused Generator Residency, were getting somewhat frustrated at these sentiments in the Zoom chat due to the apparent contradiction: how can AI safety recruiters claim to be so strapped for talent, and yet be seemingly hyperselective about who they let onto their teams? Particularly as many of the attendees were mid-career professionals attempting to pivot into AI safety, initial answers about showcasing previous cause-related work also didn't seem exactly helpful. Panelists proceeded to give suggestions about producing legible output as opposed to "LinkedIn slop" and demonstrating value and truth alignment (i.e. recognising that X-risk is real rather than believe "cope" that it will probably be solved somehow). However, although I didn't personally comment in the chat, I (and I'm sure many others) noticed that these answers didn't actually solve the problem. They were tips on how to distinguish yourself from other applicants to the position you want — helpful tips, sure, but they didn't answer why the position is so selective in the first place. Despite this, however, some attendees' gripes and grumbles remained unresolved by the hour-long panel's end.
I've seen similar situations play out in my own life, too. One of my friends at my university expressed actual interest in our AI Safety club, investing over two hours in the "intro to AIS" fellowship application, only to lose it completely once he was rejected (and start working on standard MLE instead). Another actually did make it in and has even attended board meetings, but he's been rejected from the external AI safety fellowships he's applied to, and is wrapped up in his part-time SWE internship. Even my own sister, a chronically offline med school student whom I had no idea had ever heard of Effective Altruism, told me randomly that she'd actually applied for an 80k hours advising call, only to be rejected and then never touch the movement again. (That last one took me by total surprise!)
As an AI Safety @ UCLA organiser who went from never having trained an AI model to SPAR participant within 6 months, there's definitely cases where the current talent pipeline absolutely succeeds; I'm totally invested in AI safety as a field and a career. I've also faced an arguably greater amount of rejection, though; differences in outcomes between myself and my loved ones can't be chalked up to simple (and slightly demoralising) concepts like competence, intelligence, or "legible impressive-ness." I'd say the by-far biggest motivator for me to start doing ML research, apply to programs like SPAR and MATS, attend EAGs, and generally contextually immerse myself in the field have been the connections and genuine mentorship I've received from senior organisers in my university club. Encouragement, advice, networking opportunities, and most of all reminders to maintain perspective in the face of rejection, all coming from people I trusted, were the foundation for my motivation to continue my efforts in the field. One could say that I was already value-aligned and primed to do AI safety work, but I'm inclined to call this value alignment "revolutionary love." It captures the desire to labour and utilise my time and position to create revolutionary change with respect to daunting global risks and harmful structures, with the grit, determination, and aspirational altruism best characterised by a "loving" spirit for both current and future generations.
Conversely, I'm also inclined to think that revolutionary love on the side of organisers — people who can provide the mentorship, guidance, and unwavering support that I so enjoyed — is also something the AI safety talent pipeline desperately needs. Especially as people increasingly scramble for impactful opportunities and experiences in the face of an extremely uncertain job market and world, I believe that revolutionary love — forged by community and human connection as opposed to hyper-optimising competition — will draw more deserving talent to the movement than just monetary or careerist incentive structures. I think that the average person is more value-aligned than many AI safety people believe; empirically, the prospect of "doing good" is almost never seen as a negative for prospective applicants. However, in the face of competing agents in financial incentive and attention economy, retention amidst a slew of (albeit often necessary) rejections does more harm to making AI go well for humanity than I think people care to admit. People today have existential and practical longings to be part of causes greater than themselves; I believe that focusing on expanding personable, accessible, and inertial movement-building will both reach people on the brink of meaningful contribution as well as leverage the psychology of community for dedicated long-term career retention. I don't see this lever as a panacea by any means, and I also wouldn't say I want to promise that every single attendee at that meeting can secure a stable position in AIS field-building. However, I think impressions of the panel may have at least been far more positive if panelists' answers had focused on such a sense of community, proposing that we make space for people who want to help rather than advising attendees on how to best step on and outcompete each other for false scarcity. The movement must possess, in the words of Antonio Gramsci, a "pessimism of the intellect, optimism of the will", and prospective founders, field-builders, researchers, policymakers, strategists, and those in every other field of AIS must cultivate this optimism of the will on both sides of the talent pipeline.
Works Cited
Revolutionary Love Project. https://revolutionarylove.org/about/
08 · A Beijing–Washington Hotline for AI Risk
Prompt: Pick a governance intervention for advanced AI that you think is genuinely desirable but currently hard to implement. Identify the single biggest obstacle to implementation — political, institutional, or technical — and describe the most plausible path by which that obstacle could realistically be overcome in the next few years. (300 words)
Jane Ying Wu was a Chinese-born American neuroscientist who served as a professor and researcher at Northwestern University. After extensive investigation by the NIH for ties to alleged economic and intellectual espionage — of which she was never formally charged — Wu's lab was closed in May of 2024. On July 10 of the same year, she committed suicide in her home (Nature).
A 2023 study found that among Chinese American scientists, 72% did not feel safe as an academic researcher; 65% were worried about collaborations with China (PNAS). As an Asian American Studies major, I find it obvious that the longstanding rivalry and deep-seated mistrust between the US and China has sweeping and sometimes tragic effects. I believe that this ingrained mistrust of Chinese people makes experts afraid of necessary cooperation and could easily lead to catastrophe; if geopolitical tensions over the AI race flare, will we be lucky enough to have another Arkhipov or Petrov?
Similarly to the Cuban Missile Crisis, I propose an idealised Beijing-Washington hotline, leveraging mutual interest in preventing MAD for collaborative risk assessment and mitigating misinterpreted hostilities. Housing both ends of this hotline could create a controlled (but not surveilled) collaboration between Chinese and American academics, who are the experts outputting the research and recommendations that actually shape AI. Humanising Chinese scientists in the US (and vice-versa) whilst regulating work within safety rather than capabilities (e.g. evals and taxonomies as opposed to "distillable" model weights) would build a culture of transparent trust and collaboration. The biggest obstacle to this would be finding needed experts, especially Chinese Americans, willing to bridge international divisions for fear of investigation and suspicion. My hope is that transparency-as-protection, prioritising regulated, typical institutional bureaucracy over punishment and scrutiny, can encourage mutual risk mitigation, sustain cultural goodwill, and ensure a case like Jane Ying Wu's never happens again.
Works Cited
"Academics demand apology for scientist investigated for China ties but never charged." Nature. https://www.nature.com/articles/d41586-026-01113-7
"Caught in the crossfire: Fears of Chinese–American scientists." PNAS. https://www.pnas.org/doi/10.1073/pnas.2216248120