At one point, it declared, out of nowhere, that it loved me. The central principle of rationality is to figure out which observational signs and logical validities can distinguishwhichof these two conceivableworlds is the metaphorical equivalent of believing in goblins. I think that people who try to do thought-out philanthropy, e.g., Holden Karnofsky of Givewell, would unhesitatingly agree that thesearebothconceivable worlds we prefer not to enter. Will they have anything resembling sexual desire? This is Mind Design Space, the set of possible cognitive algorithms. 2. Does pushing for a lot of public fear about this kind of research, that makes all projects hard, seem hopeless? The dystopian visions are familiar to many inside Silicon Valleys insular AI sector, where a small group of strange but influential subcultures have clashed in recent months. I am not sure how to go about doing that either. Discover world-changing science. However, one sentiment I saw was that optimists tended not to engage with the specific arguments pessimists like Yudkowsky offered. I would potentially be super interested in working with Deepminders if Deepmind set up some internal partition for Okay, accomplished Deepmind researchers whod rather not destroy the world are allowed to form subpartitions of this partition and have their work not be published outside the subpartition let alone Deepmind in general, though maybe you have to report on it to Demis only or something. Id be more skeptical/worried about working with OpenAI-minus-Anthropic because the notion of open AI continues to sound to me like what is the worst possible strategy for making the game board as unplayable as possible while demonizing everybody who tries a strategy that could possibly lead to the survival of humane intelligence, and now a lot of the people who knew about that part have left OpenAI for elsewhere. AI systems can also impersonate you in a completely convincing manner, circumventing systems that demand human presence. Its pretty easy to formally represent that kind of physical information (its just a more careful version of what engineers do anyway). Essentially, only use AI in domains where unsafe actions are impossible by construction. Yudkowsky: You can run into what we call "The Valley of Bad Rationality." Its easy to see how to constrain limited programs (eg. Eliezer, thanks for doing this! He's been flitting around for the past few years, Cassandra-like, insisting that their plans will explode and they are doomed. Because it seems to me that faults of this kind in the AI design is likely to be caught by the designers earlier. "I want to live one more day. Were going to be facing down an unalignable AGI and the current state of transparency is going to be well look at this interesting visualized pattern in the attention of the key-value matrices in layer 47 when what we need to know is okay but was the AGI plotting to kill us or not. Eliezer Yudkowsky Experts explain that GPT-4 developed these capabilities by ingesting massive amounts of data, and most say these tools do not have a humanlike understanding of the meaning behind the text. I still feel the pain, and the anger, and the emptiness, and the helplessness. Knowledge awaits. And then it ends up making no difference because your civilization failed to solve the AI alignment problem, and all the children you saved with those malaria nets grew up only to be killed by nanomachines in their sleep. We're just well-matched.). I dont think DM and OpenAI are publishing everything the not going to be published part doesnt seem like a big barrier to me. In the same way, suppose that you take weak domains where the AGI cant fool you, and apply some gradient descent to get the AGI to stop outputting actions of a type that humans can detect and label as manipulative. That would depend on how deep your earlier patch was. And then there is, so far as I can tell, a vast desert full of work that seems to me to be mostly fake or pointless or predictable. But on March 22 an open letter was published, signed by a variety of luminaries including Steve Wozniak, Yuval Harari, Elon Musk, Andrew Yang and others more closely tied to artificial intelligence ventures. That's why I write about human rationality in the first place - if you push your grasp on machine intelligence past a certain point, you can't help but start having ideas about how humans could think better too. I dont see being able to take anything remotely like, say, Mu Zero, and being able to prove any theorem about it which implies anything like corrigibility or the system not internally trying to harm humans. Querying your own human brain works fine, as an adaptive instinct, if you need to predict other humans. : https://www.youtube.com/watch?v=2RAG5-L9R70 It looks especially amenable to interpretability, formal specification, and proofs of properties. By Derek Thompson Matt Chase / The Atlantic February 27, 2023 Saved Stories This is Work in Progress, a newsletter by Derek. Im curious if the grim outlook is currently mainly due to technical difficulties or social/coordination difficulties. You seem to be visualizing that we prove a theorem and then get a theorem-like level of assurance that the system is safe. The argument: For years, ethicists have warned about problems with larger AI models, including outputs that are biased against race and gender, an explosion of synthetic media that may damage the information ecosystem, and the impact of AI that sounds deceptively human. doing work that is predictably not going to be really useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written, I think youre underestimating the value of solving small problems. An AI could clearly be good at manipulating humans, while not manipulating its creators or the directives of its creators. WE care! The better-specified the tasks you automate are, the easier it is to secure the boxes.). Unless we postulate that I have literally magical powers or an utterly unshakeable regime, I don't see how any law I could reasonably decree could delay AItimelines for very long on a planet where computers are already ubiquitous. The formalization of those arguments should be one direct short step. The mission of the Machine Intelligence Research Institute is to do today that research which, 30 years from now, people will desperately wish had begun 30 years earlier. - I don't think you can time AI with Moore's Law. AI outputs increasingly become the foundation for other AI outputs, and human knowledge is lost. Other troubling issues are less mysterious: ChatGPT can design DNA and proteins that put the biological weapons of the past to shame, and will do so without compunction. Dark Lord's Answer by Eliezer Yudkowsky | Goodreads For one thing, I dont expect to need human-level compute to get human-level intelligence, and for another I think theres a decent chance that insight and innovation have a big role to play, especially on 50 year timescales. just to clarify, that sounds like a large scale coordination difficulty to me (i.e., we as all of humanity cant coordinate to not build that AGI). I don't actually know what the final hours will be like and whether nanomachines will be involved. I havent ruled RR out yet. I agree reality has not been hugging the Robin kind of scenario this far. Is your take more on the too hard to build side? Let's conservatively set the prior probability of the Book of Mormon at one to a billion (against). Doomsday to utopia: Meet AI's rival factions - The Washington Post Twenty years ago, a young artificial intelligence researcher named Eliezer Yudkowsky ran a series of low-stakes thought experiments with fellow researchers on internet relay chat servers. Musk brought the concept of AGI to OpenAIs other co-founders, like CEO Sam Altman. Still, the concept might have stayed on the margins if not for the same wealthy tech investors interested in the outer limits of AI. Musk invested in DeepMind and introduced the company to Google co-founder Larry Page. Whats that delay for, if we all die at the end? Many argue that the apocalypse narrative overstates AIs capabilities, helping companies market the technology as part of a sci-fi fantasy. To live significantly past a googolplex years without repeating yourself, you need computing structures containing more than a googol elements, and those won't fit inside a single Hubble volume. AI capability control - Wikipedia This never works. How does nobody actually wants an unaligned AGI fail here? How, then, can we align that which we cannot understand? Theres a natural/convergent/coherent output of deep underlying algorithms that generate competence in some of the original domains; when those algorithms are implicitly scaled up, they seem likely to generalize better than whatever patch on those algorithms said 2 + 2 = 5. Guest Posts I think Im not starting with a general superintelligence here to get the trustworthy nanodesigns. If you don't want to get disassembled for spare atoms, you can, if you understand the design space well enough, reach in and pull out aparticularmachine intelligence that doesn't want to hurt you. The subgroups can be fairly fluid, even when they appear contradictory and insiders sometimes disagree on basic definitions. Yudkowsky: It'd be very surprising if college wereunderrated,given the social desirability bias of endorsing college. Horgan: Will superintelligences solve the hard problem of consciousness? Yudkowsky: Yes, but they won't have the illusion of free will. The signatures number more than 30,000 now. And if that utility function is learned from a dataset and decoded only afterwards by the operators, that sounds even scarier. I wouldnt get to say but all the following things should have happened first, before I made that observation. You don't need to be an expert in bird biology, but at the same time, it's difficult to know enough to build an airplane without realizingsomehigh-level notion of how a bird might glide or push down air with its wings. Clearly, we are touching the edges of AGI with GPT and the like. Even if the technology proliferates and the world ends a year later when other non-coordinating parties jump in, its still better to take the route where the world ends one year later instead of immediately. (Ignorance implies a wide credibility interval, not being certain that something is far away.). The creator of any system has an argument as to why its behavior does what they think it will and why it wont do bad or dangerous things. Decision theorist Eliezer Yudkowsky spells out his idiosyncratic vision of the Singularity. But at least his words give me a measure of strength. Similar systems are likely to soon be able to rapidly prove all simple true theorems (eg. The first is that we determine not to allow AI autonomous physical agency in the real world. I also think that the speedup step in iterated amplification and distillation will introduce places where the fast distilled outputs of slow sequences are not true to the original slow sequences, because gradient descent is not perfect and wont be perfect and its not clear well get any paradigm besides gradient descent for doing a step like that. On an entirely separate issue, it's possible that being an ideal Bayesian agent is ultimately incompatible with living the life best-lived from a fun-theoretic perspective. We are not on course to be prepared in any reasonable time window. I dont know however if I should be explaining at this point why manipulate humans is convergent, why conceal that you are manipulating humans is convergent, why you have to train in safe regimes in order to get safety in dangerous regimes (because if you try to train at a sufficiently unsafe level, the output of the unaligned system deceives you into labeling it incorrectly and/or kills you before you can label the outputs), or why attempts to teach corrigibility in safe regimes are unlikely to generalize well to higher levels of intelligence and unsafe regimes (qualitatively new thought processes, things being way out of training distribution, and, the hardest part to explain, corrigibility being anti-natural in a certain sense that makes it incredibly hard to, eg, exhibit any coherent planning behavior (consistent utility function) which corresponds to being willing to let somebody else shut you off, without incentivizing you to actively manipulate them to shut you off). The Universe is neither evil, nor good, it simply does not care. On some level, it's harder to be fooled if you just realize on a gut levelthat there is math,that there issomemath you'd do to arrive at the exact strength of the evidence and whether it sufficed to lift the prior improbability of the hypothesis. Here is a quick guide to decoding the ideologies (and financial incentives) behind the factions: The argument: The phrase AI safety used to refer to practical problems, like making sure self-driving cars dont crash. What does trying to die with more dignity on the mainline look like? If expected to be achievable, why? Eliezer Yudkowsky, Author at Machine Intelligence Research Institute In some sections here, I sound gloomy about the probability that coordination between AGI groups succeeds in saving the world. Asking "what would superintelligences want" is a Wrong Question. So you have people who say, for example, that we'll only be able to improve AI up to the human level because we're human ourselves, and then we won't be able to push an AI past that. Horgan: Ive described the Singularity as an escapist, pseudoscientific fantasy that distracts us from climate change, war, inequality and other serious problems. Crucial research demonstrating the failures of this type of AI, as well as ways to mitigate the problems, are often made by scholars of color many of them Black women, and underfunded junior scholars, researchers Abeba Birhane and Deborah Raji wrote in an op-ed for Wired in December. Similarly, I confidently forecast an intelligence explosion. I developed many algorithms and data structures to avoid that waste years ago (eg. And maybe somewhere down the line is someone who faces the prospect of their future self not existing at all, and they might be very sad about that; but I'm not sure I can imagine who that person will be. Researchers also found that asking ChatGPT for advice on a moral issue, such as the famous trolley dilemma, corrupts rather than improves its users moral judgment, which will certainly be an issue with human-in-the-loop weapons systems. ), 4) Mathematical proof is cheap to mechanically check (eg. Many have ties to communities like effective altruism, a philosophical movement to maximize doing good in the world. Yudkowsky: Yes, and in retrospect the answer will look embarrassingly obvious from our perspective. Inadequate Equilibria: Where and How Civilizations Get Stuck arent oriented around precise formal specification or provably guaranteed constraints. You have people who say, for example, that it should require more and more tweaking to get smarter algorithms and that human intelligence is around the limit. Self-described decision theorist Eliezer Yudkowsky, co-founder of the nonprofit Machine Intelligence Research Institute (MIRI), went further: AI development needs to be shut down worldwide, he. If, after reading Nanosystems, you still dont think that a superintelligence can get to and past the Nanosystems level, Im not quite sure what to say to you, since the models of superintelligences are much less concrete than the models of molecular nanotechnology., Im not sure if this is directed at me or the https://en.wikipedia.org/wiki/Generic_you, but Im only expressing curiosity on this point, not skepticism . Im fine with your using my name (Steve Omohundro) in any discussion of these. And right now, that schism is playing out online between two people: AI theorist Eliezer Yudkowsky and OpenAI Chief Executive Officer Sam Altman. Theyre just extremely excited about building software that reaches artificial general intelligence, or AGI, a term for AI that is as smart and as capable as a human. lab that Google acquired in 2014. Wouldnt that fear tend to be channeled into ah, yes, it must be a government project, theyre the good guys and then the government is much more hopeless and much harder to improve upon than Deepmind? This isnt to say that there arent AI systems that wouldnt. 7) We can build automated checkers for these provable safe-AI limits. And if there were a culturally loaded suitcase term 'robotruckism' that included a lot of specific technological claims along with whole economic and sociological paradigms, I'd be hesitant to say I 'believed in' driverless trucks. So you must have some prior belief about the superintelligence being aligned before you dared to look at the arguments. He is also the author of a popular fan fiction series, Harry Potter and the Methods of Rationality, an entry point for many young people into these online spheres and ideas around AI. Some people will be escapist regardless of the true values on the hidden variables of computer science, so observing some people being escapist isn't strong evidence, even if it might make you feel like you want to disaffiliate with a belief or something. Look up the fast-growing hierarchy if you really want to have your mind blown, well, eternity is longer than that. I definitely foresee a whole lot of dead ends that others dont, yes. Copy and paste from Singapore's healthcare setup. We'd still be working to an unknown deadline, and I wouldn't feel relaxed at that point. (Thank you to Ben Weinstein-Raun for building chathamroom.com, and for quickly adding some features to it at my request.). His writings (such as this essay, which helped me grok, or gave me the illusion of grokking, Bayess Theorem) exude the arrogance of the autodidact, edges undulled by formal education, but thats part of his charm. Papers Theorem proving? Posted April 7, 2023 by Eliezer Yudkowsky & filed under Analysis. Heres a nice 3 hour long tutorial about probabilistic circuits which is a representation of probability distributions, learning, Bayesian inference, etc. Replace minimum wages with negative wage taxes. But when I think of a case like this, I imagine trying to get the world to a condition where some unemployed person can offer to drive you to work for 20 minutes, be paid five dollars, and then nothing else bad happens to them. Why think the brain's software is closer to optimal than the hardware? Its been very unpleasantly surprising to me how little architectural complexity is required to start producing generalizing systems, and how fast those systems scale using More Compute. Only weird and frankly terrifying anthropic theories would let you live long enough to gaze, perhaps knowingly and perhaps not, upon the halting of the longest-running halting Turing machine with 100 states. When human brains try to do things, they can run into some very strange problems. Acts, choices, policies can be stupidgivensome set of preferences over final states of the world. It is not standing from within your own preference framework and choosing blatantly mistaken acts, nor is it standing within your meta-preference framework and making mistakes about what to prefer. Living longer than, say, a googolplex years, requires us to be wrong aboutthe basic character of physical law,not just the details. Manipulating humans is definitely an instrumentally useful kind of method for an AI, for a lot of goals. Eliezer Yudkowsky on Twitter: "RT @AISafetyMemes: AGI is not the first Big problems are solved by solving many small problems. So, even if I got run over by a truck tomorrow, I would still very much wish that in the world that survived me, Deepmind would have lots of penalty-free affordance internally for people to not publish things, and to work in internal partitions that didnt spread their ideas to all the rest of Deepmind. Why its taken this long, I have no idea. Is a crux here that you think nanosystem design requires superintelligence? Center for Humane Technology co-founder Tristan Harris, who once campaigned about the dangers of social media and has now turned his focus to AI, cited the study prominently. Systems like ChatGPT have the potential for problems that go beyond subverting the need for humans to store knowledge in their own brains. Former White House policy adviser Suresh Venkatasubramanian, who helped develop the blueprint for an AI Bill of Rights, told VentureBeat that recent exaggerated claims about ChatGPTs capabilities were part of an organized campaign of fearmongering around generative AI that detracted from stopped work on real AI issues. We may never fully understand it we certainly do not now, and it is in its infancy. Maybe! Building safe artificial intelligence is crucial to secure those eventual lives. Thanks for your perspective! Ukraine President Zelenskiy hospitalized due to COVID-19 - NBC News Horgan: Can we create superintelligences without knowing how our brains work? Yudkowsky cofounded and works at the Machine Intelligence Research Institute (formerly the Singularity Institute for Artificial Intelligence), a nonprofit organization that concerns itself with the concept known as the singularity. Maybe a bit. While I am not yet a doomer, only a gloomer, its worth noting that economist Bryan Caplan, whose forte is placing successful bets based on his predictions of current trends, has a bet withYudowsky about whether humanity will be wiped off the surface of the Earth by Jan. 1, 2030. Of course, the chatbot could just have easily encouraged him to kill others, and the idea that AI might groom susceptible minds to become terrorists is clearly not far-fetched. Eliezer Yudkowsky on Twitter This was a difficult listen. If the builders are sufficiently worried about that scenario that they push too fast too early, in fear of an arms race developing very soon if they wait, again, everybody dies. [2] Yudkowsky . Id personally love more direction on where to focus my efforts (obviously you can only say things generic to the group). I think there will be signs beforehand. Some adherents also subscribe to a philosophy called longtermism that looks at maximizing good over millions of years. The AI-box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to . There has been a lot of AI progress recently. "Imagine," they said, "some being with such intense consciousness, intellect, and emotion that it would be morally better to destroy an entire ant colony than to let that being suffer so much as a sprained ankle." Humans are such a being. It's conserved even relative to chimpanzee brain design. Eliezer Yudkowsky is a researcher, writer, and advocate for artificial intelligence safety. The reaction may more be that the fear of the public is a big powerful uncontrollable thing that doesnt move in the smart direction maybe the public fear of AI gets channeled by opportunistic government officials into and thats why We must have Our AGI first so it will be Good and we can Win. 14) I dont see any fundamental obstructions to any of these. OpenAI CEO Sam Altman's Twitter Feud With AI Doomer Eliezer Yudkowsky But given that I do think the first conceivable world is just a fond dream, it should be clear why I don't think we should ignore a problem we'll predictably have to panic about later. In the same way that a carcan'trun without dissipating entropy, you simplydon'tgetan accurate map of the world without a process that hasBayesian structureburied somewhere inside it, even if the process doesn't explicitly represent probabilities or likelihood ratios. It's much wiser to just say "Oops", admit you were not even a little right, swallow the whole bitter pill in one gulp, and get on with your life. The gap between AI systems then and AI systems now seems pretty plausibly greater than the remaining gap, even before accounting the recent dramatic increase in the rate of progress, and potential future increases in rate-of-progress as it starts to feel within-grasp. Yudkowsky: No. Maybe more current AGI groups can be persuaded to go closed; or, if more than one has an AGI, to coordinate with each other and not rush into an arms race. My optimism stems from the belief that many of the socially important things we need AI for wont require anything near that unconstrained edge. Harry Potter and the Methods of Rationality - Wikipedia Could be.. If you understand Bayes's Rule you can see at once that the improbability of the evidence is not commensurate with the improbability of the hypothesis it's trying to lift. Can We Improve Predictions? But I would not recommend to people that they obsess over that possibility too much. The properties that can be proven just arent related to safety, no matter how many times you prove an error bound on the floating-point multiplications. For more on his background and interests, see his personal website or the site of the Machine Intelligence Research Institute, which he-cofounded. I imagine lack of public support for genetic manipulation of humans has slowed that research by more than three months, would it end up impossible to run a project that could make use of an alignment miracle, because everybody was afraid of that project?. [23] Eliezer Yudkowsky on if Humanity can Survive AI - YouTube By default, all other participants are anonymized as Anonymous. Systems with learning and statistical inference add more challenges but nothing that seems in-principal all that difficult. That you can't just make stuff up and believe what you want to believe because thatdoesn't work. We have never before encountered an intelligence that is so alien. I'm aware that in trying to convince people of that, I'm swimming uphill against a sense of eternal normality - the sense that this transient and temporary civilization of ours that has existed for only a few decades, that this species of ours that has existed for only an eyeblink of evolutionary and geological time, is all that makes sense and shall surely last forever. I mention this as context for my reply, which is, "Why the heck are you tacking on the 'cyborg' detail to that? Everyone elses responses are anonymous (not pseudonymous) and neither I nor MIRI will know which potential invitee sent them. I mean, I suspect that Eliezer fooms if you give an Eliezer the ability to backup, branch, and edit himself. Relative to remotely plausible levels of future coordination, we have a technical problem. Andrew Critch reminds me to point out that gloominess like this can be a self-fulfilling prophecy if people think successful coordination is impossible, they wont try to coordinate. Whats the Biggest Science News? It could be that, (A), self-improvements of size delta tend to make the AI sufficiently smarter that it can go back and find new potential self-improvements of size k*delta and that k is greater than 1, and this continues for a sufficiently extended regime that there's a rapid cascade of self-improvements leading up to superintelligence; what I. J. But its really terrible in every aspect except that it makes it easy for machine learning practitioners to quickly slap something together which will actually sort of work sometimes. There's also a conceivable world where you work hardand fight malaria, where you work hard and keep the carbon emissions to not much worse than they are already (or use geoengineering to mitigate mistakes already made). November 11, 2021 | Rob Bensinger | Analysis, Conversations, MIRI Strategy. They cite a thought experiment from Nick Bostroms book Superintelligence, which imagines a safe superhuman AI could enable humanity to colonize the stars and create trillions of future people. The Race to Control A.I. Before We Reach Singularity - Popular Mechanics I agree with most of your specific points but I seem to be much more optimistic than you about a positive outcome.