{"id":267222,"date":"2024-12-21T01:10:27","date_gmt":"2024-12-21T01:10:27","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2024\/12\/21\/openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi\/"},"modified":"2025-06-25T17:09:58","modified_gmt":"2025-06-25T17:09:58","slug":"openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2024\/12\/21\/openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi\/","title":{"rendered":"OpenAI&#8217;s o3 model aced a test of AI reasoning \u2013 but it&#8217;s still not AGI"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"\">\n<figure class=\"ArticleImage\">\n<div class=\"Image__Wrapper\"><img fetchpriority=\"high\" decoding=\"async\" class=\"Image\" width=\"1350\" height=\"900\" alt=\"\" src=\"https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg\" sizes=\"(min-width: 1288px) 837px, (min-width: 1024px) calc(57.5vw + 55px), (min-width: 415px) calc(100vw - 40px), calc(70vw + 74px)\" srcset=\"https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=300 300w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=400 400w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=500 500w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=600 600w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=700 700w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=800 800w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=837 837w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=900 900w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1003 1003w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1100 1100w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1200 1200w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1300 1300w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1400 1400w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1500 1500w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1600 1600w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1674 1674w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1700 1700w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1800 1800w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=1900 1900w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/20192226\/SEI_233869500.jpg?width=2006 2006w\" loading=\"eager\" fetchpriority=\"high\" data-image-context=\"Article\" data-image-id=\"2462001\" data-caption=\"OpenAI announced a breakthrough achievement for its new o3 AI model\" data-credit=\"Rokas Tenys \/ Alamy\"\/><\/div><figcaption class=\"ArticleImageCaption\">\n<div class=\"ArticleImageCaption__CaptionWrapper\">\n<p class=\"ArticleImageCaption__Title\">OpenAI announced a breakthrough achievement for its new o3 AI model<\/p>\n<p class=\"ArticleImageCaption__Credit\">Rokas Tenys \/ Alamy<\/p>\n<\/div>\n<\/figcaption><\/figure>\n<\/p>\n<p>OpenAI\u2019s new o3 artificial intelligence model has achieved a breakthrough high score on a <a href=\"https:\/\/www.newscientist.com\/article\/2437029-1m-prize-for-ai-that-can-solve-puzzles-that-are-simple-for-humans\/\">prestigious AI reasoning test<\/a> called the ARC Challenge, inspiring some AI fans to speculate that o3 has achieved <a href=\"https:\/\/www.newscientist.com\/article\/mg26234921-600-what-is-artificial-general-intelligence-and-is-it-a-useful-concept\/\">artificial general intelligence<\/a> (AGI). But even as ARC Challenge organisers described o3\u2019s achievement as a major milestone, they also cautioned that it has not won the competition\u2019s grand prize \u2013 and it is only one step on the path towards AGI, a term for hypothetical future AI with human-like intelligence.<\/p>\n<p>The o3 model is the latest in a line of AI releases that follow on from the large language models powering ChatGPT. \u201cThis is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models,\u201d said <a href=\"https:\/\/fchollet.com\/\">Fran\u00e7ois Chollet<\/a>, an engineer at Google and the main creator of the ARC Challenge, in a <a href=\"https:\/\/arcprize.org\/blog\/oai-o3-pub-breakthrough\">blog post<\/a>.<\/p>\n<h2>What did OpenAI\u2019s o3 model actually do?<\/h2>\n<p><span class=\"js-content-prompt-opportunity\"\/><\/p>\n<p>Chollet designed the <a href=\"https:\/\/arcprize.org\/\">Abstraction and Reasoning Corpus<\/a> (ARC) Challenge in 2019 to test how well AIs can find correct patterns linking pairs of coloured grids. Such visual puzzles are intended to make AIs demonstrate a form of general intelligence with basic reasoning capabilities. But throwing enough computing power at the puzzles could let even a non-reasoning program simply solve them through brute force. To prevent this, the competition also requires official score submissions to meet certain limits on computing power.<\/p>\n<p>OpenAI\u2019s newly announced o3 model \u2013 which is scheduled for release in early 2025 \u2013 achieved its official breakthrough score of 75.7 per cent on the ARC Challenge\u2019s \u201csemi-private\u201d test, which is used for ranking competitors on a public leaderboard. The computing cost of its achievement was approximately $20 for each visual puzzle task, meeting the competition\u2019s limit of less than $10,000 total. However, the harder \u201cprivate\u201d test that is used to determine grand prize winners has an even more stringent computing power limit, equivalent to spending just 10 cents on each task, which OpenAI did not meet.<\/p>\n<p>The o3 model also achieved an unofficial score of 87.5 per cent by applying approximately 172 times more computing power than it did on the official score. For comparison, the typical human score is 84 per cent, and an 85 per cent score is enough to win the ARC Challenge\u2019s $600,000 grand prize \u2013 if the model can also keep its computing costs within the required limits.<\/p>\n<p>But to reach its unofficial score, o3\u2019s cost soared to thousands of dollars spent solving each task. OpenAI requested that the challenge organisers not publish the exact computing costs.<\/p>\n<h2>Does this o3 achievement show that AGI has been reached?<\/h2>\n<p>No, the ARC challenge organisers have specifically said they do not consider beating this competition benchmark to be an indicator of having achieved AGI.<\/p>\n<p>The o3 model also failed to solve more than 100 visual puzzle tasks, even when OpenAI applied a very large amount of computing power toward the unofficial score, said Mike Knoop, an ARC Challenge organiser at software company Zapier, in a social media <a href=\"https:\/\/x.com\/mikeknoop\/status\/1870172132136931512?t=1gMYo59NwGF3CYfcNYiquw&amp;s=19\">post<\/a> on X.<\/p>\n<p>In a social media <a href=\"https:\/\/bsky.app\/profile\/melaniemitchell.bsky.social\/post\/3ldqzeq23mk22\">post<\/a> on Bluesky, <a href=\"https:\/\/melaniemitchell.me\/\">Melanie Mitchell<\/a> at the Santa Fe Institute in New Mexico said the following about o3\u2019s progress on the ARC benchmark: \u201cI think solving these tasks by brute-force compute defeats the original purpose\u201d.<\/p>\n<p>\u201cWhile the new model is very impressive and represents a big milestone on the way towards AGI, I don\u2019t believe this is AGI \u2013 there\u2019s still a fair number of very easy [ARC Challenge] tasks that o3 can\u2019t solve,\u201d said Chollet in another X <a href=\"https:\/\/x.com\/fchollet\/status\/1870170778458828851\">post<\/a>.<\/p>\n<p>However, Chollet described how we might know when human-level intelligence has been demonstrated by some form of AGI. \u201cYou\u2019ll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible,\u201d he said in the blog post.<\/p>\n<p><a href=\"https:\/\/web.engr.oregonstate.edu\/~tgd\/\">Thomas Dietterich<\/a> at Oregon State University suggests another way to recognise AGI. \u201cThose architectures claim to include all of the functional components required for human cognition,\u201d he says. \u201cBy this measure, the commercial AI systems are missing episodic memory, planning, logical reasoning and, most importantly, meta-cognition.\u201d<\/p>\n<h2>So what does o3\u2019s high score really mean?<\/h2>\n<p>The o3 model\u2019s high score comes as the tech industry and AI researchers have been reckoning with a <a href=\"https:\/\/www.newscientist.com\/article\/mg26435210-600-the-shine-began-to-wear-off-ai-in-2024-as-advances-slowed-down\/\">slower pace of progress<\/a> in the latest AI models for 2024, compared with the initial explosive developments of 2023.<\/p>\n<p>Although it did not win the ARC Challenge, o3\u2019s high score indicates that AI models could beat the competition benchmark in the near future. Beyond its unofficial high score, Chollet says many official low-compute submissions have already scored above 81 per cent on the private evaluation test set.<\/p>\n<p>Dietterich also thinks that \u201cthis is a very impressive leap in performance\u201d. However, he cautions that, without knowing more about how OpenAI\u2019s <a href=\"https:\/\/www.newscientist.com\/article\/2447972-openais-warnings-about-risky-ai-are-mostly-just-marketing\/\">o1<\/a> and o3 models work, it is impossible to evaluate just how impressive the high score is. For instance, if o3 was able to practise the ARC problems in advance, then that would make its achievement easier. \u201cWe will need to await an open-source replication to understand the full significance of this,\u201d says Dietterich.<\/p>\n<p>The ARC Challenge organisers are already looking to launch a second and more difficult set of benchmark tests sometime in 2025. They will also keep the ARC Prize 2025 challenge running until someone achieves the grand prize and open-sources their solution.<\/p>\n<section class=\"ArticleTopics\">\n<p class=\"ArticleTopics__Heading\">Topics:<\/p>\n<ul class=\"ArticleTopics__List\">\n<li class=\"ArticleTopics__ListItem\"><a class=\"ArticleTopics__ListItemLink\" href=\"https:\/\/www.newscientist.com\/article-topic\/artificial-intelligence\/\" data-analytics-hook=\"topics-link\">artificial intelligence<\/a><span>\/<\/span><\/li>\n<li class=\"ArticleTopics__ListItem\"><a class=\"ArticleTopics__ListItemLink\" href=\"https:\/\/www.newscientist.com\/article-topic\/ai\/\" data-analytics-hook=\"topics-link\">AI<\/a><\/li>\n<\/ul>\n<\/section><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.newscientist.com\/article\/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi\/?utm_campaign=RSS%7CNSNS&#038;utm_source=NSNS&#038;utm_medium=RSS&#038;utm_content=home\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] OpenAI announced a breakthrough achievement for its new o3 AI model Rokas Tenys \/ Alamy OpenAI\u2019s new o3 artificial intelligence model has achieved a<\/p>\n","protected":false},"author":1,"featured_media":267223,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[177],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/267222"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=267222"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/267222\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/267223"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=267222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=267222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=267222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}