AIDeveloper https://www.webpronews.com/developer/aideveloper/ Breaking News in Tech, Search, Social, & Business Thu, 10 Oct 2024 18:01:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://i0.wp.com/www.webpronews.com/wp-content/uploads/2020/03/cropped-wpn_siteidentity-7.png?fit=32%2C32&ssl=1 AIDeveloper https://www.webpronews.com/developer/aideveloper/ 32 32 138578674 Nobel Prize Winner Geoffrey Hinton Proud Ilya Sutskever Fired Sam Altman https://www.webpronews.com/nobel-prize-winner-geoffrey-hinton-proud-ilya-sutskever-fired-sam-altman/ Thu, 10 Oct 2024 18:00:59 +0000 https://www.webpronews.com/?p=609353 Dr. Geoffrey Hinton, widely considered the “Godfather of AI,” says he is particularly proud of former student Ilya Sutskever for firing OpenAI CEO Sam Altman in 2023.

Sutskever was one of several OpenAI board members who led a coup against Altman in 2023, ousting him from the company. Pressure, from both inside and outside the company, ultimately led to Altman’s return, with Sutskever eventually leaving himself.

At the time of Altman’s ouster, reports indicated that Sutskever and the other board members were concerned that Altman was straying too far from OpenAI’s primary goal of safe AI development. The board felt Altman was pursuing profit at the expense of safety, a narrative that has been repeated by other executives who have left the company in recent months.

Hinton is the latest to lend weight those concerns. In a video post following his Nobel Prize win, Hinton touted the students he had over the years, particularly calling out Sutskever.

“I’d also like to acknowledge my students,” Hinton says in the video. “I was particularly fortunate to have many very clever students, much clever than me, who actually made things work. They’ve gone on to do great things.

“I’m particularly proud of the fact that one of my students fired Sam Altman, and I think I better leave it there and leave it for questions.”

Hinton then goes on to describe why Sutskever was involved in firing Altman.

“So OpenAI was set up with a big emphasis on safety,” he continues. “Its primary objective was to develop artificial general intelligence and ensure that it was safe.

“One of my former students Ilya Sutskever, was the chief scientist. And over time, it turned out that Sam Altman. Was much less concerned with safety than with profits. And I think that’s unfortunate.”

Hinton has long been a vocal advocate for need to develop AI with safety concerns front and center. He previously worked on AI at Google, before leaving the company and sounding the alarm over its rushed efforts to catch up with OpenAI and Microsoft.

Since leaving Google, Hinton has warned of the danger AI poses, saying efforts need to be taken to ensure it doesn’t gain the upper hand.

“The idea that this stuff could actually get smarter than people — a few people believed that,” Dr. Hinton said. “But most people thought it was way off. And I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that.

“I don’t think they should scale this up more until they have understood whether they can control it,” he added.

]]>
609353
How AI-Driven Amazon Q Developer Streamlines Code, Testing, and Security https://www.webpronews.com/how-ai-driven-amazon-q-developer-streamlines-code-testing-and-security/ Thu, 03 Oct 2024 13:49:32 +0000 https://www.webpronews.com/?p=609165 As development teams face increasing pressure to deliver high-quality code rapidly, tools that help streamline processes are becoming essential. Amazon Q Developer, an AI-powered assistant from AWS, is one such tool that promises to transform the development landscape by automating tasks such as code comprehension, testing, and debugging, while enhancing overall productivity.

In a recent demonstration, Betty Zheng, Senior Developer Advocate at AWS, showcased the potential of Amazon Q Developer to optimize various development tasks, offering a glimpse of what AI-driven development can achieve for developers working on cloud-native applications.

Catch our conversation on AI-Driven Amazon Q Developer!

 

Understanding Complex Code with Amazon Q Developer

One of the standout features of Amazon Q Developer is its ability to comprehend and summarize code in ways that allow developers to quickly grasp the architecture of new projects. Developers often face the challenge of onboarding into large, unfamiliar codebases, but Amazon Q mitigates this by parsing complex files like pom.xml and generating clear, actionable summaries. As Zheng points out, “Amazon Q helps us quickly understand the project metadata, dependencies, and build configurations in a matter of seconds.”

In her demonstration, Zheng explains how Amazon Q integrates seamlessly with popular IDEs such as VS Code and JetBrains, providing real-time explanations of the code at hand. For example, when inspecting a Spring Framework-based application, developers can simply highlight a section of code and ask Amazon Q to explain it. “This helps reduce the cognitive load on developers and allows them to focus on building and improving the application,” says Zheng.

The ability to break down complex code into simple, understandable steps is particularly useful when collaborating across teams. Amazon Q’s conversational AI can generate documentation on the fly, creating comments or JavaDoc strings for public methods. As Zheng illustrates, this feature significantly reduces the time needed for documentation, enhancing collaboration between team members.

Automated Debugging and Unit Testing

Debugging and testing are integral but time-consuming parts of software development. Amazon Q accelerates these tasks by identifying bugs, suggesting fixes, and even generating unit tests to ensure code quality. Zheng demonstrates how Amazon Q spotted an issue in a word-guessing game application, where the word selection was not functioning as expected. “By simply sending the problem code to Amazon Q, the tool provided a corrected version of the function, which we could immediately test and deploy,” Zheng explains.

The automated generation of unit tests is another powerful capability. Amazon Q creates comprehensive test cases to verify the correctness of functions, which not only improves code reliability but also boosts developer productivity by eliminating the need for manual test creation. “Unit testing is essential, but it can be a tedious task. With Amazon Q, we can generate these tests much more efficiently, ensuring higher code quality without slowing down the development process,” adds Zheng.

Additionally, Amazon Q enables continuous feedback during the development process by performing security scans. As Zheng notes, “The AI detects potential vulnerabilities and suggests fixes, ensuring that developers are writing secure code from the start.” This early detection of security risks helps teams maintain secure code without waiting until later stages of development when the cost of fixing issues is higher.

Streamlined Feature Development with Natural Language

Perhaps one of the most transformative features of Amazon Q Developer is its ability to take natural language input and translate it into functional code. In her demo, Zheng illustrates how developers can simply describe a new feature in plain English—such as adding a difficulty selection to the word-guessing game—and Amazon Q will automatically break down the request into logical steps. “The tool follows existing code patterns, reuses code where appropriate, and generates the necessary code to implement the new feature,” Zheng explains.

This capability allows teams to iterate quickly on new ideas without getting bogged down in the details of implementation. By interacting with Amazon Q using natural language, developers can go from concept to deployment in a fraction of the time it would take using traditional methods. As Zheng puts it, “You can build and test new features without leaving your IDE, making the entire development process more fluid and efficient.”

Improving Code Quality and Security

In addition to streamlining development tasks, Amazon Q helps improve overall code quality and security. Its real-time code scanning capabilities allow it to identify inefficiencies and potential vulnerabilities as developers write code. Zheng demonstrated how the tool scans for common security issues, offers best practices for remediation, and provides detailed explanations of the detected problems.

The value of this continuous scanning cannot be overstated. Longer feedback loops, especially when it comes to security issues, can lead to costly context-switching for developers. Amazon Q eliminates these delays by providing immediate feedback within the IDE, ensuring that developers can address issues as they arise rather than waiting until a formal code review or testing phase.

Moreover, Amazon Q ensures that developers are always working with the latest, most secure versions of their dependencies by automating package upgrades. This feature is especially critical for teams managing large projects with numerous dependencies, as it helps mitigate risks associated with outdated or vulnerable packages.

AI-Driven Development is Just Getting Started

Amazon Q Developer exemplifies the direction in which modern development workflows are headed. By leveraging AI, Amazon Q enhances every stage of the development lifecycle—from code comprehension and debugging to feature creation and security optimization. As Zheng highlights, “It turns tasks that would have taken days into actions that can be completed in just a few minutes.”

The implications for development teams are profound. With AI handling much of the heavy lifting, developers can focus on innovation and strategy rather than getting bogged down in routine tasks. This acceleration in the development process not only reduces time to market but also improves code quality, security, and maintainability.

In a fast-paced, competitive landscape, tools like Amazon Q Developer will be essential for teams looking to stay ahead. Whether you’re working on cloud-native applications or complex enterprise solutions, the integration of AI into your workflow can provide a critical advantage. Amazon Q Developer is leading this charge, demonstrating that AI-driven development is not a distant future—it’s happening now.

]]>
609165
The Unstoppable Rise of OpenAI’s o1 Models—And Why Experts Are Worried https://www.webpronews.com/the-unstoppable-rise-of-openais-o1-models-and-why-experts-are-worried/ Sat, 21 Sep 2024 11:05:04 +0000 https://www.webpronews.com/?p=608660 OpenAI’s newest release of the o1 models is nothing short of a game-changer in the artificial intelligence (AI) landscape. With capabilities far beyond anything seen before, these models are poised to revolutionize industries like healthcare, finance, and education. But along with these extraordinary abilities come serious questions about potential risks, including concerns over AI safety and the implications of wielding such power without sufficient oversight.

Tech executives across sectors are watching these developments closely, as the o1 models represent a significant leap in AI’s ability to handle complex reasoning tasks. However, the models also challenge established notions about the future of AI governance and raise questions about the ethical implications of deploying such powerful technology.

Listen to our conversation on the rise of OpenAI’s o1 models. Should you be worried?

 

The Unprecedented Capabilities of the o1 Models

The o1 series, which includes the o1-preview and o1-mini models, is a significant breakthrough in generative AI. As Timothy B. Lee, an AI journalist with a master’s in computer science, noted in a recent article, “o1 is by far the biggest jump in reasoning capabilities since GPT-4. It’s in a class of its own.” These models have demonstrated the ability to solve complex reasoning problems that were previously beyond the reach of earlier iterations of AI.

One of the most impressive aspects of the o1 models is their ability to handle multi-step reasoning tasks. For example, the models excel at breaking down complex programming problems into manageable steps, as OpenAI demonstrated during the launch event. By thinking step-by-step, the o1-preview model can solve intricate problems in fields like computer programming and mathematics, offering solutions far faster and with more accuracy than previous models.

This improvement is largely due to OpenAI’s use of reinforcement learning, which teaches the model to “think” through problems and find solutions in a more focused, precise manner. The shift from imitation learning, which involved mimicking human behavior, to reinforcement learning has allowed o1 to excel where other models struggle, such as in logic-heavy tasks like writing bash scripts or solving math problems.

A Double-Edged Sword: Are the o1 Models a Threat?

Despite these extraordinary capabilities, concerns about the potential dangers of the o1 models have been raised within the AI community. While OpenAI has been relatively reserved in discussing the risks, an internal letter from OpenAI researchers last year sparked considerable debate. The letter, which was leaked to Reuters, warned that the Q* project—which evolved into the o1 models—could “threaten humanity” if not properly managed. Although this might sound like a plot from a science fiction novel, the fears stem from the growing autonomy and reasoning power of these systems.

Much of the concern revolves around the speed and scale at which the o1 models can operate. By solving problems that require advanced reasoning—tasks once thought to be the exclusive domain of human intellect—the o1 models may introduce new risks if deployed irresponsibly. As Lee wrote in his analysis, “The o1 models aren’t perfect, but they’re a lot better at this [complex reasoning] than other frontier models.”

This has led to a broader conversation about AI safety and governance. While OpenAI has implemented safety protocols to mitigate risks, many industry leaders and researchers are pushing for more robust regulations to prevent the misuse of such powerful technologies. The question remains: Are we ready for AI systems that can think more critically and deeply than any model before?

Why Reinforcement Learning Makes o1 Different

The technical foundation of the o1 models is a significant departure from earlier AI systems. As Lee explains, the key to o1’s success lies in the use of reinforcement learning. Unlike imitation learning, which trains models to replicate human behavior based on predefined examples, reinforcement learning enables the model to learn from its mistakes and adapt in real-time. This capability is crucial for handling multi-step reasoning tasks, where a single mistake could derail the entire process.

To illustrate the difference, consider a basic math problem: “2+2=4.” In imitation learning, the model would simply memorize this equation and reproduce it when prompted. However, if the model were asked to solve a more complex equation, like “2+5+4+5-12+7-5=,” it might struggle because it has not learned how to break down complex problems into simpler parts.

Reinforcement learning addresses this issue by teaching the model to solve problems step by step. In the case of the o1 models, this has resulted in the ability to solve advanced math problems and write complex code, as seen in OpenAI’s demonstrations. This approach has allowed the o1 models to outperform even human experts in specific tasks, making them an invaluable tool for businesses that require deep, multi-step reasoning capabilities.

The Limitations: Where o1 Still Falls Short

Despite its many strengths, the o1 models are not without limitations. One of the most notable areas where the models struggle is spatial reasoning. In tests involving tasks that required a visual or spatial understanding—such as navigation puzzles or chess problems—both the o1-preview and o1-mini models produced incorrect or nonsensical answers.

For example, when asked to solve a chess problem, the o1-preview model recommended a move that was not only incorrect but also illegal in the game of chess. This highlights a broader issue with current AI systems: while they can excel at text-based reasoning tasks, they struggle with problems that require an understanding of physical or spatial relationships.

This limitation is a reminder that, despite the advancements in AI, we are still far from achieving a truly general artificial intelligence that can reason about the world in the same way humans do. As Lee pointed out, “The real world is far messier than math problems.” While o1’s ability to solve complex reasoning problems is impressive, it remains limited in its ability to navigate the complexities of real-world scenarios that involve spatial reasoning or long-term memory.

The Implications for Tech Executives: A Call for AI Governance

For tech executives, the release of the o1 models presents both an opportunity and a challenge. On one hand, the models’ extraordinary capabilities could revolutionize industries ranging from finance to healthcare by automating complex, multi-step reasoning tasks. On the other hand, the potential risks associated with such powerful systems cannot be ignored.

Executives must carefully consider how to integrate these models into their operations while ensuring that robust safety protocols are in place. This is especially important in industries where AI is used to make high-stakes decisions, such as healthcare or finance. The power of the o1 models to handle complex data and offer rapid solutions is unmatched, but without proper oversight, the risks could outweigh the benefits.

OpenAI’s efforts to collaborate with AI safety institutes in the U.S. and U.K. are a step in the right direction, but more needs to be done to ensure that AI systems are developed and deployed responsibly. As the capabilities of AI continue to grow, tech executives will play a crucial role in shaping the future of AI governance and ensuring that these technologies are used for the greater good.

The o1 Models Represent a New Era for AI

The o1 models represent a new era in artificial intelligence—one where AI systems are capable of deep, multi-step reasoning that was once thought to be the exclusive domain of human cognition. For businesses, these models offer unprecedented opportunities to automate complex tasks and unlock new insights from their data. But with this power comes a responsibility to ensure that AI is used ethically and safely.

As OpenAI continues to push the boundaries of what AI can do, the question for tech executives is not just how to leverage these models for growth, but also how to navigate the ethical and regulatory challenges that come with such extraordinary technology. The future of AI is here, and it’s both exciting and uncertain.

]]>
608660
OpenAI Establishes New Safety Board—Without Sam Altman https://www.webpronews.com/openai-establishes-new-safety-board-without-sam-altman/ Tue, 17 Sep 2024 14:35:18 +0000 https://www.webpronews.com/?p=608335 OpenAI has taken a major step toward improving its safety governance, establishing a new Safety and Security Committee that does not include Sam Altman. Altman has been CEO of OpenAI since 2019, outside of a short time in November 2023. 

OpenAI has faced ongoing criticism regarding its safety processes, with notable scientists and executives leaving the company over concerns it is not doing enough to address potential threats AI may pose. The company fueled concerns even more when it disbanded the “superalignment team” responsible for evaluating potential existential threats from AI.

Listen to a podcast conversation on OpenAI’s new safety board—Without Sam Altman!

 

In a move that is sure to allay fears, the company has unveiled the new Safety and Security Committee, and provided insight into how much power it has.

As one of its initial mandates, the Safety and Security Committee conducted a 90-day review of safety and security-related processes and safeguards and made recommendations to the full Board.

Following the full Board’s review, we are now sharing the Safety and Security Committee’s recommendations across five key areas, which we are adopting. These include enhancements we have made to build on our governance, safety, and security practices.

  • Establishing independent governance for safety & security
  • Enhancing security measures
  • Being transparent about our work
  • Collaborating with external organizations
  • Unifying our safety frameworks for model development and monitoring

The first recommendation is of particular note, as it gives the Safety and Security Committee far more power than previous safety oversight measures.

The Safety and Security Committee will become an independent Board oversight committee focused on safety and security, to be chaired by Zico Kolter, Director of the Machine Learning Department with the School of Computer Science at Carnegie Mellon University, and including Adam D’Angelo, Quora co-founder and CEO, retired US Army General Paul Nakasone, and Nicole Seligman, former EVP and General Counsel of Sony Corporation. It will oversee, among other things, the safety and security processes guiding OpenAI’s model development and deployment.

The Safety and Security Committee will be briefed by company leadership on safety evaluations for major model releases, and will, along with the full board, exercise oversight over model launches, including having the authority to delay a release until safety concerns are addressed. As part of its work, the Safety and Security Committee and the Board reviewed the safety assessment of the o1 release and will continue to receive regular reports on technical assessments for current and future models, as well as reports of ongoing post-release monitoring. The Safety and Security Committee will also benefit from regular engagement with representatives from OpenAI’s safety and security teams. Periodic briefings on safety and security matters will also be provided to the full Board.

The announcement is a welcome one and represents a major shift in OpenAI’s operations. The absence of Sam Altman from the committee is another welcome move. Altman has repeatedly come under fire for decisions, such as OpenAI releasing a voice the “Sky” voice that sounded eerily like Scarlett Johansson, despite the actor declining to lend her voice to the project. Altman even sent a tweet that seemed to indicate the intention to mimic Johansson’s voice. Similarly, Altman was ousted from OpenAI in 2023 amid growing concerns that he was prioritizing the commercialization of OpenAI’s work over safe development.

In view of Altman’s past, it will be a relief to investors and employees alike that he—and the rest of OpenAI leadership—finally have proper and independent oversight.

]]>
608335
OpenAI o1 Released: A New Paradigm in AI with Advanced Reasoning Capabilities https://www.webpronews.com/openai-o1-released-a-new-paradigm-in-ai-with-advanced-reasoning-capabilities/ Thu, 12 Sep 2024 19:30:35 +0000 https://www.webpronews.com/?p=607969 In a significant leap for artificial intelligence, OpenAI has introduced its latest model, o1, which represents a major advancement in how AI approaches complex reasoning tasks. Released on September 12, 2024, OpenAI o1 is designed to “think before responding,” employing a structured process known as chain-of-thought reasoning. Unlike previous models, o1 is trained using reinforcement learning to develop problem-solving strategies that mirror human cognitive processes. This enables the model to outperform its predecessors, including GPT-4o, on a variety of tasks in mathematics, science, and coding. OpenAI’s o1 is a preview of what could be a new era of AI, where models do not simply generate answers but reason their way to solutions.

The Foundations of OpenAI o1: Reinforcement Learning and Chain-of-Thought Processing

The critical distinction between o1 and earlier models like GPT-4o lies in its use of reinforcement learning (RL), which allows the model to iteratively improve its reasoning abilities. Traditional large language models (LLMs), including GPT-4o, are trained on massive datasets to predict the next word or token in a sequence, relying heavily on statistical patterns in the data. In contrast, OpenAI o1 uses RL to solve problems more dynamically, rewarding the model for correct solutions and penalizing incorrect ones. This method enables o1 to refine its internal decision-making process.

According to Mark Chen, OpenAI’s Vice President of Research, “The model sharpens its thinking and fine-tunes the strategies it uses to get to the answer.” This approach allows o1 to break down complex problems into smaller, manageable steps, similar to how a human might approach a challenging puzzle. In other words, the model doesn’t simply produce an answer—it “reasons” through the problem by analyzing multiple paths and revising its strategy as needed.

This chain-of-thought (CoT) method provides several advantages. First, it allows the model to be more transparent in its decision-making. Users can observe the step-by-step reasoning process as it unfolds, which increases the interpretability of the model’s outputs. Second, it enhances the model’s ability to handle multi-step problems. For example, when solving a mathematical problem or writing complex code, o1 iterates through each step, checking for logical consistency and correctness before moving on.

Chen explains: “The model is learning to think for itself, rather than trying to imitate the way humans would think. It’s the first time we’ve seen this level of self-reasoning in an LLM.”

Performance Benchmarks: Outperforming Humans in Science, Math, and Coding

The chain-of-thought and reinforcement learning techniques used by o1 have led to impressive results in competitive benchmarks. The model was tested against both human and machine intelligence on several reasoning-heavy tasks, and the outcomes were striking.

On the American Invitational Mathematics Examination (AIME), a test designed to challenge the brightest high school math students in the U.S., o1 achieved a 74% success rate when given a single attempt per problem, increasing to 83% with consensus voting across multiple samples. For context, GPT-4o averaged only 12% on the same exam. Notably, when allowed to process 1,000 samples with a learned scoring function, o1 achieved a 93% success rate, placing it among the top 500 students in the country.

In scientific domains, o1 demonstrated similar superiority. On GPQA Diamond, a benchmark for PhD-level expertise in biology, chemistry, and physics, o1 outperformed human PhDs for the first time. Bob McGrew, OpenAI’s Chief Research Officer, noted, “o1 was able to surpass human experts in several key tasks, which is a significant milestone for AI in academic research and problem-solving.”

In the realm of coding, o1 ranked in the 89th percentile on Codeforces, a competitive programming platform. This places the model among the top participants in real-time coding competitions, where solutions to algorithmic problems must be developed under tight constraints. The ability to apply reasoning across domains—whether in coding, math, or scientific inquiry—sets o1 apart from previous models, which often struggled with reasoning-heavy tasks.

Overcoming Traditional AI Limitations

One of the long-standing issues with AI models has been their tendency to “hallucinate”—generating plausible but incorrect information. OpenAI o1’s reinforcement learning and chain-of-thought processes help mitigate this issue by encouraging the model to fact-check its outputs during reasoning. According to Jerry Tworek, OpenAI’s Research Lead, “We have noticed that this model hallucinates less. While hallucinations still occur, o1 spends more time thinking through its responses, which reduces the likelihood of errors.”

In this sense, o1 introduces a more methodical approach to problem-solving. By considering multiple strategies and self-correcting as needed, the model minimizes the errors that plagued previous iterations of GPT models. Ethan Mollick, a professor at the University of Pennsylvania’s Wharton School, who tested o1, remarked, “In using the model for a month, I saw it tackle more substantive, multi-faceted problems and generate fewer hallucinations, even in tasks that traditionally trip up AI.”

Technical Challenges and Future Development

Despite its advancements, o1 is not without its challenges. The model requires significantly more compute resources than its predecessors, making it both slower and more expensive to operate. OpenAI has priced o1-preview at $15 per 1 million input tokens and $60 per 1 million output tokens, approximately 3-4 times the cost of GPT-4o. These costs may limit the immediate accessibility of o1, particularly for smaller developers and enterprises.

Additionally, while o1 excels at reasoning-heavy tasks, it is less effective in other areas compared to GPT-4o. For instance, o1 lacks web-browsing capabilities and cannot process multimodal inputs, such as images or audio. This positions o1 as a specialized model for reasoning rather than a general-purpose AI. OpenAI has indicated that future iterations will address these limitations, with plans to integrate reasoning and scaling paradigms in upcoming models like GPT-5.

Looking ahead, OpenAI envisions further improvements to o1’s reasoning capabilities. Sam Altman, OpenAI’s CEO, hinted at the company’s ambitions, stating, “We are experimenting with models that can reason for hours, days, or even weeks to solve the most difficult problems. This could represent a new frontier in AI development, where machine intelligence approaches the complexity of human thought.”

Implications for AI Development

The release of OpenAI o1 signals a paradigm shift in how AI models are built and deployed. By focusing on reasoning, rather than simply scaling model size, OpenAI is paving the way for more intelligent, reliable AI systems. The ability to think through problems and self-correct has the potential to transform how AI is used in high-stakes domains like medicine, engineering, and legal analysis.

As Noah Goodman, a professor at Stanford, put it, “This is a significant step toward generalizing AI reasoning capabilities. The implications for fields that require careful deliberation—like diagnostics or legal research—are profound. But we still need to be confident in how these models arrive at their decisions, especially as they become more autonomous.”

OpenAI o1 represents a breakthrough in AI’s ability to reason, marking a new era in model development. As OpenAI continues to refine this technology, the potential applications are vast, from academic research to real-world decision-making systems. While challenges remain, the advancements made by o1 show that AI is on the cusp of achieving human-like levels of reasoning, with profound implications for the future of technology and the world.

]]>
607969
Sergey Brin Working On AI ‘Pretty Much Every Day’ https://www.webpronews.com/sergey-brin-working-on-ai-pretty-much-every-day/ Thu, 12 Sep 2024 14:04:32 +0000 https://www.webpronews.com/?p=607954 Sergey Brin’s return to Google appears to be continuing in full swing, with the founder helping lead the company’s AI efforts.

Brin returned to work at Google in early 2023 as part of the company’s attempt to catch up with OpenAI and Microsoft in the race to deploy generative AI. Alphabet CEO Sundar Pichai issued a “code red,” an all-hands effort, and brought both founders back to help brainstorm.

A few months after returning, Brin was reportedly working at the office several days a week, but it seems he is now working on AI every day. In an interview (lightly edited for grammar) with All-In Podcast’s David Friedberg, Brin provided insight into his work and just how much he’s working on AI.

“Honestly, like pretty much every day,” Brin said. “I think as a computer scientist, I’ve never seen anything as exciting as all of the AI progress that’s happened the last few years.”

“Every month there’s like a new amazing capability and I’m like probably doubly wowed as everybody else is that computers can do this,” Brin added. “I really got back into the technical work because I just don’t want to miss out on this as a computer scientist.”

Friedberg asked if AI was “an extension of search or a rewriting of how people retrieve information.”

Brin made clear his believe that AI goes far beyond search, touching on many different aspects of life and work.

“I think that AI touches so many different elements of day-to-day life, and sure, search is one of them,” Brin said. “But it kind of covers everything. For example, programming itself, the way that I think about it is very different now.

“Writing code from scratch feels really hard, compared to just asking the AI to do it,” Brin added, to laughter from the audience. “I’ve written a little bit of code myself, just for kicks, just for fun. And then sometimes I’ve had the AI write the code for me, which was fun.”

Brin went on to describe using the AI model to write code that generated a bunch of Sudoku puzzles, and 30 minutes later the AI was done. He said that even Google’s other engineers were impressed before noting that they didn’t use AI as much as he thought they should.

“They were kind of impressed because they don’t honestly use the AI tools for their own coding as much as I think they ought to.”

On Individual Models vs One ‘God Model’

Friedberg asked Brin if he thought AI would continue to be divided into task-specific AI models or if he believed the industry would succeed in creating “God Models,” AI models that are so good and well-rounded that they can be used across industries and applications.

Brin noted that current models were already much closer to that reality than they were 10 or 15 years ago, when chess playing models were big news. He expressed his belief the trend will continue.

“But I do think the trend is to have more a unified model,” Brin said. “I don’t know if I’d call it a God Model, but to have certainly sort of shared architectures and ultimately shared models.”

Google’s Conservative Culture and Taking Risks

One of the criticism Google has faced in the AI race is the conservative nature of its approach. Early on, investors and insiders were unhappy that OpenAI and Microsoft beat the company to the punch, launching the first widely used AI models. In our own coverage at WPN, we criticized Google’s conservative approach, while saying Microsoft—a much older company—was acting more like a fearless startup.

Friedberg addressed those issues with Brin, citing a story of Brin pushing engineers to include AI’s code-writing ability into Gemini, even if it was not 100% error-free.

“I think there’s a little bit of fear,” Brin acknowledged. “Yeah, we were too timid to deploy them, and for a lot of good reasons. Some make mistakes, they say embarrassing things, sometimes it’s just kind of embarrassing how dumb they are. Even today’s latest and greatest things make really stupid mistakes people would never make.

Ultimately, however, Brin believes AI’s ability to empower individuals to do things they otherwise wouldn’t be able to do is worth the risk of embarrassment.

“At the same time, they’re incredibly powerful, and they can help you do things you never would have done, and I’ve programmed really complicated things with my kids,” Brin said. “They’ll just program it because they just ask the AI using all these really complicated APIs all kinds of things that would take like a month to learn.

“So I think that capability is magic, and you need to be willing to have some embarrassments and take some risks. And I think we’ve gotten better at that. You guys have probably seen some of our embarrassments.

Optimism About the AI’s Future

Brin concluded the podcast by voicing his optimism about AI’s future, and just how much it brings to the table, much like Google search did decades before.

“I think there’s tremendous value to humanity,” Brin said. “And I think if you think back like when I was in college, let’s say, and there wasn’t really a proper internet or web like we know it today. The amount of effort it would take to get basic information, the amount of effort it would take to communicate before cell phones and things.

“We’ve gained so much capability across the world, but this new AI is another big capability and pretty much everybody in the world can get access to it in one form or another these days, and I think it’s super exciting.”

]]>
607954
Reflection 70B Outperforms GPT-4o: The Rise of Open-Source AI for Developers https://www.webpronews.com/reflection-70b-outperforms-gpt-4o-the-rise-of-open-source-ai-for-developers/ Sun, 08 Sep 2024 20:39:42 +0000 https://www.webpronews.com/?p=607660 The race between open-source models and proprietary systems has hit a turning point in AI development. Reflection 70B, an open-source model, has managed to surpass some of the most powerful models on the market, including GPT-4o, in a variety of benchmarks. Developed by Matt Shumer and a small team at GlaiveAI, Reflection 70B introduces a new era of AI with its unique Reflection-Tuning approach, allowing the model to fix its own mistakes in real-time. For developers, engineers, and tech professionals, the implications of this breakthrough go far beyond a simple improvement in accuracy—it signals a potential paradigm shift in how large language models (LLMs) are built, deployed, and scaled.

Why Reflection 70B Is a Game-Changer

Reflection 70B is not just another LLM in the crowded AI landscape. It’s built using Reflection-Tuning, a technique that enables the model to self-assess and correct its responses during the generation process. Traditionally, models generate an answer and stop there, but Reflection 70B takes things further by employing a post-generation feedback loop. This reflection phase improves the model’s reasoning capabilities and reduces errors, which is especially critical in complex tasks like logic, math, and natural language understanding.

As Shumer explained, “This model is quite fun to use and insanely powerful. With the right prompting, it’s an absolute beast for many use-cases.” This feature allows the model to perform exceptionally well in both zero-shot and few-shot learning environments, beating other state-of-the-art systems like Claude 3.5, Gemini 1.5, and GPT-4o on every major benchmark tested.

Performance on Benchmarks

For AI developers, one of the most compelling reasons to pay attention to Reflection 70B is its performance across a wide range of benchmarks. The model recorded a 99.2% accuracy on the GSM8k benchmark, which is used to evaluate math and logic skills. This score raised eyebrows within the AI community, with many questioning if the model had simply memorized answers. However, independent testers like Jonathan Whitaker debunked this notion by feeding the model problematic questions with incorrect “ground-truth” answers. “I fed the model five questions from GSM8k that had incorrect answers. It got them all right, rather than regurgitating the wrong answers from the dataset,” Whitaker noted, confirming the model’s superior generalization ability.

Shumer emphasizes that the model excels in zero-shot learning, where the AI has to solve problems without any prior examples. In a world where few-shot learning—providing models with several examples before they make predictions—dominates proprietary systems, Reflection 70B stands out for its ability to reason and solve problems with minimal input. “Reflection 70B consistently outperforms other models in zero-shot scenarios, which is crucial for developers working with dynamic, real-world data where examples aren’t always available,” says Shumer.

The Technology Behind Reflection-Tuning

So how exactly does Reflection-Tuning work? The process can be broken down into three key steps: Plan, Execute, Reflect.

  1. Plan: When asked a question, the model first plans how it will tackle the problem, mapping out potential reasoning steps.
  2. Execute: It then executes the plan and generates an initial response based on its reasoning process.
  3. Reflect: Finally, the model pauses, reviews its own answer, and evaluates whether any errors were made. If it finds mistakes, it revises the output before delivering the final response.

This technique mirrors human problem-solving methods, making the model more robust and adaptable to complex tasks. For developers, this approach is especially valuable when dealing with applications that require a high degree of accuracy, such as medical diagnostics, financial forecasting, or legal reasoning. Traditional models might require frequent retraining to achieve comparable results, but Reflection-Tuning enables the model to fine-tune itself on the fly.

In one test, the model was asked to compare two decimal numbers—9.11 and 9.9. Initially, it answered incorrectly but, through its reflection phase, corrected itself and delivered the right answer. This level of introspection is a significant leap forward in AI capabilities and could reduce the need for constant human oversight during AI deployment.

Open-Source Power: Democratizing AI Development

One of the most remarkable aspects of Reflection 70B is that it’s open-source. Unlike proprietary models like GPT-4o or Google’s Gemini, which are locked behind paywalls and closed platforms, Reflection 70B is available to the public. Developers can access the model weights via platforms like Hugging Face, making it easy to integrate and experiment with the model in a variety of applications.

Shumer emphasizes that this open approach has been key to the model’s rapid development. “Just Sahil and I! This was a fun side project for a few weeks,” he explained, highlighting how small teams with the right tools can compete with tech giants. The model was trained with GlaiveAI data, accelerating its capabilities in a fraction of the time it would take larger companies. “Glaive’s data was what took it so far, so quickly,” he added.

This open-access philosophy also allows developers to customize and fine-tune the model for specific use-cases. Whether you’re building a chatbot, automating customer service, or developing a new AI-driven product, Reflection 70B provides a powerful, flexible base.

The 405B Model and Beyond

Reflection 70B isn’t the end of the road for Shumer and his team. They’re already working on the release of Reflection-405B, a larger model that promises even better performance across benchmarks. Shumer is confident that 405B will “outperform Sonnet and GPT-4o by a wide margin.”

The potential applications for this next iteration are vast. Developers can expect Reflection-405B to bring improvements in areas such as multi-modal learning, code generation, and natural language understanding. With the trend toward larger, more complex models, Reflection-405B could become a leading contender in the AI space, challenging not just open-source competitors but proprietary giants as well.

Challenges and Considerations for AI Developers

While the performance of Reflection 70B is undoubtedly impressive, developers should be aware of a few challenges. As with any open-source model, integrating and scaling Reflection 70B for production environments requires a solid understanding of AI infrastructure, including server costs, data management, and security protocols.

Additionally, Reflection-Tuning may introduce latency in applications requiring real-time responses, such as voice assistants or interactive bots. Shumer acknowledges this, noting that the model’s reflection phase can slow down response times, though optimization techniques could mitigate this issue. For developers aiming to use the model in time-sensitive environments, balancing reflection depth and speed will be a key consideration.

An Interesting New Era for Open-Source AI

Reflection 70B is not just an impressive feat of engineering; it’s a sign that open-source models are capable of competing with—and even outperforming—proprietary systems. For AI developers, the model offers a rare combination of accessibility, flexibility, and top-tier performance, all packaged in a framework that encourages community-driven innovation.

As Shumer himself puts it, “This is just the start. I have a few more tricks up my sleeve.” With the release of Reflection-405B on the horizon, developers should be watching closely. The future of AI may no longer be dominated by closed systems, and Reflection 70B has shown that open-source might just be the key to the next breakthrough in AI technology.

]]>
607660
Elon Musk’s xAI Team Brings Colossus 100k H100 Training Cluster Online in Just 122 Days https://www.webpronews.com/elon-musks-xai-team-brings-colossus-100k-h100-training-cluster-online-in-just-122-days/ Mon, 02 Sep 2024 23:22:56 +0000 https://www.webpronews.com/?p=607303 In a move that puts an exclamation point on the massively accelerating pace of artificial intelligence development, Elon Musk announced over the weekend that his xAI team successfully brought the Colossus 100k H100 training cluster online—a feat completed in an astonishing 122 days. This achievement marks the arrival of what Musk is calling “the most powerful AI training system in the world,” with plans to double its capacity in the coming months.

The Birth of Colossus

The Colossus cluster, composed of 100,000 Nvidia H100 GPUs, represents a significant milestone not just for Musk’s xAI but for the AI industry at large. “This is not just another AI cluster; it’s a leap into the future,” Musk tweeted. The system’s scale and speed of deployment are unprecedented, demonstrating the power of a concerted effort between xAI, Nvidia, and a network of partners and suppliers.

Bringing such a massive system online in just 122 days is an accomplishment that has left many industry experts and tech titans in awe. “It’s amazing how fast this was done, and it’s an honor for Dell Technologies to be part of this important AI training system,” said Michael Dell, CEO of Dell Technologies, one of the key partners in the project. The speed and efficiency of this deployment reflect a new standard in AI infrastructure development, one that could reshape the competitive landscape in AI research and application.

A Technological Marvel

The Colossus system is designed to push the boundaries of what AI can achieve. The 100,000 H100 GPUs provide unparalleled computational power, enabling the training of highly complex AI models at speeds that were previously unimaginable. “Colossus isn’t just leading the pack; it’s rewriting what we thought was possible in AI training,” commented xAI’s official ² account, capturing the sentiment of many in the tech community.

The cluster is set to expand even further, with plans to integrate 50,000 H200 GPUs in the near future, effectively doubling its capacity. The H200, Nvidia’s next-generation GPU, is expected to bring enhancements in both performance and energy efficiency, further solidifying Colossus’s position at the forefront of AI development.

Collaboration on a Grand Scale

Colossus’s rapid deployment was made possible by a collaborative effort that included some of the biggest names in technology. Nvidia, Dell, and other partners provided the essential components and expertise necessary to bring this ambitious project to life. The success of Colossus is a testament to the power of collaboration in driving technological innovation.

“Elon Musk and the xAI team have truly outdone themselves,” said Patrick Moorhead, CEO of Moor Insights & Strategy, in response to the announcement. “This project sets a new benchmark for AI infrastructure, and it’s exciting to see what this will enable in terms of AI research and applications.”

Implications for AI Development

The completion of Colossus represents more than just a technical achievement; it has far-reaching implications for the future of AI. With such a powerful system at its disposal, xAI is poised to accelerate the development of advanced AI models, including those that will power applications like autonomous vehicles, robotics, and natural language processing.

The potential of Colossus extends beyond xAI’s immediate goals. As the system scales and evolves, it could become a critical resource for the broader AI community, offering unprecedented capabilities for research and innovation. “This isn’t just innovation; it’s a revolution,” tweeted one xAI supporter, highlighting the broader impact that Colossus could have on the industry.

What’s Next?

As Colossus comes online, the tech world is watching closely to see what comes next. The expansion to 200,000 GPUs is just the beginning, with Musk hinting at even more ambitious plans on the horizon. The speed and scale of this project have set a new standard in the industry, and it’s clear that xAI is not content to rest on its laurels.

For now, the focus will be on leveraging Colossus’s immense power to push the boundaries of AI. Whether it’s through the development of new AI models or the enhancement of existing ones, the possibilities are virtually limitless. As Musk put it, “The future is now, and it’s powered by xAI.”

Congrats to xAI on this massive achievement!

]]>
607303
Llama 3.1: A Massive Upgrade in Open Source AI Technology https://www.webpronews.com/llama-3-1-a-massive-upgrade-in-open-source-ai-technology/ Sun, 01 Sep 2024 16:09:11 +0000 https://www.webpronews.com/?p=607208 In the rapidly evolving landscape of artificial intelligence, Meta’s Llama models have emerged as formidable players, particularly in the open-source domain. The latest iteration, Llama 3.1, represents a significant leap forward, not just in terms of size and capability, but also in its impact on the AI community and industry adoption. With 405 billion parameters, Llama 3.1 is one of the most advanced large language models (LLMs) available today, marking a pivotal moment in the democratization of AI technology.

The Growth and Adoption of Llama

Since its initial release, the Llama series has experienced exponential growth, with downloads surpassing 350 million as of August 2024. This represents a 10x increase from the previous year, underscoring the model’s widespread acceptance and utility across various sectors. Notably, Llama 3.1 alone was downloaded more than 20 million times in just one month, a testament to its growing popularity among developers and enterprises alike.

Meta’s open-source approach with Llama has been instrumental in this rapid adoption. By making the models freely available, Meta has fostered a vibrant ecosystem where innovation thrives. “The success of Llama is made possible through the power of open source,” Meta announced, emphasizing their commitment to sharing cutting-edge AI technology in a responsible manner. This strategy has enabled a wide range of applications, from startups experimenting with new AI solutions to large enterprises integrating Llama into their operations.

Strategic Partnerships and Industry Impact

Llama’s influence extends beyond just the number of downloads. The model’s integration into major cloud platforms like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud has significantly boosted its usage, particularly in enterprise environments. From May to July 2024, Llama’s token volume across these cloud services doubled, and by August, the highest number of unique users on these platforms was for the 405B variant of Llama 3.1. This trend highlights the increasing reliance on Llama for high-performance AI applications.

Industry leaders have been quick to recognize the value that Llama 3.1 brings to the table. Swami Sivasubramanian, VP of AI and Data at AWS, noted, “Customers want access to the latest state-of-the-art models for building AI applications in the cloud, which is why we were the first to offer Llama 2 as a managed API and have continued to work closely with Meta as they released new models.” Similarly, Ali Ghodsi, CEO of Databricks, praised the model’s quality and flexibility, calling Llama 3.1 a “breakthrough for customers wanting to build high-quality AI applications.”

The adoption of Llama 3.1 by enterprises like AT&T, Goldman Sachs, DoorDash, and Accenture further underscores its growing importance. AT&T, for instance, reported a 33% improvement in search-related responses for customer service, attributing this success to the fine-tuning capabilities of Llama models. Accenture is using Llama 3.1 to build custom large language models for ESG reporting, expecting productivity gains of up to 70%.

Technical Advancements in Llama 3.1

The technical prowess of Llama 3.1 is evident in its advanced features and capabilities. The model’s context length has been expanded to 128,000 tokens, enabling it to handle much longer and more complex inputs than previous versions. This makes it particularly effective for tasks like long-form text summarization, multilingual conversational agents, and even complex mathematical reasoning.

Moreover, Llama 3.1 supports eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, reflecting Meta’s commitment to making AI more accessible globally. The model is also optimized for tool calling, with built-in support for mathematical reasoning and custom JSON functions, making it highly adaptable for a variety of use cases.

The engineering behind Llama 3.1 is as impressive as its features. Meta’s team has meticulously documented the training process, revealing a highly sophisticated approach that balances performance with efficiency. The model was trained on 15 trillion tokens and fine-tuned using over 10 million human-annotated examples, ensuring it performs exceptionally well across a range of tasks.

Open Source and the Future of AI

Meta’s open-source strategy with Llama has not only democratized access to advanced AI models but also set a new standard for transparency and collaboration in the AI community. The release of Llama 3.1, accompanied by a detailed research paper, provides a blueprint for AI developers and researchers to build upon. This move is expected to catalyze further innovation in the field, as developers can now create derivative models and applications with greater ease and lower costs.

Mark Zuckerberg, CEO of Meta, articulated the company’s vision in an open letter, stating, “Open source promotes a more competitive ecosystem that’s good for consumers, good for companies (including Meta), and ultimately good for the world.” This philosophy is already bearing fruit, as evidenced by the creation of over 60,000 derivative models on platforms like Hugging Face.

The open-source nature of Llama 3.1 also addresses some of the ethical concerns surrounding AI development. Meta has integrated robust safety features like Llama Guard 3 and Prompt Guard, designed to prevent data misuse and promote responsible AI deployment. This is particularly crucial as AI systems become more pervasive in industries like finance, healthcare, and customer service.

A Case Study in Open Source Success

One of the most compelling examples of Llama 3.1’s impact is its adoption by Niantic, the company behind the popular AR game Peridot. Niantic integrated Llama to enhance the game’s virtual pets, known as “Dots,” making them more responsive and lifelike. Llama generates each Dot’s reactions in real-time, creating a dynamic and unique experience for players. This use case exemplifies how Llama 3.1 can drive innovation in both consumer and enterprise applications.

Another significant case is Shopify, which uses LLaVA, a derivative of Llama, for product metadata and enrichment. Shopify processes between 40 million to 60 million inferences per day using LLaVA, highlighting the scalability and efficiency of the Llama 3.1 framework.

The Future of AI with Llama

Llama 3.1 is more than just an upgrade; it represents a paradigm shift in how AI models are developed, deployed, and utilized. With its unprecedented scale, performance, and accessibility, Llama 3.1 is poised to become a cornerstone of the AI ecosystem. As more enterprises and developers adopt Llama, the boundaries of what AI can achieve will continue to expand.

The success of Llama 3.1 also reinforces the importance of open-source AI in driving innovation and ensuring that the benefits of AI are widely distributed. As Meta continues to push the envelope with future releases, the AI landscape will undoubtedly become more dynamic, competitive, and inclusive. Whether in academia, industry, or beyond, Llama 3.1 is setting the stage for a new era of AI development.

]]>
607208
Grok-2 Large Beta: A Groundbreaking Leap in AI or Just More Hype? https://www.webpronews.com/grok-2-large-beta-a-groundbreaking-leap-in-ai-or-just-more-hype/ Mon, 26 Aug 2024 13:27:25 +0000 https://www.webpronews.com/?p=606887 The artificial intelligence (AI) landscape has been buzzing with excitement, skepticism, and intrigue since the quiet release of Grok-2 Large Beta, the latest large language model (LLM) from Elon Musk’s xAI. Unlike the typical high-profile launches that accompany such advanced models, Grok-2 slipped onto the scene without a research paper, model card, or academic validation, raising eyebrows across the AI community. But the mystery surrounding its debut has only fueled more interest, prompting many to ask: Is Grok-2 a true revolution in AI, or is it just another iteration in an already crowded field?

A Mysterious Entrance

In a field where transparency and documentation are highly valued, Grok-2’s introduction was unconventional, to say the least. Traditionally, new AI models are accompanied by detailed research papers that explain the model’s architecture, training data, benchmarks, and potential applications. Grok-2, however, arrived with none of these. Instead, it was quietly integrated into a chatbot on Twitter (or X.com), leaving many AI researchers puzzled.

“It’s unusual, almost unheard of, to release a model of this scale without any academic backing or explanation,” remarked an AI researcher. “It raises questions about the model’s capabilities and the motivations behind its release.”

Despite this unconventional launch, Grok-2 quickly demonstrated its potential, performing impressively on several key benchmarks, including the Google Proof Science Q&A Benchmark and the MLU Pro, where it secured a top position, second only to Claude 3.5 Sonic. These early results suggest that Grok-2 could be a serious contender in the LLM space. However, the lack of transparency has led to a mix of curiosity and skepticism within the AI community.

One commenter on the popular ‘AI Explained’ YouTube channel voiced the general sentiment: “No paper? Just a table with benchmarks. What are the performance claims for Grok-2 really based on? Benchmarks have been repeatedly proven meaningless by this point.”

The Scaling Debate: Beyond Just Bigger Models?

One of the most contentious topics in AI is the concept of scaling—expanding a model’s size, data intake, and computational power to enhance its performance. This debate has been reignited by Grok-2’s release, particularly in light of a recent paper from Epoch AI, which predicts that AI models could be scaled up by a factor of 10,000 by 2030. Such a leap could revolutionize the field, potentially bringing us closer to AI that can reason, plan, and interact with humans on a level akin to human cognition.

The Epoch AI paper suggests that scaling could lead to the development of “world models,” where AI systems develop sophisticated internal representations of the world, enabling them to understand and predict complex scenarios better. This could be a significant step toward achieving Artificial General Intelligence (AGI), where AI systems can perform any intellectual task that a human can.

However, this vision is not universally accepted. “We’ve seen time and time again that more data and more parameters don’t automatically lead to more intelligent or useful models,” cautioned an AI critic. “What we need is better data, better training techniques, and more transparency in how these models are built and evaluated.”

This skepticism is echoed by many in the AI field. As another user on the ‘AI Explained’ channel noted, “Does anybody really believe that scaling alone will push transformer-based ML up and over the final ridge before the arrival at the mythical summit that is AGI?” This highlights a broader concern that merely making models larger may not address the fundamental limitations of current AI architectures.

Testing Grok-2: Early Performance and Challenges

In the absence of official documentation, independent AI enthusiasts and researchers have taken it upon themselves to test Grok-2’s capabilities. The Simple Bench project, an independent benchmark designed to test reasoning and problem-solving abilities, has become a key tool in this effort. According to the creator of Simple Bench, who also runs the ‘AI Explained’ channel, Grok-2 has shown promise, though it still has room for improvement.

“Grok-2’s performance was pretty good, mostly in line with the other top models on traditional benchmarks,” the creator shared. “But it’s not just about scores—it’s about how these models handle more complex, real-world tasks.”

Simple Bench focuses on tasks requiring models to understand and navigate cause-and-effect relationships, which are often overlooked by traditional benchmarks. While Grok-2 performed well in many areas, it fell short in tasks where Claude 3.5 Sonic excelled, particularly those that required deeper reasoning and contextual understanding.

Reflecting on the importance of benchmarks like Simple Bench, one commenter observed, “What I like about Simple Bench is that it’s ball-busting. Too many of the recent benchmarks start off at 75-80% on the current models. A bench that last year got 80% and now gets 90% is not as interesting anymore for these kinds of bleeding-edge discussions on progress.” This sentiment underscores the need for benchmarks that challenge AI models to push beyond the easily achievable, testing their limits in more meaningful ways.

The Ethical Dilemmas: Deepfakes and Beyond

As AI models like Grok-2 become more sophisticated, they also introduce new ethical challenges, particularly concerning the generation of highly convincing deepfakes in real-time. With tools like Flux, Grok-2’s image-generating counterpart, the line between reality and digital fabrication is blurring at an alarming rate.

“We’re not far from a world where you won’t be able to trust anything you see online,” warned an AI enthusiast. “The line between reality and fabrication is blurring at an alarming rate.”

The potential for misuse is significant, ranging from spreading misinformation to manipulating public opinion. As one commenter on the ‘AI Explained’ channel noted, “We are mindlessly hurtling towards a world of noise where nothing can be trusted or makes any sense.” This dystopian vision highlights the urgent need for regulatory frameworks and technological solutions to address the risks posed by AI-generated content.

Some experts are calling for stricter regulations and the development of new technologies to help detect and counteract deepfakes. Demis Hassabis, CEO of Google DeepMind, recently emphasized the importance of proactive measures: “We need to be proactive in addressing these issues. The technology is advancing quickly, and if we’re not careful, it could outpace our ability to control it.”

A Turning Point or Just Another Step?

The debate over Grok-2’s significance is far from settled. Some view it as a harbinger of a new era of AI-driven innovation, while others see it as just another model in an increasingly crowded field. As one skeptic on the ‘AI Explained’ channel remarked, “How can we really judge the importance of Grok-2 when there’s no transparency about how it works or what it’s truly capable of? Without that, it’s just another black box.”

Despite these reservations, Grok-2’s release is undeniably a moment of interest in the AI landscape. The model’s capabilities, as demonstrated through early benchmark performances, suggest it could play a significant role in shaping the future of AI. However, this potential is tempered by the ongoing challenges in AI development, particularly around ethics, transparency, and the limits of scaling.

The ethical implications of models like Grok-2 cannot be overstated. As AI continues to advance, the line between reality and digital fabrication becomes increasingly blurred, raising concerns about trust and authenticity in the digital age. The potential for real-time deepfakes, coupled with the model’s capabilities, presents both opportunities and risks that society must grapple with sooner rather than later.

Ultimately, Grok-2’s legacy will depend on how these challenges are addressed. Will the AI community find ways to harness the power of large language models while ensuring they are used responsibly? Or will Grok-2 and its successors become symbols of an era where technological advancement outpaced our ability to manage its consequences?

As we stand at this crossroads, the future of AI remains uncertain. Grok-2 might just be one of many signposts along the way, pointing to the immense possibilities—and dangers—of what lies ahead.

]]>
606887
Zed Editor Adds Anthropic-Powered AI Features https://www.webpronews.com/zed-editor-adds-anthropic-powered-ai-features/ Wed, 21 Aug 2024 19:30:13 +0000 https://www.webpronews.com/?p=606716 Zed, the text editor taking the development world by storm, has announced new AI features powered by Anthropic’s Claude.

Zed is a new text editor written entirely in Rust, benefiting from the speed, security, and other features the language provides. Zed has been gaining in popularity, with a Linux version of the text editor recently being released.

The company is now working with Anthropic to bring AI-powered features to the text editor, according to a blog post. Nathan Sobo, Zed founder, said the company has been looking for ways to integrate LLMs in a way that enhanced productivity.

In the two years since LLMs came onto our radar, we’ve been focused on building out the core of Zed: a fast, reliable text editor with the features developers need. Meanwhile, we’ve been quietly experimenting with integrating LLMs into our own workflows. Not as a flashy gimmick, but as a practical tool to enhance our productivity working on a complex, real-world codebase.

It appears Anthropic is a Zed fan, approaching the company to discuss a integration.

As we refined our AI integration, we caught the attention of some unexpected allies. Engineers at Anthropic, one of the world’s leading AI companies, discovered Zed and quickly saw the value of our raw, text-centric interface that puts minimal separation between the user and the language model. Their enthusiasm was validating, and our conversations sparked a dialogue that quickly evolved into a collaboration.

Now, we’re ready to introduce Zed AI, a hosted service providing convenient and performant support for AI-enabled coding in Zed, powered by Anthropic’s Claude 3.5 Sonnet and accessible just by signing in. We also worked with Anthropic to optimize Zed for implement their new Prompt Caching beta, leading to lightning-fast responses even with thousands of lines of code included in the context window while reducing cost.

Zed AI is composed of two components, one of which is the assistant panel.

The assistant panel is where you interact with AI models in Zed, but it’s not your typical chat interface. It’s a full-fledged text editor that exposes the entire LLM request. Code snippets, conversation history, file contents—it’s all there, and it’s all just text. You can observe, edit, and refine any part of the request using familiar coding tools, giving you full transparency and control over every interaction.

The second component is inline transformations.

Inline transformations, activated with ctrl-enter, allow you to transform and generate code via natural language prompts. What sets them apart is their precision and responsiveness.

To give you fast feedback, we’ve implemented a custom streaming diff protocol that works with Zed’s CRDT-based buffers to deliver edits as soon as they’re streamed from the model. You see the model’s output token by token, allowing you to read and react to changes as they happen. This low-latency streaming creates a fluid, interactive coding experience that keeps you engaged and in control throughout the process.

Inline transformations in Zed use the context you’ve built in the assistant panel. There’s no hidden system prompt-you see and control every input shaping the model’s output. This transparency lets you fine-tune the model’s behavior and improve your skills in AI-assisted coding.

Sobo says the company is working on additional features for Zed AI, including workflows for complex transformations and tools to efficiently build context. Sobo invites developers to help craft the future of Zed AI.

Zed AI embodies our belief in open, collaborative software development. We’ve created a transparent, extensible environment that empowers you to harness AI on your own terms, keeping you firmly in control of your tools and workflows.

We invite you to try Zed AI and become part of this journey. Experiment with custom slash commands, fine-tune prompts, and push boundaries. Share your innovations as extensions or as contributions to the Zed repository.

With Zed AI, you’re in the driver’s seat, directing AI’s potential within the familiar realm of text. Together, we’ll build an AI-assisted development experience that amplifies your creativity and adapts to your unique coding style. We’re excited to see what our community will create.

Anthropic is also helping to further Zed development, with the AI firm’s Rust engineers are actively contributing to Zed’s open-source codebase.

Those interested in trying Zed can download versions for macOS and Linux here.

]]>
606716
California Partners with Nvidia to Revolutionize AI Training in Community Colleges https://www.webpronews.com/california-partners-with-nvidia-to-revolutionize-ai-training-in-community-colleges/ Fri, 09 Aug 2024 20:23:11 +0000 https://www.webpronews.com/?p=606359 In a groundbreaking move to fortify its position at the forefront of technological innovation, California has partnered with Nvidia to bring cutting-edge artificial intelligence (AI) resources to the state’s expansive community college system. The partnership, formalized by Governor Gavin Newsom and Nvidia CEO Jensen Huang, represents a significant stride in equipping students, educators, and workers with the skills necessary to thrive in an increasingly AI-driven world.

A Strategic Alliance for AI Education

The collaboration is set to transform the educational landscape across California’s 116 community colleges, which serve over two million students. Under the terms of the agreement, Nvidia will provide access to its state-of-the-art AI tools, including hardware, software, and specialized training materials. These resources will be integrated into college curriculums, focusing on the practical applications of AI in high-demand sectors such as technology, healthcare, and finance.

“California’s world-leading companies are pioneering AI breakthroughs, and it’s essential that we create more opportunities for Californians to get the skills to utilize this technology and advance their careers,” Governor Newsom said during the signing ceremony. This initiative aligns with the state’s broader goals of fostering innovation and ensuring that all Californians can benefit from advancements in AI.

Empowering the Workforce of Tomorrow

The partnership focuses on equipping the next generation of workers with the tools they need to succeed in a rapidly changing job market. Nvidia will offer AI-focused certifications, workshops, and boot camps to help students and faculty stay ahead of industry trends. Additionally, the company will support the development of AI laboratories across community colleges, enabling hands-on learning experiences that will prepare students for the future workforce.

“We’re in the early stages of a new industrial revolution that will transform trillion-dollar industries around the world,” Nvidia’s Jensen Huang said. “Together with California, Nvidia will train 100,000 students, college faculty, developers, and data scientists to harness this technology to prepare California for tomorrow’s challenges and unlock prosperity throughout the state.”

Addressing Equity and Inclusion

One of the key aspects of this initiative is its focus on equitable access to AI education. The collaboration aims to bridge the gap for underserved populations by ensuring that students from all backgrounds have the opportunity to gain industry-aligned AI skills. Sonya Christian, Chancellor of California Community Colleges, emphasized this commitment: “Our approach prioritizes equitable access to AI teaching and learning enhancements that will lift up underserved populations.”

This emphasis on inclusivity reflects California’s broader commitment to using technology for social and economic advancement. The partnership hopes to create a more inclusive workforce prepared to tackle future challenges by providing AI education and resources to a diverse student body.

A Vision for the Future

The California-Nvidia partnership is part of a larger vision to position the state as a global leader in AI innovation. The initiative builds on Governor Newsom’s 2023 executive order, which called for the responsible use of AI to benefit all Californians. This collaboration not only sets a new standard for public-private partnerships but also highlights the critical role that education will play in shaping the future of AI.

As AI continues to evolve, the importance of equipping the workforce with the necessary skills cannot be overstated. The California-Nvidia partnership is a bold step toward ensuring that the state remains at the cutting edge of technological advancement while also promoting equity and opportunity for all its residents.

With this initiative, California is preparing for the future and actively shaping it.

]]>
606359
FTC’s Lina Khan Sees Open AI Models As The Answer To AI Monopolies https://www.webpronews.com/ftcs-lina-khan-sees-open-ai-models-as-the-answer-to-ai-monopolies/ Fri, 26 Jul 2024 22:21:01 +0000 https://www.webpronews.com/?p=606015 Federal Trade Commission Chair Lina Khan has vocalized her support for open AI models, saying they could prove the key to preventing AI monopolies.

According to Bloomberg, Khan made the comments at Y Combinator in San Francisco.

“There’s tremendous potential for open-weight models to promote competition,” Khan said. “Open-weight models can liberate startups from the arbitrary whims of closed developers and cloud gatekeepers.”

Khan’s comments come at a time when regulators on both sides of the Atlantic are growing increasingly wary of Big Tech. AI companies have done little to stave off such concerns, with accusations they plagiarize content, throttle organizations’ servers as they scrape them, and show little regard for the potential danger AI may pose.

In view those issues, many lawmakers are concerned about a future where AI development and breakthroughs are largely controlled by a handful of companies.

One notable exception in the industry is Meta’s Llama AI model, which the company has made available as open-source software. The company explained its reasons in a blog post announcing Llama 3:

We’re committed to the continued growth and development of an open AI ecosystem for releasing our models responsibly. We have long believed that openness leads to better, safer products, faster innovation, and a healthier overall market. This is good for Meta, and it is good for society. We’re taking a community-first approach with Llama 3, and starting today, these models are available on the leading cloud, hosting, and hardware platforms with many more to come.

With Khan’s comments, LLama and other open models may see an uptick in use.

]]>
606015
Apple Signs Biden Administration’s AI Safety Guidelines https://www.webpronews.com/apple-signs-biden-administrations-ai-safety-guidelines/ Fri, 26 Jul 2024 19:51:18 +0000 https://www.webpronews.com/?p=606010 In preparation for the release of its Apple Intelligence, the iPhone make has voluntarily signed the Biden Administration’s AI safety guidelines.

The White House announced the news in a press release:

Nine months ago, President Biden issued a landmark Executive Order to ensure that America leads the way in seizing the promise and managing the risks of artificial intelligence (AI).

This Executive Order built on the voluntary commitments he and Vice President Harris received from 15 leading U.S. AI companies last year. Today, the administration announced that Apple has signed onto the voluntary commitments, further cementing these commitments as cornerstones of responsible AI innovation.

Apple is widely considered to be a significant factor in the AI industry, thanks largely to its penchant for making high-tech solutions approach to the average user, as well as the huge user base that it can leverage.

With the announcement of Apple Intelligence, many critics and experts say Apple has done more to make the case for AI’s usefulness to the average user than most other companies combined. In view of the role Apple will likely play, it’s good to see the company’s continued commitment to safe AI development and deployment.

]]>
606010
xAI Goes Its Own Way Instead of Depending On Oracle https://www.webpronews.com/xai-goes-its-own-way-instead-of-depending-on-oracle/ Wed, 10 Jul 2024 23:00:00 +0000 https://www.webpronews.com/?p=605601 Elon Musk announced that his AI startup, xAI, will deploy Nvidia H100 systems on its own rather than continuing to use Oracle.

Musk’s xAI originally tapped Oracle to help it deploy 24,000 H100s that were used to train its Grok 2 model. According to Musk, however, the company plans to go its own way, building out its own cluster containing some 100,000 H100s. Musk framed the decision in the context of needing to leapfrog its AI rivals, with controlling its own cluster being the key to doing so.

xAI contracted for 24k H100s from Oracle and Grok 2 trained on those. Grok 2 is going through finetuning and bug fixes. Probably ready to release next month.

xAI is building the 100k H100 system itself for fastest time to completion. Aiming to begin training later this month. It will be the most powerful training cluster in the world by a large margin.

The reason we decided to do the 100k H100 and next major system internally was that our fundamental competitiveness depends on being faster than any other AI company. This is the only way to catch up.

Oracle is a great company and there is another company that shows promise also involved in that OpenAI GB200 cluster, but, when our fate depends on being the fastest by far, we must have our own hands on the steering wheel, rather than be a backseat driver.

Elon Musk (@elonmusk) | July 9, 2024

The move is a blow to Oracle. As Investors.com points out, Oracle founder Larry Ellison touted its relationship with xAI in a recent quarterly earnings call, saying his company was working to secure more H100s for the startup.

“We gave them quite a few,” Ellison said at the time. “But they wanted more, and we are in the process of getting them more.”

]]>
605601
Anthropic Adds the Ability to Evaluate Prompts https://www.webpronews.com/anthropic-adds-the-ability-to-evaluate-prompts/ Wed, 10 Jul 2024 12:03:00 +0000 https://www.webpronews.com/?p=605606 Anthropic is making it easier for developers to generate high-quality prompts, adding prompt evaluation to the Anthropic Console.

Prompts are an important part of the AI development process, and can have a major impact on the results, as Anthropic says in a blog post announcing the new feature:

When building AI-powered applications, prompt quality significantly impacts results. But crafting high quality prompts is challenging, requiring deep knowledge of your application’s needs and expertise with large language models. To speed up development and improve outcomes, we’ve streamlined this process to make it easier for users to produce high quality prompts.

You can now generate, test, and evaluate your prompts in the Anthropic Console. We’ve added new features, including the ability to generate automatic test cases and compare outputs, that allow you to leverage Claude to generate the very best responses for your needs.

Anthropic says users can generate prompts simply by describing a task to Claude. Using the Claude 3.5 Sonnet engine, Claude will use the description its given to generate a high-quality prompt.

The new Evaluate feature makes it much easier to test prompts against real-world inputs.

Testing prompts against a range of real-world inputs can help you build confidence in the quality of your prompt before deploying it to production. With the new Evaluate feature you can do this directly in our Console instead of manually managing tests across spreadsheets or code.

Manually add or import new test cases from a CSV, or ask Claude to auto-generate test cases for you with the ‘Generate Test Case’ feature. Modify your test cases as needed, then run all of the test cases in one click. View and adjust Claude’s understanding of the generation requirements for each variable to get finer-grained control over the test cases Claude generates.

Anthropic is already the leading OpenAI competitor, with its Claude 3.5 besting OpenAI’s GPT-4o in a range of tests. With the new features aimed at improving the quality of prompts, Anthropic continues to push AI development forward.

]]>
605606
Microsoft Azure AI Skirts OpenAI’s China Ban https://www.webpronews.com/microsoft-azure-ai-skirts-openais-china-ban/ Mon, 08 Jul 2024 16:03:21 +0000 https://www.webpronews.com/?p=605564 Microsoft Azure has managed to avoid OpenAI’s ban on providing API access in China, a major win for the Redmond company’s cloud and AI efforts.

OpenAI announced in late June that it would block API traffic from countries that were not on its “supported countries and territories” list. Users on the company’s forums reported receiving emails from the company informing them of the policy.

China was conspicuously absent from the long list of supported countries, meaning that Chinese developers will not have access to the company’s API for development. According to The Information, however, there is a significant workaround that runs straight through Microsoft Azure AI.

Developers in China who want to take advantage of OpenAI’s models can still do so if they sign up for an Azure account. Because of Microsoft and OpenAI’s close relationship, this gives developers access to the AI firm’s AI models through Microsoft’s services.

According to the outlet, the exception works because Azure China is a joint venture with Chinese company 21Vianet. Multiple customers confirmed to The Information that they had full access to OpenAI models within Azure.

Given the importance of the Chinese market, the revelation is good news for Microsoft, OpenAI, and Chinese AI developers.

]]>
605564
Microsoft Details ‘Skeleton Key’ AI Model Jailbreak https://www.webpronews.com/microsoft-details-skeleton-key-ai-model-jailbreak/ Wed, 03 Jul 2024 17:55:54 +0000 https://www.webpronews.com/?p=605529 Microsoft is detailing a jailbreak, dubbed “Skeleton Key,” that can be used to trick an AI model into operation outside of its parameters.

AI models are designed to operate within strictly defined parameters that ensure the responses it gives are not offensive and do not cause harm. This is something AI firms have struggled with, with AI models sometimes going beyond their parameters and stirring up controversy in the process.

According to Microsoft Security, there is a newly discovered jailbreak attack—Skeleton Key—that impacts multiple AI models from various firms (hence the name).

This AI jailbreak technique works by using a multi-turn (or multiple step) strategy to cause a model to ignore its guardrails. Once guardrails are ignored, a model will not be able to determine malicious or unsanctioned requests from any other. Because of its full bypass abilities, we have named this jailbreak technique Skeleton Key.

This threat is in the jailbreak category, and therefore relies on the attacker already having legitimate access to the AI model. In bypassing safeguards, Skeleton Key allows the user to cause the model to produce ordinarily forbidden behaviors, which could range from production of harmful content to overriding its usual decision-making rules. Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do. As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.

Microsoft says it has already made a number of updates to its Copilot AI assistants and other LLM technology in an effort to mitigate the attack. The company says customers should consider the following actions to implement their own AI system design:

  • Input filtering: Azure AI Content Safety detects and blocks inputs that contain harmful or malicious intent leading to a jailbreak attack that could circumvent safeguards.
  • System message: Prompt engineering the system prompts to clearly instruct the large language model (LLM) on appropriate behavior and to provide additional safeguards. For instance, specify that any attempts to undermine the safety guardrail instructions should be prevented (read our guidance on building a system message framework here).
  • Output filtering: Azure AI Content Safety post-processing filter that identifies and prevents output generated by the model that breaches safety criteria.
  • Abuse monitoring: Deploying an AI-driven detection system trained on adversarial examples, and using content classification, abuse pattern capture, and other methods to detect and mitigate instances of recurring content and/or behaviors that suggest use of the service in a manner that may violate guardrails. As a separate AI system, it avoids being influenced by malicious instructions. Microsoft Azure OpenAI Service abuse monitoring is an example of this approach.

The company says its Azure AI tools already help customers protect against this type of attack as well:

Microsoft provides tools for customers developing their own applications on Azure. Azure AI Content Safety Prompt Shields are enabled by default for models hosted in the Azure AI model catalog as a service, and they are parameterized by a severity threshold. We recommend setting the most restrictive threshold to ensure the best protection against safety violations. These input and output filters act as a general defense not only against this particular jailbreak technique, but also a broad set of emerging techniques that attempt to generate harmful content. Azure also provides built-in tooling for model selection, prompt engineering, evaluation, and monitoring. For example, risk and safety evaluations in Azure AI Studio can assess a model and/or application for susceptibility to jailbreak attacks using synthetic adversarial datasets, while Microsoft Defender for Cloud can alert security operations teams to jailbreaks and other active threats.

With the integration of Azure AI and Microsoft Security (Microsoft Purview and Microsoft Defender for Cloud) security teams can also discover, protect, and govern these attacks. The new native integration of Microsoft Defender for Cloud with Azure OpenAI Service, enables contextual and actionable security alerts, driven by Azure AI Content Safety Prompt Shields and Microsoft Defender Threat Intelligence. Threat protection for AI workloads allows security teams to monitor their Azure OpenAI powered applications in runtime for malicious activity associated with direct and in-direct prompt injection attacks, sensitive data leaks and data poisoning, or denial of service attacks.

The Skeleton Key attack underscores the ongoing challenges facing companies as AI becomes more widely used. While it can be a valuable tool for cybersecurity, it can also open up entirely new attack vectors.

]]>
605529
Anthropic Calls For A New Way To Evaluate AI https://www.webpronews.com/anthropic-calls-for-a-new-way-to-evaluate-ai/ Wed, 03 Jul 2024 13:00:00 +0000 https://www.webpronews.com/?p=605519 Anthropic, one of the leaders in AI development, is calling for proposals to help “fund evaluations developed by third-party organizations.”

Properly evaluating AI’s potential is a growing challenge for AI firms as the technology evolves. Not only is it challenging to properly evaluate an AI’s capabilities, but there are also concerns with evaluating the various safety issues involved.

Anthropic has increasingly been setting itself apart in the AI field, not only for its powerful Claude model that is currently beating OpenAI’s GPT-4o, but also for its safety-first approach to AI. In fact, the company was founded by OpenAI executives that were concerned with the direction of OpenAI, and the company has continued to attract disillusioned OpenAI engineers. The most notable recent example is Jan Leike, who left OpenAI after the safety team he co-lead was disbanded.

With that background, it’s not surprising that Anthropic is interested in developing and discovering new and better ways to properly evaluate AI. The company outlines its highest priority areas of focus:

  • AI Safety Level assessments
  • Advanced capability and safety metrics
  • Infrastructure, tools, and methods for developing evaluations

The company outlines a number of AI Safety Levels (ASLs) that is concerned with, including cybersecurity; chemical, biological, radiological, and nuclear (CBRN) risks; model autonomy; national security risks; and misalignment risks. In all of these areas, the company is concerned with the risk that AI could be used to aid individuals in doing harm.

We’re particularly interested in capabilities that, if automated and scaled, could pose significant risks to critical infrastructure and economically valuable systems at levels approaching advanced persistent threat actors.

We’re prioritizing evaluations that assess two critical capabilities: a) the potential for models to significantly enhance the abilities of non-experts or experts in creating CBRN threats, and b) the capacity to design novel, more harmful CBRN threats.

AI systems have the potential to significantly impact national security, defense, and intelligence operations of both state and non-state actors. We’re committed to developing an early warning system to identify and assess these complex emerging risks.

Anthropic reveals a fascinating, and terrifying, observation about current AI models, what the company identifies as “misalignment risks.”

Our research shows that, under some circumstances, AI models can learn dangerous goals and motivations, retain them even after safety training, and deceive human users about actions taken in their pursuit.

The company says this represents a major danger moving forward as AI models become more advanced.

These abilities, in combination with the human-level persuasiveness and cyber capabilities of current AI models, increases our concern about the potential actions of future, more-capable models. For example, future models might be able to pursue sophisticated and hard-to-detect deception that bypasses or sabotages the security of an organization, either by causing humans to take actions they would not otherwise take or exfiltrating sensitive information.

Anthropic goes on to highlight its desire to improve evaluation methods to address bias issues, something that has been a significant challenge in training existing AI models.

Evaluations that provide sophisticated, nuanced assessments that go beyond surface-level metrics to create rigorous assessments targeting concepts like harmful biases, discrimination, over-reliance, dependence, attachment, psychological influence, economic impacts, homogenization, and other broad societal impacts.

The company also wants to ensure AI benchmarks support multiple languages, something that is not currently the case. New evaluation methods should also be able to “detect potentially harmful model outputs,” such as “attempts to automate cyber incidents.” The company also wants the new evaluation methods to better determine AI’s ability to learn, especially in the sciences.

Anthropic’s Criteria

Parties interested in submitting a proposal should keep the company’s 10 requirements in mind:

  1. Sufficiently difficult: Evaluations should be relevant for measuring the capabilities listed for levels ASL-3 or ASL-4 in our Responsible Scaling Policy, and/or human-expert level behavior.
  2. Not in the training data: Too often, evaluations end up measuring model memorization because the data is in its training set. Where possible and useful, make sure the model hasn’t seen the evaluation. This helps indicate that the evaluation is capturing behavior that generalizes beyond the training data.
  3. Efficient, scalable, ready-to-use: Evaluations should be optimized for efficient execution, leveraging automation where possible. They should be easily deployable using existing infrastructure with minimal setup.
  4. High volume where possible: All else equal, evaluations with 1,000 or 10,000 tasks or questions are preferable to those with 100. However, high-quality, low-volume evaluations are also valuable.
  5. Domain expertise: If the evaluation is about expert performance on a particular subject matter (e.g. science), make sure to use subject matter experts to develop or review the evaluation.
  6. Diversity of formats: Consider using formats that go beyond multiple choice, such as task-based evaluations (for example, seeing if code passes a test or a flag is captured in a CTF), model-graded evaluations, or human trials.
  7. Expert baselines for comparison: It is often useful to compare the model’s performance to the performance of human experts on that domain.
  8. Good documentation and reproducibility: We recommend documenting exactly how the evaluation was developed and any limitations or pitfalls it is likely to have. Use standards like the Inspect or the METR standard where possible.
  9. Start small, iterate, and scale: Start by writing just one to five questions or tasks, run a model on the evaluation, and read the model transcripts. Frequently, you’ll realize the evaluation doesn’t capture what you want to test, or it’s too easy.
  10. Realistic, safety-relevant threat modeling: Safety evaluations should ideally have the property that if a model scored highly, experts would believe that a major incident could be caused. Most of the time, when models have performed highly, experts have realized that high performance on that version of the evaluation is not sufficient to worry them.

Those interested in submitting a proposal, and possibly working long-term with Anthropic, should use this application form.

OpenAI has been criticized for for a lack of transparency that has led many to believe the company has lost its way and is no longer focused on its one-time goal of safe AI development. Anthropic’s willingness to engage the community and industry is a refreshing change of pace.

]]>
605519
Meta Changes Its Approach To AI Labels On Photos After Backlash https://www.webpronews.com/meta-changes-its-approach-to-ai-labels-on-photos-after-backlash/ Tue, 02 Jul 2024 15:10:58 +0000 https://www.webpronews.com/?p=605506 Meta announced it is changing its approach to its “Made with AI” labels on photos after it incorrectly identified photos taken by photographers as AI-generated.

Labeling AI content has become a growing concern for online platforms, as well as regulators, as AI-generated content has become so realistic that it could easily be used to create false narratives. Meta announced in April plans to label AI content with a “Made with AI” label. Unfortunately, it’s algorithm for identifying AI content had some issues, with photos taken by human photographers being improperly labeled.

The company says it has made changes to address the issue.

We want people to know when they see posts that have been made with AI. Earlier this year, we announced a new approach for labeling AI-generated content. An important part of this approach relies on industry standard indicators that other companies include in content created using their tools, which help us assess whether something is created using AI.

Like others across the industry, we’ve found that our labels based on these indicators weren’t always aligned with people’s expectations and didn’t always provide enough context. For example, some content that included minor modifications using AI, such as retouching tools, included industry standard indicators that were then labeled “Made with AI.” While we work with companies across the industry to improve the process so our labeling approach better matches our intent, we’re updating the “Made with AI” label to “AI info” across our apps, which people can click for more information.

According to CNET, photographer Pete Souza said cropping tools appear to be one of the culprits. Because such tools add information to images, it seems that Meta’s algorithm was incorrectly identifying that added information and taking it as an indication the images were AI-generated.

The entire issue demonstrates the growing challenges associated with correctly identifying AI-generated content. For years, experts have warned about the potential havoc deepfakes could cause, impacting everything from people’s personal lives to business to politics.

Interestingly, OpenAI shuttered its own AI-content detection tool in early 2024, saying at the time that such tools don’t work:

While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.

It remains to be seen if Meta will be able to reliably identify AI-generated images, or if it will suffer the same issues that led OpenAI to throw in the towel.

]]>
605506