Welcome to The AI Track's comprehensive monthly roundup of the latest AI news!
Each month, we compile significant news, trends, and happenings in AI, providing detailed summaries with key points in bullet form for concise yet complete understanding.
This page features AI News for July 2024. At the end, you will find links to our archives for previous months
[30 July] Canva Acquired Leonardo AI to Boost Its Generative AI Efforts
Canva has acquired Leonardo AI to enhance its generative AI capabilities, aiming to provide more advanced design tools and improve user experiences.
Key Points:
- Acquisition Details: Canva’s acquisition of Leonardo AI marks its second acquisition of 2024, following its purchase of U.K. design company Affinity for an estimated $380 million.
- Generative AI Integration: Leonardo AI, an Australian-based generative AI platform, will be integrated into Canva’s design tools, offering users enhanced capabilities for creating AI-generated art and designs.
- Strategic Goals: This acquisition aims to bolster Canva’s AI offerings, allowing it to compete more effectively with other design platforms and providing users with innovative tools for content creation.
- Company Expansion: This acquisition is part of Canva’s broader strategy to expand its AI capabilities, which also includes previous acquisitions of companies like Zeetings, Pixabay, Pexels, and Smartmockups.
Why This Matters: The integration of Leonardo AI into Canva’s platform exemplifies the growing importance of generative AI in design and content creation. This move is expected to enhance Canva’s competitive edge, providing users with advanced tools for creative projects and furthering the democratization of design capabilities.
[30 July] Google Photos AI editing tools are now available to all users
Google Photos advanced AI editing tools, including Magic Editor, Magic Eraser, Photo Unblur, and Portrait Light, are now available to all users without a subscription, enabling enhanced photo editing capabilities.
Key Points:
- Layered Edits: Combine AI and traditional tools for optimal results.
- Tool Specificity: Use Magic Eraser for quick fixes and Magic Editor for complex edits.
- Selection Methods: Tap, brush, or circle to select items for editing.
- Strength Adjustment: Fine-tune the intensity of AI effects with a strength slider.
Why This Matters: These AI tools make advanced photo editing accessible to all users, enhancing creativity and improving photo quality effortlessly.
[30 July] OpenAI has launched an experimental model, GPT-4o Long Output
OpenAI has launched an experimental GPT-4 model, known as GPT-4O, with 16 times the token capacity of the standard model. This enhancement allows for significantly longer outputs, enabling more complex and detailed text generation.
Key Points:
- Expanded Token Capacity: GPT-4O supports up to 16 times more tokens than its predecessors, allowing for more extensive and nuanced outputs.
- Applications: This model is particularly beneficial for tasks requiring long-form content, such as detailed reports, extensive dialogues, and complex programming code.
- Experimental Nature: The model is currently in the experimental phase, with OpenAI testing its capabilities and potential applications.
Why This Matters: GPT-4O’s increased capacity enhances AI’s ability to generate comprehensive and detailed content, pushing the boundaries of what language models can achieve in various fields, from creative writing to technical documentation.
[29 July] Meta introduced Segment Anything Model 2 (SAM 2), for real-time segmentation of objects in images and videos
Meta’s new AI model, Segment Anything Model 2 (SAM 2), offers real-time, precise segmentation of objects in both images and videos, advancing applications in video editing and mixed reality.
Key Points:
- SAM 2 can segment objects and track them across video frames.
- Reduces operation time by up to one-third compared to previous models.
- Demonstrated applications in fields like marine science, disaster relief, and medicine.
- Meta provides a demo and open-sourced SAM 2 under the Apache 2.0 license.
Why This Matters: SAM 2’s capabilities enhance various industries by improving object tracking and segmentation in real-time video, promoting technological advancements.
[28 July] Apple's introduction of new AI features to be delayed
Apple’s introduction of new AI features has been delayed, according to Bloomberg News. The delays affect updates to key products and services, pushing back the planned advancements in artificial intelligence integration within Apple’s ecosystem.
Key Points:
- AI feature updates for Apple’s products and services are postponed.
- The delay impacts the timeline for enhancements across Apple’s ecosystem.
Why This Matters: These delays could affect Apple’s competitive position in the rapidly evolving AI landscape, impacting user experience and innovation pace.
[26 July] AI is set to revolutionize the 2024 Paris Olympics
The 2024 Paris Olympics will prominently feature AI technologies, enhancing both athlete performance and spectator experiences while navigating various challenges.
Key Points:
- Athlete Performance and Training: AI technologies like Intel’s 3DAT provide biomechanical insights to optimize athlete training, including custom-designed gear and nutrition plans.
- Refereeing and Real-time Data: AI aids in sports officiating, though real-time analysis and sport-specific challenges remain hurdles. Transparency in AI decisions is crucial for acceptance.
- Viewer Experience: AI enhances broadcasting, providing personalized highlights and detailed statistics, improving engagement for viewers.
Why This Matters: AI’s integration into the Olympics exemplifies its potential to transform sports, from training and performance to officiating and viewer engagement. This evolution highlights the broader implications of AI in enhancing efficiency, fairness, and enjoyment in global sporting events.
[25 July] OpenAI announces SearchGPT: A Competitive Edge in AI-Powered Search
OpenAI’s new SearchGPT prototype aims to revolutionize web searches by combining AI capabilities with real-time web information to provide fast, accurate answers with clear source attribution.
Key Points:
- Purpose and Launch: SearchGPT is designed to enhance web search by integrating AI with real-time web data, initially launching to a select group for feedback.
- User Experience: It offers immediate answers with up-to-date information and clear source links, allowing for conversational follow-up questions.
- Publisher Collaboration: OpenAI partners with publishers to ensure quality content and clear attribution, aiming to support a thriving ecosystem for publishers and creators.
- Future Integration: Feedback will shape future enhancements, potentially integrating the best features into ChatGPT.
Why This Matters: SearchGPT’s approach promises to streamline online searches, making information retrieval faster and more reliable while supporting content creators and maintaining high-quality standards.
[25 July] Google's AI has achieved a silver medal equivalent at the International Mathematical Olympiad (IMO)
Google DeepMind’s AI systems, AlphaProof and AlphaGeometry 2, achieved a performance equivalent to a silver medal in the International Mathematical Olympiad (IMO) by solving four out of six problems. This marks a significant milestone in AI’s ability to tackle complex mathematical problems.
Key Points:
- AI Performance: AlphaProof and AlphaGeometry 2 solved four out of six IMO problems, earning 28 out of 42 points, just shy of the gold medal threshold (29 points).
- Problem Types Solved: AlphaProof tackled two algebra problems and one number theory problem, while AlphaGeometry 2 solved the geometry problem.
- Speed and Process: The AI solved some problems within minutes and others in up to three days. Problems were translated into formal mathematical language for the AI to process.
- Historical Success: AlphaGeometry 2 improved from solving 53% to 83% of historical IMO geometry problems over the past 25 years.
- Human Involvement: Prominent mathematicians scored the AI’s solutions. Humans translated problems into the formal language Lean before the AI’s processing.
- Expert Commentary: Sir Timothy Gowers noted the AI’s achievement but highlighted that it required significantly more time and faster processing than human competitors. He suggested the AI hasn’t “solved mathematics” but has the potential to become a valuable research tool.
Why This Matters: The achievement demonstrates AI’s growing capabilities in complex problem-solving and its potential to support mathematical research. However, human expertise and intervention remain crucial in translating problems and interpreting results.
[24 July] Mistral AI announces the release of Mistral Large 2
Mistral AI announces the release of Mistral Large 2, a highly capable AI model with enhanced performance in code generation, mathematics, multilingual support, and advanced function calling, setting new standards in cost efficiency and performance.
Key Points:
- Capabilities: Supports 128k context window, dozens of languages, and 80+ coding languages.
- Performance: Achieves 84% accuracy on MMLU and excels in code and reasoning benchmarks.
- Features: Improved instruction-following, reduced hallucinations, and better multilingual performance.
- Availability: Accessible via la Plateforme, Google Cloud, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai.
[23 July] Meta Launched Llama 3.1 - the Largest-ever Open-Source AI Model
Meta has released Llama 3.1, an open-source AI model with 405 billion parameters, claiming it surpasses top models like GPT-4o and Claude 3.5 Sonnet. This model is designed to improve general knowledge, math, multilingual translation, and more, supporting the development of custom AI applications.
Key Points:
- Advanced Capabilities: Llama 3.1 offers enhanced context length, multilingual support, and state-of-the-art performance.
- Open Source: Meta emphasizes open-source collaboration, similar to the success of Linux.
- Widespread Use: Available on platforms like WhatsApp and integrated into Meta’s apps, aiming to become the most-used AI assistant.
Why This Matters: Llama 3.1 democratizes access to advanced AI, fostering innovation and ensuring broader AI adoption across industries.
[23 July] Tesla Announces High Production of Optimus Robots for Consumer Use by 2026
Elon Musk announced that Tesla’s Optimus humanoid robot, initially planned for internal use in 2024, will hopefully be ready for broader deployment by 2026.
Key Points:
- Development Timeline: Originally announced in 2021, Optimus is set to be deployed in Tesla factories by 2024, with mass production expected by 2026.
- Capabilities and Design: The robot is intended to assist on production lines, weighing 56 kg and standing 170 cm tall, and aims to be cost-effective, priced under $20,000.
- Strategic Goals: Tesla aims to mass-produce these robots for various industrial applications, enhancing automation and efficiency within its operations.
Why This Matters: The deployment of Optimus could significantly advance automation in manufacturing, potentially reducing labor costs and increasing production efficiency. This aligns with broader trends in industrial automation and AI-driven productivity enhancements.
[22 July] Nvidia to Launch Chinese Version of Flagship AI Chip Amid US Export Restrictions
Nvidia is developing a version of its flagship AI chip, the “B20,” for the Chinese market, ensuring compliance with U.S. export regulations.
Key Points:
- Blackwell Chip Series: Nvidia’s new “Blackwell” chips, unveiled in March, feature significant advancements, with the B200 being 30 times faster than its predecessor.
- B20 Chip for China: Nvidia plans to release the “B20” chip in China, collaborating with Inspur, with shipments expected to begin in Q2 2025.
- US Export Controls: Stricter U.S. export controls on advanced semiconductors aim to prevent Chinese military advancements, leading Nvidia to tailor products for compliance.
- Revenue Impact: Nvidia’s revenue from China dropped to 17% from 26% due to sanctions, but the company aims to regain market share with the new chip.
- H20 Chip Performance: Following initial pricing struggles, Nvidia’s H20 chip sales surged in China, now expected to surpass 1 million units this year, valued over $12 billion.
- Future Restrictions: The U.S. is likely to further restrict semiconductor exports, potentially banning the H20 chip in China. Nvidia’s development of the B20 aims to mitigate these impacts.
Why This Matters: Nvidia’s adaptation to export controls highlights the geopolitical impact on technology markets. The development of the B20 chip reflects the company’s strategic efforts to maintain its presence in China while navigating international regulations. This move is crucial for Nvidia to counteract the competitive pressure from Chinese firms and sustain its revenue streams amidst tightening global trade policies.
[22 July] Generative AI startup Cohere, has raised $500 million in new funding, bringing its valuation to $5.5 billion
Cohere, a generative AI startup, has raised $500 million in new funding, bringing its valuation to $5.5 billion. The funding round includes contributions from major tech companies such as Cisco, AMD, and Fujitsu. This investment aims to bolster Cohere’s capabilities in developing advanced AI models and expanding its market presence. The significant financial backing highlights the growing competition in the generative AI sector.
[18 July] Tech Giants Form Coalition for Secure AI (CoSAI)
Google, OpenAI, Microsoft, Amazon, Nvidia, Intel, and other AI leaders have formed the Coalition for Secure AI (CoSAI) to unify and enhance AI security practices through collaborative efforts.
Key Points:
- Founding Members: Leading AI companies, including Google, OpenAI, Microsoft, Amazon, Nvidia, Intel, IBM, PayPal, Cisco, and Anthropic.
- Objective:
- Address Fragmented AI Security Landscape: CoSAI aims to tackle the disjointed nature of current AI security measures by offering accessible open-source methodologies, frameworks, and tools.
- Promote Standardization: By standardizing AI security practices, CoSAI seeks to create a cohesive approach to AI development that organizations of all sizes can adopt, fostering a secure-by-design philosophy across the industry.
- Operational Structure:
- Under OASIS: CoSAI will operate under the Organization for the Advancement of Structured Information Standards (OASIS), leveraging its established infrastructure to promote and enforce standardized security practices.
- Collaborative Platform: The coalition will serve as a collaborative platform where members can share best practices, conduct AI security research, and develop open-source solutions that enhance the security and reliability of AI systems.
[18 July] Meta Withholds Multimodal AI Model from the EU Due to Regulatory Uncertainty
Meta will not release its upcoming multimodal AI model in the European Union, citing unpredictable regulatory conditions.
Key Points:
- Meta’s new multimodal AI model, capable of processing video, audio, images, and text, will not be available in the EU.
- The decision is driven by unclear regulations on data protection under the EU’s GDPR and the pending AI Act.
- Meta had previously notified EU regulators about its plans to use public social media data for model training.
- European companies and non-EU companies offering services in Europe will be affected.
- A text-only version of Meta’s Llama 3 model will still be released in the EU.
- Similar actions have been taken by Apple, withholding AI features from the EU market.
- The UK, despite having GDPR-like laws, will receive the new multimodal AI model due to clearer regulatory guidance.
[18 July] OpenAI Unveils Affordable and Efficient GPT-4o Mini Model
OpenAI has introduced the GPT-4O Mini, a cost-effective and smaller version of its advanced AI model, designed to make powerful AI more accessible.
Key Points:
- Affordable AI: The GPT-4O Mini offers a more budget-friendly option for users while maintaining robust AI capabilities.
- Efficiency: This model is optimized for performance, delivering high-quality results with lower computational requirements.
- Target Users: Aimed at small businesses, developers, and educational institutions, the GPT-4O Mini provides an accessible entry point into advanced AI technology.
- Applications: Suitable for various applications, including natural language processing, customer support automation, and educational tools.
Why This Matters: The introduction of GPT-4O Mini democratizes access to cutting-edge AI, enabling more organizations and individuals to leverage AI technology for innovative solutions and enhanced productivity.
[18 July] OpenAI is partnering with Broadcom to develop a new AI chip
OpenAI is in discussions with Broadcom to develop a new AI chip, aiming to enhance the efficiency and capabilities of its AI models amid increasing competition and demand for advanced AI hardware.
Key Points:
- Objective: OpenAI seeks to create a specialized AI chip to improve the performance and reduce the operational costs of training and deploying large AI models like ChatGPT.
- Partnership: The collaboration with Broadcom, a major semiconductor manufacturer, could provide OpenAI with the necessary hardware advancements to maintain its competitive edge in the AI industry.
- Strategic Move: This development aligns with OpenAI’s strategy to manage the growing computational demands of its AI systems and to reduce dependency on third-party hardware suppliers.
Why This Matters: Developing a dedicated AI chip can significantly enhance the efficiency of AI operations, reduce costs, and support the development of more powerful and scalable AI models. This move highlights the ongoing evolution in AI infrastructure and the critical role of custom hardware in advancing AI capabilities.
[17 July] In a bid to curb China's tech advancements, the U.S. is weighing tougher trade rules on the chip industry
The U.S. is considering imposing stricter trade rules on companies as part of its ongoing efforts to limit China’s access to advanced semiconductor technology, reflecting heightened tensions in the tech competition between the two countries.
Key Points:
- Stricter Trade Rules: The U.S. is evaluating tougher regulations to prevent Chinese companies from obtaining cutting-edge semiconductor technologies.
- Strategic Response: This move is part of a broader strategy to curb China’s technological advancements and maintain U.S. leadership in critical tech sectors.
- Impact on Companies: The potential regulations could affect global semiconductor supply chains and the operations of multinational companies involved in the chip industry.
- Ongoing Competition: This step underscores the escalating tech rivalry between Washington and Beijing, particularly in the realm of AI and semiconductor technologies.
Why This Matters: The introduction of tougher trade rules could have significant implications for the global tech industry, affecting supply chains, market dynamics, and international relations.
[17 July] Microsoft Releases AI-Powered Designer App for Multiple Platforms
Microsoft has officially launched its AI-powered design tool, Designer, for Windows, iOS, and Android, making it widely accessible for creating and editing visuals using AI technology.
Key Points:
- App Availability: Microsoft Designer is now available on web, Windows, iOS, and Android, supporting over 80 languages.
- Features: The app offers templates for various image types and allows users to create or edit visuals using AI. Integration with OpenAI’s DALL-E enhances its capabilities.
- Integration with Microsoft Services: Designer works seamlessly with Microsoft Word and PowerPoint via Copilot AI, leveraging tools like ChatGPT 4 Turbo for enhanced functionality.
- Competitive Edge: Positioned as a competitor to Canva, Designer offers similar user experience and features, with added AI-powered options.
Why This Matters: The release of Microsoft Designer underscores the growing influence of AI in creative tools, providing users with powerful, accessible options for visual content creation across multiple platforms. This move enhances productivity and democratizes design capabilities.
[17 July] OpenAI has introduced "Prover-Verifier Games"
OpenAI has introduced “Prover-Verifier Games” to improve the legibility of AI-generated text. By training models to produce text that weaker models can verify, this method enhances human evaluators’ ability to assess AI outputs effectively.
Key Points:
- Optimization: Text is made clearer by requiring strong models to generate solutions verifiable by weaker models.
- Prover-Verifier Method: Involves a “prover” creating solutions and a “verifier” checking accuracy, improving clarity and correctness.
- Application: This approach balances performance with legibility, benefiting fields requiring precise communication.
Why This Matters: Enhancing the legibility of AI outputs can build trust and facilitate the broader adoption of AI in critical areas.
[17 July] Fei-Fei Li builds a $1 billion startup named "xAI" in just four months
Fei-Fei Li, a prominent AI figure, has quickly built a $1 billion startup named “xAI” in just four months, leveraging her extensive expertise and network in artificial intelligence.
Key Points:
- Founder: Fei-Fei Li, known as the “Godmother of AI.”
- Startup: Named xAI, focusing on cutting-edge AI research and applications.
- Funding: Achieved a $1 billion valuation in just four months.
- Objective: To push the boundaries of AI technology and its applications.
Why This Matters: The rapid success of xAI underscores the growing investment and interest in AI technologies and highlights Fei-Fei Li’s influence and capability in the field.
[16 July] Former OpenAI and Tesla engineer Andrej Karpathy starts an AI education platform
Former OpenAI and Tesla engineer Andrej Karpathy has launched an AI education platform aimed at democratizing AI learning and making advanced AI tools and knowledge accessible to a broader audience.
Key Points:
- Founder: Andrej Karpathy, a prominent AI engineer with experience at OpenAI and Tesla.
- Platform Goal: To provide accessible AI education and resources to a wide range of users, from beginners to advanced practitioners.
- Features: The platform will include interactive tutorials, hands-on projects, and access to cutting-edge AI tools and models.
- Impact: Aims to bridge the knowledge gap in AI, fostering a more inclusive and educated community of AI developers and enthusiasts.
Why This Matters: Making AI education more accessible can accelerate innovation, empower a diverse range of individuals, and contribute to the broader adoption and understanding of AI technologies.
[12 July] Samsung Unveils AI Innovations at Galaxy Unpacked 2024
At Galaxy Unpacked 2024, Samsung highlighted its commitment to collaborative and responsible AI innovation through expert panel discussions and new product releases, emphasizing the future impact of mobile AI.
Key Points:
- Product Launches: Samsung introduced Galaxy Z Fold6, Galaxy Z Flip6, Galaxy Watch Ultra, Galaxy Watch7, Galaxy Ring, and Galaxy Buds3 series, expanding its AI ecosystem.
- Expert Panel: Industry leaders from Samsung, Google, Qualcomm, and OECD discussed the future of mobile AI, stressing a human-centric approach and responsible AI development.
- Research Insights: A global study with Goldsmiths’ Institute of Management Studies revealed that frequent AI users report higher quality of life and enhanced creativity.
- Collaborative Innovation: Samsung’s hybrid AI approach combines on-device and cloud-based AI, developed in partnership with Google and Qualcomm.
- AI Integration: Google’s Gemini powers new features like ‘Circle to Search’, allowing users to search by circling words, images, or videos. Additional AI tools include voice recording transcription, translation, and summarization, a PDF translation tool, and a tone-reflecting writing assistant.
- Privacy and Inclusivity: Emphasis on privacy, fairness, transparency, and accountability in AI development, with features like the Galaxy AI Dashboard allowing user control over data usage.
- Additional Features: The Galaxy Watch Ultra and Watch7 include advanced health tracking capabilities, while the Galaxy Ring offers continuous wellness monitoring.
Why This Matters: Samsung’s commitment to responsible and collaborative AI innovation highlights the importance of balancing technological advancements with ethical considerations and user privacy, setting a standard for future AI developments in the mobile industry. This approach ensures that AI serves to enhance user experiences while safeguarding their data and well-being.
[11 July] OpenAI Develops System to Measure AI Progress Towards Human-Level Intelligence
OpenAI has implemented a five-tier system to evaluate and track the development of its AI models, aiming to reach human-level Artificial General Intelligence (AGI).
Key Points:
- Five-Tier AGI Ranking System: OpenAI’s new internal classification system categorizes AI development into five levels, from chatbots to full-fledged AGI.
- Level 1: Chatbots: Current AI capabilities, similar to ChatGPT.
- Level 2: Reasoners: AI capable of PhD-level problem-solving.
- Level 3: Agents: AI systems performing multi-day tasks.
- Level 4: Innovators: AI akin to inventors like Thomas Edison.
- Level 5: Organizations: AI that can manage an entire company.
- Current Progress: OpenAI believes it is on the cusp of Level 2, with ongoing research projects demonstrating human-like reasoning capabilities.
- Potential Impact of AGI: Full AGI (Level 5) could perform any human task and potentially replace entire companies, from executives to operational roles.
- Comparison with Competitors: Google DeepMind has its own AGI scale, which measures AI capabilities against the percentiles of skilled adults.
- Speculation and Future Outlook: The roadmap indicates that future OpenAI products will progressively advance through these levels, leading to increasingly sophisticated AI applications.
Why This Matters: OpenAI’s structured approach to tracking AI development provides a transparent framework for understanding progress towards AGI. This is crucial for managing the ethical, technical, and societal impacts of increasingly powerful AI systems, ensuring responsible development and fostering public trust.
BEST ALTERNATIVES
[10 July] AWS Launches App Studio for Easy and Fast Enterprise App Development
AWS has introduced App Studio, a generative AI-powered service that simplifies the creation of enterprise-grade applications using natural language prompts.
Key Points:
- Generative AI Integration: Users can describe their desired application, and App Studio generates it within minutes, bypassing the need for extensive coding.
- Target Users: Designed for technical professionals without deep software development skills, such as IT project managers and data engineers.
- Ease of Use: Features a point-and-click interface and provides AI-driven guidance for modifications.
- Scalability and Security: Ensures applications are secure, scalable, and compliant with company policies.
- Cost Efficiency: Free to build with, customers only pay for usage time, potentially saving up to 80% compared to other low-code tools.
Why This Matters: AWS App Studio democratizes app development, enabling a broader range of professionals to create custom, secure, and scalable applications. This tool can significantly enhance productivity and innovation by reducing dependency on specialized developers and allowing rapid deployment of tailored business solutions.
BEST ALTERNATIVES
[10 July] Microsoft Withdraws Observer Seat on OpenAI Board Amid Regulatory Scrutiny
Microsoft has relinquished its observer seat on OpenAI’s board, citing confidence in OpenAI’s progress, amidst speculation that regulatory scrutiny prompted the decision.
Key Points:
- Reason for Withdrawal: Microsoft attributed the decision to OpenAI’s significant board progress, deeming their observer role unnecessary.
- Regulatory Pressures: Reports suggest growing regulatory scrutiny, particularly from the FTC and CMA, concerning Microsoft’s $13B partnership with OpenAI, influenced the decision.
- No New Observers: OpenAI confirmed there will be no additional observer seats, countering rumors about Apple.
- Apple’s Stance: Apple, rumored to take a similar observer position, has not made any official move or statement regarding joining OpenAI’s board.
Why This Matters: The withdrawal underscores the tension between strategic partnerships and regulatory compliance, emphasizing the importance of maintaining competitive fairness and transparency in the tech industry.
[9 July] Microsoft has developed a groundbreaking new AI speech generation system called VALL-E 2
Microsoft has developed a groundbreaking new AI speech generation system called VALL-E 2 that has reached a major milestone – achieving human parity in zero-shot text-to-speech synthesis. However, the researchers behind VALL-E 2 have decided not to release it to the public due to concerns over potential misuse.
Key Innovations in VALL-E 2
VALL-E 2 builds upon the previous VALL-E model and introduces two key innovations that enable its human-level performance:
- Repetition Aware Sampling: This method dynamically adapts the selection of audio codes during the decoding process to better handle repetitions in the output speech. It switches between “Nucleus Sampling” (focusing on the most likely codes) and random sampling to improve the stability and avoid issues like infinite loops.
- Grouped Code Modeling: VALL-E 2 groups multiple consecutive audio codes together and processes them as a single “frame”. This reduces the input sequence length for the language model, speeding up processing. It also helps manage the challenges of modeling very long sequences of audio.
Impressive Performance, But Too Dangerous to Release
Experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 significantly outperforms previous zero-shot text-to-speech systems in terms of speech robustness, naturalness, and speaker similarity. In fact, it is the first system to achieve human parity on these benchmarks.
With just 3-second voice samples, VALL-E 2 can generate highly realistic and natural-sounding speech that is virtually indistinguishable from the original speaker. Even with longer 10-second samples, the quality and fidelity of the synthesized speech is further improved.
However, the researchers acknowledge that VALL-E 2 carries significant risks of misuse, such as spoofing voice identification or impersonating specific speakers without their consent. As a result, Microsoft has decided not to release VALL-E 2 to the public, keeping it as a research project for now.
Potential Applications and the Need for Safeguards
The researchers believe VALL-E 2 could have many beneficial applications, such as in education, entertainment, accessibility features, translation, and chatbots. However, they emphasize that if the model is deployed in the real world, it should include a protocol to ensure speaker consent and a method for detecting synthesized speech.
Overall, VALL-E 2 represents a major breakthrough in AI-powered speech generation, but its potential for misuse has led Microsoft to withhold its public release for now. The development of robust safeguards and consent protocols will be crucial before such powerful voice cloning technology can be responsibly deployed.
[9 July] OpenAI and Thrive Global Launch Thrive AI Health for Personalized Health Coaching
OpenAI and Thrive Global, founded by Arianna Huffington, have launched Thrive AI Health, a new company focused on providing AI-driven personalized health coaching to improve health outcomes and reduce chronic disease.
Key Points:
- Company Formation: Thrive AI Health is a collaboration between OpenAI Startup Fund and Thrive Global, aiming to expand access to personalized health coaching through generative AI.
- Health Focus: The AI-enabled coach will target five areas: connection, sleep, fitness, stress management, and nutrition.
- Technology Integration: The coach will use OpenAI’s technology and Thrive Global’s behavioral change methods, incorporating peer-reviewed science, user preferences, and biometric data.
- Leadership and Investment: DeCarlos Love, former Google product leader, will serve as CEO. The Alice L. Walton Foundation is a strategic investor, with additional partnerships from prominent medical institutions.
- Challenges Ahead: The initiative must navigate potential business, technical, and regulatory challenges, particularly around user privacy and data security.
- Market Impact: Thrive AI Health aims to improve health outcomes, reduce healthcare costs, and address chronic diseases through proactive, data-driven coaching. The U.S. alone faces a significant burden from chronic illnesses, costing healthcare services approximately $3.6 trillion annually.
Why This Matters: The launch of Thrive AI Health represents a significant advancement in leveraging AI for health improvement. By combining AI technology with behavioral science, the initiative aims to make personalized health coaching accessible, fostering sustainable behavioral changes and enhancing overall well-being. This move also highlights the importance of addressing privacy concerns and regulatory challenges to ensure the success and acceptance of AI-driven health solutions.
[2 July] Meta's 3D Gen (3DGen) introduces a state-of-the-art, fast pipeline for text-to-3D asset generation
Meta’s 3D Gen (3DGen) introduces a state-of-the-art, fast pipeline for text-to-3D asset generation, offering high prompt fidelity and quality in under a minute, with support for physically-based rendering and generative retexturing.
Key Points:
- High Fidelity: Generates high-quality 3D shapes and textures quickly.
- Advanced Techniques: Integrates Meta 3D AssetGen and Meta 3D TextureGen.
- Versatility: Represents 3D objects in view, volumetric, and UV spaces.
- Performance: Outperforms industry baselines in prompt fidelity and visual quality.
Why This Matters: 3DGen enhances the creation of realistic 3D assets, advancing applications in gaming, virtual reality, and design.
[2 July] Apple will gain an observer role on OpenAI's board
Apple will gain an observer role on OpenAI’s board as part of their new AI partnership, allowing Apple to integrate ChatGPT into its devices and gain insights into OpenAI’s operations.
Key Points:
- Representation: Phil Schiller, Apple’s App Store Chief, will represent Apple at OpenAI board meetings without voting rights.
- Background: Schiller, awarded the ‘Apple Fellow’ title in 2020, is chosen for his significant contributions.
- Implementation: The arrangement starts later this year, aligning with the integration of ChatGPT into iOS and macOS.
- Industry Context: This move follows Microsoft’s similar observer role on OpenAI’s board, potentially creating strategic tensions.
Why This Matters: This collaboration could enhance AI functionalities in Apple products and influence competitive dynamics in the tech industry.
[1 July] Apple launched 4M AI model, in partnership with EPFL on Hugging Face
Apple’s 4M AI model, launched in partnership with EPFL and available on Hugging Face, allows users to create images, perform object detection, and manipulate 3D scenes via natural language. This release, aligned with Apple’s strong market performance, marks a strategic shift toward openness in AI development.
Key Points:
- Capabilities: Text-to-image generation, object detection, 3D scene manipulation.
- Market Impact: Apple’s shares surged 24% since May 1st, adding over $600 billion in value.
- Future Applications: Enhanced Siri and automated video content creation.
- Ethical Considerations: Emphasis on maintaining privacy while utilizing data-intensive AI.
Why This Matters: Apple’s initiative demonstrates a commitment to leading AI innovation and fostering a developer ecosystem, balancing cutting-edge research with user privacy.
[1 July] ElevenLabs launches a tool to clone iconic voices
ElevenLabs has partnered with the estates of iconic stars like Judy Garland, James Dean, Burt Reynolds, and Sir Laurence Olivier to bring their voices to the Reader App, allowing users to listen to digital texts with these legendary voices.
Key Points:
- Partnerships: Collaborations with estates of late stars to recreate their voices.
- Reader App: Converts articles, PDFs, eBooks, and more into voiceovers.
- Exclusive Content: Iconic voices available only through the app, not for broader content creation.
- Legacy and Accessibility: Enhances content accessibility while honoring the legacies of celebrated actors.
Why This Matters: This initiative enhances content accessibility and provides a unique way to experience digital texts, preserving and celebrating the legacies of iconic actors.