Warsaw.AI News 1-7.07.2024
Hello AI Enthusiasts!
We invite you to check the AI news that we found for you in the week of 1-7.07.2024:
A Discussion of AI Bias - The article discusses how various AI models, including generative AI and LLMs, exhibit biases, analyzing examples from Playground AI and other models that incorrectly alter people's features based on stereotypes. The author emphasizes that AI bias is a deeply rooted issue and challenging to fix, despite increasing awareness and numerous discussions on the topic.
Scientists Create Robot Controlled by Blob of Human Brain Cells - Chinese researchers have developed a robot controlled by brain organoids made from human stem cells, enabling the study of brain-computer interfaces and potential future applications in brain damage repair. The organoids control the robot via a neural interface, offering new possibilities in neurobiology and robotics.
Segment Anything without Supervision (UnSAM): A New Image Segmentation Method - UnSAM, developed by UC Berkeley, uses a "divide and conquer" strategy to generate hierarchical image segmentation structures without manual labeling. It outperforms previous unsupervised segmentation achievements by 11% in AR tests and can enhance supervised models by integrating pseudo masks with real data.
Odd-One-Out: Anomaly Detection by Comparing with Neighbors - The paper presents a method for anomaly detection in scenes by identifying "odd-looking" objects through comparison with other objects in the same scene. The method generates 3D object-centric representations and compares them to detect anomalies, tested on new benchmarks ToysAD-8K and PartsAD-15K.
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems - LLMs and RAG systems can now handle millions of input tokens, but evaluating their output quality on long-context tasks remains difficult, with simpler tasks like Needle-in-a-Haystack lacking complexity. This study proposes using summarization for evaluation, introducing the "Summary of a Haystack" (SummHay) task, which requires systems to process synthesized document collections (Haystacks) and generate summaries that accurately identify relevant insights and cite source documents. Automatic evaluation of SummHay, based on Coverage and Citation, reveals that even top systems like GPT-4o and Claude 3 Opus score significantly below human performance, highlighting SummHay as a challenging benchmark for current models and a tool for studying enterprise RAG systems and position bias in long-context models.
Meta 3D Gen - The article introduce Meta 3D Gen (3DGen), a state-of-the-art, rapid pipeline for text-to-3D asset generation that creates high-quality 3D shapes and textures with high prompt fidelity in under a minute. Supporting physically-based rendering (PBR) and generative retexturing, 3DGen combines Meta 3D AssetGen and Meta 3D TextureGen to represent 3D objects in view space, volumetric space, and UV space, achieving a 68% win rate over single-stage models and outperforming industry baselines in both prompt fidelity and visual quality for complex textual prompts.
A new initiative for developing third-party model evaluations - A robust third-party evaluation ecosystem is crucial for assessing AI capabilities and risks, yet the current landscape is inadequate and struggling to meet growing demands. To address this, this blog post introduces an initiative to fund third-party organizations in developing high-quality, safety-relevant evaluations to measure advanced AI model capabilities, aiming to enhance AI safety and provide valuable tools for the entire ecosystem.
RouteLLM: Learning to Route LLMs with Preference Data - LLMs offer impressive capabilities but choosing between models often involves a trade-off between performance and cost. To address this, the authors propose efficient router models that dynamically select between stronger and weaker LLMs during inference, optimizing the balance between cost and response quality. The training framework for these routers uses human preference data and data augmentation techniques, significantly reducing costs—by over two times in some cases—without compromising response quality. Furthermore, these routers exhibit strong transfer learning capabilities, maintaining performance even when the strong and weak models are changed at test time, providing a cost-effective yet high-performance solution for deploying LLMs.
BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream
BeNeRF is a method for reconstructing neural radiance fields (NeRF) from a single blurry image and its corresponding event stream, using a camera modeled with a cubic B-Spline in SE(3) space. The method enables obtaining sharp images from blurry data, eliminating the need for pre-computed camera poses.AXIAL: Attention-based eXplainability for Interpretable Alzheimer’s Localized Diagnosis using 2D CNNs on 3D MRI brain scans - Accurate early diagnosis of Alzheimer’s disease (AD) is challenging, but this study introduces a 2D CNN-based method for 3D MRI classification that enhances model explainability through a soft attention mechanism, producing voxel-level attention maps. Using the ADNI dataset, the method significantly outperforms state-of-the-art techniques in distinguishing AD from cognitive normal (CN) and stable from progressive mild cognitive impairment (MCI), while consistently highlighting clinically relevant brain regions like the hippocampus and amygdala with high precision and robustness.
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention - MInference 1.0 accelerates pre-filling for long-context LLMs using dynamic sparse attention patterns for efficient GPU computation. This technique significantly reduces processing time while maintaining accuracy and has been tested on various tasks, achieving up to 10x faster pre-filling.
https://github.com/adithya-s-k/omniparse - OmniParse is a platform that ingests and parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Works with documents, tables, images, videos, audio files, or web pages.
https://github.com/codeintegrity-ai/mutahunter - Mutation tests are often used by large companies to inject some small modifications into the code and look at the outcome of such changes. Mutahunter uses an LLM to achieve this unlocking new possibilities in testing.
https://github.com/lostxine/llara - Code accompanying the paper: "LLaRA: Supercharging Robot Learning Data for Vision-Language Policy" which formulates the robots actions as conversations using an LLM.
https://github.com/InternLM/InternLM - A series of new LLM models with great features like outstanding reasoning capability: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B, 1M context window and great tool usage capabilities.
https://github.com/mindsdb/mindsdb - Platform for customizing AI from enterprise data. Offers possibilities to create, serve, and fine-tune models in real-time from an own database, vector store, and application data.
https://github.com/microsoft/DeepSpeed - DeepSpeed helps with many aspects of LLM training and inference featuring lots of useful components like ZeRO, 3D-Parallelism, DeepSpeed-MoE, ZeRO-Infinity, etc. Many popular models were trained using DeepSpeed.
https://github.com/Sinaptik-AI/pandas-ai - PandasAI helps non-technical users to ask questions about their data using natural language queries and technical users with easier interaction with Pandas. Can be accessed through Jupyter notebook or by a self-hosted app.
GraphRAG: New Tool for Advanced Data Discovery - GraphRAG is Microsoft's innovative tool for generating answers to complex data queries, using knowledge graphs and language models to create hierarchical data summaries. It enables more structured information retrieval and enhances performance in Retrieval Augmented Generation (RAG) tasks.
xLAM: Salesforce's Tool for Training and Evaluating Autonomous Agents - Salesforce AI Research's xLAM repository provides tools for training autonomous agents using large language models (LLMs), standardizing agent trajectories from diverse environments. xLAM enables consistent data loading, balancing different data sources, and maintaining independent randomness during model training.
https://github.com/jihaonew/mm-instruct - MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment.
https://huggingface.co/facebook/multi-token-prediction - Released model for the Meta's paper "Better & Faster Large Language Models via Multi-token Prediction" where LLMs predict multiple future tokens instead of one to increase the sample efficiency. Four 7B parameter models trained are available on HuggingFace.
New Features for Gemini 1.5 Pro API and Google AI Studio - Google announces new features for the Gemini API, including a 2-million token context window for Gemini 1.5 Pro, code execution capabilities, and the new Gemma 2 model in Google AI Studio, allowing developers to use AI more advancedly and efficiently.
Meta Introduces '3D Gen': AI-Powered Rapid 3D Asset Creation - Meta announced a new tool, "3D Gen", that enables the rapid creation of 3D assets using artificial intelligence, aiming to revolutionize the 3D design and content production process by accelerating and simplifying the creation of high-quality 3D models.
Voice Isolator: Tool for Background Noise Removal and Voice Isolation - Voice Isolator by ElevenLabs is a tool for removing background noise from audio recordings, ideal for film, podcast, and interview post-production. Users can upload audio files or record directly to achieve clear speech audio.
Multimodal Canvas: A New Tool for Creative Work with Multimodal AI - Google Labs introduces Multimodal Canvas, a platform for creative experiments with multimodal AI models, integrating text, image, and sound in one interface. It allows users to interact with various AI modalities, supporting innovative projects and explorations in artificial intelligence.
GPT4All: Running Large Language Models Locally - GPT4All allows running large language models on local devices, ensuring privacy and no internet requirement. It supports various models and hardware, enabling access to local data and customization of user experiences.
GraphRAG: New Tool for Complex Data Discovery Now on GitHub -
GraphRAG is a Microsoft tool for generating answers to questions about private or previously unseen datasets using language models and knowledge graphs. It enables hierarchical data summarization and provides more structured information retrieval and comprehensive response generation compared to traditional RAG methods.
How to Win at Enterprise AI — A Playbook - The article discusses the challenges and strategies for implementing AI in enterprises, emphasizing the importance of breaking down work into smaller tasks and rebundling them into automated workflows. Understanding how AI can enhance efficiency by taking over knowledge and managerial tasks is crucial for delivering services as software.
Why AI Infrastructure Startups Are Insanely Hard to Build - The article discusses the challenges AI infrastructure startups face, highlighting high costs, complex technical requirements, and intense competition. These startups need to invest substantial resources in developing advanced technologies to meet growing market demands.