Warsaw.AI News 9-15.09.2024

Sep 16, 2024

We invite you to check the AI news that we found for you in the week of 9-15.09.2024.

#LLAMA #OpenSourceAI
New Open-Source AI Leader Reflection 70B's Performance Questioned, Accused of Fraud - Reflection 70B, a new open-source AI model, has come under scrutiny as its performance claims are being questioned and accusations of fraud have surfaced. The controversy raises concerns about the reliability and transparency of AI performance metrics in the open-source community.

#AI #Warsaw
The Warsaw School of Computer Science is piloting a post-diploma course on AI as part of the EU-funded ARISA AI Skills project. Details can be found here (in Polish).
#LangGraph
LangChain Academy Introduces Introductory Course on LangGraph - LangChain Academy has launched a new course titled "Intro to LangGraph," aimed at providing foundational knowledge on LangGraph. The course covers essential concepts, practical applications, and hands-on exercises to help scientists and programmers effectively utilize LangGraph in their projects.

#LLMPerformance
“Planning In Natural Language Improves LLM Search For Code Generation” - While scaling training compute has boosted LLM performance, scaling inference compute hasn't achieved similar gains due to a lack of diverse outputs. PLANSEARCH, a new algorithm, addresses this by generating diverse natural language plans, significantly improving problem-solving performance on benchmarks like LiveCodeBench.
#LLMs #researcher
“Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers” - Recent advancements in LLMs show potential for accelerating scientific discovery, but until now, no evaluations have proven they can generate expert-level research ideas. In a first-of-its-kind study comparing LLMs to NLP experts, LLM-generated ideas were found to be more novel but slightly less feasible, highlighting key challenges in research ideation.
#optimization
“The AdEMAMix Optimizer: Better, Faster, Older” - Momentum-based optimizers, commonly using a single Exponential Moving Average (EMA) for gradients, can be sub-optimal as they struggle to balance weighting recent and older gradients. AdEMAMix, a modified Adam optimizer with a mixture of two EMAs, improves gradient utilization, leading to faster convergence and lower minima in tasks like language modeling and image classification.
#3DReconstruction
“Sources of Uncertainty in 3D Scene Reconstruction” - 3D scene reconstruction using Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (GS) lacks mechanisms to handle uncertainties from noise, occlusions, and imprecise inputs. This paper introduces a taxonomy of uncertainty sources and extends NeRF- and GS-based methods with uncertainty estimation techniques, highlighting the importance of incorporating uncertainty awareness in 3D reconstruction.
#RLHF #CodeGeneration
“Policy Filtration in RLHF to Fine-Tune LLM for Code Generation” - Reinforcement learning from human feedback (RLHF) enables large language models (LLMs) to generate helpful responses, but accuracy issues in reward models, especially in complex code generation tasks, pose challenges. This paper introduces Policy Filtration for Proximal Policy Optimization (PF-PPO), a method that filters unreliable reward samples, improving policy learning and achieving state-of-the-art results in code generation benchmarks.
#SaliencyPrediction #DataAugmentation
“Data Augmentation via Latent Diffusion for Saliency Prediction” - Saliency prediction models struggle due to limited labeled data, and traditional data augmentation techniques often disrupt scene composition. This paper proposes a novel data augmentation method that preserves real-world scene complexity by editing natural images using photometric and semantic features, along with a saliency-guided cross-attention mechanism. The approach improves performance across saliency models and aligns well with human visual attention patterns.
#AIAgents
“Agent Workflow Memory” - Language model-based agents often struggle with long-horizon tasks, unlike humans who learn reusable workflows from experience. To address this, the paper introduces Agent Workflow Memory (AWM), a method that induces reusable workflows and selectively provides them to guide future actions. AWM significantly improves performance on web navigation benchmarks like Mind2Web and WebArena, boosting success rates and reducing task completion steps.
#AIAgents
“WindowsAgentArena: Evaluating Multi-Modal OS Agents at Scale” - LLMs show promise as computer agents in multi-modal tasks, but evaluating them in realistic environments is challenging due to limitations in benchmarks and slow evaluation times. To address this, the paper introduces WindowsAgentArena, a scalable environment where agents operate freely within the Windows OS to solve diverse tasks. The new multi-modal agent Navi demonstrates a 19.5% success rate in the Windows domain, compared to 74.5% for humans, highlighting opportunities for future research.
#Speech #LLMs
“LLaMA-Omni: Seamless Speech Interaction with Large Language Models” - Models like GPT-4o have improved real-time speech interaction with LLMs, but open-source speech models remain underexplored. To address this, LLaMA-Omni is introduced, a novel architecture integrating a speech encoder, adaptor, LLM, and streaming decoder, which generates text and speech responses directly from speech instructions with low latency. Experimental results show that LLaMA-Omni offers superior responses and faster processing times (226ms) compared to previous models, with efficient training on limited hardware.

#LLM #OpenAI #o1
OpenAI introduced o1: a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers - it can produce a long internal chain of thought before responding to the user. It demonstrates significant advancements in natural language processing and understanding. This model showcases improved contextual comprehension and response generation, making it a valuable tool for both scientific research and programming applications.
#MultimodalLLM #chemistry
Chai-1: a new multi-modal foundation model for molecular structure prediction that performs at the state-of-the-art across a variety of tasks relevant to drug discovery. Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, covalent modifications, and more. The authors claim that their solution can outperform AlphaFold3 on the PoseBusters benchmark.
#ComputerVision
PixTral-12B: a 12-billion parameter model designed for sophisticated image processing tasks. This model leverages extensive training data to enhance image recognition, segmentation, and generation capabilities, making it a valuable tool for scientists and programmers in the field of computer vision.
#LLM
Arcee-Llama-3.1-SuperNova: a 70B and 8B language model intended as a replacement for larger proprietary models, specifically in the context of instruction-following and human preference alignment.
#Text2Speech
Fish Speech V1.4: a text-to-speech model trained on 700k hours of audio data in multiple languages.
#ImageSegmentation #VisionTransformer
RobustSAM: a Vision Transformer (ViT) model designed for image segmentation tasks, providing enhanced robustness and accuracy. The model leverages the architecture of ViT-Huge to achieve state-of-the-art performance in segmenting complex images.
#ImageSegmentation
Finegrain Box Segmenter is a model designed for accurate object detection and segmentation. It leverages advanced machine-learning techniques to provide high-precision bounding boxes for various applications in computer vision.
#LLM #Google
DataGemma - open models from Google designed to help address the challenges of hallucination by grounding LLMs in the vast, real-world statistical data of Google's Data Commons.

#LLM
ESP32-LLM: Integrating Large Language Models with ESP32 - The repository provides a framework for integrating LLMs with the ESP32 microcontroller. It includes detailed instructions and code examples to facilitate the deployment of LLMs on resource-constrained devices.
#DataExtraction
DocAI is an open-source Python library designed for extracting structured data from unstructured documents. It leverages machine learning models to automate and enhance the efficiency of handling various document types.
#KnowledgeGraph
iText2KG is an open-source tool designed to extract knowledge graphs from unstructured text. It leverages natural language processing techniques to identify entities and relationships, facilitating the transformation of textual data into structured knowledge graphs.
#Text2X
Awesome Text2X Resources - this GitHub repository offers an extensive list of resources for various Text-to-X (X can be everything) methods (papers, codes and datasets).
#Research #LangChain
ai-data-analysis-MulitAgent - AI-Driven Research Assistant: An advanced multi-agent system for automating complex research processes. Leveraging LangChain, OpenAI GPT, and LangGraph, this tool streamlines hypothesis generation, data analysis, visualization, and report writing. Perfect for researchers and data scientists seeking to enhance their workflow and productivity.

#AIPlatform
SambaNova Launches an AI Platform - which runs Llama 3.1 405B at 132 tokens per second at full precision.
#PromptEngineering
eel - a lightweight prompt engineering library treating prompts as functions.
#RL #optimization
Simulator-Based Reinforcement Learning for Data Center Cooling Optimization - Meta engineers have developed a simulator-based reinforcement learning system to optimize data center cooling, achieving significant energy savings. The system uses a digital twin of the data center to train the RL model, which then provides real-time cooling recommendations.
#Robotics #Google
Google DeepMind teaches a robot to autonomously tie its shoes and fix fellow robots.

#Waymo
Waymo Struggles to Achieve Profitability Despite Technological Advances - Despite significant advancements in autonomous vehicle technology, Waymo has yet to achieve profitability, facing high operational costs and regulatory challenges. The company continues to invest heavily in research and development to overcome these hurdles and establish a sustainable business model.

#FashionAI #GenerativeModels
Prompt2Fashion: An automatically generated fashion dataset - created using generative models a fashion image dataset tailored to various occasions, styles, and body types, offering high-quality, personalized outfits as evaluated by both expert and non-expert users.

Warsaw.AI News

Warsaw.AI News 9-15.09.2024

Discussion about this post