Warsaw.AI News 15-21.04.2024
Hello AI Enthusiasts!
Please check the AI news that we found for you in the week of 15-21.04.2024:
“Is Attention All You Need?” - Scientists are creating alternative architectures to overcome the drawbacks of Transformers in handling long-context learning, generation, and inference speed/cost, demonstrating competitive performance at smaller scales, yet uncertain scalability.
“The Space Of Possible Minds” - The rise of advanced AI is reshaping our understanding of human identity and prompting us to investigate how we express genuine comprehension and autonomy across diverse forms of intelligence. To navigate this evolving terrain, it's crucial to establish systematic frameworks for extending our ethical considerations to the core attributes of existence, acknowledge the parallels and distinctions among different manifestations of intelligence, and foster mutually advantageous interactions among vastly disparate entities.
“The Shifting Dynamics & Meta-Moats of AI” - Michael Dempsey examines the unique competitive advantages of AI, emphasizing the increasing importance of data and scaling in creating sustainable market edges."
“The ethics of advanced AI assistants“ - The paper thoroughly examines the ethical and societal implications of advanced AI assistants, emphasizing their potential impact on individual and collective lives. It highlights the need for responsiveness to users' needs, robust safeguards against inappropriate influence, and comprehensive sociotechnical evaluations to support responsible decision-making and deployment in this domain.
“Call for High School Projects - Machine Learning for Social Impact“ - This year, NeurIPS conference invites high school students to submit research papers on the topic of machine learning for social impact. A subset of finalists will be selected to present their projects virtually and will have their work spotlighted on the NeurIPS homepage.
“Compression Represents Intelligence Linearly” - The study investigates the relationship between compression and intelligence in LLMs, treating LLMs as data compressors and using downstream benchmark scores as a measure of intelligence. It finds a nearly linear correlation between LLMs' ability to compress external text corpora and their intelligence as reflected by average benchmark scores, suggesting that superior compression efficiency indicates greater intelligence.
“VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time” - Microsoft introduces VASA - a framework designed to create lifelike talking faces for virtual characters, utilizing a single static image and a speech audio clip. The premier model, VASA-1, generates synchronized lip movements, captures diverse facial nuances, and incorporates natural head motions, resulting in authentic and lively expressions, outperforming previous methods across various dimensions and supporting real-time generation of high-quality videos.
“Long-form music generation with latent diffusion“ - Recent advancements in audio-based generative models for music have focused on producing full-length music tracks with coherent musical structures. By training a generative model on long temporal contexts and utilizing a diffusion-transformer on a highly downsampled continuous latent representation, the model achieves state-of-the-art results in generating long-form music tracks, demonstrating coherent structure and high audio quality according to various metrics and subjective evaluations.
“FedPFT: Federated Proxy Fine-Tuning of Foundation Models” - In the realm of Federated Learning, adapting Foundation Models (FMs) for downstream tasks has shown promise in preserving data privacy and FM integrity. Addressing the suboptimal performance stemming from existing methods, Federated Proxy Fine-Tuning (FedPFT) introduces two key modules: a sub-FM construction module for comprehensive fine-tuning and a sub-FM alignment module to reduce gradient errors, resulting in superior performance across various datasets in both text and vision tasks.
“In-Context Learning State Vector with Inner and Momentum Optimization” - LLMs have demonstrated impressive In-Context Learning (ICL) capabilities from limited examples, with recent studies indicating that these functions can be represented by compressed vectors from the transformer. Addressing the need for deeper exploration, this paper offers a comprehensive analysis of these vectors, introducing the concept of state vectors and proposing inner and momentum optimization methods for their refinement, leading to state-of-the-art performance enhancements across various tasks using Llama-2 and GPT-J models.
“SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap“ - This paper addresses the challenge of reconstructing the game state from football videos captured by a single camera, crucial for analyzing player movements and team tactics. Introducing the SoccerNet-GSR dataset and GS-HOTA metric, it provides a foundation for research in game state reconstruction, offering an end-to-end baseline and paving the way for future advancements in this area.
“Token-level Direct Preference Optimization” - This paper presents Token-level Direct Preference Optimization (TDPO) as a novel approach to fine-tune Large Language Models (LLMs) by optimizing policy at the token level, improving alignment with human preferences and generation diversity. Unlike previous methods, TDPO incorporates forward KL divergence constraints for each token, enhancing alignment while maintaining simplicity, resulting in superior performance across various text tasks compared to existing approaches.
https://qwenlm.github.io/blog/codeqwen1.5 - A new member of the Qwen1.5 open-source family, the CodeQwen1.5-7B, a specialized codeLLM built upon the Qwen1.5 language model. CodeQwen1.5-7B has been pretrained with around 3 trillion tokens of code-related data, supports 92 programming languages and context window of 64k tokens. A strong competitor benchmarked against GPT4-Turbo.
Meta released Llama3, currently, in 8B and 70B versions. 400B version is expected. 8B version outperforms baselines: Mistral 7B and Gemma 7B. Bigger model is a strong competitor benchmarked against Gemini 1.5 Pro and Claude 3 Sonnet. In addition to Llama3, Meta announced Meta AI assistant (currently not yet available in Poland).
OLMo 1.7–7B: an updated version of a 7 billion parameter Open Language Model, OLMo 1.7–7B. This model scores 52 on MMLU, sitting above Llama 2–7B and approaching Llama 2–13B, and outperforms Llama 2–13B on GSM8K.
Zamba: A Compact 7B SSM Hybrid Model. Approaching Mistral and Gemma levels of performance despite being trained on many times fewer tokens, and using open datasets. Notably outperforms LLaMA-2 7B and OLMo-7B on a wide array of benchmarks despite requiring less than half of the training data.
https://mistral.ai/news/mixtral-8x22b - A new sparse Mixture-of-Experts (SMoE) model from Mistral. Uses only 39B active parameters out of 141B. The model is fluent in English, French, Italian, German, and Spanish. Supports context window of size 64k tokens, Achieves great results compared to baselines including Llama models and Command R models, and maintains low cost.
Effort Engine - A Novel Method for Language Model Inference - Effort Engine introduces a novel algorithm that allows dynamic adjustment of computation during language model inference, enhancing speed while maintaining quality.
LINGO-2: A New Era of Language-Driven Autonomous Vehicles - Wayve introduces LINGO-2, a language model that integrates vision, language, and action to enhance driving autonomy through advanced language management and better decision-making understanding.
Google Maps Introduces AI Enhancements to EV Charging Navigation - Google Maps has updated its navigation features by incorporating artificial intelligence to refine directions to electric vehicle charging stations, potentially making it easier for drivers to locate available chargers and plan routes.
SAMMO: A General-Purpose Framework for Prompt Optimization - Microsoft Research has developed SAMMO, a framework for optimizing prompts for language models, facilitating adaptation and enhancing AI efficiency across various applications.
Stable Diffusion 3 API Now Available - Stability AI has released its new Stable Diffusion 3 and Turbo versions, featuring improved architecture and better prompt compliance, while maintaining openness and availability of models for developers.
Ensuring Security and Privacy in Slack AI - Slack has developed its AI features with a focus on secure and private data processing, utilizing language models without training on customer data.
https://github.com/pytorch/torchtune - A PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs. Currently in stage alpha. Supports various popular techniques like LoRA, QLoRA. Accessible through configs. Supports many models e.g. Llama3
https://github.com/google-gemini/cookbook - A collection of guides and examples for the Gemini API.
https://github.com/paul-gauthier/aider - Aider is AI pair programming in your terminal.
https://github.com/infiniflow/ragflow - RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://github.com/wangyuchi369/ladic - LaDiC: A Diffusion-based Image Captioning Model.
https://github.com/jafioti/luminal - Luminal - a deep learning library that uses composable compilers to achieve high performance.
https://github.com/madrylab/modelcomponents - the code to estimate and analyze component attributions.
https://github.com/vikhyat/moondrea - Moondream is a computer-vision model that can answer real-world questions about images. It's tiny by today's models, with only 1.6B parameters. That enables it to run on a variety of devices, including mobile phones and edge devices.
https://github.com/Portkey-AI/gateway - AI Gateway which streamlines requests to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, load balancing, and can be edge-deployed for minimum latency.
MLOps vs. Engineering: Misaligned Incentives and Failure to Launch?
The article discusses these issues and emphasizes the need for collaboration between data science and engineering.
Enjoy!
Warsaw.AI News Team
P.S: This newsletter is free, but if you enjoy it, you are welcome to donate to help us make it even better (the readers paying taxes in Poland can also contribute 1.5% of their taxes to support us).