Warsaw.AI News 18-24.03.2024
Hello AI Enthusiasts!
Please find the AI news that we found for you in the week of 18-24.03.2024:
General:
“NVIDIA Blackwell Platform Arrives to Power a New Era of Computing“ - NVIDIA introduces the Blackwell platform, which aims to usher in a new era of computing by offering a unified architecture for AI, HPC, and data analytics workloads. This platform promises to deliver breakthrough performance and efficiency to address the increasingly complex challenges in these fields.
Prompt Library - a comprehensive collection of prompts designed to guide AI models in various tasks.
“What I learned from looking at 900 most popular open source AI tools” - an interesting analysis by Chip Huyen.
“Plentiful, high-paying jobs in the age of AI” - The article explores the notion that despite concerns about automation and job displacement, there are still abundant high-paying job opportunities available in the U.S. labor market. It discusses the importance of investing in education and training programs to equip workers with the skills needed to access these lucrative positions and adapt to evolving job market demands.
Events:
The application for the Climate Change AI In-Person Summer School is now open (the deadline is April 14).
We’ve released the video recording of the Quantum Machine Learning Conference 2024 organized by the Quantum AI Foundation.
Education:
ARISA – an EU-funded AI skills project is reaching out. The Warsaw School of Computer Science is a partner in a project to upskill Europe’s future AI workforce. The ARISA consortium has already published a needs analysis and is drawing up curricula with a view to launching pilot courses later this year – and industry input is welcome.
Code from Sebastian Raschka’s Build a Large Language Model (From Scratch) covering advanced features to the training function, which are used in typical pre-training and fine-tuning. Covers learning rate scheduling, warm starting, and gradient clipping among others.
Science:
“A Conceptual Framework For White Box Neural Networks” - The paper proposes semantic features as a framework for fully explainable neural network layers, demonstrating its effectiveness with a proof-of-concept model for a subproblem of MNIST. This model, comprising four layers with 4.8K learnable parameters, achieves human-level adversarial test accuracy without adversarial training, and can be trained quickly on a single CPU, hinting at the potential for democratized and highly generalizable white box neural networks.
“Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation” - Latent Adversarial Diffusion Distillation (LADD) is introduced as a novel distillation method aimed at improving inference speed and performance in image and video synthesis. Unlike previous approaches like adversarial diffusion distillation (ADD), LADD leverages generative features from pretrained latent diffusion models, simplifying training while achieving high-resolution multi-aspect ratio image synthesis, as demonstrated by its application to Stable Diffusion 3 (8B) to produce SD3-Turbo, a fast model matching state-of-the-art text-to-image generators with just four unguided sampling steps.
“Distilling Datasets Into Less Than One Image” - The paper introduces Poster Dataset Distillation (PoDD), a novel approach that compresses entire datasets into single posters, pushing the boundaries of dataset distillation to achieve high accuracy with less than one image-per-class. By focusing on distilled pixels-per-dataset rather than images-per-class, PoDD achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and CUB200 datasets, demonstrating superior efficiency in dataset compression and model training.
“MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control” - MineDreamer, an embodied agent developed in the Minecraft simulator, addresses the challenge of following diverse instructions by integrating Multimodal Large Language Models and diffusion models. Utilizing a Chain-of-Imagination mechanism, MineDreamer envisions and executes step-by-step instructions, demonstrating superior performance in following both single and multi-step instructions compared to existing generalist agent baselines.
“DreamDA: Generative Data Augmentation with Diffusion Models” - This paper introduces DreamDA, a classification-oriented framework that leverages diffusion models for data synthesis and label generation, addressing the limitations of conventional Data Augmentation (DA) techniques. DreamDA generates diverse samples adhering to the original data distribution by perturbing the reverse diffusion process of training images, and employs a self-training paradigm to generate accurate labels for the synthesized data, resulting in consistent improvements across various tasks and datasets.
“Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models“ - Our work introduces the Chain-of-Spot (CoS) method, enabling Interactive Reasoning in Large Vision-Language Models (LVLMs), which enhances feature extraction by focusing on key regions of interest (ROI) within the image, corresponding to the posed questions or instructions. Integrating CoS with instruct-following LLaVA-1.5 models consistently improves image reasoning performance across multimodal datasets and benchmarks, achieving new state-of-the-art results and enhancing LVLMs' ability to understand and reason about visual content.
“PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns” - the paper introduces PuzzleVQA, a dataset of puzzles based on abstract patterns, to evaluate large multimodal models' ability to reason abstractly. Despite advancements, even state-of-the-art models like GPT-4V struggle to generalize well to simple abstract patterns, with weaker visual perception and inductive reasoning identified as main bottlenecks, highlighting the need for improvement in emulating human cognitive processes.
“Adapting language model architectures for time series forecasting” - The article discusses adapting language model architectures for time series forecasting, highlighting the challenges and benefits of this approach. By incorporating techniques like autoregressive masking and temporal embeddings, language models can effectively capture temporal dependencies and generate accurate forecasts for time series data.
“Moirai: A Time Series Foundation Model for Universal Forecasting” - The blog post introduces Moirai, a new benchmark for multimodal grounded language understanding tasks. It presents a comprehensive evaluation framework that covers various aspects of language understanding, such as knowledge reasoning, commonsense reasoning, and multimodal comprehension.
“TacticAI: an AI assistant for football tactics” - TacticAI is introduced as an AI assistant for football tactics, focusing on analyzing corner kicks in collaboration with Liverpool FC. It incorporates predictive and generative components to provide coaches with effective suggestions for player setups, validated through qualitative studies showing its indistinguishable suggestions from real tactics and high favorability among coaches.
“MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data” - The study demonstrates advancements in reconstructing visual perception from brain activity with improved practical utility by requiring only 1 hour of fMRI training data per subject. By pretraining the model across multiple subjects and employing a novel functional alignment procedure, the approach achieves high-quality reconstructions, outperforming single-subject approaches and showcasing the potential for accurate reconstructions from limited training data.
“Reverse Training to Nurse the Reversal Curse” - The Reversal Curse phenomenon in large language models (LLMs), where they struggle to generalize from "A has a feature B" to "B is a feature of A," is addressed by a proposed alternative training scheme called reverse training. By utilizing reverse training, which involves training the LLM in both forward and reverse directions while preserving chosen substrings like entities, the study demonstrates superior performance on standard tasks with data-matched reverse-trained models and significantly improved performance on reversal tasks with compute-matched reverse-trained models, effectively mitigating the Reversal Curse issue.
“A landmark moment’: scientists use AI to design antibodies from scratch” - Modified protein-design tool could make it easier to tackle challenging drug targets - but AI antibodies are still a long way from reaching the clinic.
Interesting repositories:
https://github.com/hpcaitech/Open-Sora - Open-Sora, an initiative dedicated to efficiently producing high-quality video and making the model, tools, and content accessible to all. Currently in the early stages.
https://github.com/unslothai/unsloth - Collection of beginner-friendly notebooks to efficiently fine-tune LLMs provided by Unsloth. Notebooks support many open-source models and can be easily run for free on Google Colab
https://github.com/imartinez/privateGPT - PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of LLMs, even in scenarios without an Internet connection. Built using FastAPI and LlamaIndex. Thanks to wrappers for a RAG pipeline, the project enables chatting with files too.
https://github.com/albertan017/LLM4Decompile - Pioneering open-source LLM dedicated to decompilation. Its current version supports decompiling Linux x86_64 binaries, ranging from GCC's O0 to O3 optimization levels, into human-readable C source code. Comes with Decompile-Eval which is the first decompilation benchmark, similar to HumanEval (but in C language).
https://github.com/lavague-ai/LaVague - LaVague can automate interactions with the browser by executing commands passed in natural language. Functionality seems to be similar to Selenium, but natural language commands are much easier than code (however probably less reliable, so one needs to decide on this tradeoff).
https://github.com/mshumer/gpt-prompt-engineer - a system that helps to engineer a prompt, by taking the description of the task, test cases, and creating a good prompt. Various models are supported e.g. GPT-4, GPT-3.5 Turbo, Claude 3 Opus.
https://github.com/Skyvern-AI/skyvern - Skyvern automates browser-based workflows using LLMs and computer vision.
Interesting models:
https://github.com/xai-org/grok-1 - This repository contains JAX example code for loading and running the Grok-1 open-weights model.
Applied:
Suno: ChatGPT-Powered AI Music Creation - Suno is a ChatGPT-powered AI music generator that produces original songs from simple text prompts. Through a collaboration with Microsoft, Suno enables song generation via Chatbot, offering both free and premium versions with varying rights to the tracks.
Introducing Stable Video 3D - Stability AI introduces Stable Video 3D, a new model that generates 3D videos from single images, offering significantly improved quality and multi-view capabilities over previous solutions, with both commercial and non-commercial access to the model.
VLOGGER: Bringing Still Photos to Life with AI - Google researchers have developed an AI system called VLOGGER that can generate lifelike videos of people speaking and gesturing from a single photo, opening new possibilities while raising concerns about deepfakes.
RAG 2.0: Advancing Generative AI Systems - Contextual AI introduces RAG 2.0, enhancing generative AI systems through end-to-end optimization, achieving state-of-the-art performance across various industry benchmarks with new Contextual Language Models.
Fitbit Utilizes Google Gemini for New AI Fitness Coach - Fitbit is collaborating with Google to create a new LLM based on Gemini to provide Fitbit app users with personalized data and health recommendations, potentially revolutionizing fitness and health monitoring.
NVIDIA's "moonshot" for Human-Level AI in Robot Form -
NVIDIA announces Project GR00T, aiming to create a general-purpose foundation model for humanoid robots with AI, potentially enabling robots to learn skills and solve a variety of tasks on the fly.“Introducing SceneScript, a novel approach for 3D scene reconstruction” - Scenescript is a novel 3D scene reconstruction method developed by Meta Reality Labs Research. It utilizes deep learning techniques to reconstruct detailed 3D scenes from a single image, demonstrating promising results in generating realistic and accurate scene representations.
“Evolving New Foundation Models: Unleashing the Power of Automating Model Development” - Sakana AI introduces an evolutionary model merging approach aimed at enhancing large neural network models' performance. By iteratively combining and evolving smaller models, their method achieves superior performance on various tasks compared to individual models.
NVIDIA Launches Generative AI Microservices for Developers, enabling the creation and deployment of AI applications across the NVIDIA CUDA GPU base. These microservices accelerate data processing, LLM customization, inference, and generation, supported by a broad AI ecosystem.
Logarithm: Revolutionizing AI Logging - Meta introduces Logarithm, a logging engine optimized for AI training debugging, indexing over 100 GB of logs per second.
Intelligent monitoring: Towards AI-assisted monitoring for cloud services
- Microsoft Research introduces an approach to intelligent cloud service monitoring, leveraging AI to enhance incident detection accuracy, reduce unnecessary alerts, and improve system reliability.
Business:
An interview with Sam Altman about OpenAI, GPT-5, Sora, Board Saga, Elon Musk, Ilya, Power & AGI.
Enjoy reading! :)
Warsaw.AI News Team
P.S.: This newsletter is free, but if you enjoy it, you are welcome to donate to help us make it even better (the readers paying taxes in Poland can also contribute 1.5% of their taxes to support us).