Hello AI Enthusiasts!
Please check the AI news that we found for you in the week of 22-28.04.2024:
“ML Workshop: Physics-Informed Neural Networks” - This workshop is an introduction to physics-informed neural networks (PINNs) and neural operators. It will take place online on May 11-18.
“A Visual Guide to Vision Transformers” - This guide will walk you through the key components of Vision Transformers in a scroll story format, using visualizations and simple explanations to help you understand how these models work and how the flow of the data through the model looks like.
“Getting Started with Mistral” - A beginner-friendly course, it’s suitable for anyone who wants to learn about and use Mistral AI’s collection of advanced open-source and commercial LLMs.
“Stanford CS25: Transformers United” - A series of seminars and lectures examining the details of how transformers work.
“Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone” - A technical report of phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, demonstrating competitive performance similar to larger models like Mixtral 8x7B and GPT-3.5, achieving 69% on MMLU and 8.38 on MT-bench.
“SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation” - SEED-X integrates features allowing comprehension of images of any size and ratio, and enables multi-granularity image generation, resulting in a versatile model capable of handling real-world applications across domains with competitive performance on public benchmarks and potential for future research exploration.
“MultiBooth: Towards Generating All Your Concepts in an Image from Text“ - This paper presents MultiBooth, a novel technique for efficient multi-concept customization in image generation from text, addressing challenges faced by existing methods in handling multi-concept scenarios. MultiBooth divides the generation process into single-concept learning and multi-concept integration phases, employing a multi-modal image encoder and bounding boxes to improve concept fidelity and reduce inference cost, respectively. Evaluated against various baselines, MultiBooth demonstrates superior performance and computational efficiency in both qualitative and quantitative assessments.
“Taming Diffusion Probabilistic Models for Character Control” - This paper introduces a character control framework leveraging Conditional Autoregressive Motion Diffusion Model (CAMDM), enabling real-time generation of diverse character animations based on dynamic user-supplied control signals. Through innovative algorithmic designs such as separate condition tokenization and heuristic future trajectory extension, the framework addresses challenges associated with motion diffusion probabilistic models, achieving high-quality, diverse character animations across multiple styles with computational efficiency. Evaluation on various locomotion skills demonstrates the superiority of the proposed method over existing character controllers.
“CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method“ - CutDiffusion simplifies and accelerates the process of diffusion extrapolation for transforming large pre-trained low-resolution diffusion models to higher resolutions. By dividing the diffusion process into initial comprehensive structure denoising and subsequent detail refinement phases, CutDiffusion achieves faster inference speeds, reduced GPU costs, and improved generation performance, making it a versatile and efficient solution for diffusion adaptability.
“Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation” - CFExplainer is a novel counterfactual explainer for GNN-based vulnerability detection that seeks minimal perturbations to input code graphs, addressing what-if questions for vulnerability detection by pinpointing the root causes of detected vulnerabilities. Unlike factual reasoning-based explainers, CFExplainer provides counterfactual explanations, enabling valuable insights for developers to take appropriate actions for fixing vulnerabilities, as demonstrated through extensive experiments on four GNN-based vulnerability detection models.
“Conformal Predictive Systems Under Covariate Shift” - This paper introduces Weighted CPS (WCPS), an extension of Conformal Predictive Systems (CPS) designed to handle scenarios with covariate shifts, departing from the Independent and Identically Distributed (IID) model assumption. WCPS leverages likelihood ratios between training and testing covariate distributions to construct nonparametric predictive distributions, facilitating calibrated inference and decision-making under covariate shifts. Empirical evaluations on synthetic and real-world datasets demonstrate the utility and probabilistic calibration of WCPS in handling scenarios characterized by covariate shifts.
“FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search” - This work proposes a novel Graph Neural Network (GNN) predictor for Neural Architecture Search (NAS) tasks, aiming to estimate the performance of deep neural network architectures. By combining conventional and inverse graph views, and introducing a customized training loss, the GNN predictor achieves improved prediction accuracy, as demonstrated through experiments on benchmark datasets such as NAS-Bench-101, NAS-Bench-201, and the DARTS search space. The results show a notable enhancement in prediction accuracy, with the Kendall-tau correlation increasing by 3% to 16% compared to leading GNN predictors.
“A Multimodal Automated Interpretability Agent“ - Obtaining a comprehensive understanding of neural models involves various inquiries, including identifying reliance on specific features, recognizing prediction errors, and optimizing accuracy and robustness through data and architecture modifications. However, acquiring this understanding currently demands substantial human effort, requiring researchers to formalize questions, hypothesize model decision processes, design evaluation datasets, and validate hypotheses, resulting in a slow and expensive process.
“Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding” - LayerSkip offers an end-to-end solution to accelerate the inference of large language models (LLMs). By employing layer dropout during training, an early exit loss mechanism, and a self-speculative decoding approach during inference, LayerSkip achieves improved accuracy at early layers without adding auxiliary layers or modules, resulting in significant speed-ups of up to 2.16x for summarization, 1.82x for coding, and 2.0x for semantic parsing tasks, across various LLM sizes and training scenarios.
“Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings“ - This study addresses the evaluation gap in text-to-image (T2I) generative models by introducing a comprehensive skills-based benchmark, gathering human ratings across multiple templates and models, and introducing a new QA-based auto-eval metric. With over 100K annotations, the research provides insights into the differences arising from prompt ambiguity and metric quality, enabling practitioners to pinpoint challenging skills and levels of complexity while evaluating T2I models effectively.
“OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework” - The release of OpenELM, a state-of-the-art open language model, aims to enhance the reproducibility and transparency of large language models. By employing a layer-wise scaling strategy and providing the complete framework for training and evaluation on publicly available datasets, OpenELM achieves improved accuracy and empowers the open research community for future endeavors.
“How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study” - The latest release of Meta's LLaMA series, particularly LLaMA3 models, demonstrates remarkable performance across various tasks after super-large scale pre-training on over 15T tokens of data. However, exploring the capabilities of LLaMA3 when quantized to low bit-width reveals significant performance degradation challenges, especially in ultra-low bit-width scenarios, indicating a need for further advancements to bridge the performance gap in future developments of Large Language Models. You can also check the project’s repository.
Llama-3-8B-16K - This is an extended (16K) context version of LLaMA 3 8B. Trained for five hours on 8x A6000 GPUs, using the
Yukang/LongAlpaca-16k-length
dataset.Meditron - An LLM suite especially suited for low-resource medical settings leveraging Meta Llama.
https://github.com/mistralai/mistral-common - A set of tools to help work with Mistral models.
https://github.com/duangzhu/maexp - A generic platform for RL-based multi-agent exploration.
https://github.com/google-deepmind/penzai - A JAX research toolkit for building, editing, and visualizing neural networks.
https://github.com/jxnl/instructor - A Python library that makes it a breeze to work with structured outputs from LLMs. Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses.
https://github.com/hiyouga/LLaMA-Factory - Efficient Fine-Tuning of 100+ LLMs.
https://github.com/microsoft/BitBLAS - a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
https://github.com/apple/corenet - a deep neural network toolkit that allows researchers and engineers to train standard and novel small and large-scale models for variety of tasks, including foundation models (e.g., CLIP and LLM), object classification, object detection, and semantic segmentation.
https://github.com/google/maxtext - a high performance, highly scalable, open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference. MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
https://github.com/Suyimu/WRV2 - Raformer: Redundancy-Aware Transformer for Video Wire Inpainting.
https://github.com/cohere-ai/cohere-toolkit - a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8 - a local LLM inference tool that lets distribute and run LLMs with a single file.
US Air Force confirms first successful AI dogfight - An autonomously controlled aircraft faced off against a human pilot in a test last year.
Generative A.I. Arrives in the Gene Editing World of CRISPR - a new AI system devises blueprints for microscopic mechanisms that can edit your DNA.
Snowflake Arctic - an enterprise-focused LLM.
PyTorch 2.3 released.
Safely repairing broken builds with ML - Automatically repairing non-building code increases productivity as measured by overall task completion and appears to introduce no detectable negative impact on code safety, provided that high-quality training data and responsible monitoring are employed.
PepRank - a pioneering model and web service for predicting Major Histocompatibility Complex (MHC) and peptide elution. Its training methodology focuses on peptide rankings predictions, reducing the need for extensive experiments and making them cost-effective. Ideal for scenarios where researchers need to experimentally verify a small, targeted set of peptides.
AI in Practice: Real and Potential Applications - The article explores the use of large language models across various professions, discussing both current applications and future possibilities for task automation.
Enjoy!
Warsaw.AI News Team
P.S.: This newsletter is free, but if you enjoy it, you are welcome to donate to help us make it even better (the readers paying taxes in Poland can also contribute 1.5% of their taxes to support us).
Here is my alternative to the Instructor library: https://github.com/zby/LLMEasyTools It is aiming at a more agentic workflow where the limitation of just one output type of Instructor is a big disadvantage.