AI3-GenAI4Sci

GroupPost20250407-WorkAlign

2025-04-07T00:00:00+00:00

Internal Report - Meeting Minutes Link: feishu docs. To view or edit, please request access.

GroupPost20250325-WorkAlign

2025-03-25T00:00:00+00:00

Internal Report - Meeting Minutes Link: feishu docs. To view or edit, please request access.

GroupPost20240304-COT Compress

2025-03-04T00:00:00+00:00

Report Overview

This report explores Chain-of-Thought (COT) compression techniques aimed at reducing the computational cost and enhancing the efficiency of large language models (LLMs) in complex reasoning tasks. While COT improves LLM performance by generating intermediate reasoning steps, the length of these reasoning chains leads to increased computational demands. COT compression seeks to shorten these chains without significantly impacting model performance.

Key Strategies and Methods

The report covers several key strategies for COT compression:

Explicit Compression During Training:
- Knowledge Distillation: This involves transferring knowledge from a complex “System 2” model (which generates COT) to a more efficient “System 1” model (which directly outputs results), thereby speeding up the inference process.
  - Notable work: Meta’s “Distilling System 2 into System 1.”
- Step-by-Step Training: This method iteratively trains models to identify and skip redundant reasoning steps, effectively shortening the reasoning pathway.
  - Notable work: Research by Qiu et al. in “Can Language Models Learn to Skip Steps?”.
- Data-Conditioned Training: Utilizes GPT-4 to create pairs of long and short COT data, training models to generate reasoning chains of varying lengths.
  - Notable work: Research by BeiKe, “C3OT: Generating Shorter Chain-of-Thought without Compromising Effectiveness.”
Hidden State Compression:
- Implicit COT: Gradually internalizes the explicit COT reasoning process into the model’s hidden states, eliminating the need for explicit reasoning chains.
  - Notable work: Research by Yejin Choi et al., “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step.”
- Compressed COT (CCOT): Employs densely semantic tokens to represent compressed reasoning, compressing reasoning chains within the hidden space.
  - Notable work: Research by JHU, “Compressed Chain of Thought: Efficient Reasoning Through Dense Representations.”
Dynamic Length Control:
- COT-Valve: Introduces adjustable parameters within the model’s parameter space to dynamically manage the length of reasoning chains.
  - Notable work: Research by NUS, “COT-Valve: Length-Compressible Chain-of-Thought Tuning.”
Reinforcement Learning Compression:
- O1-Pruner: Designs a length-harmonizing reward function to guide models in generating shorter yet accurate reasoning sequences through reinforcement learning.
  - Notable work: Research by Sun Yat-sen University, “O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning.”
- Kimi k1.5: Explores various reinforcement learning compression strategies, including model merging, shortest rejection sampling, DPO, and Long2short RL.
  - Notable work: Kimi 1.5 Technical Report.

Key Research Trends

A shift from explicit token compression to implicit representation compression, focusing on more efficient reasoning methods.
The growing prominence of reinforcement learning in reasoning chain compression.
The exploration of multi-agent frameworks for social reasoning.

Future Outlook

As models continue to scale, managing inference costs will become increasingly critical.
The development of more effective compression techniques is essential for deploying LLMs in resource-constrained environments.

Report Significance

Provides a comprehensive overview of COT compression technologies, highlighting the latest advancements in the field.
Offers practical insights for optimizing LLM performance and efficiency through appropriate compression strategies.

The specific files can be found here: Trend Report.pdf

GroupPost20250225-WorkAlign

2025-02-25T00:00:00+00:00

Internal Report - Meeting Minutes Link: feishu docs. To view or edit, please request access.

GroupPost20241119-RFDiffusion

2024-11-19T00:00:00+00:00

RFdiffusion is a general protein design framework based on diffusion models, capable of de novo design of binders and higher-order symmetric cyclic polymers, among other types of proteins.

Mr.Shi shared the classic RFDiffusion algorithm and provided a brief introduction to

the basics of proteins
RFD training methods
applications

The specific files can be found here: RFDiffusion.pdf

GroupPost20241105-AlphaFold2 & PeptideGPT

2024-11-05T00:00:00+00:00

AF2, the champion of the 2021 CASP competition, is also the work that won the 2024 Nobel Prize and holds significant importance for protein engineering.

PeptideGPT is a pipeline to generate protein sequences with specific functions.

The following is an overview of the presentations by Bohao Lv:

PeptideGPT：

Finetuned the existing protein sequence generation model ProtGPT2 using protein data with specific functions to generate protein sequences with specific functions.
Utilized bioinformatics knowledge to perform the first round of sequence rationality screening on the generated sequences.
Used the structure prediction model ESMFold to perform the second round of rational structure screening on the generated sequences.
Employed a classifier to validate the functions of the generated protein sequences.

AF2

Decoder part
- IPA
- Backbone predict
- atom predict
Loss function

The specific files can be found here: AF2.pdf & PeptideGPT Blog

GroupPost20241029-Maple & AlphaFold2

2024-10-29T00:00:00+00:00

MAPLE can be used to predict methylation age and disease risk. It achieves stable and precise results by eliminating batch effects through contrastive learning methods.

AF2, the champion of the 2021 CASP competition, is also the work that won the 2024 Nobel Prize and holds significant importance for protein engineering.

The following is an overview of the presentations by Yu Zhang & Bohao Lv:

MAPLE：

Predicting an individual’s age and disease probability through methylation data
Using contrastive learning methods to eliminate batch effects between methylation data from different sources
Capturing biological factors related to disease risk using MAPLE
Analyzing MAPLE results under the framework of aging biology

AF2

Input feature construction: Multiple Sequence Alignment (MSA) + Pair representation
Encoding part Evoformer:
- MSA representation update: seq-based pair-bias self-attention + residues-based self-attention
- Pair representation update: Triangular multiplicative + Triangular self-attention

The specific files can be found here: MAPLE.pdf & AF2.pdf