Skip to the content.

Lavoisier

Only the extraordinary can beget the extraordinary

Spectacular Logo

Python Version License

Lavoisier is a high-performance computing solution for mass spectrometry-based metabolomics data analysis pipelines. It combines traditional numerical methods with advanced visualization and AI-driven analytics to provide comprehensive insights from high-volume MS data.

Core Architecture

Lavoisier features a metacognitive orchestration layer that coordinates two main pipelines:

  1. Numerical Analysis Pipeline: Uses established computational methods for ion spectra extraction, annotates ion peaks through database search, fragmentation rules, and natural language processing.

  2. Visual Analysis Pipeline: Converts spectra into video format and applies computer vision methods for annotation.

The orchestration layer manages workflow execution, resource allocation, and integrates LLM-powered intelligence for analysis and decision-making.

┌────────────────────────────────────────────────────────────────┐
│                   Metacognitive Orchestration                   │
│                                                                │
│  ┌──────────────────────┐          ┌───────────────────────┐   │
│  │                      │          │                       │   │
│  │  Numerical Pipeline  │◄────────►│  Visual Pipeline      │   │
│  │                      │          │                       │   │
│  └──────────────────────┘          └───────────────────────┘   │
│                 ▲                              ▲                │
│                 │                              │                │
│                 ▼                              ▼                │
│  ┌──────────────────────┐          ┌───────────────────────┐   │
│  │                      │          │                       │   │
│  │  Model Repository    │◄────────►│  LLM Integration      │   │
│  │                      │          │                       │   │
│  └──────────────────────┘          └───────────────────────┘   │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Command Line Interface

Lavoisier provides a high-performance CLI interface for seamless interaction with all system components:

Numerical Processing Pipeline

The numerical pipeline processes raw mass spectrometry data through a distributed computing architecture, specifically designed for handling large-scale MS datasets:

Raw Data Processing

Comprehensive MS2 Annotation

Enhanced MS2 Analysis

Distributed Computing

Data Management

Processing Features

Visual Analysis Pipeline

The visualization pipeline transforms processed MS data into interpretable visual formats:

Spectrum Analysis

Visualization Generation

Data Integration

Output Formats

LLM Integration & Continuous Learning

Lavoisier integrates commercial and open-source LLMs to enhance analytical capabilities and enable continuous learning:

Assistive Intelligence

Solver Architecture

Continuous Learning System

Metacognitive Query Generation

Specialized Models Integration

Lavoisier incorporates domain-specific models for advanced analysis tasks:

Biomedical Language Models

Scientific Text Encoders

Chemical Named Entity Recognition

Proteomics Analysis

Key Capabilities

Performance

Data Handling

Annotation Capabilities

Quality Control

Analysis Features

Use Cases

Proteomics Research

Metabolomics Studies

Quality Control

Data Visualization

Project Structure

lavoisier/
├── pyproject.toml            # Project metadata and dependencies
├── LICENSE                   # Project license
├── README.md                 # This file
├── docs/                     # Documentation
│   ├── user_guide.md         # User documentation
│   └── developer_guide.md    # Developer documentation
├── lavoisier/                # Main package
│   ├── __init__.py           # Package initialization
│   ├── cli/                  # Command-line interface
│   │   ├── __init__.py
│   │   ├── app.py            # CLI application entry point
│   │   ├── commands/         # CLI command implementations
│   │   └── ui/               # Terminal UI components
│   ├── core/                 # Core functionality
│   │   ├── __init__.py
│   │   ├── metacognition.py  # Orchestration layer
│   │   ├── config.py         # Configuration management
│   │   ├── logging.py        # Logging utilities
│   │   └── ml/               # Machine learning components
│   │       ├── __init__.py
│   │       ├── models.py     # ML model implementations
│   │       └── MSAnnotator.py # MS2 annotation engine
│   ├── numerical/            # Numerical pipeline
│   │   ├── __init__.py
│   │   ├── processing.py     # Data processing functions
│   │   ├── pipeline.py       # Main pipeline implementation
│   │   ├── ms1.py            # MS1 spectra analysis
│   │   ├── ms2.py            # MS2 spectra analysis
│   │   ├── ml/               # Machine learning components
│   │   │   ├── __init__.py
│   │   │   ├── models.py     # ML model definitions
│   │   │   └── training.py   # Training utilities
│   │   ├── distributed/      # Distributed computing
│   │   │   ├── __init__.py
│   │   │   ├── ray_utils.py  # Ray integration
│   │   │   └── dask_utils.py # Dask integration
│   │   └── io/               # Input/output operations
│   │       ├── __init__.py
│   │       ├── readers.py    # File format readers
│   │       └── writers.py    # File format writers
│   ├── visual/               # Visual pipeline
│   │   ├── __init__.py
│   │   ├── conversion.py     # Spectra to visual conversion
│   │   ├── processing.py     # Visual processing
│   │   ├── video.py          # Video generation
│   │   └── analysis.py       # Visual analysis
│   ├── llm/                  # LLM integration
│   │   ├── __init__.py
│   │   ├── api.py            # API for LLM communication
│   │   ├── ollama.py         # Ollama integration
│   │   ├── commercial.py     # Commercial LLM integrations
│   │   └── query_gen.py      # Query generation
│   ├── models/               # Model repository
│   │   ├── __init__.py
│   │   ├── repository.py     # Model management
│   │   ├── distillation.py   # Knowledge distillation
│   │   └── versioning.py     # Model versioning
│   └── utils/                # Utility functions
│       ├── __init__.py
│       ├── helpers.py        # General helpers
│       └── validation.py     # Validation utilities
├── tests/                    # Tests
│   ├── __init__.py
│   ├── test_numerical.py
│   ├── test_visual.py
│   ├── test_llm.py
│   └── test_cli.py
└── examples/                 # Example workflows
    ├── basic_analysis.py
    ├── distributed_processing.py
    ├── llm_assisted_analysis.py
    └── visual_analysis.py

Installation & Usage

Installation

pip install lavoisier

For development installation:

git clone https://github.com/username/lavoisier.git
cd lavoisier
pip install -e ".[dev]"

Basic Usage

Process a single MS file:

lavoisier process --input sample.mzML --output results/

Run with LLM assistance:

lavoisier analyze --input sample.mzML --llm-assist

Perform comprehensive annotation:

lavoisier annotate --input sample.mzML --databases all --pathway-analysis

Compare numerical and visual pipelines:

lavoisier compare --input sample.mzML --output comparison/

Development Roadmap

  1. Phase 1: CLI Interface & Core Architecture
    • Implement high-performance CLI
    • Establish metacognitive orchestration layer
    • Integrate basic LLM capabilities
  2. Phase 2: Enhanced ML Integration
    • Deep learning for MS2 analysis
    • Transfer learning implementation
    • Model serialization standard
    • Comprehensive annotation system with multiple databases
  3. Phase 3: Advanced LLM & Continuous Learning
    • Commercial LLM integration
    • Knowledge distillation framework
    • Automated query generation
  4. Phase 4: Comparison & Validation
    • Numeric vs. visual pipeline comparison tools
    • Performance benchmarking framework
    • Validation suite

Contributing

Contributions are welcome! Please see our contributing guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

  1. Sachikonye, K. (2025). Lavoisier: A High-Performance Computing Solution for MassSpectrometry-Based Metabolomics with Novel Video AnalysisPipeline

Key Features

High-Performance Data Processing

Comprehensive Data Analysis

Specialized Models Integration

Flexible Visualization Suite