Skip to the content.

Lavoisier

Only the extraordinary can beget the extraordinary

Spectacular Logo

Python Version License License: MIT Hugging Face Claude ChatGPT IntelliJ IDEA Python

Lavoisier is a high-performance computing solution for mass spectrometry-based metabolomics data analysis pipelines. It combines traditional numerical methods with advanced visualization and AI-driven analytics to provide comprehensive insights from high-volume MS data.

Core Architecture

Lavoisier features a sophisticated AI-driven architecture that combines multiple specialized modules for mass spectrometry analysis:

  1. Diadochi Framework: Multi-domain LLM orchestration system for intelligent query routing and expert collaboration
  2. Mzekezeke: Bayesian Evidence Network with Fuzzy Logic for probabilistic MS annotations
  3. Hatata: Markov Decision Process verification layer for stochastic validation
  4. Zengeza: Intelligent noise reduction using statistical analysis and machine learning
  5. Nicotine: Context verification system with cryptographic puzzles for AI integrity
  6. Diggiden: Adversarial testing system for evidence network vulnerability assessment
┌─────────────────────────────────────────────────────────────────────────────┐
│                           Lavoisier AI Architecture                           │
│                                                                             │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐        │
│  │                 │    │                 │    │                 │        │
│  │   Diadochi      │◄──►│   Mzekezeke     │◄──►│    Hatata       │        │
│  │   (LLM Routing) │    │ (Bayesian Net)  │    │ (MDP Verify)    │        │
│  │                 │    │                 │    │                 │        │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘        │
│           ▲                        ▲                        ▲              │
│           │                        │                        │              │
│           ▼                        ▼                        ▼              │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐        │
│  │                 │    │                 │    │                 │        │
│  │    Zengeza      │◄──►│    Nicotine     │◄──►│    Diggiden     │        │
│  │ (Noise Reduce)  │    │ (Context Verify)│    │ (Adversarial)   │        │
│  │                 │    │                 │    │                 │        │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Command Line Interface

Lavoisier provides a high-performance CLI interface for seamless interaction with all system components:

Numerical Processing Pipeline

The numerical pipeline processes raw mass spectrometry data through a distributed computing architecture, specifically designed for handling large-scale MS datasets:

Raw Data Processing

Comprehensive MS2 Annotation

Enhanced MS2 Analysis

Distributed Computing

Data Management

Processing Features

Visual Analysis Pipeline

The visualization pipeline transforms processed MS data into interpretable visual formats:

Spectrum Analysis

Visualization Generation

Data Integration

Output Formats

LLM Integration & Continuous Learning

Lavoisier integrates commercial and open-source LLMs to enhance analytical capabilities and enable continuous learning:

Assistive Intelligence

Solver Architecture

Continuous Learning System

Metacognitive Query Generation

Specialized Models Integration

Lavoisier incorporates domain-specific models for advanced analysis tasks:

Biomedical Language Models

Scientific Text Encoders

Chemical Named Entity Recognition

Proteomics Analysis

Advanced Model Architecture

Lavoisier features a comprehensive multi-tier model architecture that integrates cutting-edge AI technologies:

1. Models Module (lavoisier.models)

The models module provides a complete framework for managing, versioning, and deploying specialized AI models:

Chemical Language Models (chemical_language_models.py)

Spectral Transformer Models (spectral_transformers.py)

Embedding Models (embedding_models.py)

Model Repository System (repository.py)

Knowledge Distillation (distillation.py)

Model Registry (registry.py)

Version Management (versioning.py)

2. LLM Integration Module (lavoisier.llm)

The LLM module provides comprehensive integration with large language models for enhanced analytical capabilities:

LLM Service Architecture (service.py)

API Client Layer (api.py)

Commercial LLM Proxy (commercial.py)

Local LLM Support (ollama.py)

Query Generation System (query_gen.py)

Chemical NER (chemical_ner.py)

Text Encoders (text_encoders.py)

Specialized LLM (specialized_llm.py)

3. AI Integration Module (lavoisier.ai_modules.integration)

The integration module orchestrates all AI components into a cohesive analytical system:

Advanced MS Analysis System

Analysis Pipeline

  1. Stage 1 - Noise Reduction: Zengeza intelligent noise removal
  2. Stage 2 - Evidence Networks: Mzekezeke Bayesian network construction
  3. Stage 3 - Context Verification: Nicotine cryptographic puzzle validation
  4. Stage 4 - MDP Validation: Hatata stochastic verification
  5. Stage 5 - Security Assessment: Diggiden adversarial testing
  6. Stage 6 - Integration: Unified result compilation and confidence scoring

System Health Monitoring

Export and Reporting

Enhanced System Architecture

The complete Lavoisier architecture now includes these additional layers:

┌─────────────────────────────────────────────────────────────────────────────┐
│                      Enhanced Lavoisier Architecture                          │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                          LLM Integration Layer                          │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │ │
│  │  │ Commercial  │  │    Local    │  │   Query     │  │  Chemical   │   │ │
│  │  │     LLMs    │  │    LLMs     │  │  Generator  │  │     NER     │   │ │
│  │  │ (GPT/Claude)│  │  (Ollama)   │  │             │  │             │   │ │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                      ▲                                       │
│                                      │                                       │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                          Models Management Layer                        │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │ │
│  │  │  Chemical   │  │  Spectral   │  │ Embedding   │  │ Knowledge   │   │ │
│  │  │  Language   │  │Transformers │  │   Models    │  │Distillation │   │ │
│  │  │   Models    │  │             │  │             │  │             │   │ │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                      ▲                                       │
│                                      │                                       │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                        AI Integration Layer                             │ │
│  │                    ┌─────────────────────────┐                         │ │
│  │                    │  Advanced MS Analysis   │                         │ │
│  │                    │        System           │                         │ │
│  │                    └─────────────────────────┘                         │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                      ▲                                       │
│                                      │                                       │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                           Core AI Modules                              │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │ │
│  │  │  Diadochi   │  │  Mzekezeke  │  │   Hatata    │  │   Zengeza   │   │ │
│  │  │ (LLM Route) │  │(Bayes Net)  │  │(MDP Verify) │  │(Noise Reduce│   │ │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │ │
│  │  ┌─────────────┐  ┌─────────────┐                                      │ │
│  │  │  Nicotine   │  │  Diggiden   │                                      │ │
│  │  │(Context Ver)│  │(Adversarial)│                                      │ │
│  │  └─────────────┘  └─────────────┘                                      │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                      ▲                                       │
│                                      │                                       │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      Processing Pipelines                              │ │
│  │  ┌─────────────────┐                    ┌─────────────────┐            │ │
│  │  │    Numerical    │                    │     Visual      │            │ │
│  │  │    Pipeline     │                    │    Pipeline     │            │ │
│  │  │                 │                    │                 │            │ │
│  │  └─────────────────┘                    └─────────────────┘            │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘

Key Capabilities

AI-Driven Analysis

Advanced Model Management

LLM-Enhanced Analysis

Advanced Annotation System

Quality Assurance

Performance Optimization

Performance

Data Handling

Annotation Capabilities

Analysis Features

Use Cases

Advanced MS Analysis

AI Research Applications

Security and Robustness

Proteomics Research

Metabolomics Studies

Quality Control

Data Visualization

Results & Validation

Our comprehensive validation demonstrates the effectiveness of Lavoisier’s dual-pipeline approach through rigorous statistical analysis and performance metrics:

Core Performance Metrics

Example Analysis Results

Mass Spectrometry Analysis

Full MS Scan Full scan mass spectrum showing the comprehensive metabolite profile with high mass accuracy and resolution

MS/MS Analysis MS/MS fragmentation pattern analysis for glucose, demonstrating detailed structural elucidation

Feature Comparison Comparison of feature extraction between numerical and visual pipelines, showing high concordance and complementarity

Visual Pipeline Output

The following video demonstrates our novel computer vision approach to mass spectrometry analysis:

The video above shows real-time mass spectrometry data analysis through our computer vision pipeline, demonstrating:

Technical Details: Novel Visual Analysis Method

The visual pipeline represents a groundbreaking approach to mass spectrometry data analysis through computer vision techniques. This section details the mathematical foundations and implementation of the method demonstrated in the video above.

Mathematical Formulation

  1. Spectrum-to-Image Transformation The conversion of mass spectra to visual representations follows:
    F(m/z, I) → R^(n×n)
    

    where:

    • m/z ∈ R^k: mass-to-charge ratio vector
    • I ∈ R^k: intensity vector
    • n: resolution dimension (default: 1024)

    The transformation is defined by:

    P(x,y) = G(σ) * ∑[δ(x - φ(m/z)) · ψ(I)]
    

    where:

    • P(x,y): pixel intensity at coordinates (x,y)
    • G(σ): Gaussian kernel with σ=1
    • φ: m/z mapping function to x-coordinate
    • ψ: intensity scaling function (log1p transform)
    • δ: Dirac delta function
  2. Temporal Integration Sequential frames are processed using a sliding window approach:
    B_t = {F_i | i ∈ [t-w, t]}
    

    where:

    • B_t: frame buffer at time t
    • w: window size (default: 30 frames)
    • F_i: transformed frame at time i

Feature Detection and Tracking

  1. Scale-Invariant Feature Transform (SIFT)
    • Keypoint detection using DoG (Difference of Gaussians)
    • Local extrema detection in scale space
    • Keypoint localization and filtering
    • Orientation assignment
    • Feature descriptor generation
  2. Temporal Pattern Analysis
    • Optical flow computation using Farneback method
    • Flow magnitude and direction analysis:
      M(x,y) = √(fx² + fy²)
      θ(x,y) = arctan(fy/fx)
      

      where:

    • M: flow magnitude
    • θ: flow direction
    • fx, fy: flow vectors

Pattern Recognition

  1. Feature Correlation Temporal patterns are analyzed using frame-to-frame correlation:
    C(i,j) = corr(F_i, F_j)
    

    where C(i,j) is the correlation coefficient between frames i and j.

  2. Significant Movement Detection Features are tracked using a statistical threshold:
    T = μ(M) + 2σ(M)
    

    where:

    • T: movement threshold
    • μ(M): mean flow magnitude
    • σ(M): standard deviation of flow magnitude

Implementation Details

  1. Resolution and Parameters
    • Frame resolution: 1024×1024 pixels
    • Feature vector dimension: 128
    • Gaussian blur σ: 1.0
    • Frame rate: 30 fps
    • Window size: 30 frames
  2. Processing Pipeline a. Raw spectrum acquisition b. m/z and intensity normalization c. Coordinate mapping d. Gaussian smoothing e. Feature detection f. Temporal integration g. Video generation

  3. Quality Metrics
    • Structural Similarity Index (SSIM)
    • Peak Signal-to-Noise Ratio (PSNR)
    • Feature stability across frames
    • Temporal consistency measures

This novel approach enables:

Analysis Outputs

The system generates comprehensive analytical outputs organized in:

  1. Time Series Analysis (time_series/)
    • Chromatographic peak tracking
    • Retention time alignment
    • Intensity variation monitoring
  2. Feature Analysis (feature_analysis/)
    • Principal component analysis
    • Feature clustering
    • Pattern recognition results
  3. Interactive Dashboards (interactive_dashboards/)
    • Real-time data exploration
    • Dynamic filtering capabilities
    • Interactive peak annotation
  4. Publication Quality Figures (publication_figures/)
    • High-resolution spectral plots
    • Statistical analysis visualizations
    • Comparative analysis figures

Pipeline Complementarity

The dual-pipeline approach shows strong synergistic effects:

Validation Methodology

For detailed information about our validation approach and complete results, please refer to:

Project Structure

lavoisier/
├── pyproject.toml            # Project metadata and dependencies
├── LICENSE                   # Project license
├── README.md                 # This file
├── docs/                     # Documentation
│   ├── ai-modules.md         # Comprehensive AI modules documentation
│   ├── user_guide.md         # User documentation
│   ├── developer_guide.md    # Developer documentation
│   ├── architecture.md       # System architecture details
│   └── performance.md        # Performance benchmarking
├── lavoisier/                # Main package
│   ├── __init__.py           # Package initialization
│   ├── diadochi/             # Multi-domain LLM framework
│   │   ├── __init__.py
│   │   ├── core.py           # Core framework components
│   │   ├── routers.py        # Query routing strategies
│   │   ├── chains.py         # Sequential processing chains
│   │   └── mixers.py         # Response mixing strategies
│   ├── ai_modules/           # Specialized AI modules
│   │   ├── __init__.py
│   │   ├── integration.py    # AI system orchestration
│   │   ├── mzekezeke.py      # Bayesian Evidence Network
│   │   ├── hatata.py         # MDP Verification Layer
│   │   ├── zengeza.py        # Intelligent Noise Reduction
│   │   ├── nicotine.py       # Context Verification System
│   │   └── diggiden.py       # Adversarial Testing Framework
│   ├── models/               # AI Model Management
│   │   ├── __init__.py
│   │   ├── chemical_language_models.py  # ChemBERTa, MoLFormer, PubChemDeBERTa
│   │   ├── spectral_transformers.py     # SpecTUS model
│   │   ├── embedding_models.py          # CMSSP model
│   │   ├── huggingface_models.py        # HuggingFace integration
│   │   ├── distillation.py              # Knowledge distillation
│   │   ├── registry.py                  # Model registry system
│   │   ├── repository.py                # Model repository
│   │   ├── versioning.py                # Model versioning
│   │   └── papers.py                    # Research papers integration
│   ├── llm/                  # LLM Integration Layer
│   │   ├── __init__.py
│   │   ├── service.py        # LLM service architecture
│   │   ├── api.py            # API client layer
│   │   ├── query_gen.py      # Query generation system
│   │   ├── commercial.py     # Commercial LLM proxy
│   │   ├── ollama.py         # Local LLM support
│   │   ├── chemical_ner.py   # Chemical NER
│   │   ├── text_encoders.py  # Scientific text encoders
│   │   └── specialized_llm.py # Specialized LLM implementations
│   ├── core/                 # Core functionality
│   │   ├── __init__.py
│   │   ├── config.py         # Configuration management
│   │   ├── logging.py        # Logging utilities
│   │   └── registry.py       # Component registry
│   ├── numerical/            # Traditional MS analysis pipeline
│   │   ├── __init__.py
│   │   ├── numeric.py        # Main numerical analysis
│   │   ├── ms1.py            # MS1 spectra analysis
│   │   ├── ms2.py            # MS2 spectra analysis
│   │   └── io/               # Input/output operations
│   │       ├── __init__.py
│   │       ├── readers.py    # File format readers
│   │       └── writers.py    # File format writers
│   ├── visual/               # Computer vision pipeline
│   │   ├── __init__.py
│   │   ├── conversion.py     # Spectra to visual conversion
│   │   ├── processing.py     # Visual processing
│   │   ├── video.py          # Video generation
│   │   └── analysis.py       # Visual analysis
│   ├── proteomics/           # Proteomics analysis
│   │   └── __init__.py       # Proteomics module initialization
│   ├── cli/                  # Command-line interface
│   │   ├── __init__.py
│   │   ├── app.py            # CLI application entry point
│   │   ├── commands/         # CLI command implementations
│   │   └── ui/               # Terminal UI components
│   └── utils/                # Utility functions
│       ├── __init__.py
│       ├── helpers.py        # General helpers
│       └── validation.py     # Validation utilities
├── tests/                    # Tests
│   ├── __init__.py
│   ├── test_ai_modules.py    # AI modules tests
│   ├── test_models.py        # Models module tests
│   ├── test_llm.py           # LLM integration tests
│   ├── test_diadochi.py      # Diadochi framework tests
│   ├── test_numerical.py     # Numerical pipeline tests
│   └── test_cli.py           # CLI tests
├── scripts/                  # Analysis scripts
│   ├── run_mtbls1707_analysis.py # MTBLS1707 benchmark
│   └── benchmark_analysis.py     # Performance benchmarking
└── examples/                 # Example workflows
    ├── ai_assisted_analysis.py   # AI-driven analysis
    ├── adversarial_testing.py    # Security testing
    ├── bayesian_annotation.py    # Bayesian network annotation
    ├── model_distillation.py     # Knowledge distillation example
    ├── llm_integration.py        # LLM integration example
    └── complete_pipeline.py      # Full pipeline example

Installation & Usage

Installation

pip install lavoisier

For development installation:

git clone https://github.com/username/lavoisier.git
cd lavoisier
pip install -e ".[dev]"

Basic Usage

Process a single MS file:

lavoisier process --input sample.mzML --output results/

Run with LLM assistance:

lavoisier analyze --input sample.mzML --llm-assist