Lavoisier Documentation
Welcome to the comprehensive documentation for the Lavoisier mass spectrometry analysis framework. Lavoisier is a high-performance computing framework that combines numerical and visual processing methods with integrated artificial intelligence modules for automated compound identification and structural elucidation.
π― NEW: Precursor Framework
The Precursor module introduces a revolutionary approach to mass spectrometry analysis through S-Entropy Coordinates and Virtual Instruments.
Key Innovations
- S-Entropy Coordinates: 3D categorical coordinate system (S_knowledge, S_time, S_entropy) for platform-independent spectral representation
- Ion-to-Droplet Computer Vision: Bijective transformation of mass spectra into thermodynamic droplet images
- Virtual Instrument Ensemble: Hardware-grounded virtual mass spectrometers with phase-lock networks
- Molecular Maxwell Demon: Information-theoretic fragmentation analysis using categorical states
- Categorical Completion: Gap-filling and annotation through S-entropy trajectory analysis
Precursor Pipeline
Spectral Acquisition β S-Entropy Transform β Computer Vision
β
Virtual Instruments β Categorical Completion β BMD Grounding
Validated on UC Davis Metabolomics Dataset
The Precursor framework has been validated on the UC Davis metabolomics dataset (10 mzML files, ~16,000 spectra total), demonstrating:
- S-Entropy transformation: 800+ spectra/second
- Physics-validated droplet conversion: 50-100 spectra/second
- Cross-platform categorical consistency: Coherence > 0.85
π― Buhera Scripting Language
Lavoisier includes Buhera, a domain-specific scripting language that transforms mass spectrometry analysis by encoding the scientific method as executable scripts.
Buhera Documentation
- π Buhera Overview - Complete introduction to the Buhera scripting language
- π Language Reference - Comprehensive syntax and semantics reference
- π§ Integration Guide - Detailed guide to Buhera-Lavoisier integration
- π Tutorials - Step-by-step tutorials from beginner to advanced
- πΌ Script Examples - Practical examples for various applications
Key Buhera Features
- π― Objective-First Analysis: Scripts declare explicit scientific goals before execution
- β Pre-flight Validation: Catch experimental flaws before wasting time and resources
- π§ Goal-Directed AI: Bayesian evidence networks optimized for specific objectives
- π¬ Scientific Rigor: Enforced statistical requirements and biological coherence
Core Framework
System Architecture & Installation
- ποΈ Architecture Overview - System design and component relationships
- βοΈ Installation Guide - Setup instructions and requirements
- π Performance Benchmarks - System performance characteristics
AI Modules & Intelligence
- π€ AI Modules Overview - Comprehensive guide to all AI modules
- π§ Specialized Intelligence - Domain-specific AI capabilities
- π HuggingFace Integration - Machine learning model integration
- π Embodied Understanding - 3D molecular reconstruction validation
Analysis Pipelines
- π’ Numerical Analysis - Mathematical foundations and algorithms
- ποΈ Visual Processing - Computer vision and image analysis
- π Results & Validation - Analysis outputs and validation metrics
Development & Integration
- π§ Implementation Roadmap - Development planning and milestones
- π¦ Rust Integration - High-performance Rust components
- π Python Integration - Python module organization
- π Autobahn Integration - Probabilistic reasoning integration
Quick Start Guide
1. Precursor Analysis (NEW!)
from precursor.src.core.SpectraReader import extract_mzml
from precursor.src.core.EntropyTransformation import SEntropyTransformer
from precursor.src.core.IonToDropletConverter import IonToDropletConverter
# Load data
scan_info, spectra, xic = extract_mzml("your_data.mzML")
# Transform to S-Entropy coordinates
transformer = SEntropyTransformer()
coords, matrix = transformer.transform_spectrum(mz, intensity)
# Generate droplet images with physics validation
converter = IonToDropletConverter(resolution=(512, 512))
image, droplets = converter.convert_spectrum_to_image(mz, intensity)
# Access categorical coordinates
for droplet in droplets:
s_k = droplet.s_entropy_coords.s_knowledge # Structural knowledge
s_t = droplet.s_entropy_coords.s_time # Temporal position
s_e = droplet.s_entropy_coords.s_entropy # Thermodynamic entropy
2. Virtual Instrument Ensemble
from precursor.src.virtual import VirtualMassSpecEnsemble
# Create ensemble with all instruments
ensemble = VirtualMassSpecEnsemble(
enable_all_instruments=True,
enable_hardware_grounding=True
)
# Measure with cross-platform consensus
result = ensemble.measure_spectrum(mz, intensity, rt)
print(f"Phase-locks: {result.total_phase_locks}")
print(f"Convergence: {result.convergence_nodes_count}")
3. Complete Pipeline
cd precursor
# Run complete analysis on UC Davis dataset
python run_ucdavis_complete_analysis.py
# Or resume from Stage 2B (faster)
python run_ucdavis_resume.py
4. Buhera Script Analysis
# Build Buhera language
cd lavoisier-buhera && cargo build --release
# Create and execute a script
buhera validate biomarker_discovery.bh
buhera execute biomarker_discovery.bh
Pipeline Results Structure
After running Precursor analysis:
results/
βββ ucdavis_complete_analysis/
β βββ {file_name}/
β β βββ stage_01_preprocessing/
β β β βββ scan_info.csv
β β β βββ spectra/
β β βββ stage_02_sentropy/
β β β βββ sentropy_features.csv
β β β βββ matrices/
β β βββ stage_02_cv/
β β β βββ images/ # Droplet images
β β β βββ droplets/ # Physics-validated data
β β βββ stage_02_5_fragmentation/
β β βββ stage_03_bmd/
β β βββ stage_04_completion/
β β βββ stage_05_virtual/
β βββ analysis_summary.csv
βββ visualizations/
βββ entropy_space/
βββ molecular_language/
βββ phase_lock/
Use Cases
π¬ Scientific Research
- Metabolomics: S-Entropy coordinate analysis for metabolite identification
- Proteomics: Fragmentation pattern analysis with categorical completion
- Biomarker Discovery: Virtual instrument consensus for robust markers
- Cross-Platform Studies: Platform-independent categorical representation
π€ Computer Vision
- Ion-to-Droplet Conversion: Thermodynamic image generation
- Physics Validation: Navier-Stokes constrained droplet parameters
- Multi-Modal Analysis: Spectral + visual feature fusion
π Virtual Instruments
- Ensemble Consensus: Multi-instrument agreement scoring
- Hardware Grounding: Reality validation through oscillation harvesting
- Phase-Lock Networks: Molecular ensemble detection
Publications
The framework is documented in several publications under precursor/publication/:
- S-Entropy Coordinates: Categorical coordinate system for mass spectrometry
- Ion-to-Droplet Computer Vision: Bijective thermodynamic image generation
- Virtual Instruments: Hardware-grounded virtual mass spectrometers
- Molecular Language: Categorical amino acid alphabet and fragmentation grammar
Contributing
We welcome contributions to:
- Precursor Framework: S-Entropy, virtual instruments, computer vision
- Buhera Language: Rust-based language implementation
- Documentation: Tutorials, examples, and best practices
- Validation: Test cases and benchmarking datasets
See our implementation roadmap for current development priorities.
Community
- GitHub: lavoisier
- Issues: Report bugs and request features
- Discussions: Share use cases and get help
License
Lavoisier is released under the MIT License. See LICENSE file for details.
βOnly the extraordinary can beget the extraordinaryβ - Antoine Lavoisier
Transform your mass spectrometry analysis with S-Entropy coordinates and virtual instruments.