Buhera Language Reference
This document provides a comprehensive reference for the Buhera scripting language syntax, semantics, and built-in functions.
Language Grammar
Lexical Elements
Keywords
objective, validate, phase, import, if, else, abort, warn, check_*
Identifiers
[a-zA-Z_][a-zA-Z0-9_]*
Literals
- String literals:
"string content"
- Numeric literals:
123
,45.67
,1e-6
- Boolean literals:
true
,false
- Array literals:
["item1", "item2", "item3"]
Operators
=, ==, !=, <, >, <=, >=, AND, OR, NOT
Delimiters
:, ;, ,, (, ), [, ], {, }
Grammar Rules
Script Structure
script ::= import_list objective validation_list phase_list
import_list ::= import_statement*
import_statement ::= "import" module_path
objective ::= "objective" identifier ":" objective_fields
validation_list ::= validation_rule*
validation_rule ::= "validate" identifier ":" validation_body
phase_list ::= phase_definition*
phase_definition ::= "phase" identifier ":" phase_body
Objective Definition
objective_fields ::= objective_field+
objective_field ::= field_name ":" string_literal
field_name ::= "target" | "success_criteria" | "evidence_priorities"
| "biological_constraints" | "statistical_requirements"
Validation Rules
validation_body ::= statement_list
statement ::= assignment | function_call | conditional | action
action ::= abort_statement | warn_statement
abort_statement ::= "abort" "(" string_literal ")"
warn_statement ::= "warn" "(" string_literal ")"
Phase Definitions
phase_body ::= statement_list
statement ::= assignment | function_call | conditional
assignment ::= identifier "=" expression
function_call ::= module_path "." function_name "(" argument_list ")"
conditional ::= "if" condition ":" statement_list ["else" ":" statement_list]
Built-in Functions
Validation Functions
check_instrument_capability
Validates that the instrument can achieve the required analytical performance.
Usage:
validate InstrumentCheck:
check_instrument_capability
if target_concentration < instrument_detection_limit:
abort("Instrument cannot detect target concentrations")
Checks:
- Detection limits vs. target concentrations
- Mass accuracy requirements
- Chromatographic resolution needs
- Scan rate compatibility
check_sample_size
Validates statistical power based on sample size.
Usage:
validate StatisticalPower:
check_sample_size
if sample_size < required_minimum:
warn("Sample size may be insufficient for robust analysis")
Parameters:
effect_size
: Expected effect sizealpha_level
: Significance level (default: 0.05)power_requirement
: Required statistical power (default: 0.8)
check_pathway_consistency
Validates biological coherence of expected metabolic changes.
Usage:
validate BiologicalCoherence:
check_pathway_consistency
if expected_pathway_disruption AND missing_key_metabolites:
warn("Missing expected pathway disruption markers")
Data Loading Functions
load_dataset
Loads mass spectrometry data with metadata.
Syntax:
dataset = load_dataset(
file_path: "path/to/data.mzML",
metadata: "path/to/metadata.csv",
groups: ["control", "treatment"],
focus: "objective_context"
)
Parameters:
file_path
: Path to mzML filemetadata
: Optional metadata filegroups
: Sample groups for comparisonfocus
: Objective context for optimized loading
Lavoisier Integration Functions
lavoisier.mzekezeke.build_evidence_network
Builds objective-focused Bayesian evidence network.
Syntax:
evidence_network = lavoisier.mzekezeke.build_evidence_network(
data: dataset,
objective: "research_objective",
evidence_types: ["mass_match", "ms2_fragmentation"],
pathway_focus: ["glycolysis", "tca_cycle"],
evidence_weights: {"pathway_membership": 1.3}
)
Parameters:
data
: Input datasetobjective
: Research objective stringevidence_types
: Types of evidence to collectpathway_focus
: Biological pathways to prioritizeevidence_weights
: Custom evidence weighting
lavoisier.hatata.validate_with_objective
Validates analysis results against objective criteria.
Syntax:
annotations = lavoisier.hatata.validate_with_objective(
evidence_network: evidence_network,
objective: "research_objective",
confidence_threshold: 0.85,
success_criteria: {"sensitivity": 0.85}
)
Parameters:
evidence_network
: Evidence network to validateobjective
: Research objectiveconfidence_threshold
: Minimum confidence thresholdsuccess_criteria
: Success criteria dictionary
lavoisier.zengeza.noise_reduction
Objective-aware noise reduction.
Syntax:
clean_data = lavoisier.zengeza.noise_reduction(
data: raw_data,
objective_context: "biomarker_discovery",
preserve_patterns: ["glucose_pathway", "lipid_metabolism"]
)
Parameters:
data
: Raw data to processobjective_context
: Objective context for preservationpreserve_patterns
: Patterns to preserve during cleaning
Data Types
Basic Types
String
name = "diabetes_biomarker_discovery"
description = "Multi-line string content
can span multiple lines"
Number
threshold = 0.85
sample_size = 100
mass_accuracy = 1e-6 // Scientific notation
Boolean
validation_passed = true
analysis_complete = false
Array
evidence_types = ["mass_match", "ms2_fragmentation", "pathway_membership"]
pathway_focus = ["glycolysis", "gluconeogenesis", "tca_cycle"]
Complex Types
Dataset
Represents loaded mass spectrometry data.
Properties:
spectra_count
: Number of spectramass_range
: Mass range coverageretention_time_range
: Chromatographic rangemetadata
: Associated metadata
EvidenceNetwork
Represents a Bayesian evidence network.
Properties:
confidence
: Overall confidence scoreevidence_scores
: Individual evidence scoresobjective
: Associated objectiverecommendations
: Analysis recommendations
Control Flow
Conditional Statements
Basic If Statement
if condition:
statement_list
If-Else Statement
if condition:
statement_list
else:
alternative_statement_list
Complex Conditions
if sample_size >= 30 AND effect_size > 0.5:
proceed_with_analysis()
else:
recommend_sample_size_increase()
Logical Operators
AND Operator
if sensitivity >= 0.85 AND specificity >= 0.85:
validation_passed = true
OR Operator
if high_confidence_ms1 OR high_confidence_ms2:
accept_annotation()
NOT Operator
if NOT pathway_coherence_check:
warn("Pathway coherence validation failed")
Comparison Operators
// Equality
if confidence == 1.0:
perfect_match()
// Inequality
if error_rate != 0.0:
investigate_errors()
// Relational
if sample_size > minimum_required:
sufficient_power = true
if mass_error <= 5_ppm:
acceptable_accuracy = true
Comments
Single-line Comments
// This is a single-line comment
objective BiomarkerDiscovery: // End-of-line comment
target: "identify biomarkers"
Multi-line Comments
/*
* This is a multi-line comment
* Used for detailed explanations
* of complex analysis logic
*/
Documentation Comments
/**
* This phase builds an evidence network optimized for biomarker discovery.
* It prioritizes pathway membership evidence because biological relevance
* is critical for clinical biomarker validation.
*/
phase EvidenceBuilding:
evidence_network = lavoisier.mzekezeke.build_evidence_network(
objective: "biomarker_discovery"
)
Error Handling
Validation Errors
validate InstrumentCapability:
if target_concentration < detection_limit:
abort("Instrument cannot detect target concentrations")
// Script execution stops here
Warnings
validate SampleSize:
if sample_size < optimal_size:
warn("Sample size below optimal threshold")
// Script continues with warning
Runtime Errors
Runtime errors are handled by the Buhera runtime and Lavoisier integration:
- File not found: Data files missing
- Integration errors: Lavoisier module communication failures
- Analysis failures: Statistical or scientific validation failures
Reserved Words
The following identifiers are reserved and cannot be used as variable names:
objective, validate, phase, import, if, else, abort, warn, true, false,
AND, OR, NOT, check_instrument_capability, check_sample_size,
check_pathway_consistency, load_dataset
Naming Conventions
Objectives
Use PascalCase for objective names:
objective DiabetesBiomarkerDiscovery:
objective DrugMetabolismCharacterization:
Variables
Use snake_case for variable names:
evidence_network = build_network()
sample_metadata = load_metadata()
Functions
Use snake_case for function names:
build_evidence_network()
validate_with_objective()
Constants
Use UPPER_SNAKE_CASE for constants:
MIN_SAMPLE_SIZE = 30
MAX_MASS_ERROR = 5e-6
Best Practices
1. Objective-First Design
Always start with a clear, measurable objective:
// Good: Specific and measurable
objective DiabetesBiomarkerDiscovery:
target: "identify metabolites predictive of diabetes progression"
success_criteria: "sensitivity >= 0.85 AND specificity >= 0.85"
// Bad: Vague and unmeasurable
objective GeneralAnalysis:
target: "analyze some data"
success_criteria: "get results"
2. Comprehensive Validation
Include validation for all critical assumptions:
// Validate instrument capabilities
validate InstrumentCapability:
check_instrument_capability
// Validate statistical power
validate StatisticalPower:
check_sample_size
// Validate biological coherence
validate BiologicalCoherence:
check_pathway_consistency
3. Document Scientific Reasoning
Explain why specific choices were made:
phase EvidenceBuilding:
// Prioritize pathway membership for biomarker discovery
// because biological relevance is critical for clinical translation
evidence_network = lavoisier.mzekezeke.build_evidence_network(
evidence_weights: {"pathway_membership": 1.3}
)
4. Use Meaningful Names
Choose descriptive names that reflect scientific intent:
// Good: Descriptive and scientific
diabetes_progression_markers = load_dataset("diabetes_cohort.mzML")
glycolysis_focused_network = build_evidence_network(
pathway_focus: ["glycolysis"]
)
// Bad: Generic and unclear
data = load_dataset("file.mzML")
result = build_network()
This reference provides the foundation for writing effective Buhera scripts that encode scientific reasoning as executable, validatable programs.