Heihachi Neural Audio Analysis Framework

Revolutionary audio analysis framework featuring fire-based emotional querying, Autobahn biological intelligence integration, and Rust-powered performance. Combining neurological models with consciousness-aware audio generation.

"What makes a tiger so strong is that it lacks humanity"
Heihachi Framework Capabilities

🔥 Revolutionary Fire-Based Emotion Interface

Groundbreaking fire-based emotional querying system that taps into humanity's deepest cognitive patterns. Users create and manipulate digital fire through an intuitive WebGL interface, which the system "understands" using advanced AI reconstruction techniques.

The Science Behind Fire-Emotion Mapping

Based on extensive research into human consciousness and fire recognition, fire represents humanity's first and most fundamental abstraction - deeply embedded in our neural architecture. Fire recognition activates the same neural networks as human consciousness itself.

🔥

Digital Fire Creation

Intuitive WebGL interface for creating and manipulating fire with real-time physics simulation

🧠

Pakati Understanding Engine

AI system that "learns" fire patterns by reconstructing them from partial information

🎵

Direct Audio Generation

Converting understood fire patterns into music that matches the emotional content

How It Works

1

Fire Creation

Users interact with WebGL interface to create, maintain, and modify digital fire

2

Pattern Capture

System captures fire characteristics (intensity, color, movement, structure)

3

AI Understanding

Pakati reconstructs fire from partial information to prove comprehension

4

Audio Generation

Understood patterns drive Heihachi's audio synthesis engines

🧠 Autobahn Integration: Delegated Probabilistic Reasoning

Revolutionary delegated probabilistic reasoning architecture where all probabilistic tasks, Bayesian inference, biological intelligence, and consciousness modeling are delegated to the Autobahn oscillatory bio-metabolic RAG system.

Autobahn System Overview

🧬

12 Theoretical Frameworks

Including fire-evolved consciousness substrate and biological intelligence architectures

âš¡

Oscillatory Bio-Metabolic Processing

3-layer architecture with ATP-driven metabolic computation

🧘

Consciousness Emergence Modeling

Real-time IIT Φ calculation for consciousness quantification

📊

Advanced Uncertainty Quantification

Sophisticated Bayesian inference and fuzzy logic processing

Integration Benefits

Performance Optimization

  • Fire Pattern Analysis: <10ms (Autobahn oscillatory processing)
  • Audio Optimization: <20ms (Autobahn Bayesian inference)
  • Consciousness Calculation: <15ms (Autobahn IIT Φ)
  • End-to-End Latency: <50ms (Rust + Autobahn delegation)

Scientific Foundation

  • Biological Intelligence: Membrane processing with ion channel coherence
  • Consciousness Modeling: IIT-based Φ calculation for awareness quantification
  • Metabolic Computation: ATP-driven processing with multiple metabolic modes
  • Uncertainty Handling: Explicit modeling in all probabilistic operations

Delegation Architecture

Heihachi

  • Fire Interface
  • Pakati Engine
  • Rust Audio Core
→

Autobahn

  • Fire Pattern Analysis
  • Consciousness Modeling (IIT Φ)
  • Bayesian Optimization
→

Audio Output

  • Optimized Audio Generation
  • Consciousness-Informed Synthesis
  • Real-time Performance

FeltBeats: Music Discovery by Feeling

Transforming Heihachi into a revolutionary music listening application where users discover music by describing emotions and feelings - powered by academic research and continuous learning.

Discover Music by Feeling

Instead of searching by genre or artist, describe how you want to feel: "I want something dark and atmospheric with building tension" or "Find me energetic tracks with complex rhythms and heavy bass."

"I want to feel mysterious and anticipatory"
→ Atmospheric intro sections with filtered breaks and sparse percussion
"Find energetic sections with aggressive basslines"
→ Peak moments with layered percussion and bass stacking
"Something technical but spacious"
→ Complex drum patterns with reverb-heavy atmospheric elements

Dual LLM Architecture

Academic Knowledge LLM

Trained on ~100 scientific publications covering music perception, emotion, and drum & bass production. Provides deep theoretical understanding of how music affects emotions and neural processing.

Scientific Foundation Music Perception Emotional Response

Continuous Learning LLM

Builds domain expertise by continuously analyzing new mixes. Each analysis becomes training data, creating an ever-growing understanding of electronic music patterns and emotional characteristics.

Adaptive Learning Mix Analysis Pattern Recognition

REST API

Comprehensive REST API for integrating Heihachi's audio analysis capabilities into web applications, mobile apps, and other systems. Supports both synchronous and asynchronous processing.

🚀

Fast & Scalable

Asynchronous job processing with configurable concurrency limits and rate limiting

🔧

Easy Integration

RESTful endpoints with comprehensive documentation and client examples

🎯

Specialized Analysis

Dedicated endpoints for beats, drums, stems, emotions, and semantic search

Quick Start

# Install API dependencies
pip install flask flask-cors flask-limiter

# Start the API server
python api_server.py --host 0.0.0.0 --port 5000

# Or with production settings
python api_server.py --production --config-path configs/production.yaml
# Analyze audio file with emotion mapping
curl -X POST http://localhost:5000/api/v1/semantic/analyze \
  -F "file=@track.wav" \
  -F "include_emotions=true" \
  -F "index_for_search=true"

# Extract beats from audio
curl -X POST http://localhost:5000/api/v1/beats \
  -F "file=@track.mp3"

# Search indexed tracks semantically
curl -X POST http://localhost:5000/api/v1/semantic/search \
  -H "Content-Type: application/json" \
  -d '{"query": "dark aggressive neurofunk", "top_k": 5}'
import requests

# Extract emotional features
def extract_emotions(file_path):
    url = "http://localhost:5000/api/v1/semantic/emotions"
    with open(file_path, 'rb') as f:
        files = {'file': f}
        response = requests.post(url, files=files)
        return response.json()

# Example usage
emotions = extract_emotions("track.wav")
print(f"Dominant emotion: {emotions['summary']['dominant_emotion']}")
print(f"Energy level: {emotions['emotions']['energy']:.1f}/10")
// Analyze audio file
async function analyzeAudio(file) {
    const formData = new FormData();
    formData.append('file', file);
    formData.append('include_emotions', 'true');
    
    const response = await fetch('/api/v1/semantic/analyze', {
        method: 'POST',
        body: formData
    });
    
    return await response.json();
}

// Usage with file input
const fileInput = document.getElementById('audio-file');
fileInput.addEventListener('change', async (e) => {
    const result = await analyzeAudio(e.target.files[0]);
    console.log('Analysis result:', result);
});

Available Endpoints

Audio Analysis

POST /api/v1/analyze Full audio analysis pipeline
POST /api/v1/features Extract audio features
POST /api/v1/beats Beat and tempo detection
POST /api/v1/drums Drum pattern analysis
POST /api/v1/stems Audio stem separation

Semantic Analysis

POST /api/v1/semantic/analyze Semantic analysis with emotions
POST /api/v1/semantic/emotions Extract emotional features
POST /api/v1/semantic/search Semantic track search
POST /api/v1/semantic/text-analysis Text sentiment analysis

Job Management

POST /api/v1/batch-analyze Batch process multiple files
GET /api/v1/jobs/{id} Get job status and results
GET /api/v1/jobs List all jobs (paginated)

Analysis Results Visualization

The API returns detailed analysis data that can be visualized to understand track characteristics and emotional profiles. These visualizations help developers integrate meaningful insights into their applications.

Drum Hit Distribution

Drum Element Distribution

Shows the proportion of different drum elements, contributing to groove and rhythm characteristics

Confidence vs Velocity Analysis

Velocity-Confidence Analysis

Correlates drum hit confidence with velocity, indicating playing dynamics and energy levels

Semantic Analysis

Transform raw audio features into meaningful emotional dimensions and enable intelligent music discovery through semantic search and natural language queries.

Emotional Feature Mapping

Heihachi maps technical audio features to 9 distinct emotional dimensions using scientifically-grounded algorithms that correlate spectral, rhythmic, and temporal characteristics with human emotional perception.

Energy
8.5

Loudness, tempo, and drum intensity

Brightness
6.5

Spectral centroid and high-frequency content

Tension
7.5

Dissonance and rhythmic complexity

Warmth
4.5

Low-mid energy and harmonic richness

Groove
9.0

Microtiming and syncopation quality

Aggression
8.0

Transient sharpness and distortion

Atmosphere
7.0

Reverb amount and stereo width

Melancholy
3.5

Minor key and sparse arrangement

Euphoria
5.5

Major key and uplifting progressions

Technical Foundation

Semantic analysis builds upon detailed drum pattern analysis and feature extraction to create comprehensive emotional profiles. The system processes complex rhythmic patterns and translates them into meaningful emotional dimensions.

Drum Pattern Analysis Heatmap

Drum pattern analysis visualization showing the temporal distribution of different drum elements, which feeds into the emotional mapping algorithms to determine groove, energy, and tension characteristics.

Drum Density Over Time

Drum density analysis over time - high-density regions contribute to energy and aggression scores, while sparse sections indicate atmospheric or melancholic characteristics.

Command Line Integration

Semantic analysis capabilities are fully integrated into the Heihachi CLI for streamlined workflows.

Extract emotions: python -m src.main semantic emotions track.wav
Index for search: python -m src.main semantic index audio_folder/ --artist "Artist" --title "Track"
Search tracks: python -m src.main semantic search "atmospheric intro with tension building"
View statistics: python -m src.main semantic stats

Emotional Analysis Output

Transform raw audio analysis into structured, emotion-focused data that powers feeling-based music discovery and LLM training.

Analysis Structure

mix_analysis/
mix_001/
metadata.json # Basic mix info
summary.txt # Human-readable summary
segments.json # Track segments with timestamps
emotional_profile.json # Emotional characteristics
technical_features.jsonl # LLM-friendly features

emotional_profile.json

{
  "overall_mood": ["dark", "energetic", "technical"],
  "intensity_curve": [0.4, 0.5, 0.7, 0.8, 0.75, 0.9, 0.85, 0.7],
  "emotional_segments": [
    {
      "start_time": 0,
      "end_time": 390.0,
      "primary_emotion": "atmospheric",
      "tension_level": 0.4,
      "descriptors": ["spacious", "anticipatory", "mysterious"]
    }
  ],
  "peak_moments": [
    {
      "time": 870.5,
      "intensity": 0.92,
      "description": "Maximum energy with layered percussion and aggressive bassline",
      "key_elements": ["double_drops", "bass_stacking", "drum_fills"]
    }
  ]
}

segments.json

[
  {
    "segment_id": "s001",
    "start_time": 0,
    "end_time": 198.5,
    "type": "intro",
    "energy_level": 0.45,
    "key_elements": ["atmospheric_pads", "filtered_breaks", "sparse_percussion"],
    "description": "Atmospheric intro with filtered breaks and sparse percussion"
  },
  {
    "segment_id": "s002", 
    "start_time": 198.5,
    "end_time": 390.0,
    "type": "build",
    "energy_level": 0.68,
    "key_elements": ["rolling_bassline", "amen_break", "rising_synths"],
    "description": "Energy building section with rolling bassline and classic amen breaks"
  }
]

technical_features.jsonl

{"time": 0, "feature_type": "bass", "description": "Sub-heavy reese bass with moderate distortion and 120Hz fundamental", "characteristics": {"distortion": 0.35, "width": 0.7, "sub_weight": 0.8}}
{"time": 0, "feature_type": "drums", "description": "Broken beat pattern with ghost notes and 16th hi-hats", "characteristics": {"complexity": 0.65, "velocity_variation": 0.4, "swing": 0.2}}
{"time": 0, "feature_type": "atmosphere", "description": "Reverb-heavy pads with 6-8kHz air frequencies", "characteristics": {"reverb_size": 0.85, "density": 0.3, "brightness": 0.5}}
{"time": 198.5, "feature_type": "transition", "description": "Filter sweep transition with drum roll buildup", "characteristics": {"length_bars": 8, "smoothness": 0.7, "energy_change": 0.25}}

summary.txt

This 60-minute neurofunk mix features 24 tracks with consistent energy throughout. 
The mix begins with atmospheric elements at 174 BPM before transitioning to 
heavier sections at 6:30. Notable sections include an extended bass sequence 
from 18:20-22:45 featuring time-stretched Amen breaks and layered Reese basses. 
The final third introduces more percussive elements with complex drum patterns 
and syncopated rhythms. Energy peaks occur at 14:30, 28:15, and 52:40.

Overview

🧠

Neural Foundation

Built upon established neuroscientific research on rhythm processing and motor-auditory coupling

🎵

Genre Specialization

Optimized for electronic music analysis with focus on neurofunk and drum & bass

âš¡

High Performance

Memory-optimized processing, parallel execution, and GPU acceleration

🤖

AI Integration

HuggingFace models for advanced feature extraction and neural processing

Theoretical Foundation

Neural Basis of Rhythm Processing

The framework is built upon established neuroscientific research demonstrating that humans possess an inherent ability to synchronize motor responses with external rhythmic stimuli. This phenomenon, known as beat-based timing, involves complex interactions between auditory and motor systems in the brain.

Key Neural Mechanisms

  • Beat-based Timing Networks: Basal ganglia-thalamocortical circuits, supplementary motor area (SMA), premotor cortex (PMC)
  • Temporal Processing Systems: Duration-based timing mechanisms, beat-based timing mechanisms, motor-auditory feedback loops

Motor-Auditory Coupling

Research has shown that low-frequency neural oscillations from motor planning areas guide auditory sampling, expressed through coherence measures:

$$C_{xy}(f) = \frac{|S_{xy}(f)|^2}{S_{xx}(f)S_{yy}(f)}$$

Where:

  • $C_{xy}(f)$ represents coherence at frequency $f$
  • $S_{xy}(f)$ is the cross-spectral density
  • $S_{xx}(f)$ and $S_{yy}(f)$ are auto-spectral densities

Mathematical Framework

Spectral Decomposition

$$X(k) = \sum_{n=0}^{N-1} x(n)e^{-j2\pi kn/N}$$

Groove Pattern Analysis

$$MT(n) = \frac{1}{K}\sum_{k=1}^{K} |t_k(n) - t_{ref}(n)|$$

Amen Break Detection

$$S_{amen}(t) = \sum_{f} w(f)|X(f,t) - A(f)|^2$$

Reese Bass Analysis

$$R(t,f) = \left|\sum_{k=1}^{K} A_k(t)e^{j\phi_k(t)}\right|^2$$

Core Features

Rhythmic Analysis

  • Automated drum pattern recognition
  • Groove quantification
  • Microtiming analysis
  • Syncopation detection

Spectral Analysis

  • Multi-band decomposition
  • Harmonic tracking
  • Timbral feature extraction
  • Sub-bass characterization

Component Analysis

  • Sound source separation
  • Transformation detection
  • Energy distribution analysis
  • Component relationship mapping
Detailed Capabilities Breakdown

Comprehensive breakdown of Heihachi's multi-dimensional analysis capabilities

Amen Break Analysis

  • Pattern matching and variation detection
  • Transformation identification
  • Groove characteristic extraction
  • VIP/Dubplate classification
  • Robust onset envelope extraction

Prior Subspace Analysis

  • Neurofunk-specific component separation
  • Bass sound design analysis
  • Effect chain detection
  • Temporal structure analysis

Composite Similarity

  • Multi-band similarity computation
  • Transformation-aware comparison
  • Groove-based alignment
  • Confidence scoring

Peak Detection

  • Multi-band onset detection
  • Adaptive thresholding
  • Feature-based peak classification
  • Confidence scoring

Segment Clustering

  • Pattern-based segmentation
  • Hierarchical clustering
  • Relationship analysis
  • Transition detection

Transition Detection

  • Mix point identification
  • Blend type classification
  • Energy flow analysis
  • Structure boundary detection

Memory Management

  • Streaming processing for large files
  • Efficient cache utilization
  • GPU memory optimization
  • Automatic garbage collection

Parallel Processing

  • Multi-threaded feature extraction
  • Batch processing capabilities
  • Distributed analysis support
  • Adaptive resource allocation

Storage Efficiency

  • Compressed result storage
  • Metadata indexing
  • Version control for analysis results
  • Scalable parallel execution

AI Model Integration

Heihachi integrates specialized AI models from HuggingFace, enabling advanced neural processing of audio using state-of-the-art models carefully selected for electronic music analysis tasks.

Core Feature Extraction

Microsoft BEATs

High Priority

Bidirectional ViT-style encoder trained with acoustic tokenizers, providing 768-d latent embeddings at ~20ms hop length

Spectral Analysis Temporal Analysis

OpenAI Whisper

High Priority

Trained on >5M hours; encoder provides 1280-d features tracking energy, voicing & language

Robust Features Energy Tracking

Rhythm & Beat Analysis

Beat-Transformer

High Priority

Dilated self-attention encoder with F-measure ~0.86 for beat and downbeat detection

Beat Detection Downbeat Detection

BEAST

Medium Priority

50ms latency, causal attention; ideal for real-time DJ analysis

Real-time Low Latency

Audio Separation & Component Analysis

Demucs v4

High Priority

Returns 4-stem or 6-stem tensors for component-level analysis (drums, bass, vocals, other)

Stem Separation Component Analysis

Multimodal & Similarity

LAION CLAP

Medium Priority

Query with free-text and compute cosine similarity on 512-d embeddings

Multimodal Text-Audio

UniMus OpenJMLA

Medium Priority

Score arbitrary tag strings for effect-chain heuristics

Zero-shot Tagging

Usage Example

from heihachi.huggingface import FeatureExtractor, StemSeparator, BeatDetector

# Extract features
extractor = FeatureExtractor(model="microsoft/BEATs-base")
features = extractor.extract(audio_path="track.mp3")

# Separate stems
separator = StemSeparator()
stems = separator.separate(audio_path="track.mp3")
drums = stems["drums"]
bass = stems["bass"]

# Detect beats
detector = BeatDetector()
beats = detector.detect(audio_path="track.mp3", visualize=True, output_path="beats.png")
print(f"Tempo: {beats['tempo']} BPM")

Academic Knowledge Pipeline

Extract, process, and structure knowledge from ~100 scientific publications on music perception, emotion, and drum & bass production to build a comprehensive academic knowledge base.

Processing Pipeline

📄

PDF Extraction

Extract structured text from academic PDFs with layout preservation and section detection

→
🧠

Knowledge Extraction

Use LLMs to extract concepts, findings, and relationships from research papers

→
🔗

Knowledge Graph

Build interconnected knowledge base linking concepts across papers

→
âš¡

LLM Training

Generate training examples and fine-tune models for music expertise

Extracted Knowledge Types

Concepts

Key concepts related to music perception, emotion, and production techniques

Example: "Beat-based timing networks involve basal ganglia-thalamocortical circuits that enable synchronization with rhythmic stimuli"

Findings

Research conclusions and evidence about music and emotional responses

Example: "Low-frequency neural oscillations from motor planning areas guide auditory sampling (Chen et al., 2008)"

Relationships

Connections between concepts across different research domains

Example: "Motor-auditory coupling → enables → rhythm perception"

Implementation Overview

class AcademicKnowledgeProcessor:
    def process_papers(self, papers_directory):
        """Extract and structure knowledge from academic PDFs"""
        processed_papers = []
        
        for pdf_file in papers_directory:
            # Extract structured text with section preservation
            sections = self.extract_structured_text(pdf_file)
            metadata = self.extract_paper_metadata(pdf_file)
            
            # Use LLM to extract knowledge
            concepts = self.extract_concepts(sections)
            findings = self.extract_findings(sections)
            relationships = self.extract_relationships(concepts)
            
            processed_papers.append({
                "metadata": metadata,
                "concepts": concepts,
                "findings": findings,
                "relationships": relationships
            })
        
        return self.create_knowledge_base(processed_papers)
    
    def generate_training_examples(self, knowledge_base):
        """Create LLM training examples from extracted knowledge"""
        examples = []
        
        # Concept explanation examples
        for concept in knowledge_base['concepts']:
            examples.append({
                "input": f"What is {concept['name']} in music perception?",
                "output": concept['explanation']
            })
        
        # Application examples
        for concept in knowledge_base['concepts']:
            examples.append({
                "input": f"How can I apply {concept['name']} in drum and bass production?",
                "output": self.generate_application_example(concept)
            })
        
        return examples

Experimental Results

Demonstration of Heihachi's capabilities through comprehensive analysis of a 33-minute electronic music mix, showcasing advanced drum pattern recognition and temporal structure analysis.

91,179
Drum Hits Detected
33 min
Analysis Duration
5
Drum Categories
0.385
Avg. Confidence Score

Drum Hit Analysis

Advanced multi-stage analysis employing onset detection, neural network classification, confidence scoring, and temporal pattern recognition identified 91,179 percussion events across five primary categories.

Distribution by Type

Drum Hit Types Distribution

Distribution of 91,179 detected drum hits by type

Drum Hit Types Bar Chart

Comparative analysis of drum type frequencies

Drum Hits Timeline

Temporal distribution of drum events throughout the mix

Confidence vs Velocity Analysis

Relationship between detection confidence and velocity

Drum Density Analysis

Rhythmic density patterns across the entire mix

Drum Pattern Heatmap

Heatmap visualization of drum pattern intensity

Classification Performance

Toms

0.385 confidence

Snares

0.381 confidence

Kicks

0.370 confidence

Cymbals

0.284 confidence

Hi-hats

0.223 confidence

Key Findings

Microtiming Variations

Subtle deviations from quantized grid detected, particularly in hi-hats and snares, contributing to human feel

Structural Markers

Clear delineation of musical sections through changes in drum event density and type distribution

Layering Techniques

Overlapping drum hits at key points create impact moments through stacked percussion events

Rhythmic Motifs

Recurring patterns serve as stylistic identifiers throughout the mix structure

Documentation

Installation

Quick Install (Recommended)

# Clone the repository
git clone https://github.com/fullscreen-triangle/heihachi.git
cd heihachi

# Run the setup script
python scripts/setup.py

Manual Installation

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Installation Options

--dev Install development dependencies
--no-gpu Skip GPU acceleration dependencies
--no-interactive Skip interactive mode dependencies
--shell-completion Install shell completion scripts

Quick Usage

# Process a single audio file
heihachi process audio.wav --output results/

# Extract emotional features from audio
python -m src.main semantic emotions track.wav

# Start the REST API server
python api_server.py --host 0.0.0.0 --port 5000

# Index tracks for semantic search
python -m src.main semantic index audio_dir/ --artist "Artist"

# Search indexed tracks semantically
python -m src.main semantic search "dark atmospheric neurofunk"

# Extract features using AI models
heihachi hf extract audio.wav --model microsoft/BEATs-base