Kwasa-Kwasa Domain Expansions
This document explains how to use the new domain expansions in Kwasa-Kwasa, which extend the framework beyond text processing to handle genomic data and pattern-based meaning extraction.
Overview
The core philosophy of Kwasa-Kwasa is to work with arbitrarily defined units and their relationships, regardless of their semantic meaning to humans. The domain expansions take this concept further by applying the same powerful abstractions to:
- Genomic Sequence Analysis - Treating DNA/RNA sequences as units that can be manipulated with the same operators as text
- Pattern-Based Meaning Extraction - Finding meaning in fundamental patterns of symbols, independent of their conventional semantic meaning
Getting Started with Genomic Analysis
Setting Up
First, add the genomic module to your Turbulance imports:
import genomic
Basic Genomic Operations
Creating a Sequence
// Create a new DNA sequence
item dna = genomic.NucleotideSequence.new("ATGCTAGCTAGCTAGCTA", "gene_123")
// Access properties
print("GC content: {:.2f}%".format(dna.gc_content() * 100))
print("Length: {} bp".format(len(dna.content())))
Manipulating Sequences with Mathematical Operators
// Split into fragments using division operator
item motifs = dna / "GCT"
// Combine sequences with multiplication (recombination)
item exon1 = genomic.NucleotideSequence.new("ATGCCC", "exon1")
item exon2 = genomic.NucleotideSequence.new("GGGTGA", "exon2")
item joined = exon1 * exon2
// Concatenate sequences with addition
item concatenated = exon1 + exon2
// Remove pattern with subtraction
item filtered = dna - "GCT"
Using Within Blocks for Genomic Processing
within dna:
// Process the sequence
given contains("ATG"):
print("Found start codon at position {}".format(index_of("ATG")))
given gc_content() > 0.5:
print("High GC content region detected")
Processing Sequences with Different Unit Types
// Convert sequence to codons
within dna as codons:
for each codon:
given codon == "ATG":
print("Start codon found")
given codon in ["TAA", "TAG", "TGA"]:
print("Stop codon found")
Working with Genomic Propositions
proposition GeneRegulation:
motion Activation("Gene X activates Gene Y")
motion Inhibition("Gene Z inhibits Gene X")
within dna:
given contains("TATAAA"):
print("Found TATA box promoter")
ensure_follows("Gene body")
Pipeline Processing for Genomic Data
// Create a genomic analysis pipeline
item result = dna |>
find_open_reading_frames() |>
filter_by_length(min_length=100) |>
translate_to_protein() |>
predict_secondary_structure()
Getting Started with Pattern Analysis
Setting Up
First, add the pattern module to your Turbulance imports:
import pattern
Basic Pattern Operations
Creating Pattern Analyzers
// Create analyzers
item analyzer = pattern.PatternAnalyzer.new()
item ortho_analyzer = pattern.OrthographicAnalyzer.new()
Analyzing Character Distributions
// Analyze n-gram frequencies
item text = "The quick brown fox jumps over the lazy dog"
item trigrams = analyzer.analyze_ngrams(text, 3)
// Calculate entropy
item entropy = analyzer.shannon_entropy(text)
print("Text entropy: {:.2f} bits".format(entropy))
Finding Significant Patterns
// Detect statistically significant patterns
item patterns = analyzer.detect_significant_patterns(text, 2, 5)
for each p in patterns:
print("Pattern '{}' occurs {} times (significance: {:.2f})".format(
p.content(), p.occurrences(), p.significance()
))
Analyzing Visual Patterns
// Generate visual density map
item density_map = ortho_analyzer.visual_density(text, 40)
print("Average density: {:.2f}".format(density_map.average_density()))
// Extract visual rhythm
item rhythm = ortho_analyzer.visual_rhythm(text)
Mathematical Operators for Patterns
// Division: Split text by pattern type
item visual_units = text / "visual_class"
// Multiplication: Combine based on pattern similarity
item combined = text1 * text2
// Addition: Concatenate with pattern-aware joining
item joined = text1 + text2
// Subtraction: Remove common patterns
item uncommon = text - common_patterns
Working with Pattern Propositions
proposition TextPatternAnalysis:
motion FrequencyDistribution("Letter distribution shows specific patterns")
motion VisualDensity("Text has visual density anomalies")
within text:
given entropy > 4.5:
print("High information density detected")
given contains_unusual_patterns():
print("Text contains statistically unusual patterns")
Advanced Usage
Combining Genomic and Pattern Analysis
One powerful aspect of Kwasa-Kwasa is the ability to apply pattern analysis techniques to genomic data:
// Analyze patterns in a genomic sequence
item dna = genomic.NucleotideSequence.new("ATGCTAGCTAGCTAGCTA", "gene_123")
item pattern_analyzer = pattern.PatternAnalyzer.new()
// Find repeating patterns in DNA
item significant_patterns = pattern_analyzer.detect_significant_patterns(dna.content(), 3, 7)
for each p in significant_patterns:
print("DNA pattern '{}' occurs {} times".format(p.content(), p.occurrences()))
Creating Custom Unit Types
You can define your own unit types for specialized analysis:
struct CustomUnit:
content: bytes
metadata: any
funxn new(content, name):
return CustomUnit {
content: content,
metadata: { "name": name }
}
// Implement Unit trait
funxn content(self):
return self.content
funxn display(self):
return string(self.content)
funxn metadata(self):
return self.metadata
Example Projects
Check out the examples directory for complete projects that demonstrate these capabilities:
examples/genomic_analysis.turb
- Demonstrates genomic sequence analysisexamples/pattern_analysis.turb
- Shows pattern-based meaning extractionexamples/combined_analysis.turb
- Combines both approaches
Next Steps
To learn more about these domain expansions, refer to the following documents:
domain_expansion_plan.md
- Detailed implementation plan- API documentation under
docs/api/genomic/
anddocs/api/pattern/
- The source code in
src/genomic/
andsrc/pattern/