Pipeline
VIDEO PROCESSING PIPELINE
Distributed Processing for Sports Video Analysis
CORE FEATURES
Distributed video processing with
intelligent memory management
The pipeline processes videos using distributed computing to extract pose data, generate annotated videos, and train LLMs. It combines Ray for distributed analysis, Dask for parallel processing, and MediaPipe for pose estimation.
SYSTEM ARCHITECTURE
Distributed components working
together for optimal performance
Video Pipeline
MediaPipe Processor
Memory Monitor
LLM Trainer
RayDistributed Analysis
Parallel processing
Distributed analysis of pose data with automatic load balancing and fault tolerance.
DaskFrame Processing
Parallel video frames
Efficient batch processing of video frames with intelligent memory management.
MediaPipePose Estimation
Real-time landmarks
High-accuracy pose estimation with confidence scoring and temporal consistency.
Memory MonitorResource Control
Intelligent limits
Prevents crashes by limiting memory usage to 40% of system RAM with dynamic scaling.
USAGE EXAMPLES
Simple commands for powerful processing
# Process a single video
python pipeline.py --video public/your_video.mp4
# Generate both annotated video and pose model
python pipeline.py --video public/basketball_game.mp4 --sport_type basketball
# Output to custom directory
python pipeline.py --video public/training.mp4 --output results
Single video processing with automatic pose detection and annotation generation.
# Process all videos in a folder
python pipeline.py --input public
# Batch process with custom workers
python pipeline.py --input videos --workers 6 --batch_size 8
# Generate only pose models (no videos)
python pipeline.py --input public --no_video
Batch processing multiple videos with distributed workers and customizable parameters.
# Limit memory usage to 30% of system RAM
python pipeline.py --memory_limit 0.3
# Conservative processing for low-spec machines
python pipeline.py --memory_limit 0.2 --workers 2 --batch_size 3
# High-performance processing
python pipeline.py --memory_limit 0.6 --workers 8 --batch_size 10
Intelligent memory management prevents crashes and optimizes performance for your hardware.
# Train LLM with generated pose data
python pipeline.py --input public --train_llm
# Use OpenAI API for synthetic data generation
python pipeline.py --train_llm --use_openai
# Use Claude API for enhanced analysis
python pipeline.py --train_llm --use_claude
Integrate with LLM training and API services for advanced AI-powered analysis.
COMMAND LINE OPTIONS
Complete parameter reference
-
Input/Output Options
--input, -i: Input folder containing videos (default: public)
--video, -v: Process a single video file
--output, -o: Output folder for processed videos (default: output)
--models, -m: Folder for pose models (default: models)
-
Processing Control
--memory_limit: Memory limit as fraction of total (default: 0.4)
--workers: Number of worker processes (default: auto)
--batch_size: Frames to process per batch (default: 5)
--no_video: Skip generating annotated videos (models only)
-
AI Integration
--train_llm: Train LLM using the generated pose models
--use_openai: Use OpenAI API for synthetic data generation
--use_claude: Use Claude API for synthetic data generation
--sport_type: Type of sport for context (e.g., basketball, soccer)
-
Storage Options
--llm_data, -l: Folder for LLM training data (default: llm_training_data)
--llm_models: Folder for trained LLM models (default: llm_models)
All paths can be absolute or relative to the current working directory.
PROCESSING FLOW
Step-by-step pipeline execution
Video Loading
Batch Processing
Pose Analysis
LLM Training
Step 1Video Loading & Batch Split
Intelligent preprocessing
Videos are loaded and split into manageable batches for distributed processing with memory monitoring.
Step 2MediaPipe Processing
Pose estimation
Batches are processed with MediaPipe via Dask workers for real-time pose landmark detection.
Step 3Ray Analysis
Distributed computation
Pose landmarks are analyzed with Ray for parallel processing and biomechanical calculations.
Step 4Results Combination
Output generation
Results are combined and saved as annotated videos and pose model JSON files.
Step 5LLM Training
AI integration
Pose data is converted to training examples and used for LLM training or synthetic data generation.
"The distributed pipeline processes our training videos 5x faster than traditional methods while maintaining research-grade accuracy."
Coach Martinez
Olympic Training Center
"Memory management is flawless - we can process hours of footage on standard hardware without crashes or performance issues."
Dr. Kim
Sports Science Institute