Lab 4.3: Capstone Project - Autonomous Voice-Controlled Humanoid
Lab Objective
Goal: Integrate Modules 1-4 into single voice-controlled system demonstrating capstone requirements.
Skills: System integration, debugging, performance optimization, documentation.
Time: 20+ hours (intensive capstone project)
Capstone Deliverables
Your final submission must include:
- Code (GitHub repo with documentation)
- Video (5-8 min demonstration)
- Report (8-12 pages technical document)
Part 1: System Integration (8 hours)
1.1: Launch Stack
Create capstone.launch.xml:
<?xml version="1.0"?>
<launch>
<!-- Module 2: Gazebo -->
<include file="gazebo.launch.xml"/>
<!-- Module 3: SLAM + Navigation -->
<include file="slam.launch.xml"/>
<include file="navigation.launch.xml"/>
<!-- Module 4: Voice -->
<node pkg="voice_control" exec="voice_input_node.py" name="voice"/>
<node pkg="voice_control" exec="voice_vla_integration.py" name="vla"/>
</launch>
1.2: Create Unified Controller
class CapstoneHumanoid:
def __init__(self):
# All modules
self.perception = SlAMNav2()
self.language = LanguageAgent()
self.voice = VoiceNode()
self.controller = RobotController()
self.logger = setup_logging()
def execute(self):
"""Main capstone loop"""
while True:
cmd = self.voice.listen()
if cmd:
self.process_voice_command(cmd)
def process_voice_command(self, voice_input):
"""End-to-end: voice → action"""
# Transcribe (Whisper)
text = whisper.transcribe(voice_input)
self.logger.info(f"Heard: {text}")
# Understand (LLM)
state = self.perception.get_state()
plan = self.language.parse(text, state)
# Execute (with feedback)
for action in plan['actions']:
self.controller.execute_with_feedback(action)
self.speak(f"Completed: {text[:30]}...")
Part 2: Testing 3+ Voice Commands (6 hours)
Test these required commands:
1. "Walk to the kitchen"
→ Uses: SLAM localization + Nav2 navigation
2. "Find and pick up the blue cup"
→ Uses: Object detection + grasping + perception
3. "Bring it back here"
→ Uses: Navigation + grasp execution + return
Additional commands (for demonstrating capability):
- "Look around"
- "Show me what you see"
- "Describe the room"
Part 3: Video Recording (3 hours)
Video Requirements
Format:
- Duration: 5-8 minutes
- Resolution: 1080p (1920x1080)
- Frame rate: 30 fps
- Audio: Clear narration + robot sounds
Content:
-
Introduction (0-1 min)
- What is this system?
- Components overview
-
Demo 1 (1-3 min)
- "Walk to kitchen" command
- Show SLAM map building
- Show navigation path
-
Demo 2 (3-5 min)
- "Find blue cup" command
- Show object detection
- Show grasping
-
Demo 3 (5-6 min)
- "Bring it back" command
- Return navigation
- Success confirmation
-
Conclusion (6-8 min)
- System capabilities
- Lessons learned
- Future improvements
Recording Script Example
[Intro]
"This is our voice-controlled humanoid robot.
It integrates ROS 2 communication, Gazebo simulation,
Isaac perception, and LLM-based voice control."
[Demo]
"Robot, walk to the kitchen"
[Wait for execution]
"The robot used SLAM to localize itself, Nav2 to plan a path,
and actuators to walk to the kitchen."
Part 4: Technical Report (3+ hours)
Report Structure (8-12 pages):
1. Executive Summary (1 page)
- Problem statement
- Solution overview
- Key results
2. System Architecture (2 pages)
- Module 1: ROS 2 communication
- Module 2: Gazebo simulation
- Module 3: SLAM and perception
- Module 4: VLA and voice control
- Integration diagram
3. Implementation Details (3 pages)
- Language-to-action mapping
- Voice recognition (Whisper)
- Closed-loop control
- Safety mechanisms
4. Results & Validation (2 pages)
- Command success rates
- Latency measurements
- Perception accuracy
- Tables and graphs
5. Challenges & Solutions (1 page)
- What went wrong?
- How we fixed it?
- Trade-offs
6. Lessons Learned (1 page)
- Key insights
- Future work
- Recommendations
Part 5: Code Documentation (2 hours)
Requirements:
- Well-commented code
- README with setup instructions
- Architecture documentation
- Deployment guide
Submission Checklist
- Code on GitHub with clean commits
- README with: setup, usage, architecture
- 3+ voice commands working
- Video recorded and uploaded
- Technical report (8-12 pages)
- Grading rubric self-assessment
Evaluation Criteria
Your system will be graded on:
Code (30%):
- Correctness (does it work?)
- Quality (is it well-written?)
- Documentation (is it understandable?)
Video (30%):
- Clarity (can we see what's happening?)
- Completeness (are all 3 commands demonstrated?)
- Production quality
Report (20%):
- Technical depth
- Results and measurements
- Problem analysis
Integration (20%):
- All modules working together
- System stability
- Feature completeness
Success Criteria
Your capstone is successful if:
✓ Robot responds to 3+ voice commands ✓ Commands execute correctly (>90% success) ✓ System runs without crashes for 30+ minutes ✓ Latency acceptable (under 10s per command) ✓ Code is well-documented ✓ Video demonstrates all capabilities ✓ Report shows deep technical understanding
Troubleshooting
Common issues:
| Problem | Solution |
|---|---|
| Voice latency too high | Use smaller Whisper model |
| Robot crashes | Add error handling, timeouts |
| LLM errors | Improve prompt engineering |
| Perception fails | Add fallback behaviors |
Timeline
- Week 11: Integrate modules, basic testing
- Week 12: Refine voice commands, record video
- Week 13: Finalize report, submit
Resources
Summary
Lab 4.3 accomplishes:
- Complete system integration ✓
- 3+ working voice commands ✓
- Professional video demonstration ✓
- Technical documentation ✓
This is it! Your capstone project demonstrates mastery of all 4 modules.
Navigation
- Previous Lab: Lab 4.2: Voice Commands
- Capstone Guide: Capstone Requirements