Lab 4.3: Capstone Project - Autonomous Voice-Controlled Humanoid

Lab Objective

Goal: Integrate Modules 1-4 into single voice-controlled system demonstrating capstone requirements.

Skills: System integration, debugging, performance optimization, documentation.

Time: 20+ hours (intensive capstone project)

Capstone Deliverables

Your final submission must include:

Code (GitHub repo with documentation)
Video (5-8 min demonstration)
Report (8-12 pages technical document)

Part 1: System Integration (8 hours)

1.1: Launch Stack

Create capstone.launch.xml:

<?xml version="1.0"?>
<launch>
  <!-- Module 2: Gazebo -->
  <include file="gazebo.launch.xml"/>

  <!-- Module 3: SLAM + Navigation -->
  <include file="slam.launch.xml"/>
  <include file="navigation.launch.xml"/>

  <!-- Module 4: Voice -->
  <node pkg="voice_control" exec="voice_input_node.py" name="voice"/>
  <node pkg="voice_control" exec="voice_vla_integration.py" name="vla"/>
</launch>

1.2: Create Unified Controller

class CapstoneHumanoid:
    def __init__(self):
        # All modules
        self.perception = SlAMNav2()
        self.language = LanguageAgent()
        self.voice = VoiceNode()
        self.controller = RobotController()
        self.logger = setup_logging()

    def execute(self):
        """Main capstone loop"""
        while True:
            cmd = self.voice.listen()
            if cmd:
                self.process_voice_command(cmd)

    def process_voice_command(self, voice_input):
        """End-to-end: voice → action"""
        # Transcribe (Whisper)
        text = whisper.transcribe(voice_input)
        self.logger.info(f"Heard: {text}")

        # Understand (LLM)
        state = self.perception.get_state()
        plan = self.language.parse(text, state)

        # Execute (with feedback)
        for action in plan['actions']:
            self.controller.execute_with_feedback(action)

        self.speak(f"Completed: {text[:30]}...")

Part 2: Testing 3+ Voice Commands (6 hours)

Test these required commands:

1. "Walk to the kitchen"
   → Uses: SLAM localization + Nav2 navigation

2. "Find and pick up the blue cup"
   → Uses: Object detection + grasping + perception

3. "Bring it back here"
   → Uses: Navigation + grasp execution + return

Additional commands (for demonstrating capability):

"Look around"
"Show me what you see"
"Describe the room"

Part 3: Video Recording (3 hours)

Video Requirements

Format:

Duration: 5-8 minutes
Resolution: 1080p (1920x1080)
Frame rate: 30 fps
Audio: Clear narration + robot sounds

Content:

Introduction (0-1 min)
- What is this system?
- Components overview
Demo 1 (1-3 min)
- "Walk to kitchen" command
- Show SLAM map building
- Show navigation path
Demo 2 (3-5 min)
- "Find blue cup" command
- Show object detection
- Show grasping
Demo 3 (5-6 min)
- "Bring it back" command
- Return navigation
- Success confirmation
Conclusion (6-8 min)
- System capabilities
- Lessons learned
- Future improvements

Recording Script Example

[Intro]
"This is our voice-controlled humanoid robot.
It integrates ROS 2 communication, Gazebo simulation,
Isaac perception, and LLM-based voice control."

[Demo]
"Robot, walk to the kitchen"
[Wait for execution]
"The robot used SLAM to localize itself, Nav2 to plan a path,
and actuators to walk to the kitchen."

Part 4: Technical Report (3+ hours)

Report Structure (8-12 pages):

1. Executive Summary (1 page)
   - Problem statement
   - Solution overview
   - Key results

2. System Architecture (2 pages)
   - Module 1: ROS 2 communication
   - Module 2: Gazebo simulation
   - Module 3: SLAM and perception
   - Module 4: VLA and voice control
   - Integration diagram

3. Implementation Details (3 pages)
   - Language-to-action mapping
   - Voice recognition (Whisper)
   - Closed-loop control
   - Safety mechanisms

4. Results & Validation (2 pages)
   - Command success rates
   - Latency measurements
   - Perception accuracy
   - Tables and graphs

5. Challenges & Solutions (1 page)
   - What went wrong?
   - How we fixed it?
   - Trade-offs

6. Lessons Learned (1 page)
   - Key insights
   - Future work
   - Recommendations

Part 5: Code Documentation (2 hours)

Requirements:

Well-commented code
README with setup instructions
Architecture documentation
Deployment guide

Submission Checklist

Code on GitHub with clean commits
README with: setup, usage, architecture
3+ voice commands working
Video recorded and uploaded
Technical report (8-12 pages)
Grading rubric self-assessment

Evaluation Criteria

Your system will be graded on:

Code (30%):

Correctness (does it work?)
Quality (is it well-written?)
Documentation (is it understandable?)

Video (30%):

Clarity (can we see what's happening?)
Completeness (are all 3 commands demonstrated?)
Production quality

Report (20%):

Technical depth
Results and measurements
Problem analysis

Integration (20%):

All modules working together
System stability
Feature completeness

Success Criteria

Your capstone is successful if:

✓ Robot responds to 3+ voice commands ✓ Commands execute correctly (>90% success) ✓ System runs without crashes for 30+ minutes ✓ Latency acceptable (under 10s per command) ✓ Code is well-documented ✓ Video demonstrates all capabilities ✓ Report shows deep technical understanding

Troubleshooting

Common issues:

Problem	Solution
Voice latency too high	Use smaller Whisper model
Robot crashes	Add error handling, timeouts
LLM errors	Improve prompt engineering
Perception fails	Add fallback behaviors

Timeline

Week 11: Integrate modules, basic testing
Week 12: Refine voice commands, record video
Week 13: Finalize report, submit

Resources

Summary

Lab 4.3 accomplishes:

Complete system integration ✓
3+ working voice commands ✓
Professional video demonstration ✓
Technical documentation ✓

This is it! Your capstone project demonstrates mastery of all 4 modules.

Previous Lab: Lab 4.2: Voice Commands
Capstone Guide: Capstone Requirements

Lab Objective​

Capstone Deliverables​

Part 1: System Integration (8 hours)​

1.1: Launch Stack​

1.2: Create Unified Controller​

Part 2: Testing 3+ Voice Commands (6 hours)​

Part 3: Video Recording (3 hours)​

Video Requirements​

Recording Script Example​

Part 4: Technical Report (3+ hours)​

Part 5: Code Documentation (2 hours)​

Submission Checklist​

Evaluation Criteria​

Success Criteria​

Troubleshooting​

Timeline​

Resources​

Summary​

Navigation​

Textbook Assistant