Skip to main content

Lab 4.3: Capstone Project - Autonomous Voice-Controlled Humanoid

Lab Objective

Goal: Integrate Modules 1-4 into single voice-controlled system demonstrating capstone requirements.

Skills: System integration, debugging, performance optimization, documentation.

Time: 20+ hours (intensive capstone project)


Capstone Deliverables

Your final submission must include:

  1. Code (GitHub repo with documentation)
  2. Video (5-8 min demonstration)
  3. Report (8-12 pages technical document)

Part 1: System Integration (8 hours)

1.1: Launch Stack

Create capstone.launch.xml:

<?xml version="1.0"?>
<launch>
<!-- Module 2: Gazebo -->
<include file="gazebo.launch.xml"/>

<!-- Module 3: SLAM + Navigation -->
<include file="slam.launch.xml"/>
<include file="navigation.launch.xml"/>

<!-- Module 4: Voice -->
<node pkg="voice_control" exec="voice_input_node.py" name="voice"/>
<node pkg="voice_control" exec="voice_vla_integration.py" name="vla"/>
</launch>

1.2: Create Unified Controller

class CapstoneHumanoid:
def __init__(self):
# All modules
self.perception = SlAMNav2()
self.language = LanguageAgent()
self.voice = VoiceNode()
self.controller = RobotController()
self.logger = setup_logging()

def execute(self):
"""Main capstone loop"""
while True:
cmd = self.voice.listen()
if cmd:
self.process_voice_command(cmd)

def process_voice_command(self, voice_input):
"""End-to-end: voice → action"""
# Transcribe (Whisper)
text = whisper.transcribe(voice_input)
self.logger.info(f"Heard: {text}")

# Understand (LLM)
state = self.perception.get_state()
plan = self.language.parse(text, state)

# Execute (with feedback)
for action in plan['actions']:
self.controller.execute_with_feedback(action)

self.speak(f"Completed: {text[:30]}...")

Part 2: Testing 3+ Voice Commands (6 hours)

Test these required commands:

1. "Walk to the kitchen"
→ Uses: SLAM localization + Nav2 navigation

2. "Find and pick up the blue cup"
→ Uses: Object detection + grasping + perception

3. "Bring it back here"
→ Uses: Navigation + grasp execution + return

Additional commands (for demonstrating capability):

  • "Look around"
  • "Show me what you see"
  • "Describe the room"

Part 3: Video Recording (3 hours)

Video Requirements

Format:

  • Duration: 5-8 minutes
  • Resolution: 1080p (1920x1080)
  • Frame rate: 30 fps
  • Audio: Clear narration + robot sounds

Content:

  1. Introduction (0-1 min)

    • What is this system?
    • Components overview
  2. Demo 1 (1-3 min)

    • "Walk to kitchen" command
    • Show SLAM map building
    • Show navigation path
  3. Demo 2 (3-5 min)

    • "Find blue cup" command
    • Show object detection
    • Show grasping
  4. Demo 3 (5-6 min)

    • "Bring it back" command
    • Return navigation
    • Success confirmation
  5. Conclusion (6-8 min)

    • System capabilities
    • Lessons learned
    • Future improvements

Recording Script Example

[Intro]
"This is our voice-controlled humanoid robot.
It integrates ROS 2 communication, Gazebo simulation,
Isaac perception, and LLM-based voice control."

[Demo]
"Robot, walk to the kitchen"
[Wait for execution]
"The robot used SLAM to localize itself, Nav2 to plan a path,
and actuators to walk to the kitchen."

Part 4: Technical Report (3+ hours)

Report Structure (8-12 pages):

1. Executive Summary (1 page)
- Problem statement
- Solution overview
- Key results

2. System Architecture (2 pages)
- Module 1: ROS 2 communication
- Module 2: Gazebo simulation
- Module 3: SLAM and perception
- Module 4: VLA and voice control
- Integration diagram

3. Implementation Details (3 pages)
- Language-to-action mapping
- Voice recognition (Whisper)
- Closed-loop control
- Safety mechanisms

4. Results & Validation (2 pages)
- Command success rates
- Latency measurements
- Perception accuracy
- Tables and graphs

5. Challenges & Solutions (1 page)
- What went wrong?
- How we fixed it?
- Trade-offs

6. Lessons Learned (1 page)
- Key insights
- Future work
- Recommendations

Part 5: Code Documentation (2 hours)

Requirements:

  • Well-commented code
  • README with setup instructions
  • Architecture documentation
  • Deployment guide

Submission Checklist

  • Code on GitHub with clean commits
  • README with: setup, usage, architecture
  • 3+ voice commands working
  • Video recorded and uploaded
  • Technical report (8-12 pages)
  • Grading rubric self-assessment

Evaluation Criteria

Your system will be graded on:

Code (30%):

  • Correctness (does it work?)
  • Quality (is it well-written?)
  • Documentation (is it understandable?)

Video (30%):

  • Clarity (can we see what's happening?)
  • Completeness (are all 3 commands demonstrated?)
  • Production quality

Report (20%):

  • Technical depth
  • Results and measurements
  • Problem analysis

Integration (20%):

  • All modules working together
  • System stability
  • Feature completeness

Success Criteria

Your capstone is successful if:

✓ Robot responds to 3+ voice commands ✓ Commands execute correctly (>90% success) ✓ System runs without crashes for 30+ minutes ✓ Latency acceptable (under 10s per command) ✓ Code is well-documented ✓ Video demonstrates all capabilities ✓ Report shows deep technical understanding


Troubleshooting

Common issues:

ProblemSolution
Voice latency too highUse smaller Whisper model
Robot crashesAdd error handling, timeouts
LLM errorsImprove prompt engineering
Perception failsAdd fallback behaviors

Timeline

  • Week 11: Integrate modules, basic testing
  • Week 12: Refine voice commands, record video
  • Week 13: Finalize report, submit

Resources


Summary

Lab 4.3 accomplishes:

  • Complete system integration ✓
  • 3+ working voice commands ✓
  • Professional video demonstration ✓
  • Technical documentation ✓

This is it! Your capstone project demonstrates mastery of all 4 modules.


Textbook Assistant

Ask me anything about the textbook...