Skip to main content

Example Projects & Reference Implementations

This chapter showcases three complete capstone project examples, one for each hardware path. Each demonstrates how to integrate all four modules into a working voice-controlled robot system.


Example 1: Simulation-Only Path (Gazebo + ROS 2)

Project: "VLA-Bot Simulator" (Score: 4.5/5)

Target Platform: Ubuntu 22.04 + ROS 2 Humble + Gazebo + OpenAI API

What It Does:

  • Loads a humanoid robot in Gazebo
  • Listens for voice input via microphone
  • Uses OpenAI Whisper for ASR
  • Sends commands to ChatGPT for planning
  • Executes robot actions (walk, turn, wave)
  • Uses camera simulation + basic object detection
  • Runs entirely on a laptop (no GPU required, but GPU helpful)

Repository Structure:

vla-bot-simulator/
├── README.md
├── requirements.txt
├── .env.example # API keys template
├── launch/
│ └── vla-bot.launch.xml # Master launch file
├── src/
│ ├── vla_bot/
│ │ ├── __init__.py
│ │ ├── voice_input_node.py # Whisper integration
│ │ ├── planner_node.py # LLM planning
│ │ ├── robot_controller_node.py # ROS 2 cmd_vel publisher
│ │ ├── perception_node.py # Camera → object detection
│ │ ├── safety_manager_node.py # Emergency stop
│ │ └── utils.py # Helper functions
│ └── urdf/
│ └── humanoid_simple.urdf # Robot description
├── gazebo_worlds/
│ └── vla_bot_world.sdf # Simulation environment
├── tests/
│ └── test_integration.py # Basic tests
└── RESULTS.md # Documented results

Key Files Deep-Dive:

voice_input_node.py

import rclpy
from rclpy.node import Node
import speech_recognition as sr
import openai
from std_msgs.msg import String

class VoiceInputNode(Node):
def __init__(self):
super().__init__('voice_input_node')
self.voice_pub = self.create_publisher(String, 'voice_input', 10)
self.declare_parameter('api_key', '')
openai.api_key = self.get_parameter('api_key').value

def listen(self):
"""Capture voice and transcribe with Whisper"""
recognizer = sr.Recognizer()
with sr.Microphone() as source:
audio = recognizer.listen(source, timeout=10)

try:
# Use OpenAI Whisper via API
transcript = openai.Audio.transcribe("whisper-1", audio)
msg = String()
msg.data = transcript['text']
self.voice_pub.publish(msg)
except Exception as e:
self.get_logger().error(f"Transcription failed: {e}")

def main(args=None):
rclpy.init(args=args)
node = VoiceInputNode()
rclpy.spin(node)

if __name__ == '__main__':
main()

planner_node.py

import rclpy
from rclpy.node import Node
from std_msgs.msg import String
import openai
import json

class PlannerNode(Node):
def __init__(self):
super().__init__('planner_node')
self.action_pub = self.create_publisher(String, 'robot_action', 10)
self.create_subscription(String, 'voice_input', self.plan_from_voice, 10)

self.action_schema = {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["walk", "turn", "wave", "stop"]},
"distance": {"type": "number"},
"angle": {"type": "number"},
"duration": {"type": "number"}
}
}

def plan_from_voice(self, msg):
"""Use LLM to convert voice to robot actions"""
voice_input = msg.data

prompt = f"""
Convert this voice command to a robot action:
Command: "{voice_input}"

Output ONLY valid JSON matching this schema:
{json.dumps(self.action_schema)}

Examples:
"walk forward" → {{"action": "walk", "distance": 1.0}}
"turn left 90 degrees" → {{"action": "turn", "angle": 90}}
"wave hand" → {{"action": "wave", "duration": 3}}
"""

try:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
action_json = response['choices'][0]['message']['content']
action = json.loads(action_json)

action_msg = String()
action_msg.data = json.dumps(action)
self.action_pub.publish(action_msg)
except Exception as e:
self.get_logger().error(f"Planning failed: {e}")

def main(args=None):
rclpy.init(args=args)
node = PlannerNode()
rclpy.spin(node)

if __name__ == '__main__':
main()

Launch File (vla-bot.launch.xml):

<?xml version="1.0"?>
<launch>
<arg name="world_name" default="vla_bot_world"/>
<arg name="api_key" default=""/>

<!-- Gazebo world -->
<include file="$(find-pkg-share gazebo_ros)/launch/gazebo.launch.py">
<arg name="world" value="$(find-pkg-share vla-bot)/gazebo_worlds/$(var world_name).sdf"/>
</include>

<!-- Robot spawning -->
<node pkg="gazebo_ros" exec="spawn_entity.py"
args="-entity humanoid -file $(find-pkg-share vla-bot)/src/urdf/humanoid_simple.urdf
-x 0 -y 0 -z 0.5"/>

<!-- ROS 2 nodes -->
<node pkg="vla-bot" exec="voice_input_node" name="voice_input">
<param name="api_key" value="$(var api_key)"/>
</node>
<node pkg="vla-bot" exec="planner_node" name="planner"/>
<node pkg="vla-bot" exec="robot_controller_node" name="robot_controller"/>
<node pkg="vla-bot" exec="perception_node" name="perception"/>
<node pkg="vla-bot" exec="safety_manager_node" name="safety"/>
</launch>

Test Results:

Command 1: "Walk forward 1 meter"
Expected: Robot moves forward 1m in Gazebo
Actual: ✅ Robot walks 1.05m (within tolerance)
Latency: 2.3s (voice → LLM → action)

Command 2: "Turn left 90 degrees"
Expected: Robot rotates 90°
Actual: ✅ Robot rotates 89.8° (accurate)
Latency: 1.8s

Command 3: "Stop"
Expected: Robot stops immediately
Actual: ✅ Robot stops
Latency: 0.3s

Edge Case: "Unrecognized command"
Input: "Do a backflip"
Expected: System handles gracefully
Actual: ✅ LLM returns "unknown" action; robot announces "Command not understood"

Success Rate: 3/3 = 100%
Average Latency: 1.5s

Capstone Video (6 minutes):

  • [0:00] System overview (ROS 2 graph diagram)
  • [1:00] Gazebo world with robot
  • [1:30] Command 1: "Walk forward" → Video of robot walking
  • [2:30] Command 2: "Turn left" → Video of robot turning
  • [3:30] Command 3: "Stop" → Robot stops
  • [4:15] Perception demo: Camera showing object detection in Gazebo
  • [5:00] Recap + code walkthrough (30 seconds)

Key Features Demonstrated:

  • ✅ Module 1 (ROS 2): Multi-node architecture, pub/sub pattern, launch file
  • ✅ Module 2 (Gazebo): Physics simulation, robot control, sensor data
  • ✅ Module 3 (Perception): Camera feed + basic object detection
  • ✅ Module 4 (VLA): Voice → LLM → action pipeline

Challenges & Solutions:

ChallengeSolution
Gazebo robot unstableAdjusted inertia values in URDF
Voice recognition noiseAdded noise filtering in Whisper post-processing
LLM latency (3+ seconds)Cached common commands
No real camera, used OpenCV simulationAcceptable for MVP

Lessons Learned:

  • ROS 2 launch files are essential for multi-node coordination
  • LLM latency is the bottleneck; pre-caching helps
  • Simulation alone teaches robot concepts but can't prepare for real-world complexity

Example 2: Jetson Edge Hardware Path

Project: "Smart Manipulator Arm" (Score: 4.8/5)

Target Platform: NVIDIA Jetson Orin Nano + RealSense D435i + ROS 2 Humble

What It Does:

  • Processes real camera feed from RealSense
  • Uses Jetson GPU for hardware-accelerated perception
  • LLM planning on Jetson (using quantized model or API calls)
  • Controls manipulator arm (or mobile base + arm simulation)
  • Demonstrates real sensor integration with sim-to-real transfer

Repository Structure:

smart-manipulator/
├── README.md
├── requirements.txt
├── docker/
│ ├── Dockerfile.jetson # Jetson-specific image
│ └── docker-compose.yml
├── launch/
│ └── manipulator.launch.xml
├── src/
│ ├── perception_pipeline/
│ │ ├── realsense_node.py # RealSense camera driver
│ │ ├── yolo_detector.py # Object detection with TensorRT
│ │ └── point_cloud_processor.py
│ ├── planning/
│ │ ├── llm_planner.py # LLM for task planning
│ │ └── trajectory_planner.py # Arm trajectory planning
│ ├── control/
│ │ ├── arm_controller.py # ROS 2 joint command publisher
│ │ └── safety_monitor.py
│ └── integration/
│ └── vla_pipeline.py # Full VLA integration
├── config/
│ ├── yolo_config.yaml # Object detection config
│ ├── arm_params.yaml # Arm kinematics
│ └── jetson_config.yaml # GPU/memory optimization
├── tests/
│ └── hardware_integration_test.py
└── results/
├── performance_metrics.csv
└── video_demo.mp4

Key Features:

realsense_node.py (Real Camera Input)

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
import pyrealsense2 as rs
from cv_bridge import CvBridge
import cv2

class RealSenseNode(Node):
def __init__(self):
super().__init__('realsense_node')
self.image_pub = self.create_publisher(Image, '/camera/color/image_raw', 10)
self.depth_pub = self.create_publisher(Image, '/camera/depth/image_raw', 10)

self.pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
self.pipeline.start(config)
self.bridge = CvBridge()

def publish_frames(self):
frames = self.pipeline.wait_for_frames()
color_frame = frames.get_color_frame()
depth_frame = frames.get_depth_frame()

color_image = cv2.cvtColor(np.asarray(color_frame.get_data()), cv2.COLOR_BGR2RGB)
depth_image = np.asarray(depth_frame.get_data())

self.image_pub.publish(self.bridge.cv2_to_imgmsg(color_image, "rgb8"))
self.depth_pub.publish(self.bridge.cv2_to_imgmsg(depth_image, "16UC1"))

yolo_detector.py (Hardware-Accelerated Perception)

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from vision_msgs.msg import Detection2DArray, Detection2D
from cv_bridge import CvBridge
import tensorrt as trt
import numpy as np

class YOLODetector(Node):
def __init__(self):
super().__init__('yolo_detector')
self.image_sub = self.create_subscription(Image, '/camera/color/image_raw', self.detect, 10)
self.detection_pub = self.create_publisher(Detection2DArray, '/detections', 10)

# TensorRT engine for GPU acceleration
self.trt_engine = self.load_tensorrt_engine('yolo_jetson.trt')
self.bridge = CvBridge()

def detect(self, msg):
"""Detect objects using TensorRT (GPU-accelerated)"""
image = self.bridge.imgmsg_to_cv2(msg)

# TensorRT inference (2-3ms on Jetson)
detections = self.trt_engine.infer(image)

detection_array = Detection2DArray()
for detection in detections:
det = Detection2D()
det.bbox.center.x = detection['x']
det.bbox.center.y = detection['y']
det.bbox.size_x = detection['width']
det.bbox.size_y = detection['height']
detection_array.detections.append(det)

self.detection_pub.publish(detection_array)

Performance Metrics (Real Hardware):

Task: Detect and grasp blue cube

Metrics:
Vision latency (camera → detection): 35ms
LLM planning latency: 1.2s
Trajectory planning: 200ms
Arm execution: 3.5s
Total time (voice → grasp): ~5.0s

Success rate: 8/10 grasps (80%)
Failure causes: Reflections, occlusions

GPU utilization: 65% (room for more tasks)
Power consumption: 12W (Jetson baseline ~8W)

Video Demo (5 minutes):

  • [0:00] Setup: Jetson, RealSense camera, robot arm
  • [0:30] System overview: Real perception pipeline
  • [1:00] "Pick up the blue cube" → Real camera shows object → Arm grasps successfully
  • [2:00] "Move to the table" → Real sensor feedback → Navigation working
  • [3:00] Multiple commands sequence
  • [4:00] Performance analysis: latency, success rates
  • [4:30] Lessons learned from hardware integration

Key Achievements:

  • ✅ Real sensor integration (RealSense depth + color)
  • ✅ Hardware-accelerated perception (TensorRT on Jetson)
  • ✅ Real-time control (5Hz loop with sub-second latency)
  • ✅ Sim-to-real transfer validated (simulation → Jetson hardware)

Example 3: Full Physical Robot Path

Project: "Unitree G1 Voice Commander" (Score: 5/5)

Target Platform: Unitree G1 Humanoid + Jetson Orin AGX + ROS 2

What It Does:

  • Full physical humanoid robot (bipedal locomotion, arms)
  • Real-time voice command processing
  • Visual SLAM for autonomous navigation
  • Multi-modal perception (camera, IMU, proprioception)
  • End-to-end control from voice to physical movement

Repository Structure:

unitree-voice-commander/
├── README.md
├── requirements.txt
├── launch/
│ ├── unitree_full_stack.launch.xml
│ ├── unitree_sim.launch.xml # Sim version for testing
│ └── safety_monitor.launch.xml
├── src/
│ ├── hardware_interface/
│ │ ├── unitree_ros2_sdk/ # Official Unitree ROS 2 SDK
│ │ └── motor_command_node.py # High-level motor control
│ ├── perception/
│ │ ├── slam_node.py # ORB-SLAM3
│ │ ├── imu_fusion_node.py # IMU + odometry fusion
│ │ └── obstacle_detection.py # LiDAR-based detection
│ ├── planning/
│ │ ├── locomotion_planner.py # Gait planning
│ │ ├── arm_planner.py # Arm trajectory planning
│ │ └── high_level_planner.py # Task-level planning via LLM
│ ├── voice_interface/
│ │ ├── whisper_node.py
│ │ ├── tts_node.py # Text-to-speech feedback
│ │ └── voice_confirmation.py
│ └── safety/
│ ├── emergency_stop_node.py # Hardware e-stop
│ ├── joint_limit_monitor.py # Joint safety
│ └── stability_monitor.py # IMU-based fall detection
├── config/
│ ├── unitree_g1_config.yaml # Robot-specific params
│ ├── gait_parameters.yaml # Walking gait profiles
│ └── safety_thresholds.yaml
├── tests/
│ ├── sim_validation.py # Gazebo tests before hardware
│ ├── hardware_safety_tests.py
│ └── integration_tests.py
└── docs/
├── sim_to_real_transfer.md
├── safety_protocols.md
└── troubleshooting.md

Example: Walk Command Pipeline

# User: "Walk to the kitchen"
#
# 1. Voice Input: Whisper captures "Walk to the kitchen"
# 2. LLM Planning: ChatGPT → "action: walk, target_location: kitchen, duration: 30s"
# 3. Perception: SLAM localizes robot in map
# 4. Locomotion Planning: Generate walking gait (5 step sequence)
# 5. Motor Commands: Send joint commands to Unitree SDK
# 6. Feedback: IMU confirms walking; SLAM updates position
# 7. Execution: Robot walks for 30s or until destination reached

Test Results (Physical Hardware):

Test 1: "Stand up"
Execution: ✅ 2.5s from sitting to standing
Stability: ✅ IMU balance maintained

Test 2: "Walk forward 2 meters"
Execution: ✅ Walks 2.1m (within tolerance)
Time: 8s at normal walking speed
Stability: ✅ No falls, gait smooth

Test 3: "Pick up the cube with your left hand"
Execution: ✅ Reaches down, grasps object, lifts
Time: 12s (slower for dexterity)

Test 4: "Stop and turn around"
Execution: ✅ Completes turn (180°) in 3s

Edge Case: Obstacle detection
Input: "Walk forward" with obstacle placed ahead
Output: ✅ Robot detects via LiDAR, avoids obstacle
Alternative path planning: Working

Challenges Overcome:

ChallengeImpactSolution
Motor control latency50ms → 10ms per commandJetson Orin AGX sufficient; optimized ROS 2 bridge
Walking gait instabilityFalls in early testingTuned gait parameters; added IMU feedback loops
Voice latency5+ secondsCached common commands; local TTS for feedback
Battery drain4hr to 2hr with real-time perceptionOptimized perception pipeline; selective SLAM updates
Sim-to-real gapModel trained in sim doesn't transferFine-tuned gait in real environment; empirical validation

Video Demo (8 minutes):

  • [0:00] Intro: "Unitree G1 Voice Commander capstone"
  • [0:30] System architecture overview (block diagram)
  • [1:00] Robot standing in living room
  • [1:30] Command 1: "Walk to the kitchen" → Robot walks with narration
  • [3:00] Command 2: "Pick up the cup" → Robot grasps and lifts object
  • [4:30] Command 3: "Avoid the chair and walk to the door" → Navigation with obstacle avoidance
  • [6:00] Perception demo: Show SLAM map, camera feed, obstacle detection
  • [7:00] Closing: Integration summary, lessons learned, future work

Technical Report Highlights:

  • Deep analysis of sim-to-real transfer learning
  • Hardware integration challenges and solutions
  • Performance benchmarks (latency, power, stability)
  • Safety validation procedures (gait stability, joint limits, fall detection)
  • Lessons on bipedal locomotion dynamics

Integration Achievement:

  • Module 1 (ROS 2): Full multi-node ecosystem running on real hardware
  • Module 2 (Simulation): Validated in Gazebo before physical deployment
  • Module 3 (Perception): Real SLAM + obstacle detection on Jetson AGX
  • Module 4 (VLA): Full voice→LLM→action pipeline working end-to-end

Comparison: All Three Paths

AspectSimulation-OnlyJetson Edge HardwarePhysical Robot
Cost$0$400–500$30,000+
Setup Time2–3 hours1–2 days1–2 weeks
Capstone Dev Time40–60 hours60–80 hours100+ hours
Sensor RealismSimulatedReal RealSenseReal sensors (camera, IMU, LiDAR)
LatencyCan be tuned low~1.5s typical~2–3s (real-world complexity)
Failure ModesSimulation bugsSensor noise, latencyHardware failures, battery, safety
Learning ValueHigh (concepts)Very High (real sensors)Highest (physical constraints)
Publication PotentialLowMediumHigh (research quality)
Recommended ForLearning ROS 2 conceptsUnderstanding real roboticsAdvanced students, research

Which Example Should You Follow?

Choose Simulation-Only If:

  • You're learning ROS 2 fundamentals for the first time
  • You don't have hardware available
  • You want to iterate quickly on algorithms
  • You're comfortable with "what if" scenarios

Choose Jetson If:

  • You want real sensor experience
  • You have ~$400 budget
  • You want to practice sim-to-real transfer
  • You want a capstone you can reuse for other projects

Choose Physical Robot If:

  • You have lab access to a robot
  • You want the maximum learning impact
  • You're interested in publishing research
  • You're willing to invest significant time in debugging

How to Use These Examples

  1. Read the appropriate example for your path
  2. Reference the code structure when building your own project
  3. Test locally using the launch files and test scripts
  4. Adapt the code to your specific use case (different robot, different commands)
  5. Document your journey like these examples do

All examples include:

  • ✅ Complete code structure
  • ✅ Launch files for quick startup
  • ✅ Documented test results
  • ✅ Video demos (conceptual structure described)
  • ✅ Lessons learned and challenges faced
  • ✅ Ideas for future improvement

Next Steps

  1. Pick your example based on your hardware path (1, 2, or 3)
  2. Review the code structure and understand the ROS 2 patterns
  3. Build your own version with your own robot, sensors, and AI pipeline
  4. Document as you go: Results, challenges, solutions
  5. Create your capstone video and technical report using the rubrics in Chapter 2

Good luck! 🤖

Questions? Post in forums or review the grading rubrics (Chapter 2) for detailed expectations.

Textbook Assistant

Ask me anything about the textbook...