Project · course-project · 2022

Speech Bot

A Jetson Nano robot combining lightweight ASR, OpenAI conversation API, TTS, and hand pose detection for multimodal interaction.

Role: Course project team member
Stack: Jetson Nano · TensorRT · OpenAI API · TTS · hand pose detection
Links: Code →Demo →

Overview

Speech Bot is a voice and gesture-controlled robot built on Jetson Nano 2GB. It combines a lightweight ASR model (optimized with TensorRT), OpenAI’s conversation API, Google TTS, and hand pose detection to create an interactive assistant that can chat, play music from YouTube, check live weather, and tell time. The system supports both Mandarin Chinese and English commands.

System flowchart: voice input, ASR, OpenAI conversation API, TTS output, and hand pose detection modules running on Jetson Nano.

Interaction modes

Speech interaction: The robot converts voice to text via ASR, processes requests through OpenAI’s conversation API, and responds with text-to-speech synthesis.

Music control via gestures: When music is playing and voice detection is unreliable, six hand gestures provide an alternative: peace (pause), pan (play), stop (end playback), fist and OK (volume up/down).

Hand gesture controls — Supported hand gestures for music control: peace (pause), pan (play), stop (end playback), fist and OK (volume adjustment).

Smart assistant functions: Mentioning keywords like “天氣” or “weather” triggers the weather module, after which the user can specify any city for live web-crawled data.

Overview

Interaction modes

Demo