# PHASE 4: VOICE & VIDEO CALLS - DOCUMENTATION ## Overview Phase 4 implements WebRTC-based voice and video calling with support for: - 1-on-1 audio and video calls - Group calls with up to 20 participants - Screen sharing - TURN/STUN servers for NAT traversal - Real-time media controls (mute, video toggle) - Connection quality monitoring - Call recording support (infrastructure) ## Architecture ### WebRTC Topology **1-on-1 Calls**: Mesh topology with direct peer-to-peer connections **Group Calls**: SFU (Selective Forwarding Unit) using Mediasoup (placeholder for future implementation) ### Components ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Client A │◄───────►│ Server │◄───────►│ Client B │ │ (Browser) │ WebRTC │ Socket.io │ WebRTC │ (Browser) │ │ │ Signals │ Signaling │ Signals │ │ └─────────────┘ └─────────────┘ └─────────────┘ ▲ │ ▲ │ │ │ │ ┌──────────▼──────────┐ │ └───────────►│ TURN/STUN Server │◄───────────┘ │ (NAT Traversal) │ └─────────────────────┘ ``` ## Database Schema ### Calls Table ```sql CREATE TABLE calls ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE, type VARCHAR(20) NOT NULL DEFAULT 'audio', -- 'audio', 'video', 'screen' status VARCHAR(20) NOT NULL DEFAULT 'initiated', -- Status: 'initiated', 'ringing', 'active', 'ended', 'missed', 'rejected', 'failed' initiated_by UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, started_at TIMESTAMPTZ, ended_at TIMESTAMPTZ, duration_seconds INTEGER, end_reason VARCHAR(50), sfu_room_id VARCHAR(255), recording_url TEXT, quality_stats JSONB, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); ``` ### Call Participants Table ```sql CREATE TABLE call_participants ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), call_id UUID NOT NULL REFERENCES calls(id) ON DELETE CASCADE, user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, status VARCHAR(20) NOT NULL DEFAULT 'invited', -- Status: 'invited', 'ringing', 'joined', 'left', 'rejected', 'missed' joined_at TIMESTAMPTZ, left_at TIMESTAMPTZ, ice_candidates JSONB, media_state JSONB DEFAULT '{"audioEnabled": true, "videoEnabled": true, "screenSharing": false}', media_stats JSONB, connection_quality VARCHAR(20), created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); ``` ### TURN Credentials Table ```sql CREATE TABLE turn_credentials ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, username VARCHAR(255) NOT NULL, credential VARCHAR(255) NOT NULL, expires_at TIMESTAMPTZ NOT NULL, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Auto-cleanup function CREATE OR REPLACE FUNCTION cleanup_expired_turn_credentials() RETURNS void AS $$ BEGIN DELETE FROM turn_credentials WHERE expires_at < NOW(); END; $$ LANGUAGE plpgsql; ``` ## API Endpoints ### 1. POST `/api/calls/initiate` Initiate a new call. **Request:** ```json { "conversationId": "uuid", "type": "video", // or "audio" "participantIds": ["uuid1", "uuid2"] } ``` **Response:** ```json { "callId": "uuid", "status": "initiated", "participants": [ { "userId": "uuid", "userName": "John Doe", "userIdentifier": "john@example.com", "status": "invited" } ] } ``` ### 2. POST `/api/calls/:callId/answer` Answer an incoming call. **Response:** ```json { "callId": "uuid", "status": "active", "startedAt": "2025-01-10T14:30:00Z" } ``` ### 3. POST `/api/calls/:callId/reject` Reject an incoming call. **Response:** ```json { "callId": "uuid", "status": "rejected" } ``` ### 4. POST `/api/calls/:callId/end` End an active call. **Response:** ```json { "callId": "uuid", "status": "ended", "duration": 120, "endReason": "ended-by-user" } ``` ### 5. PATCH `/api/calls/:callId/media` Update media state (mute/unmute, video on/off). **Request:** ```json { "audioEnabled": true, "videoEnabled": false, "screenSharing": false } ``` **Response:** ```json { "success": true, "mediaState": { "audioEnabled": true, "videoEnabled": false, "screenSharing": false } } ``` ### 6. GET `/api/calls/turn-credentials` Get temporary TURN server credentials. **Response:** ```json { "credentials": { "urls": ["turn:turn.example.com:3000"], "username": "1736517600:username", "credential": "hmac-sha1-hash" }, "expiresAt": "2025-01-11T14:00:00Z" } ``` ### 7. GET `/api/calls/:callId` Get call details. **Response:** ```json { "call": { "id": "uuid", "conversationId": "uuid", "type": "video", "status": "active", "initiatedBy": "uuid", "startedAt": "2025-01-10T14:30:00Z", "participants": [...] } } ``` ## WebSocket Events ### Client → Server #### `call:offer` Send WebRTC offer to peer. ```javascript socket.emit('call:offer', { callId: 'uuid', targetUserId: 'uuid', offer: RTCSessionDescription }); ``` #### `call:answer` Send WebRTC answer to peer. ```javascript socket.emit('call:answer', { callId: 'uuid', targetUserId: 'uuid', answer: RTCSessionDescription }); ``` #### `call:ice-candidate` Send ICE candidate to peer. ```javascript socket.emit('call:ice-candidate', { callId: 'uuid', targetUserId: 'uuid', candidate: RTCIceCandidate }); ``` ### Server → Client #### `call:incoming` Notify user of incoming call. ```javascript socket.on('call:incoming', (data) => { // data: { callId, conversationId, type, initiatedBy, participants } }); ``` #### `call:offer` Receive WebRTC offer from peer. ```javascript socket.on('call:offer', (data) => { // data: { callId, fromUserId, offer } }); ``` #### `call:answer` Receive WebRTC answer from peer. ```javascript socket.on('call:answer', (data) => { // data: { callId, fromUserId, answer } }); ``` #### `call:ice-candidate` Receive ICE candidate from peer. ```javascript socket.on('call:ice-candidate', (data) => { // data: { callId, fromUserId, candidate } }); ``` #### `call:ended` Notify that call has ended. ```javascript socket.on('call:ended', (data) => { // data: { callId, reason, endedBy } }); ``` #### `call:participant-joined` Notify that participant joined group call. ```javascript socket.on('call:participant-joined', (data) => { // data: { callId, userId, userName, userIdentifier } }); ``` #### `call:participant-left` Notify that participant left group call. ```javascript socket.on('call:participant-left', (data) => { // data: { callId, userId } }); ``` #### `call:media-state-changed` Notify that participant's media state changed. ```javascript socket.on('call:media-state-changed', (data) => { // data: { callId, userId, mediaState } }); ``` ## Frontend Integration ### WebRTC Manager Usage ```javascript import WebRTCManager from './utils/webrtc'; // Initialize const webrtcManager = new WebRTCManager(socket); // Set TURN credentials const turnCreds = await fetch('/api/calls/turn-credentials'); await webrtcManager.setTurnCredentials(turnCreds.credentials); // Get local media stream const localStream = await webrtcManager.initializeLocalStream(true, true); localVideoRef.current.srcObject = localStream; // Setup event handlers webrtcManager.onRemoteStream = (userId, stream) => { remoteVideoRef.current.srcObject = stream; }; // Initiate call webrtcManager.currentCallId = callId; webrtcManager.isInitiator = true; await webrtcManager.initiateCallToUser(targetUserId); // Toggle audio/video webrtcManager.toggleAudio(false); // mute webrtcManager.toggleVideo(false); // video off // Screen sharing await webrtcManager.startScreenShare(); webrtcManager.stopScreenShare(); // Cleanup webrtcManager.cleanup(); ``` ### Call Component Usage ```javascript import Call from './components/Call'; function App() { const [showCall, setShowCall] = useState(false); return (
{showCall && ( { console.log('Call ended:', data); setShowCall(false); }} /> )}
); } ``` ## TURN Server Setup (Coturn) ### Installation ```bash # Ubuntu/Debian sudo apt-get update sudo apt-get install coturn # Enable service sudo systemctl enable coturn ``` ### Configuration Edit `/etc/turnserver.conf`: ```conf # Listening port listening-port=3000 tls-listening-port=5349 # External IP (replace with your server IP) external-ip=YOUR_SERVER_IP # Relay IPs relay-ip=YOUR_SERVER_IP # Realm realm=turn.yourdomain.com # Authentication use-auth-secret static-auth-secret=YOUR_TURN_SECRET # Logging verbose log-file=/var/log/turnserver.log # Security no-multicast-peers no-cli no-loopback-peers no-tlsv1 no-tlsv1_1 # Quotas max-bps=1000000 user-quota=12 total-quota=1200 ``` ### Environment Variables Add to `.env`: ```env # TURN Server Configuration TURN_SERVER_HOST=turn.yourdomain.com TURN_SERVER_PORT=3000 TURN_SECRET=your-turn-secret-key TURN_TTL=86400 ``` ### Firewall Rules ```bash # Allow TURN ports sudo ufw allow 3000/tcp sudo ufw allow 3000/udp sudo ufw allow 5349/tcp sudo ufw allow 5349/udp # Allow UDP relay ports sudo ufw allow 49152:65535/udp ``` ### Start Service ```bash sudo systemctl start coturn sudo systemctl status coturn ``` ### Testing TURN Server Use the [Trickle ICE](https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/) test page: 1. Add your TURN server URL: `turn:YOUR_SERVER_IP:3000` 2. Generate TURN credentials using the HMAC method 3. Click "Gather candidates" 4. Verify `relay` candidates appear ## Media Codecs ### Audio - **Codec**: Opus - **Sample Rate**: 48kHz - **Bitrate**: 32-128 kbps (adaptive) - **Features**: Echo cancellation, noise suppression, auto gain control ### Video - **Codecs**: VP8, VP9, H.264 (fallback) - **Clock Rate**: 90kHz - **Resolutions**: - 1280x720 (HD) - default - 640x480 (SD) - low bandwidth - 320x240 (LD) - very low bandwidth - **Frame Rate**: 30 fps (ideal), 15-60 fps range - **Bitrate**: 500kbps-2Mbps (adaptive) ## Connection Quality Monitoring The system monitors connection quality based on: 1. **Round Trip Time (RTT)** - Good: < 100ms - Fair: 100-300ms - Poor: > 300ms 2. **Packet Loss** - Good: < 2% - Fair: 2-5% - Poor: > 5% 3. **Available Bitrate** - Good: > 500kbps - Fair: 200-500kbps - Poor: < 200kbps Quality is checked every 3 seconds and displayed to users. ## Error Handling ### Common Errors 1. **Media Access Denied** ``` Failed to access camera/microphone: NotAllowedError ``` - User denied browser permission - Solution: Request permission again, show help dialog 2. **ICE Connection Failed** ``` Connection failed with user: ICE connection failed ``` - NAT/firewall blocking connection - Solution: Ensure TURN server is configured and reachable 3. **Peer Connection Closed** ``` Connection closed with user: Connection lost ``` - Network interruption or user disconnected - Solution: Notify user, attempt reconnection 4. **Turn Credentials Expired** ``` TURN credentials expired ``` - Credentials have 24-hour TTL - Solution: Fetch new credentials automatically ## Security Considerations 1. **TURN Authentication**: Time-limited credentials using HMAC-SHA1 2. **DTLS**: WebRTC encrypts all media streams with DTLS-SRTP 3. **JWT Auth**: All API calls require valid JWT token 4. **Rate Limiting**: Protect against DoS attacks 5. **User Verification**: Verify users are in conversation before allowing calls ## Testing Checklist - [ ] 1-on-1 audio call works - [ ] 1-on-1 video call works - [ ] Mute/unmute audio works - [ ] Toggle video on/off works - [ ] Screen sharing works - [ ] Call can be answered - [ ] Call can be rejected - [ ] Call can be ended - [ ] Connection quality indicator updates - [ ] Call duration displays correctly - [ ] Multiple participants can join (group call) - [ ] Participant joins/leaves notifications work - [ ] Media state changes propagate - [ ] TURN server fallback works (test behind NAT) - [ ] Call persists after page refresh (reconnection) - [ ] Missed call notifications work - [ ] Call history is recorded ## Performance Optimization ### Bandwidth Usage **Audio Only (per participant)**: - Opus @ 32kbps: ~15 MB/hour - Opus @ 64kbps: ~30 MB/hour **Video + Audio (per participant)**: - 480p @ 500kbps: ~225 MB/hour - 720p @ 1Mbps: ~450 MB/hour - 1080p @ 2Mbps: ~900 MB/hour ### Recommendations 1. **Start with audio only** for low bandwidth users 2. **Use VP9** if supported (better compression than VP8) 3. **Enable simulcast** for group calls (SFU) 4. **Adaptive bitrate** based on network conditions 5. **Limit group calls** to 20 participants max ## Future Enhancements - [ ] **Mediasoup SFU**: Implement actual SFU for efficient group calls - [ ] **Call Recording**: Record and store calls in cloud storage - [ ] **Background Blur**: Virtual backgrounds using ML - [ ] **Noise Cancellation**: Advanced audio processing - [ ] **Grid/Speaker View**: Different layouts for group calls - [ ] **Reactions**: Emoji reactions during calls - [ ] **Hand Raise**: Signal to speak in large calls - [ ] **Breakout Rooms**: Split large calls into smaller groups - [ ] **Call Scheduling**: Schedule calls in advance - [ ] **Call Analytics**: Detailed quality metrics and reports ## Troubleshooting ### No Audio/Video 1. Check browser permissions 2. Verify camera/microphone is not used by another app 3. Test with `navigator.mediaDevices.enumerateDevices()` 4. Check browser console for errors ### Connection Fails 1. Test TURN server with Trickle ICE 2. Verify firewall allows UDP ports 49152-65535 3. Check TURN credentials are not expired 4. Ensure both users are online ### Poor Quality 1. Check network bandwidth 2. Monitor packet loss and RTT 3. Reduce video resolution 4. Switch to audio-only mode ### Echo/Feedback 1. Ensure `echoCancellation: true` in audio constraints 2. Use headphones instead of speakers 3. Reduce microphone gain 4. Check for multiple audio sources ## Support For issues or questions: - Check logs in browser console - Review `/var/log/turnserver.log` for TURN issues - Monitor backend logs for signaling errors - Test with multiple browsers (Chrome, Firefox, Safari) --- **Phase 4 Complete** ✓