Continuously Stream and Play Audio Chunks in Flutter Web?

Callmephil · December 12, 2024, 1:17pm

I’ve been trying to implement streaming audio chunks (Uint8List) from an openAI tts-1 model in a Flutter Web app and playing them in sequence. The idea is to progressively fetch and buffer the audio as it’s generated (e.g., from a TTS endpoint) and then play each buffered portion as soon as the previous one finishes.

What’s Happening:

The first chunk plays successfully.
Subsequent chunks fail to start, leading to errors like DEMUXER_ERROR_COULD_NOT_OPEN and NotSupportedError.

Errors:

Playing audio... PlayerState.completed
AudioPlayers Exception: AudioPlayerException(
    BytesSource(bytes: 7b444, mimeType: audio/mpeg), 
    PlatformException(WebAudioError, Failed to set source. For troubleshooting, see https://github.com/bluefireteam/audioplayers/blob/main/troubleshooting.md, 
    MediaError: DEMUXER_ERROR_COULD_NOT_OPEN: FFmpegDemuxer: open context failed (Code: 4), null)
Error setting audio source: PlatformException(WebAudioError, Failed to set source. For troubleshooting, see https://github.com/bluefireteam/audioplayers/blob/main/troubleshooting.md, 
    MediaError: DEMUXER_ERROR_COULD_NOT_OPEN: FFmpegDemuxer: open context failed (Code: 4), null)
Error: PlatformException(WebAudioError, Failed to set source.
    NotSupportedError: The element has no supported sources.

What I’ve Tried:

audioplayers: Using setSourceBytes on each buffered chunk works for the first chunk but fails on subsequent chunks.
just_audio: I attempted to use just_audio, but a streaming source is not available for the web.
JS Interop for Streaming: On the web, I’m not using http or dio for fetching. Instead, I rely on the browser’s Fetch API via JS interop (getReader()) to continuously read chunks as they become available. These chunks are then added to a queue and played in sequence.

Relevant Code:

import 'dart:async';
import 'dart:collection';
import 'dart:typed_data';

import 'package:example/audio_player_controller.dart';
import 'package:example/tts_service_web.dart';
import 'package:flutter/material.dart';

void main() {
  runApp(const MyApp());
}

class MyApp extends StatelessWidget {
  const MyApp({super.key});
  @override
  Widget build(BuildContext context) {
    return const MaterialApp(
      home: AudioStreamScreen(),
    );
  }
}

class AudioStreamScreen extends StatefulWidget {
  const AudioStreamScreen({super.key});

  @override
  State<AudioStreamScreen> createState() => _AudioStreamScreenState();
}

class _AudioStreamScreenState extends State<AudioStreamScreen> {
  // TODO: Add your API key
  final openAIKey = 'YOUR_OPENAI_API_KEY';

  final Queue<Uint8List> _bufferQueue = Queue();
  final BytesBuilder _currentBuffer = BytesBuilder();
  bool _isPlaying = false;
  final int _bufferSize = 64 * 1024; // Adjust this as needed
  AudioPlayerController? _controller;

  @override
  void initState() {
    super.initState();
    _controller = AudioPlayerController(onError: (e, s) {
      debugPrint('Error: $e');
    });
  }

  @override
  void dispose() {
    _controller?.dispose();
    super.dispose();
  }

  Future<void> _fetchAndPlayAudio() async {
    final stream = TTSServiceWeb(openAIKey).tts(
      'https://api.openai.com/v1/audio/speech',
      {
        'model': 'tts-1',
        'voice': 'alloy',
        'speed': 1,
        'input': 'Lorem ipsum ...',
        'response_format': 'opus',
        'stream': true,
      },
    );

    try {
      await for (final chunk in stream) {
        _addToBuffer(chunk);
        if (_currentBuffer.length >= _bufferSize) {
          debugPrint('New Buffer: ${_currentBuffer.toBytes().lengthInBytes} / $_bufferSize');
          _flushBufferToQueue();
        }
        debugPrint('Last chunk: ${chunk.lengthInBytes / 1024} KB');
        _playNextInQueue();
      }
      _flushBufferToQueue(finalFlush: true);
    } catch (e) {
      debugPrint('Error fetching audio: $e');
    }
  }

  void _addToBuffer(Uint8List chunk) {
    _currentBuffer.add(chunk);
  }

  void _flushBufferToQueue({bool finalFlush = false}) {
    if (_currentBuffer.isNotEmpty) {
      _bufferQueue.add(_currentBuffer.toBytes());
      _currentBuffer.clear();
    }
    if (finalFlush) {
      _playNextInQueue();
    }
  }

  Future<void> _playNextInQueue() async {
    if (_isPlaying || _bufferQueue.isEmpty) return;

    final nextChunk = _bufferQueue.removeFirst();
    _isPlaying = true;

    try {
      debugPrint('Playing chunk: ${nextChunk.lengthInBytes / 1024} KB');
      await _controller?.play(nextChunk);
    } catch (e) {
      debugPrint('Error playing chunk: $e');
    } finally {
      _isPlaying = false;
      if (_bufferQueue.isNotEmpty) {
        _playNextInQueue();
      }
    }
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: const Text('Audio Stream Example'),
      ),
      body: Center(
        child: ElevatedButton(
          onPressed: _fetchAndPlayAudio,
          child: const Text('Play Audio'),
        ),
      ),
    );
  }
}

Video Demo:
Vimeo link:

GitHub Repository:
flutter_audio_streaming_prototype

Questions:

Has anyone successfully implemented real-time streaming and playback of audio chunks on Flutter Web?
Are there alternative libraries or approaches that can handle a continuous stream of audio data on both web & mobile platforms?

Any insights, suggestions, or code examples would be greatly appreciated!

MarcoB · December 13, 2024, 5:15pm

I have used flutter_soloud package to write an example of using Cartesia AI APIs.

It sends a text and then get the audio data using their websocket APIs. The audio data is stored into an AudioBuffer and it plays immediately or buffering data if the speed doesn’t provide enough bandwidth.

It works on all platforms.

The cons of flutter_soloud is that AudioBuffer can handle only raw PCM data (for now).

Callmephil · December 13, 2024, 5:46pm

Awesome I had no idea about Soloud, there are so many packages nowadays that it’s hard to find the gems, I’ll give it a shot. OpenAI supports PCM I’ll give it a shot asap and let you know.

Thanks for sharing much appreciated!

MarcoB · December 13, 2024, 5:58pm

Forgot to mention that the AudioBuffer thing is not yet on pub.dev. It will soon, meanwhile you could try using the GitHub sources in the pubspec.yaml:

dependencies:
  flutter_soloud:
    git:
      url: git@github.com:alnitak/flutter_soloud.git
      ref: main

Callmephil · December 13, 2024, 8:17pm

Yep, I figured, the first thing I do is to look at pubspec.yaml and check the deps

Sadly, It looks like there’s an issue with OpenAI PCM data seems “corrupted” I’ll ask on OpenAI forum maybe someones experienced that.

Demo: Stream Audio - OpenAI - Flutter

Maybe an option could be to convert opus to PCM or try to add support for opus or wav in Soloud…

I managed to split the long text into multiple API calls for now but, it isn’t that reliable and is a bit more costly.

Eventually Cartesia.ai looks like their privacy policies doesn’t fit our needs as they use data to train their models (we have private names and addresses etc)

import 'dart:async';

import 'package:example/tts_service_web.dart';
import 'package:flutter/material.dart';
import 'package:flutter_soloud/flutter_soloud.dart';

void main() async {
  WidgetsFlutterBinding.ensureInitialized();

  /// Initialize the player.
  await SoLoud.instance.init();

  runApp(const MyApp());
}

class MyApp extends StatelessWidget {
  const MyApp({super.key});
  @override
  Widget build(BuildContext context) {
    return const MaterialApp(
      home: AudioStreamScreen(),
    );
  }
}

class AudioStreamScreen extends StatefulWidget {
  const AudioStreamScreen({super.key});

  @override
  State<AudioStreamScreen> createState() => _AudioStreamScreenState();
}

class _AudioStreamScreenState extends State<AudioStreamScreen> {
  final openAIKey = 'OPEN_AI_KEY';

  @override
  void initState() {
    super.initState();
  }

  @override
  void dispose() {
    unawaited(SoLoud.instance.disposeAllSources());
    super.dispose();
  }

  Future<void> _fetchAndPlayAudio() async {
    final stream = TTSServiceWeb(openAIKey).tts(
      'https://api.openai.com/v1/audio/speech',
      {
        'model': 'tts-1',
        'voice': 'alloy',
        'speed': 1,
        'input':
            '''1. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.''',
        'response_format': 'pcm',
        "sample_rate": 16000,
        'stream': true,
      },
    );

    final currentSound = SoLoud.instance.setBufferStream(
      maxBufferSize: 1024 * 1024 * 5, // 2 MB
      sampleRate: 16000,
      channels: Channels.mono,
      pcmFormat: BufferPcmType.s16le,
      onBuffering: (isBuffering, handle, time) async {
        debugPrint('buffering');
      },
    );

    int chunkNumber = 0;
    stream.listen((chunk) async {
      try {
        SoLoud.instance.addAudioDataStream(
          currentSound,
          chunk,
        );
        if (chunkNumber == 0) {
          await SoLoud.instance.play(currentSound);
        }
        chunkNumber++;
        print('chunk number: $chunkNumber');
        print('chunk length: ${chunk.length}');
      } on SoLoudPcmBufferFullCppException {
        debugPrint('pcm buffer full or stream already set '
            'to be ended');
      } catch (e) {
        debugPrint(e.toString());
      }
    }, onDone: () {
      SoLoud.instance.setDataIsEnded(currentSound);
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: const Text('Audio Stream Example'),
      ),
      body: Center(
        child: ElevatedButton(
          onPressed: _fetchAndPlayAudio,
          child: const Text('Play Audio'),
        ),
      ),
    );
  }
}

MarcoB · December 13, 2024, 9:00pm

I am not famialiar with OpenAI APIs nor I have an API key.

It seems strange to me such a great diversity in the length of the chunks, anyway that’s noise seems like the data you are receiving are compressed or encoded to base64?

Callmephil · December 13, 2024, 9:38pm

I have no idea regarding base64 but encoding is set to chunked.

If I play around with the sample rate the audio gets a bit clearer but still has those noises.

I used the js package to implement the fetch method from javascript as HTTP/dio doesn’t support streams

import 'dart:convert';
import 'dart:typed_data';

import 'package:js/js.dart';
import 'package:js/js_util.dart' as js_util;

@JS('fetch')
external dynamic fetchJs(dynamic url, dynamic options);

class TTSServiceWeb {
  final String apiKey;

  TTSServiceWeb(this.apiKey);

  Stream<Uint8List> tts(String url, Map<String, dynamic> payload) async* {
    final options = js_util.jsify({
      'method': 'POST',
      'headers': {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer $apiKey',
        'Transfer-Encoding': 'chunked',
      },
      'body': jsonEncode(payload),
    });

    final response = await js_util.promiseToFuture(fetchJs(url, options));
    final status = js_util.getProperty(response, 'status') as int;
    if (status != 200) {
      throw Exception('Failed to fetch stream. Status code: $status');
    }

    final body = js_util.getProperty(response, 'body');
    final reader = js_util.callMethod(body, 'getReader', []);

    while (true) {
      final result =
          await js_util.promiseToFuture(js_util.callMethod(reader, 'read', []));
      final done = js_util.getProperty(result, 'done') as bool;
      if (done) break;

      final chunk = js_util.getProperty(result, 'value');
      yield Uint8List.fromList(List<int>.from(chunk));
    }
  }
}

PS: If you have some time and feel like you want to help further, I can share a key with you in PM.

MarcoB · December 14, 2024, 8:34am

I wrote a PM to you.

MarcoB · December 14, 2024, 2:52pm

I think the problem were the chunks size coming from the stream.

Since the chunks size coming from OpenAI could be really small and they can be odd, I used a buffer. When the buffer reaches the [chunkSize] size, we yield the bytes so we are sure that we deliver an even number of bytes of a consistent size.

Seems to work just fine!! I submitted a PR.

Really love these stuff and this forum!

Callmephil · December 14, 2024, 3:24pm

Oof! You’re a genius man

I see you used the HTTP package, but it doesn’t support chunking on the web so the audio will play only at the end of the download. (you can verify this by replacing the text with a very long one).

So what I did is that I modified the fetch method I wrote and it works like a charm !

  Stream<Uint8List> tts(String url, Map<String, dynamic> payload) async* {
    final options = js_util.jsify({
      'method': 'POST',
      'headers': {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer $apiKey',
        'Transfer-Encoding': 'chunked',
      },
      'body': jsonEncode(payload),
    });

    final response = await js_util.promiseToFuture(fetchJs(url, options));
    final status = js_util.getProperty(response, 'status') as int;
    if (status != 200) {
      throw Exception('Failed to fetch stream. Status code: $status');
    }

    final body = js_util.getProperty(response, 'body');
    final reader = js_util.callMethod(body, 'getReader', []);

    /// Since the chunks size coming from OpenAI could be really small and they
    /// can be odd, here we are using a buffer. When the buffer reaches the
    /// [chunkSize] size, we yield the bytes so we are sure that we deliver
    /// an even number of bytes of a consistent size.
    final buffer = BytesBuilder();
    var remainder = Uint8List(0);
    const chunkSize = 1024 * 2; // 2 KB of audio data
    var count = 0;

    while (true) {
      final result =
          await js_util.promiseToFuture(js_util.callMethod(reader, 'read', []));
      final done = js_util.getProperty(result, 'done') as bool;
      if (done) break;

      final chunk = js_util.getProperty(result, 'value');
      buffer.add(List<int>.from(chunk));
      count++;
      debugPrint('YIELD count: $count  buffer: ${buffer.length} bytes');

      while (buffer.length >= chunkSize) {
        final bufferBytes = buffer.toBytes();
        final chunk = Uint8List.sublistView(bufferBytes, 0, chunkSize);
        debugPrint('Chunk: ${chunk.length} bytes');
        yield chunk;

        remainder = Uint8List.sublistView(bufferBytes, chunkSize);
        buffer
          ..clear()
          ..add(remainder);
      }
    }
    if (remainder.isNotEmpty) yield remainder;
  }

tfozo · December 17, 2024, 10:48am

This is the best thread I read so far…you two make a good colleague

My question is I was working on WASAPI windows device audio API and I wanted to fetch that audio stream and wanted to broadcast it to other people who are connected to the server

The host i.e the windows pc is running flutter web(on chrome or edge any browser)

And the person listening to the audio anyone because they are just listening to the audio.

I tried working on it and it got out of my hand…and there weren’t much of packages that at least help me with the audio fetching of device (internal) audio…so I kinda sadly put my project away until I start working on it…

Now your conversation inspired me any advice before I approach this again?

Callmephil · December 17, 2024, 3:13pm

I’m not sure to understand your requirements is it like a shared audio session (like playing music for multiple clients?) or it’s more of a communication one? if it’s the second then you probably want to look at VoIP solutions like webRTC they support Flutter

In my case, it’s not production-ready yet, The Soloud package only uses PCM for streaming at the moment, and the format is too big for streaming on non-stable connections. Sometimes, it hangs out and sometimes loses some packets over long text. It is however much quicker than just waiting for the whole audio to be generated. (and we’re talking 40 seconds+ of difference)

So I recommend waiting until this gets sorted out if you need something similar.

MarcoB · December 17, 2024, 6:20pm

I was experimenting sending audio and receive through a WebSocket if this is something you are looking for.

Here the repo which implements a walky talky like app with a WS client and server. It uses flutter_recorder to listen to the mic and flutter_soloud to output the received audio.

It is just an experiment that send audio to all devices connected in the same network, but could be a start!

BlueAquilae · December 17, 2024, 7:00pm

Is this general enough to work on other protocol than WS?

MarcoB · December 17, 2024, 8:38pm

Wherever the audio data comes from, you can use it. As long as it is RAW PCM (for now). For example you can generate the sound locally via software and then listen to the result.
The @Callmephil repository uses http POST to get audio data from OpenAI, while my walky talky uses a WS, but needs to be rewritten to use other methods.

It will probably be possible to implement adding audio data with the opus codec which would make the data transfer cheaper.

I hope I answered your question!

BlueAquilae · December 17, 2024, 8:42pm

Thank you @MarcoB you addressed every points… I just need to read more on the subject but you know I already use your libs!

MarcoB · December 17, 2024, 9:35pm

Yes I know and I’m honored!!

One disadvantage for now is that the streamed data is all saved in memory, so continuous streaming is not possible at the moment.

Callmephil · December 20, 2024, 10:30am

I just discovered https://livekit.io/ just in case someone is more inclined to use a third party. I’m looking forward to seeing an Opus implementation for streaming less data

gh-pap · January 12, 2025, 3:59pm

I’m developing a flutter multi-client (including Windows) app using the OpenAI realtime API which is all straightforward except for playing streaming audio which I’ve been stuck on all day (ok, longer haha). I was so excited by this thread until @MarcoB says this:
“so continuous streaming is not possible at the moment”

was wondering if that’s still the status and, if so, whether anyone on this thread knows of other plugins or has other ideas (the Windows requirement is the real struggle). Any help appreciated!

MarcoB · January 12, 2025, 4:53pm

I would like to correct and elaborate on what I wrote. Continuous streaming is possible as long as there is memory for that received audio data. This means that if the player is initialized with a sample rate of 24KHz, every minute will take about 300 MB.

So for now the received audio can be listened many times and possibly disposed to start with a new audio.

Later perhaps it will be possible to discard the audio data already listened to free up the memory of the part already listened to.

Lastly, the Opus format with the Ogg codec (the one supported by OpenAI), is already available for all platforms except for web which is in WIP (in the opus branch of flutter_soloud in this PR).