I have now added support for streaming responses for Completion and ChatCompletion endpoints.
At this point, the entire current OpenAI API is supported, and the library has somewhat stabilized. So I’ve bumped the minor version to 0.2.
Keep the feedback and suggestions coming! Shoutout to @hubertlepicki for pointing me at multipart, and @neilberkman and @zachallaun for their samples on how to stream. Those ideas have already made it into the library.
Some of the background information on langchain, prompt engineering, etc. has also been very informative.
It’s really quite remarkable that you can build something that can carry on complex interactions with such a simple set of instructions.
Warning. I ran into a bunch of rate limits while running the notebook (likely because I have a free account). There’s a commented out IO.inspect which will show you what’s happening if you’re getting errors.
I have added a new Livebook demonstrating the details of streaming chat completion. It is functionally the same as Deeplearning.AI Order Bot — openai_ex v0.2.0, but using the streaming version of the API.
I was pleasantly surprised by how easy it was to integrate the streaming api into Livebook, using Kino to create the UI.
Going by my own user experience, I think the streaming API is pretty much a requirement in any production scenario, and I now understand why it’s even there. I’m happy to provide this sample to help others get started with it.
Regarding how elixir can “integrate” (loosely) with the broader set of LLM tools out there,
Integrating with Langchain - Langchain is a set of tools for working with LLM’s that provides some common APIs, but those API’s are changing really, really, really, fast. I could see an Elixir wrapper happening at some point, but right now it’s way too early. To say “Integrate with langchain” or “support langchain” is too broad to be meaningful, at least at this point.
Integration with “other LLM’s” - I would love to have some way to integrate local LLM’s into elixir projects. Instead of focusing on Langchain as an integration target, it probably makes more sense to look at Huggingface’s “transformers” library, which is what most LLM’s use. Transformers is pretty huge though. Implementing Transformers in Elixir is probably a bad idea.
Integration with Python in general - Python is so different from Elixir. Elixir can make a bunch of guarantees because it’s functional, uses the Actor model, is based on Erlang, etc. Python is notorious for concurrency, memory management, etc. (Forgive me if this is a naive evaluation). Trying to bridge the gap is probably also a losing proposition.
I checked the documentation, and it appears the rate limits apply to paid accounts as well. Most likely that was causing the problem. Uncommenting the IO.inspect will show you the error JSON, and you can verify that was the case.
I’m happy you’re getting some use out of the example.
I’m trying to figure out what other functionality belongs in this library, or whether it’s ok as is. The current incarnation is a very thin wrapper around endpoint calls.
There’s room for higher level support such as utilities to pick out pieces of content (especially from streams), keep track of token counts, rate-limits, cost estimates, etc. Possibly breaking long requests into batches, and so on. The kinds of things that I think @darwin67 was referring to earlier in his mentions of langchain.
Whether to include that in this library, provide it as another library, or just sample notebooks is an open question. I’m leaning towards including frequently used functionality in the library, and more ad-hoc use cases as sample notebooks. Anyone have any thoughts on this?
@zachallaun one feature (that you alluded to in your code) that would indeed be handy would be a way to cancel a currently executing streaming request. For real apps, I think it would be a requirement to allow the user to stop completion generation part way through.
I don’t know if you were planning on adding it. I could take a crack at it, but given the elegance of your PR code, you would likely come up with something much prettier
The PR to Req is just to support the option that would allow a custom plugin to support streaming, but the FinchStream module is just an example. If/when the PR is merged, I will likely work on a ReqStream package, but I haven’t put too much thought into it yet.
That’s all to say: you’re welcome to add/change whatever you see fit, and I’m happy to help where possible, but don’t have immediate plans.
It looks like the way to abort a request is to throw inside the stream function, see here.
Thanks for the pointer. Yours is (naturally) a much more general solution for SSE events than what I put together specifically for OpenAI.
What I’m looking for at the moment is a way to allow user cancellation while the request is streaming. I couldn’t see an easy way to do that in your PR. Am I missing something?
I know just about enough of Elixir and Python to agree with you → but there are some interesting developments in that space that gave me pause for thought on this idea.
Thanks for your work on this… Im very much looking forward to using it ‘in anger’ asap
@neilberkman thanks for the heads up. My intuition is that this may be the way to go for “real” apps, rather than the langchain approach of composable “models”.
I have just released v 0.2.2 with a fix for the streaming Completion implementation. The parsing routine made assumptions about the format of the SSE data which turned out to be incomplete in some (heretofore rare ) situations. It has now been fixed.
The streaming Completion sample in the User Guide Livebook, as well as in the Completions Bot Livebook, both of which broke when “text-davinci-003” was deprecated, are working again.
I have released v 0.2.3 to gracefully handle errors during streaming (chat) completion requests. Basically an empty stream is returned and the logger records a warning. Since the use-case is Livebook, the user immediately sees the logger warning in the Debug output pane.
I took the opportunity to sync to the latest (as of Oct 27, 2023) API reference. This involved dropping the deprecated Edit endpoint and replacing the deprecated FineTune endpoint with the FineTuning.Job endpoint.
I also fixed a couple of overlooked bugs on EndPoints that I wasn’t really using myself.
Suggestions for improvement and PRs are always welcome.
In particular the beta Assistants API is implemented, although examples have not as yet made it into the user guide and the documentation is incomplete in places. I hope to have some examples soon, and perhaps more complete documentation as the beta stabilizes.
In addition the deprecated ‘function’ calls parameters in the chat completion api have been replaced with the new ‘tool’ based parameters (shoutout to @TomBers for doing that and helping out with the documentation and testing, and generally being on the ball).