Openai_ex - OpenAI API client library

I have now added support for streaming responses for Completion and ChatCompletion endpoints.

At this point, the entire current OpenAI API is supported, and the library has somewhat stabilized. So I’ve bumped the minor version to 0.2.

Keep the feedback and suggestions coming! Shoutout to @hubertlepicki for pointing me at multipart, and @neilberkman and @zachallaun for their samples on how to stream. Those ideas have already made it into the library.

Some of the background information on langchain, prompt engineering, etc. has also been very informative.

Many thanks :pray::pray::pray: to everyone.

4 Likes

Thanks for the pointer. It was a short but interesting series of videos.

As an exercise, I translated the python notebook in Lesson 8, of the course, which builds a restaurant OrderBot.

The elixir / kino livebook is available at Deeplearning.AI Order Bot — openai_ex v0.2.0.

It’s really quite remarkable that you can build something that can carry on complex interactions with such a simple set of instructions.

Warning. I ran into a bunch of rate limits while running the notebook (likely because I have a free account). There’s a commented out IO.inspect which will show you what’s happening if you’re getting errors.

1 Like

I have added a new Livebook demonstrating the details of streaming chat completion. It is functionally the same as Deeplearning.AI Order Bot — openai_ex v0.2.0, but using the streaming version of the API.

The sample is at Streaming Orderbot — openai_ex v0.2.0 and can also be deployed as a Livebook app.

I was pleasantly surprised by how easy it was to integrate the streaming api into Livebook, using Kino to create the UI.

Going by my own user experience, I think the streaming API is pretty much a requirement in any production scenario, and I now understand why it’s even there. I’m happy to provide this sample to help others get started with it.

1 Like

This version works well I feel. Nice! I am taking inspiration from it.

(Actually I got Finch errors with your previous version - I do have a paid OpenAI account)

Regarding how elixir can “integrate” (loosely) with the broader set of LLM tools out there,

  • Integrating with Langchain - Langchain is a set of tools for working with LLM’s that provides some common APIs, but those API’s are changing really, really, really, fast. I could see an Elixir wrapper happening at some point, but right now it’s way too early. To say “Integrate with langchain” or “support langchain” is too broad to be meaningful, at least at this point.
  • Integration with “other LLM’s” - I would love to have some way to integrate local LLM’s into elixir projects. Instead of focusing on Langchain as an integration target, it probably makes more sense to look at Huggingface’s “transformers” library, which is what most LLM’s use. Transformers is pretty huge though. Implementing Transformers in Elixir is probably a bad idea.
  • Integration with Python in general - Python is so different from Elixir. Elixir can make a bunch of guarantees because it’s functional, uses the Actor model, is based on Erlang, etc. Python is notorious for concurrency, memory management, etc. (Forgive me if this is a naive evaluation). Trying to bridge the gap is probably also a losing proposition.

So, what’s left?

Kinda exactly what openai_ex already is - an API for integrating with openai-compatible API’s (both standard http and streaming). Since OpenAI is the de-facto, Langchain supports OpenAI’s API, and most new projects (AutoGPT, AgentGPT, BabyAGI, etc) all integrate with OpenAI first and foremost. I think very soon we’ll see more projects (text-generation-webui, etc) providing OpenAI-compatible API’s for http and streaming. There’s already GitHub - hyperonym/basaran: Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models. which provides an OpenAI-compatible API that allows you to run LLM’s locally. So,

Probably your best bet to use Elixir with local LLM’s is to use openai_ex to talk to a Barasan server, and run your models under Barasan.

I checked the documentation, and it appears the rate limits apply to paid accounts as well. Most likely that was causing the problem. Uncommenting the IO.inspect will show you the error JSON, and you can verify that was the case.

I’m happy you’re getting some use out of the example.

@tensiondriven I largely agree with what you’re saying.

I’m trying to figure out what other functionality belongs in this library, or whether it’s ok as is. The current incarnation is a very thin wrapper around endpoint calls.

There’s room for higher level support such as utilities to pick out pieces of content (especially from streams), keep track of token counts, rate-limits, cost estimates, etc. Possibly breaking long requests into batches, and so on. The kinds of things that I think @darwin67 was referring to earlier in his mentions of langchain.

Whether to include that in this library, provide it as another library, or just sample notebooks is an open question. I’m leaning towards including frequently used functionality in the library, and more ad-hoc use cases as sample notebooks. Anyone have any thoughts on this?

@zachallaun one feature (that you alluded to in your code) that would indeed be handy would be a way to cancel a currently executing streaming request. For real apps, I think it would be a requirement to allow the user to stop completion generation part way through.

I don’t know if you were planning on adding it. I could take a crack at it, but given the elegance of your PR code, you would likely come up with something much prettier :slight_smile:

The PR to Req is just to support the option that would allow a custom plugin to support streaming, but the FinchStream module is just an example. If/when the PR is merged, I will likely work on a ReqStream package, but I haven’t put too much thought into it yet.

That’s all to say: you’re welcome to add/change whatever you see fit, and I’m happy to help where possible, but don’t have immediate plans.

It looks like the way to abort a request is to throw inside the stream function, see here.

1 Like

If you need streaming → Request/Response streaming for Finch adapter, SSE middleware by teamon · Pull Request #540 · elixir-tesla/tesla · GitHub

If you give it a shot and confirm it’s working fine I’d be happy to merge it and release a new version of tesla.

1 Like

Thanks for the pointer. Yours is (naturally) a much more general solution for SSE events than what I put together specifically for OpenAI.

What I’m looking for at the moment is a way to allow user cancellation while the request is streaming. I couldn’t see an easy way to do that in your PR. Am I missing something?

I know just about enough of Elixir and Python to agree with you :slight_smile: → but there are some interesting developments in that space that gave me pause for thought on this idea.

Thanks for your work on this… Im very much looking forward to using it ‘in anger’ asap

1 Like

@restlessronin FYI GitHub - microsoft/guidance: A guidance language for controlling large language models.

3 Likes

@neilberkman thanks for the heads up. My intuition is that this may be the way to go for “real” apps, rather than the langchain approach of composable “models”.

1 Like

Released v 0.2.1 with increased timeout for Http Request to address Hitting Finch Timeout · Issue #48 · restlessronin/openai_ex · GitHub

I have just released v 0.2.2 with a fix for the streaming Completion implementation. The parsing routine made assumptions about the format of the SSE data which turned out to be incomplete in some (heretofore rare :wink:) situations. It has now been fixed.

The streaming Completion sample in the User Guide Livebook, as well as in the Completions Bot Livebook, both of which broke when “text-davinci-003” was deprecated, are working again.

1 Like

I have released v 0.2.3 to gracefully handle errors during streaming (chat) completion requests. Basically an empty stream is returned and the logger records a warning. Since the use-case is Livebook, the user immediately sees the logger warning in the Debug output pane.

This is a fix for Bug: Unhandled streaming API Error when max tokens are exceeded. · Issue #50 · restlessronin/openai_ex · GitHub

1 Like

I have released v 0.3.0 to fix Add functions to the chat_completion api_fields list · Issue #53 · restlessronin/openai_ex · GitHub. Thanks to @TomBers for pointing it out. I also added a section for function calling in the user guide.

I took the opportunity to sync to the latest (as of Oct 27, 2023) API reference. This involved dropping the deprecated Edit endpoint and replacing the deprecated FineTune endpoint with the FineTuning.Job endpoint.

I also fixed a couple of overlooked bugs on EndPoints that I wasn’t really using myself.

Suggestions for improvement and PRs are always welcome.

1 Like

I have just published v0.4.0 with all the new and updated API features announced at the OpenAI DevDay.

In particular the beta Assistants API is implemented, although examples have not as yet made it into the user guide and the documentation is incomplete in places. I hope to have some examples soon, and perhaps more complete documentation as the beta stabilizes.

In addition the deprecated ‘function’ calls parameters in the chat completion api have been replaced with the new ‘tool’ based parameters (shoutout to @TomBers for doing that and helping out with the documentation and testing, and generally being on the ball).

Enjoy playing with the new functionality!

5 Likes

Great work, especially with the speed the docs and API is changing. :mechanical_arm:

1 Like