Openai_ex - OpenAI API client library

restlessronin · November 13, 2023, 1:10pm

I have just published v0.4.1 with

A first cut at adding all the new beta api calls to the user guide.
DALL-E-3 support in the Image endpoint
changes to the FQN for some modules (although I did not bump up the major version)

if there’s anything missing or not working as it should, please file an issue (or better yet, a PR )

restlessronin · November 15, 2023, 2:26pm

Still catching up with the new features.

I just published v0.4.2

Added the Text-To-Speech endpoint
Changed the FQN of the Audio and Image functions to match the equivalent Python functions (and yet, I left the major and minor versions unchanged )
Updated the docs.

subbu · November 17, 2023, 12:26pm

Anybody built on top of the new assistants/threads API yet? What’s the sequence of calls you should be making to get the threads use case working? It’s not linear anymore as it requires you to periodically check if a run is complete. OpenAI can now suggest you to make multiple function_calls and these can be executed in parallel. Is using a GenServer for every thread that keeps track of all this state and acts as a bridge between UI ↔ Phoenix App ↔ OpenAI a good idea?

restlessronin · November 21, 2023, 5:49pm

I’ve been hearing good things about the zephyr model created by HF, and was curious whether something that capable could be run locally on my machine.

since there are multiple apps that proxy the openai api to other models (including local models), this seems to be quite straightforward. the only thing that needs to be changed is to parameterize the base_url.

I have done this now with the function OpenaiEx.with_base_url/2 to parameterize the api base URL. I tested it against the llama.cpp-python library and the non-streaming versions of the completion and chat-completion apis worked on the first try. the streaming versions seem to be causing some difficulty still. i will try the streaming calls against another proxy when i get a chance, to check if it’s a bug in the proxy.

my current dev machine is a mac with 8GB and it turns out that’s not enough to run a 7b model as well as a docker dev container (and assorted other apps). i suspect 16 Gb would work.

if anyone wants to try local LLMs via an OpenAI API proxy, please give the library a whirl and let me know what you think.

i have not published this to hex as yet, so the mix entry has to point to the github main branch for now.

restlessronin · November 23, 2023, 2:48pm

I’m looking for feedback on potential new features for the library. Please upvote on the following issues if you think they should be implemented.

Improve non-interactive usage with more explicit error reporting / handling. · Issue #70 · restlessronin/openai_ex · GitHub
Allow interruption of streaming requests · Issue #69 · restlessronin/openai_ex · GitHub

arton · November 30, 2023, 9:05pm

Thanks for the work on this Library. I have a couple questions.

Do you think openai_ex is a good fit for handling requests in a production app instead of just LiveBooks?
Is exponential backoff supported via finch to handle rate limiting issues?

restlessronin · December 8, 2023, 9:36am

@arton sorry for the delay, I’m not on the forum every day, so I didn’t realize you had asked the questions. Here are my 2 cents.

I don’t see why it couldn’t be used for production. It’s just a very thin wrapper over the JSON RPC api. If there are some changes that need to be made, I’m happy to make them. For instance, there may need to be some way to handle Finch pools, but these design decisions are best driven by an actual use-case, so I haven’t tried to add them in ex-ante.
The library doesn’t do any exponential backoff. I’d prefer to leave those kinds of decisions to the library user and/or whatever Finch does.

restlessronin · December 10, 2023, 5:11am

I just released v0.5.0 adding OpenaiEx.with_base_url to enable other LLMs via an OpenAI API proxy. AFAIK this only really applies to completion and chat completion endpoints.

Enjoy. Please let me know if you come across any issues.

restlessronin · December 16, 2023, 11:55am

It turns out that there is a bug in the streaming chat completion implementation when used with some 3rd party OpenAI proxies to non-OpenAI models.

Basically the SSE implementation in the library is not completely compliant with (what I assume to be) the SSE spec. Ideally this should be solved at the Finch/Mint level, rather than left to the JSON API wrappers, but if this is an important use case, I’ll tackle it in the library on a priority basis.

Please respond on this thread (or on the github bug) if this is important for you.

To be clear, this does not impact usage with the actual OpenAI API.

restlessronin · December 20, 2023, 4:31am

I have released v0.5.1 with fixes for a bug that affected streaming openai proxies to other models (think litellm to ollama, etc.).

Thanks to github user @Madd0g for reporting the bug and providing a fix.

If you’re using openai_ex to access non-open ai models via a proxy, please use this version and let me know of any other changes that you might want.

restlessronin · December 29, 2023, 11:21am

I have released v0.5.2. After another request for increasing the timeout, this time to 10 minutes, I decided to make the timeout configurable using OpenaiEx.with_receive_timeout/1. The default timeout stays at 2 minutes, although it can probably be reduced now.

I also removed the documentation for the Completion API which is now legacy. The OpenAI models that use it are deprecated and will be removed in Jan 2024.

I have not removed the actual API because there might still be proxied LLMs (enabled by the recent OpenaiEx.with_base_url/1 configuration) that continue to use this endpoint for a little longer.

I have documented the configurable options (receive timeout and base url) in the User Guide.

restlessronin · January 1, 2024, 3:19pm

I have released v0.5.3 with support for cancel’ing an ongoing streaming chat completion request. This seems necessary for any production use of interactive chat.

The streaming api had to be changed so that a task pid for cancellation could be returned. I took the opportunity to also return the response status and headers in addition to the streaming body.

The user guide has been updated to reflect the new API return value. The use of the request cancellation API is shown in detail in the Streaming Orderbot sample.

The cancellation API has only been tested on OpenAI, not on any proxied models.

Enjoy the new functionality in the New Year

restlessronin · January 4, 2024, 4:46am

I have released v0.5.6. Changes since v0.5.3 are

Allow configuration of receive timeout (previous fix was wrong)
Allow configuration of named Finch instance. This enables custom Finch pools
Updated documentation.

hubertlepicki · January 5, 2024, 9:32am

I just tested 0.5.6 and works for me!

restlessronin · January 7, 2024, 8:30am

The library is getting fairly stable, and I’m now looking towards strict SemVer compliance and an eventual 1.0 release. Before I do that, I would like to add any missing major functionality.

A few things that come to mind are

Production use. The original use-case for the library was Livebook, which is an interactive use-case. But there’s no reason why it can’t be used in production (especially after the last 2 updates on finch pooling and request cancellation). If you’re not using it for production because it’s missing something you need (possibly this), please let me know.
Full typespecs, including return types. I have not missed this personally, but I’m open to adding these if it’s a usage deal breaker for anyone.
Matching FQN names to those in the official python library (a small nit, but one I’d like to fix).
Partial auto-generation from the OpenAPI spec (would also help with 2 & 3). I would like to head towards a manually tuned, auto generated library, where the tuning would be retained between generations.
Anything else I might have missed.

If anyone has any thoughts on any of these, please let me know, either on this thread or a s a github issue.

rched · January 18, 2024, 11:29pm

Any plans to add support for Microsoft Azure’s OpenAI service?

restlessronin · January 23, 2024, 4:10am

@rched I haven’t used the Azure service myself, but my understanding is that the parameters are not identical to the OpenAI version. Correct me if I’m wrong, but I believe that the model name is embedded in the URL path, rather than being passed as a parameter.

If that’s the case, then I don’t currently have plans to support it. That could change if there’s a lot of demand for it. But I suspect that it’s more likely that Azure will start using the OpenAI v1 API instead. Then it should just be a matter of setting the base_url to point to azure instead of OpenAI.

rched · January 27, 2024, 4:28am

@restlessronin I’ve only recently started working with Azure’s OpenAI service so I am by no means an expert. That said it looks to me that Azure tries to conform to the OpenAI API as much as possible. There are three main differences that I’ve come across so far. These are unlikely to be “fixed” by Azure because they stem from more fundamental differences between the two services. The main differences I’ve found are:

On Azure you call the API for a specific deployment. That deployment is for a specific model and has a unique URL. This means the model can’t be set as part of the request payload. This does seem to be solvable using the base_url setting you mentioned.
Authentication is slightly different. Instead of expecting an Authorization header Azure expects an api-key header with the value being just the API key.
Lastly I believe Azure requires the request URL to contain a query parameter specifying the API version you would like to use. Here is an example URL for the chat completions endpoint: https://dummycorp.openai.azure.com/openai/deployments/dummy-gpt-4-deployment/chat/completions?api-version=2023-12-01-preview"

It looks like it wouldn’t take too much work to support these differences in Openai_ex.

restlessronin · January 28, 2024, 8:20am

@rched could you please point me at official documentation that clearly specifies the differences you mention, so I have a reference I can use for any potential changes?

Points that I’m considering while making a decision

A lot of 3rd party LLM models / model sources now implement the OpenAI chat completions API to provide API access. It would be best to enforce conformity with the actual API instead of making exceptions that create multiple flavors of API.
The long term goal is to track the OpenAI API with automatically generated wrappers from the OpenAPI spec, which then have manual tuning auto-applied. I’d prefer not to have exceptions that have to be separately tracked.
I would be Ok with having some extra API on top of / beside the existing API, rather than modifying the current API to special case Azure. It’s not clear how to achieve that.

rched · January 28, 2024, 4:42pm

@restlessronin you can find the completions API documentation here: Azure OpenAI Service REST API reference - Azure OpenAI | Microsoft Learn.

As far as I can tell Azure does conform to the OpenAI API, in fact it sounds like they work with OpenAI on developing the API, so I think it should still be possible to use the automatically generated wrappers. Azure also provides their own OpenAPI specs for their full API if that’s helpful.

This commit shows what was necessary to get a successful response from Azure with Openai_ex. A possible solution might be to add a couple of configurable settings. One for api type that would be openai by default but could be azure. A second for api_version that would be empty by default but if set would be appended to requests as a query parameter. These changes along with the base url config should be all that’s necessary to add basic support for Azure.