Hello, I just published openai_responses, a very simple wrapper around OpenAI’s new Responses API. From what I understand, this is what they want for developers to use going forward, and the old Chat Completions API is now considered “legacy”.
Granted, it’s v0.1.0, so bugs and rough edges are to be expected at this point.
Here is a X thread with some usage examples. Please let me know what you think!
@vkryukov Nice! Any reason you didn’t use the openai_ex client (disclaimer I’m the maintainer )? I added the responses endpoint a few days ago (although I see from the documentation that they’ve already updated some stuff there, and I’ll need to do one more pass over it).
@restlessronin the main reason was that I needed something quick, and I assumed that major libraries (such as LangChain which I use in production) will take a while to implement it. I did check openai_ex’s GitHub homepage but since it didn’t mention responses, I thought they are not implemented yet (and I failed to check the git log).
But also, I wanted something lightweight, in “SDKs with Req” fashion. For example, I use @brainlid’s LangChain in production, because it supports many providers, such as OpenAI/Anthropic/Google/Groq/xAI/many others with just a parameter change, and it is mature and well tested, but the simplest usage example is something like this:
Or another example (and this is not a ding to LangChain), to get the number of tokens you need to define a callback function. I understand how it might be useful in some contexts, but it can also be a bit cumbersome in others.
I found that I almost always create simple wrappers, and wanted to design a new library from scratch - without any legacy luggage, like the need to support chat completions or other providers - to be simple and delightful in use.
And of course, last but not the least, it was an excuse to try Claude Code. I am very satisfied with the result of this experiment: it can create something quite useful with minimal guidance.
Here’s my subjective experience comparing Claude Code to Cursor, which I use daily as my main tool. (I say “subjective” because, even when the underlying models—like Claude Sonnet 3.7—are the same, these tools differ in their behaviors in ways that are hard to measure.)
Claude Code feels a bit smarter and gets to the “right answer” more quickly, with fewer revisions. In my opinion, it’s about 2-3 times faster, based on the time from when I give a prompt to when I get a mostly working solution.
The trade-off is the cost. I suspect that, like the early days of Uber or Lyft, some venture capital money is being spent to keep prices low for AI code editors. For example, I spent around $5 on Claude Code credits (mostly trying to get streaming to work—more on that later). With Cursor’s $20 monthly subscription for 500 fast requests, that’s like using 125 requests. If I’d done the same task in Cursor, I probably wouldn’t have used more than 10-15 requests—a huge difference. (Also, someone on X recommended trae.ai, which is currently free, because Alibaba or some other Chinese internet giant is paying for your tokens.)
I had two main challenges when creating openai_responses:
The streaming did not work initially, because it didn’t know about Req’s :into parameter and hallucinated that it needs hackney to make it work. The solution, after many trials (including asking Grok 3 to help), was to just drop instructor_ex’s source file which implements streaming and telling Claude, “do it this way”.
The Kino.Frame streaming example in Livebook was originally enclosed with another spawn, and didn’t work (some weird interactions between Elixir processes I guess). Neither Claude nor Grok knew how to fix it until I just decided to try to remove the enclosing spawn, which was really not needed.
Also, the examples it wrote for me to include in Livebook tutorial were overly complex - that’s the only part of the library that I decided to write myself.
btw, I suspect that the Enum should be Stream in your tutorial, otherwise it will convert everything into a list before iterating and thus you won’t observe any streaming behavior.
My bad. I should have done a better job with setting up some kind of changelog
Fair point. Perhaps at some point, Open AI will start doing this and providing multiple lightweight SDKs themselves. In the meantime, my goal was to mirror the complete Open AI SDK in as lightweight a manner as possible.
Based on user feedback/PRs the openai_ex library layered on functionality that I myself was not using / testing (Azure support, Finch pools, Local LLM streaming tweaks, Portkey support, deviations from SSE standards, api key log redaction, etc.) A lot of knowledge from actual use has been baked into the library at this point.
OTOH, it’s unclear if any of this will be important for the Responses API, so perhaps it is a good decision to keep it separate and lightweight.
Another option might be to use ‘openai_ex’ to do the actual call and have a library that provides functionality that layers on top, such as your 'text_deltas" function. I considered adding these helpers at some point but decided that I didn’t understand the individual use cases well enough to decide what was appropriate for everyone.
Ah no. It works correctly. Enum works on enumerables, not lists. In any case, the user guide / tutorial is basically my test suite. It gets run after every change to the API, so something like this would get caught pretty early. Doing it this way also ensures that the documentation is always up to date with the library.
Thanks for taking the time to recount your experience in such detail.
As another data point about workflow/costs for those who are interested:
I tend to use Claude Desktop for most of my coding work, and that’s a fixed price subscription. Once in a while I get rate-limited and have to take a break, and I just switch to Grok-3 (which is pretty good) and gemini-2-pro (much improved, first useful google model) also in the chat window (hence free for the moment). No MCP, so it’s a little less fluid.
I also have API keys for open router (i use their chat window as well), claude and open-api (for use in zed or cline). I don’t like the AI IDE interfaces in general, and they seem to burn through credits much faster in ways that I don’t understand.
Initial impressions of Claude Code is that it enables more agentic workflows where it goes off and does a bunch of stuff for you. I still have to get used the UI though. Not crazy about cost, though it seems, if anything, a little lower than using API key via the IDEs. I have the impression that not all IDEs may be using cached context (= lower price), but that’s just speculation