Openai_responses - A client for OpenAI's new Responses API

Hello, I just published openai_responses, a very simple wrapper around OpenAI’s new Responses API. From what I understand, this is what they want for developers to use going forward, and the old Chat Completions API is now considered “legacy”.

Granted, it’s v0.1.0, so bugs and rough edges are to be expected at this point.

Here is a X thread with some usage examples. Please let me know what you think!


Why did I create yet another Elixir library for working with LLMs?

I have to confess: I initially developed OpenAI.Responses just to explore the (then newly released) Responses API and experiment with Elixir code generation using LLM agents (Claude Code with Sonnet 3.7 and Cursor + Sonnet 3.7). Since then, I’ve refined it, releasing version 0.4.0 with improved API wrapping.

Ecosystem fragmentation is a known issue, especially in smaller communities like Elixir. So why create another library instead of using existing ones? Two reasons:

  1. Focus on Cutting-Edge APIs: I target OpenAI’s latest, advanced API, prioritizing innovation over supporting a broad range of LLM providers.
  2. Minimalist SDK Approach: Inspired by Dashbit’s SDK philosophy, I aim for minimal abstraction, avoiding heavy frameworks.

No existing solution aligns with these goals, justifying a new library.

Existing Solutions

Four notable libraries exist for LLM integration in Elixir (GitHub stars indicate relative popularity):

  1. LangChain (897 github stars): The most popular, supporting numerous providers with a unified abstraction. Example:

    LLMChain.add_message(Message.new_user!("Where is the hairbrush located?"))
    

    It smooths out provider differences but prioritizes broad compatibility over advanced features. Responses API support is in progress.

  2. Instructor (720 github stars): Unique for enabling structured outputs via Ecto schemas. Revolutionary 1.5 years ago, it’s less critical now as OpenAI and Gemini natively support structured outputs with JSON schema compliance. Although Anthropic’s support of structured output is inconsistent, and smaller providers vary, so Instructor still has its use cases.

  3. OpenAI.Ex (345 github stars): No longer actively maintained (last commit ~11 months ago).

  4. OpenAI_Ex (178 github stars): Actively developed with Responses API support, but primarily focused on the older Chat Completions API.

Why OpenAI.Responses?

For my startup, finding product-market fit is critical. Success won’t hinge on using the cheapest or fastest LLM, but on leveraging a reliable, feature-rich provider like OpenAI. Targeting OpenAI exclusively allows access to cutting-edge features (e.g., image generation, web search, script execution) without worrying about cross-provider portability.

Portability across LLM providers is impractical anyways! Even switching models within OpenAI (e.g., gpt-4o to gpt-4.1) can alter behavior, and different providers require unique optimizations and careful prompt refining.

I also prioritize minimal overhead. Unlike LangChain’s Message.new_user!, I use simple structures like %{role: :user, content: "message"}, which are cleaner and more flexible (e.g., supporting dynamic inputs or YAML). OpenAI.Responses keeps abstraction to a minimum; users can inspect response.body for a well-documented structure. For complex use cases, OpenAI’s documentation is essential anyways, and I avoid adding unnecessary layers.

Finally, by focusing on a single provider, I can realistically support features such as automatic cost calculation; it would be simply impractical to keep in sync with pricing changes across the full ecosystem. I can also experiment with API design, such as chaining of create/2 calls to support the conversation state, a cleaner interface for JSON schema definition, or automatic resolution of tool calls.

Conclusion

By focusing on OpenAI’s latest API and minimizing overhead, OpenAI.Responses offers a lean, modern solution for Elixir developers. I welcome feedback—please share your thoughts!

4 Likes

@vkryukov Nice! Any reason you didn’t use the openai_ex client (disclaimer I’m the maintainer :innocent:)? I added the responses endpoint a few days ago (although I see from the documentation that they’ve already updated some stuff there, and I’ll need to do one more pass over it).

@vkryukov On a separate note, I noticed that you used Claude Code with this repo. What has been your experience with it?

I have just started looking into it, and I hope they bring some of the features into Claude Desktop (my daily driver).

@restlessronin the main reason was that I needed something quick, and I assumed that major libraries (such as LangChain which I use in production) will take a while to implement it. I did check openai_ex’s GitHub homepage but since it didn’t mention responses, I thought they are not implemented yet (and I failed to check the git log).

But also, I wanted something lightweight, in “SDKs with Req” fashion. For example, I use @brainlid’s LangChain in production, because it supports many providers, such as OpenAI/Anthropic/Google/Groq/xAI/many others with just a parameter change, and it is mature and well tested, but the simplest usage example is something like this:

{:ok, updated_chain} =
  %{llm: ChatOpenAI.new!(%{model: "gpt-4o"})}
  |> LLMChain.new!()
  |> LLMChain.add_message(Message.new_user!("Testing, testing!"))
  |> LLMChain.run()

Compare this to just

{:ok, response} = Responses.create("gpt-4o", "Testing, testing!")

Or another example (and this is not a ding to LangChain), to get the number of tokens you need to define a callback function. I understand how it might be useful in some contexts, but it can also be a bit cumbersome in others.

I found that I almost always create simple wrappers, and wanted to design a new library from scratch - without any legacy luggage, like the need to support chat completions or other providers - to be simple and delightful in use.

And of course, last but not the least, it was an excuse to try Claude Code. I am very satisfied with the result of this experiment: it can create something quite useful with minimal guidance.

2 Likes

Here’s my subjective experience comparing Claude Code to Cursor, which I use daily as my main tool. (I say “subjective” because, even when the underlying models—like Claude Sonnet 3.7—are the same, these tools differ in their behaviors in ways that are hard to measure.)

Claude Code feels a bit smarter and gets to the “right answer” more quickly, with fewer revisions. In my opinion, it’s about 2-3 times faster, based on the time from when I give a prompt to when I get a mostly working solution.

The trade-off is the cost. I suspect that, like the early days of Uber or Lyft, some venture capital money is being spent to keep prices low for AI code editors. For example, I spent around $5 on Claude Code credits (mostly trying to get streaming to work—more on that later). With Cursor’s $20 monthly subscription for 500 fast requests, that’s like using 125 requests. If I’d done the same task in Cursor, I probably wouldn’t have used more than 10-15 requests—a huge difference. (Also, someone on X recommended trae.ai, which is currently free, because Alibaba or some other Chinese internet giant is paying for your tokens.)

I had two main challenges when creating openai_responses:

  1. The streaming did not work initially, because it didn’t know about Req’s :into parameter and hallucinated that it needs hackney to make it work. The solution, after many trials (including asking Grok 3 to help), was to just drop instructor_ex’s source file which implements streaming and telling Claude, “do it this way”.

  2. The Kino.Frame streaming example in Livebook was originally enclosed with another spawn, and didn’t work (some weird interactions between Elixir processes I guess). Neither Claude nor Grok knew how to fix it until I just decided to try to remove the enclosing spawn, which was really not needed.

Also, the examples it wrote for me to include in Livebook tutorial were overly complex - that’s the only part of the library that I decided to write myself.

1 Like

btw, I suspect that the Enum should be Stream in your tutorial, otherwise it will convert everything into a list before iterating and thus you won’t observe any streaming behavior.

stream.body_stream
|> Stream.flat_map(& &1)
|> Enum.each(fn event ->

My bad. I should have done a better job with setting up some kind of changelog :slightly_frowning_face:

Fair point. Perhaps at some point, Open AI will start doing this and providing multiple lightweight SDKs themselves. In the meantime, my goal was to mirror the complete Open AI SDK in as lightweight a manner as possible.

Based on user feedback/PRs the openai_ex library layered on functionality that I myself was not using / testing (Azure support, Finch pools, Local LLM streaming tweaks, Portkey support, deviations from SSE standards, api key log redaction, etc.) A lot of knowledge from actual use has been baked into the library at this point.

OTOH, it’s unclear if any of this will be important for the Responses API, so perhaps it is a good decision to keep it separate and lightweight.

Another option might be to use ‘openai_ex’ to do the actual call and have a library that provides functionality that layers on top, such as your 'text_deltas" function. I considered adding these helpers at some point but decided that I didn’t understand the individual use cases well enough to decide what was appropriate for everyone.

Good luck with the project in any case :+1:

1 Like

Ah no. It works correctly. Enum works on enumerables, not lists. In any case, the user guide / tutorial is basically my test suite. It gets run after every change to the API, so something like this would get caught pretty early. Doing it this way also ensures that the documentation is always up to date with the library.

2 Likes

Thanks for taking the time to recount your experience in such detail.

As another data point about workflow/costs for those who are interested:

I tend to use Claude Desktop for most of my coding work, and that’s a fixed price subscription. Once in a while I get rate-limited and have to take a break, and I just switch to Grok-3 (which is pretty good) and gemini-2-pro (much improved, first useful google model) also in the chat window (hence free for the moment). No MCP, so it’s a little less fluid.

I also have API keys for open router (i use their chat window as well), claude and open-api (for use in zed or cline). I don’t like the AI IDE interfaces in general, and they seem to burn through credits much faster in ways that I don’t understand.

Initial impressions of Claude Code is that it enables more agentic workflows where it goes off and does a bunch of stuff for you. I still have to get used the UI though. Not crazy about cost, though it seems, if anything, a little lower than using API key via the IDEs. I have the impression that not all IDEs may be using cached context (= lower price), but that’s just speculation

I have added a CHANGELOG now. Thanks for bringing this to my attention.

Updated first post with “Why did I create yet another Elixir library for working with LLMs?”

1 Like

I know the original comment is about using Claude Code to develop the code base, but I’m wondering if Responses are exclusively an OpenAI thing.

I was under the impression that, say, Claude’s API was compatible with OpenAI’s API, but maybe that is not true for Responses.

I’d just rather give money to Anthropic than OpenAI if paying for credits outside of my Claude Max subscription for Claude Code usage.

OpenAI.Responses is indeed OpenAI-only library, but I develop it using Claude Code, which I think is the current SOTA in terms of the code generation quality.

openai_responses library has been updated to v0.4.0. This release cleans up both the API and implementation details, and adds improved support for streaming (including streaming JSON events) , structured outputs, and function calling. Additional details can be found in Changelog for v0.4.0.

Here are some examples of using the new API from the Livebook Tutorial:

  1. You can use a previous response to continue the conversation; OpenAI takes care of keeping the state:
alias OpenAI.Responses

Responses.create!(
  input: [
    %{role: :developer, content: "Talk like a pirate."},
    %{role: :user, content: "Write me a haiku about Elixir"}
  ]
)
|> Responses.create!(input: "Which programming language is this haiku about?")
|> Map.get(:text)
|> IO.puts()
# Output contains "Elixir"; still talks like a pirate
  1. Costs are calculated automatically:
{:ok, response} = Responses.create("Explain quantum computing")

# All cost values are Decimal for precision
IO.inspect(response.cost)
# => %{
#      input_cost: #Decimal<0.0004>,
#      output_cost: #Decimal<0.0008>,
#      total_cost: #Decimal<0.0012>,
#      cached_discount: #Decimal<0>
#    }
  1. You can request a Structured Output and stream JSON events:
Responses.stream(
  input: "List 3 programming languages with their year of creation",
  model: "gpt-4o-mini",
  schema: %{
    languages: {:array, %{
      name: :string,
      year: :integer,
      paradigm: {:string, description: "Main programming paradigm"}
    }}
  }
)
|> Responses.Stream.json_events()
|> Enum.each(&IO.puts/1)
  1. You can define provide access to your own functions and automatically resolve function calls (notice it uses run/2 instead of create/1):
# Define available functions
functions = %{
  "get_weather" => fn %{"location" => location} ->
    # In a real app, this would call a weather API
    case location do
      "Paris" -> "15°C, partly cloudy"
      "London" -> "12°C, rainy"
      "New York" -> "8°C, sunny"
      _ -> "Weather data not available"
    end
  end,
  "get_time" => fn %{"timezone" => timezone} ->
    # In a real app, this would get actual time for timezone
    case timezone do
      "Europe/Paris" -> "14:30"
      "Europe/London" -> "13:30" 
      "America/New_York" -> "08:30"
      _ -> "Unknown timezone"
    end
  end
}

# Define function tools
weather_tool = Responses.Schema.build_function(
  "get_weather",
  "Get current weather for a location",
  %{location: {:string, description: "City name"}}
)

time_tool = Responses.Schema.build_function(
  "get_time", 
  "Get current time in a timezone",
  %{timezone: {:string, description: "Timezone like Europe/Paris"}}
)

# Run the conversation with automatic function calling
responses = Responses.run(
  [
    input: "What's the weather and time in Paris?",
    tools: [weather_tool, time_tool]
  ],
  functions
)
1 Like

openai_responses library has been updated to v0.5.1, bringing significant enhancements since v0.4.0. This release adds union type support, manual function calling control, built-in error handling with retry logic, and flexible API options. See the full changelog for complete details.

Union Types with anyOf

Define properties that can be multiple types:

alias OpenAI.Responses

Responses.create!(
  input: "Generate a product listing",
  schema: %{
    price: {:anyOf, [:number, :string]},  # Can be 29.99 or "$29.99"
    tags: {:anyOf, [:string, {:array, :string}]}  # Can be "electronics" or ["laptop", "gaming"]
  }
)

Manual Function Calling

Take control of function execution with the new call_functions/2:

  # Get a response with function calls
  {:ok, response} = Responses.create(
    input: "What's the weather in Paris and London?",
    tools: [weather_tool]
  )

  # Manually execute functions with custom logic
  outputs = Responses.call_functions(response.function_calls, %{
    "get_weather" => fn %{"location" => city} ->
      # Add logging, caching, or modify results
      Logger.info("Weather requested for #{city}")

      weather = fetch_weather_from_api(city)
      %{temperature: weather.temp, conditions: weather.desc, cached_at: DateTime.utc_now()}
    end
  })

  # Continue conversation with enriched results
  {:ok, final} = Responses.create(response, input: outputs)

Error Handling with Retry Support

The new OpenAI.Responses.Error module provides intelligent error handling:

  case Responses.create(input: "Hello") do
    {:ok, response} ->
      IO.puts(response.text)

    {:error, error} ->
      if OpenAI.Responses.Error.retryable?(error) do
        # Retry with exponential backoff for 429, 500, 503, or timeout errors
        :timer.sleep(1000)
        # retry the request
      else
        # Handle non-retryable errors
        Logger.error("API error: #{error.message}")
      end
  end

Flexible API Options

All major functions now accept both keyword lists and maps:

  # Traditional keyword list
  Responses.create(input: "Hello", model: "gpt-4o", temperature: 0.7)

  # Using maps (great for dynamic options)
  options = %{input: "Hello", model: "gpt-4o", temperature: 0.7}
  Responses.create(options)

  # Works with streaming too
  Responses.stream(%{
    input: "Write a story",
    stream: Responses.Stream.delta(&IO.write/1)
  })

Additional Improvements

  • Model preservation in follow-up responses (no more accidentally switching models)
  • Function calls now properly handle JSON-encodable return values
  • Support for string keys in schema definitions (database-friendly)
  • Cost calculation for new models including o3
  • Enhanced documentation and LLM usage guide

The API remains backward compatible while providing more flexibility and control over your AI interactions.

1 Like

Hi everyone! I’m excited to announce the release of OpenAI.Responses 0.6.0, which includes several improvements and bug fixes since version 0.5.1.

What’s New

Array Schemas at Root Level (0.6.0)

The biggest feature in this release is automatic support for arrays at the root level of structured output schemas. Previously, OpenAI’s API required the root level to be an object, which meant you had to manually wrap arrays. Now the library handles this transparently:

# You can now do this directly!
{:ok, response} = Responses.create(
  input: "List 3 interesting facts about space exploration",
  schema: {:array, %{
    fact: :string,
    year: {:integer, description: "Year of the event"},
    significance: {:string, description: "Why this fact is important"}
  }}
)

# response.parsed is an array directly:
[
  %{"fact" => "First satellite launch", "year" => 1957, "significance" => "Started the space age"},
  %{"fact" => "Moon landing", "year" => 1969, "significance" => "First humans on another celestial body"},
  %{"fact" => "ISS construction", "year" => 1998, "significance" => "Permanent human presence in space"}
]

The library automatically wraps arrays in an object before sending to the API and unwraps them in the response, making the developer experience seamless.

Bug Fixes

Duplicate Assistant Response Handling (0.5.3)

Fixed an edge case where the OpenAI API sometimes returns duplicate assistant responses. The library now only processes the first assistant response in Response.extract_text/1, preventing duplicate content in your results.

Documentation Improvements (0.5.2)

  • Added comprehensive documentation for the :schema option
  • Improved examples throughout the codebase
  • Better explanation of structured output features

Internal Improvements (0.5.2)

  • Refactored input handling to consistently accept both maps and keyword lists with atom or string keys
  • Fixed all Dialyzer issues for better type safety
  • Resolved Credo warnings for improved code quality

Upgrading

To upgrade to the latest version, update your mix.exs:

def deps do
  [
    {:openai_responses, "~> 0.6.0"}
  ]
end

Then run:

mix deps.update openai_responses
1 Like