Foundation - Elixir Infrastructure and Observability Library

I’m excited to announce foundation, a new Elixir library born out of the development of ElixirScope, my debugging and code intelligence platform. foundation aims to provide the essential, “boring but essential” infrastructure pieces that many Elixir applications need, saving you from building them yourself.


The Motivation Behind foundation

During the development of ElixirScope, I consistently encountered the need for robust infrastructure components. What started as simple configuration management quickly evolved into building custom event systems, telemetry handlers, circuit breakers, and more. This was infrastructure code that, while critical, diverted focus from the core product.

About fifty prompts into the project, I realized I had inadvertently created a comprehensive infrastructure library. It included:

  • A configuration system with dynamic updates and subscriber notifications.
  • An event store to track analysis pipeline activities.
  • Intuitive telemetry integration.
  • Circuit breakers and rate limiting for interacting with temperamental AI APIs.
  • Service discovery for managing analysis components.
  • Error handling that preserved context throughout the analysis chain.

The tipping point was debugging intermittent AI analysis failures, which turned out to be a classic distributed systems problem. Instead of piecing together disparate libraries, I found I already had a cohesive solution. This led to the extraction of foundation as a standalone library.


What Foundation Offers

foundation provides infrastructure components with a consistent API. Its core pieces include:

  • Configuration management: Beyond environment variables and Application.get_env, it handles dynamic updates.
  • Event system: Structured events with correlation tracking for easier debugging.
  • Telemetry & monitoring: Actionable metrics to understand application behavior.
  • Infrastructure protection: A unified API for circuit breakers, rate limiting, and connection pooling.
  • Service discovery: A simple registry with health checking.
  • Error handling: Structured errors that retain context.

API Examples

Here’s a glimpse of foundation’s API:

# Dynamic configuration with change notifications
:ok = Foundation.Config.update([:ai, :rate_limit], 100)
:ok = Foundation.Config.subscribe()

# Events with correlation tracking
correlation_id = Foundation.Utils.generate_correlation_id()
{:ok, event} = Foundation.Events.new_event(:ai_request, %{prompt_size: 1500},
  correlation_id: correlation_id)

# Infrastructure protection that actually works together
result = Foundation.Infrastructure.execute_protected(
  :ai_api_call,
  [circuit_breaker: :openai_fuse, rate_limiter: {:ai_user, user_id}],
  fn -> OpenAI.completion(prompt) end
)

Why foundation is Different

foundation is designed for practical use cases rather than theoretical perfection. Each component addresses a real-world need encountered during ElixirScope’s development. For example, the event system focuses on practical correlation IDs for debugging, not complex event sourcing. The configuration system prioritizes runtime updates and notifications.

It’s also built for incremental adoption, allowing you to integrate parts of foundation without a complete rewrite of your existing application.


Current Status and Future Plans

foundation is currently at v0.1.0. All tests pass, and comprehensive documentation, including Mermaid diagrams, is available.

I’m particularly interested in feedback on the API design and the utility of its abstractions for different use cases. The infrastructure protection layer, while effective for ElixirScope, is the most opinionated part and I’m keen to hear if it fits broader needs.

Future considerations include adding distributed coordination primitives like leadership election and distributed locks, as these are recurring requirements in ElixirScope’s development.


Installation and Links

To install foundation, add it to your mix.exs dependencies:

{:foundation, "~> 0.1.0"}

Useful Links:

7 Likes

foundation has morphed. I’ve been building around jido, jido_action, and jido_signal.

Still a long ways to go but I think the core arch is finalized, finally. The goal is to build a true batteries-included agent framework.

I am super grateful for any critical feedback (it’s needed).

The lib code tree:

$ tree
.
├── foundation
│   ├── application.ex
│   ├── atomic_transaction.ex
│   ├── batch_operations.ex
│   ├── cache.ex
│   ├── circuit_breaker.ex
│   ├── config_validator.ex
│   ├── error.ex
│   ├── error_context.ex
│   ├── error_handler.ex
│   ├── ets_helpers
│   │   └── match_spec_compiler.ex
│   ├── ets_helpers.ex
│   ├── infrastructure
│   │   ├── cache.ex
│   │   └── circuit_breaker.ex
│   ├── jido_config
│   │   └── helpers.ex
│   ├── jido_config.ex
│   ├── performance_monitor.ex
│   ├── protocols
│   │   ├── coordination.ex
│   │   ├── infrastructure.ex
│   │   ├── registry.ex
│   │   └── registry_any.ex
│   ├── resource_manager.ex
│   ├── service_integration
│   │   ├── contract_evolution.ex
│   │   ├── contract_validator.ex
│   │   ├── dependency_manager.ex
│   │   ├── health_checker.ex
│   │   └── signal_coordinator.ex
│   ├── service_integration.ex
│   ├── services
│   │   ├── connection_manager.ex
│   │   ├── rate_limiter.ex
│   │   ├── retry_service.ex
│   │   ├── signal_bus.ex
│   │   └── supervisor.ex
│   ├── task_helper.ex
│   ├── telemetry.ex
│   └── telemetry_handlers.ex
├── foundation.ex
├── jido_foundation
│   ├── bridge.ex
│   ├── examples.ex
│   └── signal_router.ex
├── jido_system
│   ├── actions
│   │   ├── get_performance_metrics.ex
│   │   ├── get_task_status.ex
│   │   ├── pause_processing.ex
│   │   ├── process_task.ex
│   │   ├── queue_task.ex
│   │   ├── resume_processing.ex
│   │   ├── update_error_count.ex
│   │   └── validate_task.ex
│   ├── agents
│   │   ├── coordinator_agent.ex
│   │   ├── foundation_agent.ex
│   │   ├── monitor_agent.ex
│   │   └── task_agent.ex
│   ├── application.ex
│   ├── error_store.ex
│   ├── health_monitor.ex
│   ├── sensors
│   │   ├── agent_performance_sensor.ex
│   │   └── system_health_sensor.ex
│   ├── skills
│   └── workflows
├── jido_system.ex
├── mabeam
│   ├── agent_coordination.ex
│   ├── agent_coordination_impl.ex
│   ├── agent_infrastructure.ex
│   ├── agent_infrastructure_impl.ex
│   ├── agent_registry.ex
│   ├── agent_registry_impl.ex
│   ├── application.ex
│   ├── coordination.ex
│   ├── coordination_patterns.ex
│   └── discovery.ex
└── ml_foundation
    ├── agent_patterns.ex
    ├── distributed_optimization.ex
    ├── team_orchestration.ex
    └── variable_primitives.ex

Looks interesting! I would suggest you to pay more attention to an ability to:

  • use it alongside other apps/libs; the current implementation would blow up if both app and some dependency use it
  • use it in the distributed environment (currently it wouldn’t run smoothly in the cluster, AFAICT)
1 Like

Very grateful. The line you noted is from an older prototype. The latest code is in lib. Codebase is disorganized currently given dev stage. Good find, I’ll keep that in mind going forward.

Distribution is on the horizon. In the weeds still. Thanks for bringing that up. Makes much more sense to build in distribution initially. I was hoping to add distribution later, but perhaps that isn’t a good idea.

1 Like

I’ve been meaning to return to this for three weeks now.

FYI: it would really help if you include a short example for each of the features of the library. My brain is finding some patterns here that I might have use for but very unwilling to do open-ended plays for the moment.

1 Like

Indeed, I have had the same inclination. That’s absolutely on the horizon, but the codebase is still in poor shape with all sorts of basic OTP design issues. Fleshing them out a bit at a time.

Thanks for the heads up. Will get the code stable with examples. Edit: Ahh - key point - short examples for each feature.

1 Like

Yes, please, if you don’t mind. I used an LLM to summarize my upcoming SQLite NIF Rust library (well, it’s not upcoming anymore, it’s actually ready but I have to fight with Docker and Earthly to remove a weird out-of-memory problem during tests… on machines with 7GB RAM that have 58% memory free at the time of the failures :roll_eyes:) and you can relatively quickly find if what you need is there…

…and I still think my README is too big.

In general what I would try to do for my library / libraries and what I want from any that I stumble upon is: make it crystal clear to any insanely busy programmer why would they want to use your thing.