BEAM Devs: Asking for Feedback on the Software Architecture Draft

In my first two UK roles, software architecture always included a failover system, an independent, exact copy of production running in another cloud or on-premises data center. This differs from redundancy within the same provider.

In this approach, the switch from production to the failover happens by manually switching the IP for the server in the DNS, which has a very short TTL. In the case the cloud provider is having an outage/issues or a catastrophic production incident that is not easy to solve immediately or roll back effortlessly, we can switch the DNS and use the failover system, or having clients switch automatically to the failover when production doesn’t respond after a certain timeout.

:warning: Fail-over not the same as Blue-Green Deployments. While a blue-green deployment gradually replaces an older version, failover runs continuously alongside production. Ideally, both strategies should be used together when possible.

In my second role, we also implemented a request duplicator. This tool allowed stress testing of new releases by amplifying live requests (e.g., x2, x4) to find breaking points. It also helped validate major architecture changes before going live by running them in parallel with production.

The request duplicator only relied on production responses but on my case it could be coded to consider the first response from production or failover. For strong consistency guarantees, it could wait for both before returning a response, backed by a TTL and a request failure-handling strategy.

:wrench: Key Consideration: Applications using this approach must ensure side effects (e.g., emails, billing) only occur in production. A flag-based system is required to enforce this.

Bear in mind that I wasn’t in the DevOps team, nor did I have input on the architecture. Thus, the diagram is trying to reflect what I was aware of and can recall.

I am thinking of also using this approach for BEAM Devs, as per the diagram image. However, in my case, I have a CRUD application from the user perspective, whereas in my previous roles, they were read-only for external users and CRUD internally based on background jobs or request metadata collection and analytics.

As with everything in software architecture, it’s about trade-offs. Thus, this will have some, like added complexity to ensure no side effects occur in the non-production systems and to guarantee that both production and failover are in the same state (strong consistency).

So, my challenge is to be able to use the failover and request duplicator approach in conjunction with blue-green deployments and keep strong consistency guarantees for my CRUD application.

I could start with a non-distributed traditional Phoenix app, but I want to use this project as an opportunity to use distribution for real, and to start with a good base for building a very resilient architecture.

What would you do differently?

Feel free to ask any questions.

If this project resonates with you then don’t skip to subscribe now for updates and/or early access at:

1 Like

If I may (and I hope this does not convey as harsh, this is not my intent but English is not my native language, intent is sometimes lost in translation), since you wrote multiple times that you are looking to build your own job with BeamDEVS (and that’s a superb goal), I would focus on finding a market fit first before spending too much architecture tokens.

Paying customers / users are vital to keep you working on this platform. You can always start with a single node. The needs of the various parties that your system wants to engage (recruitment agencies, companies, devs looking for work, and maybe others) will influence your software, and maybe how you architect it, both in terms of application code, and in the roles your nodes take and how they talk to each other.

If you have extensive DevOps experience and infra isn’t consuming too much work tokens for you, for sure, start with your infrastructure design, but otherwise, maybe start with customers or users.

I hope you’ll succeed !

3 Likes

No offence taken at all, and I appreciate that you took the time to give me your feedback.

I know the usual approach is to just dive in and write code, but all projects pay a very high price for doing so, but businesses are so used to this approach that they don’t even notice it and just accept the consequences as doing business as usual. However, I understand that sometimes it cannot be done any other way due to time constraints and speed to market being a very important factor.

While it is important for me to find paying customers as fast as possible, I also need to ensure that I start with a good architectural base to avoid the mess I have seen a lot of projects drown in.

In software development, you can choose the fast lane and build tech debt to ship faster, or you can choose the middle lane to balance speed with quality. Otherwise, new features and bugs will become increasingly difficult to work on as the project grows via the fast lane.

That being said, I will not spend my initial time trying to have a perfect base solution as per the diagram, but I will spend some time ensuring I can build it progressively. I will start with the skeleton and slowly add all the bits required to fully flesh it out.

3 Likes

I cannot agree more, debt is something that must be paid back but is too often left alone and velocity only can decrease…

Often through user research, we discover a huge gap between what we thought we would be building and what actually gets built as we understand users better. At the start of a project this gap is maximal and if everything goes well, our understanding improves and reduces this gap.

2 Likes

I agree completely with user research, and that’s one reason I am sharing my journey to build this project and collect at least 1,000 subscribers for early access. This will help them to understand what I am building for them and for me to align with their feedback and expectations.

When possible, the best approach is to have Event Storming or Event Modelling sessions with users, but this is only practical when building software for a client or for the company we work for.

One thing I have done is to use AI to help me define and flesh out architectures. (As mentioned earlier, I would not want to do this with any of the pay-per-token billing options. So, I only do this with ChatGPT, for which I pay a fixed monthly amount.) With ChatGPT, at least on the web UI, you can create a project and add PDF files that describe in-depth the architecture methodology you wish to use. Then, in the chat, you can upload your current architecture document and have ChatGPT analyze it against the methodology documents and provide you with feedback on it. That may help you to find gaps in your design or where it may deviate from the design standards you wish to use. You can also have it tell you the pros and cons for different design choices so you can decide which way to go. The more clear and concise you can be in your instructions, the less the AI will need to guess and the better it can help you.

This doesn’t replace a detailed review by an experienced dev. But, it can provide you with yet another way to have a review.

Edit: I just expounded on this with a blog post.

1 Like

Thanks for your suggestions and insights. I really appreciate the fact that you took the time to give me an alternative way of validating my ideas.

Until now, my use of AI code assistance has shown me that it only works, with limited output quality, when we follow the path everyone else follows because that’s where the AI training dataset comes from. From the moment you try to do things that aren’t the usual way, it hallucinates and becomes a time sink and a waste.

I may use AI as an advanced code auto-completer, but not as a tool to ask it to code for me, unless I am coding in an unfamiliar language. Otherwise, I will be building tech debt and spaghetti code from day one.

As I said previously, you can choose the fast lane or the middle lane in software development, and with AI, the fast lane is even more problematic than when it was done only by humans to ship faster.

Nowadays, AI is the best way to “kill” your project in the long run because the mess will be so big in a few years that fixing bugs and adding new features to projects will slow down enormously.

2 Likes