Apple silicon and cross-platform Docker fails Minikube

jgallinari · January 1, 2024, 7:17am

Hello,
I made the switch recently from an x86 MacBook to Apple silicon (M3) and it seems to fail my Docker/Minikube workflow that perfectly worked under my old machine.

My app is a Phoenix app that is started in Minikube with a sidecar container (same pod) doing the migrations before the real app is started.
I’m using Elixir v1.15.7 with OTP v26.

FYI, the migrations are run in Dockerfile via

command: ['/home/elixir/app/bin/oss', 'eval', 'Oss.Migrator.migrate']

First thing, I found out there is no way to Docker build the app unless I add this ERL flag in the Dockerfile (as pointed out by Mix deps.get memory explosion when doing cross-platform Docker build]):

ENV ERL_FLAGS="+JPperf true"

or

ENV ERL_FLAGS="+JMsingle true"

I also had to use docker build --platform linux/amd64 to make it work in Minikube.

Then the rest of it:

Run Docker Desktop.
Start Minikube with

minikube start --extra-config=apiserver.service-node-port-range=1-50000 --memory=6g --cpus=4 --kubernetes-version=v1.22.15 --cni=calico --driver=docker

Docker build
Run the app in Minikube (via terraform apply), which causes this weird error while doing the migrations:

** (ArgumentError) could not call Module.put_attribute/3 because the module Oss.Repo.Migrations.CreatePopsTable is already compiled                          │
│     (elixir 1.15.7) lib/module.ex:2310: Module.assert_not_readonly!/2                                                                                        │
│     (elixir 1.15.7) lib/module.ex:2007: Module.__put_attribute__/5                                                                                           │
│     lib/oss-1.14.0/priv/repo/migrations/20210000000300_create_pois_table.exs:2: (module)

This error makes absolutely no sense to me, especially because it does not occur with my x86 MacBook.

If you think of anything I could try to sort this out, I’d be grateful for that .

Tyson · January 3, 2024, 12:41am

I can’t speak to your specific migration error, but I’m surprised that you need to target x86 to build/run a Docker image in your local environment.

I’ve had to wrestle with cross-arch Elixir+Docker quirks when building images on my M1 Macbook to run on x86 hosts, but have had no issues building ARM images to run locally on my Macbook.

Tho I’m also using the k8s “cluster” built into Docker Desktop instead of minikube.

jgallinari · January 3, 2024, 10:07am

You’re absolutely right, I actually don’t need to build my Docker image with --platform linux/amd64.
I can now reproduce the same error with an arm64 image .

jhogberg · January 3, 2024, 10:47am

Cross-architecture support in docker is iffy at best since it uses QEMU in user-mode emulation mode, which:

Cannot handle JIT’ed code gracefully in all cases, as it is unaware of dual-mapped pages and cannot detect that code mapped that way has been modified. The +JMsingle true flag (implied by +JPperf true) works around the most egregious bugs, but not all.
Keeps multi-threaded code generation enabled even when this is known not to work due to differences in the target and host memory orders. Emulating ARM (weakly ordered) on x86 (strongly ordered) is fine, but the opposite is a recipe for disaster.

In short, don’t use cross-architecture docker unless you want to spice up your life with super-mysterious bugs.

Does the problem happen outside of a container? Does it work when disabling the JIT (--disable-jit configure flag)?

Tyson · January 3, 2024, 2:19pm

Seems like other folks who have run into this were inadvertently using erlang/elixir installations (or macOS system libs) originally compiled for x86 and/or have Rosetta emulation enabled in their terminal. When you switched to your M3 Macbook, did you use Apple’s migration tool to move things over from your Intel machine? Could be some lingering x86 packages from that.

I believe @jhogberg is referring to using the --disable-jit flag when configuring the Erlang build, which I have had good luck with when building cross-arch, but if you’re now using an ARM Docker image, I don’t think that’s the issue here.

[edit: I believe Rosetta emulation only comes into play while installing elixir/erlang packages, as it may result in x86 versions being installed rather than native ARM]

[edit: my suggestions about macOS sys libs is probably irrelevant to your Docker build ]

jgallinari · January 3, 2024, 6:48pm

Not sure where to add --disable-jit in my Dockerfile, I tried

ENV KERL_CONFIGURE_OPTIONS="--disable-jit"

with not much effect in the error happening.

To answer your questions:

No, the problem does not happen outside of a container.
And no, I did not use Apple’s migration tool because I feared it could cause some x86 vs arm issues

In addition, I also did some trials by replacing Docker Desktop with Colima run like this:

colima start --memory 6 --cpu 4 --dns 100.100.100.100 --arch aarch64 --vm-type=vz --vz-rosetta

which required adding the --base-image gcr.io/k8s-minikube/kicbase:v0.0.40 option to minikube start to make it work, like reported in Starting Minikube messes up binfmt config for Docker, making it unable to run cross-architecture images · Issue #17700 · kubernetes/minikube · GitHub because Minikube v1.32.0 which is based on kicbase:v0.0.42 causes issues with Colima (not only OrbStack as mentioned in the Minikube issue).

That to say that since I use the Apple Virtualization Framework + Rosetta enabled in Colima, QEMU is not in use in my workflow.
But using Colima stills results in the exact same error

Tyson · January 3, 2024, 8:57pm

I’m still struggling with this part:

I’ve never needed to do any extra config to build ARM images to run on my local ARM machine. I’ve only had to mess with that while building on ARM for x86 platform. Are you sure the Elixir base image you’re using is also ARM? For instance:

$ docker run --rm elixir:1.15-alpine uname -m
aarch64

But in case this is helpful (and maybe there’s a more elegant way to do this), you can copy the Erlang Dockerfile and update it to read:

...
    && ./configure --build="$gnuArch" --disable-jit \
...

…then build the Erlang image, then create a local copy of the Elixir Dockerfile to build FROM ... that local Erlang image in order to create your own JIT-disabled Elixir base image.

Tyson · January 3, 2024, 9:10pm

One more thing I’ll add: in some circumstances I’ve found it necessary to specify the platform in the Dockerfile (rather than with the docker build flag):

FROM --platform=linux/arm64 <IMAGE_NAME>

Not sure why that sometimes makes a difference.

jgallinari · January 3, 2024, 9:41pm

That’s the output I get indeed.

If I don’t use

ENV ERL_FLAGS="+JMsingle true"

in the Dockerfile, then

if I use Docker Desktop, I get

 => ERROR [build  5/10] RUN mix deps.get --only prod                                                                                                                                                             1.0s
------
 > [build  5/10] RUN mix deps.get --only prod:
1.000
1.000 21:33:37.562 [notice] Application ssl exited: exited in: :ssl_app.start(:normal, [])
1.000     ** (EXIT) an exception was raised:
1.000         ** (MatchError) no match of right hand side value: []
1.000             (elixir 1.15.7) src/elixir_def.erl:134: :elixir_def.store_definition/3
1.000             (ssl 11.0.3) ssl_app.erl:47: :ssl_app.start_logger/0
1.000             (ssl 11.0.3) ssl_app.erl:32: :ssl_app.start/2
1.000             (kernel 9.1) application_master.erl:293: :application_master.start_it_old/4

and if I use Colima, it seemingly hangs forever on the same step.

And this I get while calling docker build without specifying a plaftorm, but using FROM --platform=linux/amd64 <custom-elixir-alpine-image>

Tyson · January 3, 2024, 9:58pm

Why specify AMD platform for something you’re going to run locally on ARM?

Tyson · January 3, 2024, 10:07pm

This doesn’t really sound like an issue with the architecture. Sounds like :ssl_app.start(:normal, []) is not a valid invocation with empty list.

[edit: but I guess you’re not calling that, so could be some cross-arch fallout]

jgallinari · January 3, 2024, 10:28pm

If I use FROM --platform=linux/arm64 ... instead of FROM --platform=linux/amd64 then the container fails with

rosetta error: failed to open elf at /lib/ld-musl-x86_64.so.1

Indeed, I’m not.

Tyson · January 3, 2024, 11:03pm

Okay, this feels like the real problem. Is it possible that there is a base image further down the stack which is x86? I believe this can come up when a base image doesn’t offer an ARM version.

jgallinari · January 4, 2024, 3:44pm

Yes, the hexpm/elixir base image that I was using is x86 based.
Once I could try it with the linux/arm64 version of the image, the problem was gone, can you believe it?

Thanks a lot, I really appreciate your support!

[edit: and no more need for the ERL_FLAGS, as you said]

dimitarvp · January 4, 2024, 3:56pm

So what’s the takeaway? Don’t emulate x86 on ARM?

Tyson · January 4, 2024, 4:13pm

I’ve had good luck building/running x86 images from ARM host by configuring the base Erlang image to --disable-jit.

I haven’t yet tried the ERL_FLAGS approach that was giving jgallinari trouble here.