Gist on my late Nerves activities - expand partition, burn .fw on another machine, scp issues, mTLS emqtt

tomazbracic · November 10, 2024, 12:27pm

Hi,

last week I worked with Nerves through the week and come to some interesting problems that I solved with the help of @lawik, @lostkobrakai, @damirados, @amclain and @fhunleth . I really appreciate your help! @lawik suggested me to share my findings here as well so others can benefit from it if needed.

1st problem

It started with… I have again this issue with CM4. When I boot a firmware on it, I can’t really see that FS extended after the boot. I have physical ~32Gb, but when I check I can see something like this

iex(20)> cmd("df -h")
Filesystem                Size      Used Available Use% Mounted on
/dev/root                35.3M     35.3M         0 100% /
devtmpfs                  1.0M         0      1.0M   0% /dev
tmpfs                   368.9M     16.0K    368.8M   0% /tmp
tmpfs                   184.4M     16.0K    184.4M   0% /run
/dev/mmcblk0p1           18.7M     14.4M      4.3M  77% /boot
/dev/mmcblk0p3          510.0M     64.1M    445.9M  13% /root
tmpfs                     1.0M         0      1.0M   0% /sys/fs/cgroup
0
iex(21)> cmd("ls -l /data")
lrwxrwxrwx    1 root     root             4 Nov  5 08:40 /data -> root

while on a regular Zero 2w for instance… where I have SD Cards… I have like

iex(2)> cmd("df -h")
Filesystem                Size      Used Available Use% Mounted on
/dev/root                27.9M     27.9M         0 100% /
devtmpfs                  1.0M         0      1.0M   0% /dev
tmpfs                    30.6M      8.0K     30.6M   0% /tmp
tmpfs                    15.3M      4.0K     15.3M   0% /run
/dev/mmcblk0p1           18.7M      6.6M     12.1M  35% /boot
/dev/mmcblk0p3           28.8G     92.0K     27.3G   0% /root
tmpfs                     1.0M         0      1.0M   0% /sys/fs/cgroup

And I had this issue before (Flashing Nerves to CM4's eMMC directly - wrong size partitions?).

Now the whole problem was in my approach how to get firmware image to my device. In both cases I had a devices (with Rpi CM4) where I had to put a device into “boot” mode with a switch. So storage was eMMC not a clasic SD card. So for some reason I started to complicate things. TLDR and fast forward … I converted my .fw file to .img and I used BalenaEtcher to burn my .img onto the device. And in order to do that I had to use rpiboot tool. It actually worked great, BUT… once I logged into the system I was missing all my space. Why there was no space on a device? I had a CM4 with 32Gb, where you can see I had listed only around 500Mb of it.

After some research and great help from Slack, I had this problem because I converted to .img and didn’t use my fwup utility. If you check your fwup.conf you will find in there all sorts of directives. One of them is something like this

   partition 2 {
        block-offset = ${APP_PART_OFFSET}
        block-count = ${APP_PART_COUNT}
        type = 0x83 # Linux
        expand = true
    }

I switched back to my fwup and everything became as it should. What I didn’t know is you can use firmware built on another (build machine in my case) and use fwup on my developer laptop without any project files present. So I just copied myfirmware.fw to my macbook and did fwup myfirmware.fw with rpiboot running and it worked great. Partition expanded and I had all the space.

2nd problem

How do you create a folder on /root partition over ssh? I can do `cmd(“mkdir -p /root/test”) if previously connected to nerves device. But if I try to execute command over ssh it doesn’t. I get this

**Error** ** (CompileError) nofile: cannot compile file (errors have been logged)
    (elixir 1.17.2) src/elixir.erl:455: :elixir.quoted_to_erl/4
    (elixir 1.17.2) src/elixir.erl:332: :elixir.eval_forms/4
    (elixir 1.17.2) lib/module/parallel_checker.ex:112: Module.ParallelChecker.verify/1
    (elixir 1.17.2) lib/code.ex:572: Code.validated_eval_string/3
    (nerves_ssh 1.0.0) lib/nerves_ssh/exec.ex:23: NervesSSH.Exec.run/1
    (ssh 5.2.1) ssh_cli.erl:828: anonymous fn/4 in :ssh_cli.exec_in_self_group/5
Failed to create /root/certificates/ directory on the device.

@fhunleth suggested I do ssh nerves.local 'File.mkdir_p("/root/test")' and that actually worked great.

Another suggestion was to do it like this ssh nerves@192.168.1.195 'use Toolshed;cmd("mkdir -p /root/test/ka1")

3rd problem

I needed my :emqtt client to be set and using mTLS which means client and server certificates, some ca-chain certs, etc. My EMQX broker seats behind Nginx which terminates mTLS on cloud side. Did all sorts of configuration variants since I didn’t find much help on official web page. But at the end this combination worked for me.

 config :myapp, :emqtt,
   host: "myprefix.mydomain.com",
   port: 8883,
   clientid: "nemo_gw_1",
   clean_start: false,
   ssl: true,
   ssl_opts: [
     cacertfile: ~c"/root/certificates/ca-chain.cert.pem",
     certfile: ~c"/root/certificates/client-nemo_gw_1.cert.pem",
     keyfile: ~c"/root/certificates/client-nemo_gw_1.key.pem",
     tls_versions: [:"tlsv1.2", :"tlsv1.3"],
     verify: :verify_peer,
     server_name_indication: ~c"myprefix.mydomain.com"
   ],
   name: :emqtt,
   reconnect: true,
   reconnect_interval: 10000

Key important points. You need to use ~c where you see in my configuration. Related to how Erlang SSL works. I had to have server_name_indication.
This now works great as well.

I came up with nice setup, working reComputer with LTE modem. I am using custom image for that, I set up my own CA and created client and server + intermediate and root certificates. Things are now configured and connected and they work great. We created a basic UI where we detect devices online, we can request device informations from cloud service, we can upload Firmwares to cloud where they wait for devices to pull it down. Device(s) get a mqtt message that there is a new firmware waiting for them, once they get it, they download it to /data partition and do on-device upgrade and switch partition, and report back its current status.

Still waiting to be develop. Further on we’ll put on a device c37 parser which will parse data from local (same LAN) PMU (Phaser Measurement Units/Devices). I am developing local processing flow for both 1Hz and 50Hz streams. Some other logic.

So some big steps behind and still some to come, for sure

Thanks again for all the help!!

Tomaz Bracic

tomazbracic · November 10, 2024, 12:38pm

4th problem

For my provisioning I needed to copy my key, cert and ca-chain files to a device. It was a really strange behaviour. So I have a provisioning Bash script which includes some scp one liners to transfer the files to a device.

This is the output that I got.

Enter the IP address of the device: 192.168.1.231
ca-chain.cert.pem                                                                                                                            100% 4317   424.2KB/s   00:00    
client-nemo_gw_1.cert.pem                                                                                                                    100% 1862   220.6KB/s   00:00    
client-nemo_gw_1.key.pem                                                                                                                     100% 1704   188.1KB/s   00:00    
provisioning.env                                                                                                                             100%  284    69.9KB/s   00:00    
Provisioning completed.

But on my device I got this.

iex(1)> cmd("ls -l /root/certificates")
-rw-rw-r--    1 root     root          1704 Nov  8 13:01 client-nemo_gw_1.key.pem
-rw-rw-r--    1 root     root          1862 Nov  8 13:01 client-nemo_gw_1.cert.pem
-rw-rw-r--    1 root     root           221 Nov  8 13:13 ca-chain.cert.pem

So original ca-chain.cert.pem was like 4317, but copied onto the device size was 221.
File was a bit bigger then the rest of them and it looks like it was cut off. First I thought the issue is that bash script finished to quickly and didn’t “wait to fully copy files”, but then when I tried and use scp without script the result was the same.

I was stuck for an hour for sure, then again @lawik suggested I try and use sftp instead since scp is less reliable. I didn’t know that before. I thought scp is up to every task you throw at it. I was wrong.

Using sftp solved my problem.

lawik · November 11, 2024, 12:04pm

The sftp/scp thing is related to how much of the respective protocol is supported and how compatible it is in the Nerves+Erlang version of the server. They should both be fine for most SSH servers I imagine. The one we run supports a subset. Seem to recall Connor having built that out?