Error when using `:os.cmd`: "not a list of characters" Panic : (kernel 8.4.2) os.erl:434

NaN · October 18, 2022, 5:21am

I keep getting the following error.

  * 1st argument: not a list of characters

    (kernel 8.4.2) os.erl:434: :os.cmd('wc -L ./lib/parser/jobs/1666070156816596389_970650/log_dir/0023')

iex(1)> is_list('wc -L ./lib/parser/jobs/1666070156816596389_970650/log_dir/0023')
true

it originates from here :

"wc -L #{path}"
      |> :binary.bin_to_list()
      |> :os.cmd()

any ideas???

Sebb · October 18, 2022, 7:03am

thats odd (but not a kernel panic).

what happens when you do:

iex()> "ls" |> :binary.bin_to_list |> :os.cmd()

What happens when you use System.cmd instead?

NaN · October 18, 2022, 3:38pm

** (stop) :emfile
    :erlang.open_port({:spawn, 'sh -c "wc -m ./jobs/1666106990098114339_7600/log_dir/0124"'}, [{:parallelism, true}, :use_stdio, :exit_status, :binary, :hide])

from

 System.shell("wc -m #{path}", parallelism: true)
      |> elem(0)

and

** (stop) :emfile
    :erlang.open_port({:spawn_executable, '/usr/bin/wc'}, [{:parallelism, true}, :use_stdio, :exit_status, :binary, :hide, {:args, ["-m", "./jobs/1666107183821118452_212556/log_dir/0065"]}])

from

System.cmd("wc", ["-m", path], parallelism: true)
      |> elem(0)

Some thing to bear in mind is Im calling this 1000s of times in a few seconds. Whats odd is :os.cmd() intermittently had issues using String.to_charlist() and Kernal.to_charlist() so I switched to :binary.bin_to_list() which was more stable, no errors, till last night.

NaN · October 18, 2022, 4:00pm

user@terminal:~/path/$ cat /proc/sys/fs/file-nr
21728	0	9223372036854775807

NaN · October 18, 2022, 4:02pm

System.cmd && System.shell both also:

/usr/bin/wc: write error
erl_child_setup: failed with error 32 on line 281

al2o3cr · October 18, 2022, 6:18pm

Like the joke about the patient and the doctor says, have you considered… not doing that? There are other ways to check the size of a file that don’t involve starting up an sh for every individual file.

My suspicion is that :os.cmd is swallowing these specific errors, but it’s hard to say for sure.

That shows you’ve got file-max pumped way up, but what do ulimit -n and ulimit -n -H show? file-max is a system-wide limit, but individual processes like the BEAM are going to hit the ulimit value first.

What kind of “issues”? I don’t see any immediately relevant differences between those functions and bin_to_list, other than the latter two both respecting Unicode:

iex(1)> :binary.bin_to_list("Zoë")
[90, 111, 195, 171]

iex(2)> String.to_charlist("Zoë")
[90, 111, 235]

iex(3)> to_charlist("Zoë")
[90, 111, 235]

but the binaries in your example don’t seem to have any Unicode characters anywhere…

NaN · October 18, 2022, 10:25pm

Ive tried other ways to determine max line length:
file = File.read!(file_path)

 def find_longest(list, greatest \\ 0, n \\ 1) do
    cond do
      n <= list |> length ->
        len = :lists.nth(n, list) |> String.length()

        new =
          cond do
            greatest > len -> greatest
            true -> len
          end

        find_longest(list, new, n + 1)

      true ->
        greatest
    end
  end

longest_line = file |> :binary.split("\n", [:global]) |> find_longest()

and file length : file_len = file |> String.length()

but they are not reliable there is often, not always, some unknown deviation from the correct number - the number wc is returning. Its breaking the parser.

simply they would cause the error

with greater frequency

ulimit -n -H
1048576

ulimit -n
1024

fmn · October 18, 2022, 11:06pm

very crude, but how about this?

iex> file = "parallel_compiler.ex"
iex> for line <- File.stream!(file, [encoding: :utf8], :line), reduce: {0, 0, 1} do
...>   {index, length, count} -> String.length(line) > length
          && {count, String.length(line), count + 1}
          || {index, length, count + 1}
...> end
{177, 99, 731}

$ sed -n -e '177p' parallel_compiler.ex | wc -m
99

dimitarvp · October 18, 2022, 11:12pm

EDIT: Ah, I misunderstood and just made a function to return the file with most lines. Still leaving it in and will make a next post with the actual solution.

I too would strongly advise against just invoking external programs. Here’s how I’d naively approach your problem if I only had 10 minutes to get it done:

defmodule Files do
  def count_lines(file_path) do
    file_path
    |> File.stream!()
    |> Enum.reduce(0, fn _line, lines -> lines + 1 end)
  end

  def file_with_most_lines(dir_path) do
    dir_path
    |> File.ls!()
    |> Enum.filter(&(not File.dir?(&1)))
    |> Enum.max_by(&count_lines/1)
  end
end

Then just invoke e.g. Files.file_with_most_lines(".")

Et voila.

dimitarvp · October 18, 2022, 11:38pm

Sorry that my previous post operated under misunderstood requirements.

Here’s the version that only returns a full file path of the file with the longest line inside a directory (not recursive) and said longest line length:

defmodule Files do
  def longest_line(file_path) do
    file_path
    |> File.stream!()
    |> Stream.map(&String.length/1)
    |> Enum.max()
  end

  def file_with_longest_line(dir_path) do
    dir_path
    |> File.ls!()
    |> Stream.filter(&(not File.dir?(&1)))
    |> Task.async_stream(
      fn filename ->
        file_path = Path.join(dir_path, filename)
        {file_path, longest_line(file_path)}
      end,
      ordered: false,
      max_concurrency: 5
    )
    |> Stream.map(fn {:ok, result} -> result end)
    |> Enum.max_by(fn {_file_path, longest_line} -> longest_line end)
  end
end

Basically, "~/data" |> Path.expand() |> Files.file_with_longest_line() will give you a tuple with a file path and the line length of the file with the longest line of all in the directory. It’s NOT a recursive directory check; only checks files in the specified directory and does not descend downwards.

It’s also parallel – allows up to 5 files to have their lines counted at a time (controlled by the max_concurrency: 5 option).

NaN · October 19, 2022, 12:08am

2 things 1) I already have the file open. 2) heres the output when applying the same method to char count

def char_count(file_path) do
    file_path
    |> File.stream!()
    |> Stream.map(&String.length/1)
    |> Enum.sum()
  end

 {file_len_wc, _} =
      System.shell("wc -m #{file_path}")
      |> elem(0)
      |> Integer.parse()

file_len_fn = file_path |> char_count()

    {file_len_wc, file_len_fn} |> IO.inspect()

heres a sample of the output

{790, 785}
{1285, 1280}
{767, 762}
{898, 893}
{636, 631}
{800, 795}
{720, 715}
{911, 906}
{990, 985}
{773, 768}
{596, 591}
{651, 646}
{906, 901}
{632, 627}
{953, 948}
{1256, 1251}
{1163, 1158}
{1011, 1006}
{734, 729}
{685, 680}
{744, 739}
{639, 634}
{620, 615}
{806, 801}
{649, 644}
{641, 636}
{1078, 1073}
{760, 755}
{801, 796}
{656, 651}
{905, 900}
{649, 644}
{1116, 1111}
{838, 833}
{731, 726}
{611, 606}
{649, 644}
{585, 580}
{744, 739}
{968, 963}
{923, 918}
{620, 615}
{806, 801}
{1205, 1200}
{1124, 1119}
{760, 755}
{603, 598}
{597, 592}
{1197, 1192}
{790, 785}
{585, 580}
{768, 763}
{610, 605}
{962, 957}
{585, 580}
{859, 854}
{575, 570}
{843, 838}
{1139, 1134}
{834, 829}
{1031, 1026}
{968, 963}
{818, 813}
{642, 637}
{552, 547}
{1116, 1111}
{603, 598}
{642, 637}
{1056, 1051}
{576, 571}
{734, 729}
{811, 806}
{790, 785}
{595, 590}
{648, 643}
{717, 712}
{1094, 1089}
{849, 844}
{1162, 1157}
{691, 686}
{698, 693}
{896, 891}
{901, 896}
{1007, 1002}
{768, 763}
{922, 917}
{1081, 1076}
{969, 964}
{1121, 1116}
{825, 820}
{821, 816}
{765, 760}
{955, 950}
{1193, 1188}
{1102, 1097}
{576, 571}
{1275, 1270}
{919, 914}
{1206, 1201}
{699, 694}
{571, 566}
{835, 830}
{600, 595}
{576, 571}
{1086, 1081}
{949, 944}
{585, 580}
{1092, 1087}
{695, 690}
{617, 612}
{715, 710}
{1145, 1140}
{1151, 1146}
{971, 966}
{779, 774}
{638, 633}
{1020, 1015}
{570, 565}
{622, 617}
{968, 963}
{1055, 1050}
{571, 566}
{849, 844}
{1012, 1007}
{1302, 1297}
{576, 571}
{897, 892}
{841, 836}
{1134, 1129}
{1041, 1036}
{620, 615}
{1059, 1054}
{645, 640}
{885, 880}
{576, 571}
{1224, 1219}
{1189, 1184}
{887, 882}
{622, 617}
{1265, 1260}
{1235, 1230}
{993, 988}
{1103, 1098}
{1012, 1007}
{848, 843}
{985, 980}
{921, 916}
{685, 680}
{595, 590}
{1150, 1145}
{1045, 1040}
{515, 510}
{1015, 1010}
{785, 780}
{619, 614}
{609, 604}
{740, 735}
{864, 859}
{619, 614}
{976, 971}
{1134, 1129}
{619, 614}
{907, 902}
{1292, 1287}
{912, 907}
{775, 770}
{870, 865}
{732, 727}
{638, 633}
{647, 642}
{923, 918}
{931, 926}
{751, 746}
{869, 864}
{1082, 1077}
{767, 762}
{586, 581}
{667, 662}
{678, 673}
{617, 612}
{576, 571}
{741, 736}
{875, 870}
{823, 818}
{618, 613}
{798, 793}
{770, 765}
{641, 636}
{818, 813}
{652, 647}
{590, 585}
{1369, 1364}
{805, 800}
{1090, 1085}
{1294, 1289}
{1151, 1146}
{1040, 1035}
{615, 610}
{837, 832}
{1036, 1031}
{707, 702}
{1400, 1395}
{784, 779}
{1252, 1247}
{1110, 1105}
{603, 598}
{1132, 1127}
{585, 580}
{683, 678}
{1035, 1030}
{1120, 1115}
{588, 583}
{618, 613}
{583, 578}
{741, 736}
{801, 796}
{559, 554}
{658, 653}
{1136, 1131}
{636, 631}
{576, 571}
{700, 695}
{881, 876}
{1027, 1022}
{1171, 1166}
{515, 510}
{1080, 1075}
{907, 902}
{1011, 1006}
{835, 830}
{585, 580}
{835, 830}
{595, 590}
{771, 766}
{805, 800}
{724, 719}

Its not reliable

dimitarvp · October 19, 2022, 12:10am

Not sure what you mean.

NaN · October 19, 2022, 12:12am

The char_count fun number deviates from the wc -m count. And it matters. Calling wc works, using the char_count fun breaks thigns

jerdew · October 19, 2022, 12:16am

perhaps this is String.length’s unicode friendliness getting in your way. does &byte_size/1 in its place give the same answer as wc?

dimitarvp · October 19, 2022, 12:17am

Then try something else and not String.length(text), like:

String.codepoints(text) |> length()
String.graphemes(text) |> length()
byte_size(text)

Basically try to replicate what wc -m does. You very likely have all the tools at your disposal already available, you have to find the right one.

NaN · October 19, 2022, 12:34am

That did not work, unfortunately

NaN · October 19, 2022, 12:35am

None of those work, unfortunately

kip · October 19, 2022, 12:40am

wc -m is dependent on the locale you have set, if the file content is other than LATIN1. Have you taken that into account?

fmn · October 19, 2022, 12:49am

Interesting, String.length() counts Unicode graphemes which map to characters almost universally. Is there a locale which causes character/glyph count change?

fmn · October 19, 2022, 12:53am

How about trying to pinpoint and share lines which are a culprit? That could help understanding what’s going on.