Error when using `:os.cmd`: "not a list of characters" Panic : (kernel 8.4.2) os.erl:434

I keep getting the following error.

  * 1st argument: not a list of characters

    (kernel 8.4.2) os.erl:434: :os.cmd('wc -L ./lib/parser/jobs/1666070156816596389_970650/log_dir/0023')
iex(1)> is_list('wc -L ./lib/parser/jobs/1666070156816596389_970650/log_dir/0023')
true

it originates from here :

"wc -L #{path}"
      |> :binary.bin_to_list()
      |> :os.cmd()

any ideas???

thats odd (but not a kernel panic).

what happens when you do:

iex()> "ls" |> :binary.bin_to_list |> :os.cmd()  

What happens when you use System.cmd instead?

** (stop) :emfile
    :erlang.open_port({:spawn, 'sh -c "wc -m ./jobs/1666106990098114339_7600/log_dir/0124"'}, [{:parallelism, true}, :use_stdio, :exit_status, :binary, :hide])

from

 System.shell("wc -m #{path}", parallelism: true)
      |> elem(0)

and

** (stop) :emfile
    :erlang.open_port({:spawn_executable, '/usr/bin/wc'}, [{:parallelism, true}, :use_stdio, :exit_status, :binary, :hide, {:args, ["-m", "./jobs/1666107183821118452_212556/log_dir/0065"]}])

from

System.cmd("wc", ["-m", path], parallelism: true)
      |> elem(0)

Some thing to bear in mind is Im calling this 1000s of times in a few seconds. Whats odd is :os.cmd() intermittently had issues using String.to_charlist() and Kernal.to_charlist() so I switched to :binary.bin_to_list() which was more stable, no errors, till last night.

user@terminal:~/path/$ cat /proc/sys/fs/file-nr
21728	0	9223372036854775807

System.cmd && System.shell both also:

/usr/bin/wc: write error
erl_child_setup: failed with error 32 on line 281

Like the joke about the patient and the doctor says, have you considered… not doing that? There are other ways to check the size of a file that don’t involve starting up an sh for every individual file.

My suspicion is that :os.cmd is swallowing these specific errors, but it’s hard to say for sure.

That shows you’ve got file-max pumped way up, but what do ulimit -n and ulimit -n -H show? file-max is a system-wide limit, but individual processes like the BEAM are going to hit the ulimit value first.

What kind of “issues”? I don’t see any immediately relevant differences between those functions and bin_to_list, other than the latter two both respecting Unicode:

iex(1)> :binary.bin_to_list("Zoë")
[90, 111, 195, 171]

iex(2)> String.to_charlist("Zoë")
[90, 111, 235]

iex(3)> to_charlist("Zoë")
[90, 111, 235]

but the binaries in your example don’t seem to have any Unicode characters anywhere…

2 Likes

Ive tried other ways to determine max line length:
file = File.read!(file_path)

 def find_longest(list, greatest \\ 0, n \\ 1) do
    cond do
      n <= list |> length ->
        len = :lists.nth(n, list) |> String.length()

        new =
          cond do
            greatest > len -> greatest
            true -> len
          end

        find_longest(list, new, n + 1)

      true ->
        greatest
    end
  end

longest_line = file |> :binary.split("\n", [:global]) |> find_longest()

and file length : file_len = file |> String.length()

but they are not reliable there is often, not always, some unknown deviation from the correct number - the number wc is returning. Its breaking the parser.

simply they would cause the error

with greater frequency

ulimit -n -H
1048576

ulimit -n
1024

very crude, but how about this?

iex> file = "parallel_compiler.ex"
iex> for line <- File.stream!(file, [encoding: :utf8], :line), reduce: {0, 0, 1} do
...>   {index, length, count} -> String.length(line) > length
          && {count, String.length(line), count + 1}
          || {index, length, count + 1}
...> end
{177, 99, 731}
$ sed -n -e '177p' parallel_compiler.ex | wc -m
99
1 Like

EDIT: Ah, I misunderstood and just made a function to return the file with most lines. Still leaving it in and will make a next post with the actual solution. :smiley:

I too would strongly advise against just invoking external programs. Here’s how I’d naively approach your problem if I only had 10 minutes to get it done:

defmodule Files do
  def count_lines(file_path) do
    file_path
    |> File.stream!()
    |> Enum.reduce(0, fn _line, lines -> lines + 1 end)
  end

  def file_with_most_lines(dir_path) do
    dir_path
    |> File.ls!()
    |> Enum.filter(&(not File.dir?(&1)))
    |> Enum.max_by(&count_lines/1)
  end
end

Then just invoke e.g. Files.file_with_most_lines(".")

Et voila.

1 Like

Sorry that my previous post operated under misunderstood requirements.

Here’s the version that only returns a full file path of the file with the longest line inside a directory (not recursive) and said longest line length:

defmodule Files do
  def longest_line(file_path) do
    file_path
    |> File.stream!()
    |> Stream.map(&String.length/1)
    |> Enum.max()
  end

  def file_with_longest_line(dir_path) do
    dir_path
    |> File.ls!()
    |> Stream.filter(&(not File.dir?(&1)))
    |> Task.async_stream(
      fn filename ->
        file_path = Path.join(dir_path, filename)
        {file_path, longest_line(file_path)}
      end,
      ordered: false,
      max_concurrency: 5
    )
    |> Stream.map(fn {:ok, result} -> result end)
    |> Enum.max_by(fn {_file_path, longest_line} -> longest_line end)
  end
end

Basically, "~/data" |> Path.expand() |> Files.file_with_longest_line() will give you a tuple with a file path and the line length of the file with the longest line of all in the directory. It’s NOT a recursive directory check; only checks files in the specified directory and does not descend downwards.

It’s also parallel – allows up to 5 files to have their lines counted at a time (controlled by the max_concurrency: 5 option).

2 things 1) I already have the file open. 2) heres the output when applying the same method to char count

def char_count(file_path) do
    file_path
    |> File.stream!()
    |> Stream.map(&String.length/1)
    |> Enum.sum()
  end

 {file_len_wc, _} =
      System.shell("wc -m #{file_path}")
      |> elem(0)
      |> Integer.parse()

file_len_fn = file_path |> char_count()

    {file_len_wc, file_len_fn} |> IO.inspect()

heres a sample of the output

{790, 785}
{1285, 1280}
{767, 762}
{898, 893}
{636, 631}
{800, 795}
{720, 715}
{911, 906}
{990, 985}
{773, 768}
{596, 591}
{651, 646}
{906, 901}
{632, 627}
{953, 948}
{1256, 1251}
{1163, 1158}
{1011, 1006}
{734, 729}
{685, 680}
{744, 739}
{639, 634}
{620, 615}
{806, 801}
{649, 644}
{641, 636}
{1078, 1073}
{760, 755}
{801, 796}
{656, 651}
{905, 900}
{649, 644}
{1116, 1111}
{838, 833}
{731, 726}
{611, 606}
{649, 644}
{585, 580}
{744, 739}
{968, 963}
{923, 918}
{620, 615}
{806, 801}
{1205, 1200}
{1124, 1119}
{760, 755}
{603, 598}
{597, 592}
{1197, 1192}
{790, 785}
{585, 580}
{768, 763}
{610, 605}
{962, 957}
{585, 580}
{859, 854}
{575, 570}
{843, 838}
{1139, 1134}
{834, 829}
{1031, 1026}
{968, 963}
{818, 813}
{642, 637}
{552, 547}
{1116, 1111}
{603, 598}
{642, 637}
{1056, 1051}
{576, 571}
{734, 729}
{811, 806}
{790, 785}
{595, 590}
{648, 643}
{717, 712}
{1094, 1089}
{849, 844}
{1162, 1157}
{691, 686}
{698, 693}
{896, 891}
{901, 896}
{1007, 1002}
{768, 763}
{922, 917}
{1081, 1076}
{969, 964}
{1121, 1116}
{825, 820}
{821, 816}
{765, 760}
{955, 950}
{1193, 1188}
{1102, 1097}
{576, 571}
{1275, 1270}
{919, 914}
{1206, 1201}
{699, 694}
{571, 566}
{835, 830}
{600, 595}
{576, 571}
{1086, 1081}
{949, 944}
{585, 580}
{1092, 1087}
{695, 690}
{617, 612}
{715, 710}
{1145, 1140}
{1151, 1146}
{971, 966}
{779, 774}
{638, 633}
{1020, 1015}
{570, 565}
{622, 617}
{968, 963}
{1055, 1050}
{571, 566}
{849, 844}
{1012, 1007}
{1302, 1297}
{576, 571}
{897, 892}
{841, 836}
{1134, 1129}
{1041, 1036}
{620, 615}
{1059, 1054}
{645, 640}
{885, 880}
{576, 571}
{1224, 1219}
{1189, 1184}
{887, 882}
{622, 617}
{1265, 1260}
{1235, 1230}
{993, 988}
{1103, 1098}
{1012, 1007}
{848, 843}
{985, 980}
{921, 916}
{685, 680}
{595, 590}
{1150, 1145}
{1045, 1040}
{515, 510}
{1015, 1010}
{785, 780}
{619, 614}
{609, 604}
{740, 735}
{864, 859}
{619, 614}
{976, 971}
{1134, 1129}
{619, 614}
{907, 902}
{1292, 1287}
{912, 907}
{775, 770}
{870, 865}
{732, 727}
{638, 633}
{647, 642}
{923, 918}
{931, 926}
{751, 746}
{869, 864}
{1082, 1077}
{767, 762}
{586, 581}
{667, 662}
{678, 673}
{617, 612}
{576, 571}
{741, 736}
{875, 870}
{823, 818}
{618, 613}
{798, 793}
{770, 765}
{641, 636}
{818, 813}
{652, 647}
{590, 585}
{1369, 1364}
{805, 800}
{1090, 1085}
{1294, 1289}
{1151, 1146}
{1040, 1035}
{615, 610}
{837, 832}
{1036, 1031}
{707, 702}
{1400, 1395}
{784, 779}
{1252, 1247}
{1110, 1105}
{603, 598}
{1132, 1127}
{585, 580}
{683, 678}
{1035, 1030}
{1120, 1115}
{588, 583}
{618, 613}
{583, 578}
{741, 736}
{801, 796}
{559, 554}
{658, 653}
{1136, 1131}
{636, 631}
{576, 571}
{700, 695}
{881, 876}
{1027, 1022}
{1171, 1166}
{515, 510}
{1080, 1075}
{907, 902}
{1011, 1006}
{835, 830}
{585, 580}
{835, 830}
{595, 590}
{771, 766}
{805, 800}
{724, 719}

Its not reliable

Not sure what you mean.

The char_count fun number deviates from the wc -m count. And it matters. Calling wc works, using the char_count fun breaks thigns

perhaps this is String.length’s unicode friendliness getting in your way. does &byte_size/1 in its place give the same answer as wc?

Then try something else and not String.length(text), like:

  • String.codepoints(text) |> length()
  • String.graphemes(text) |> length()
  • byte_size(text)

Basically try to replicate what wc -m does. :person_shrugging: You very likely have all the tools at your disposal already available, you have to find the right one.

That did not work, unfortunately :confused:

None of those work, unfortunately :confused:

wc -m is dependent on the locale you have set, if the file content is other than LATIN1. Have you taken that into account?

Interesting, String.length() counts Unicode graphemes which map to characters almost universally. Is there a locale which causes character/glyph count change?

How about trying to pinpoint and share lines which are a culprit? That could help understanding what’s going on.