Streams for processing a tree of folder and file paths?

I’m looking to learn the File module and was trying to think of a pet project for doing so.

I have a huge dir full of mp3 files that I want to organize.

Given the nature of music Artist → Album → Track.
I figured I would be building a possibly big list of folders and files into memory.
So naturally the first thing that comes to mind is streams.

Is there something like File.ls but for streams?

Or maybe a better question. What kind of limits are there to the number of folders and files that can be made in any one folder on today’s OS systems?

How does File.ls handle large lists in other words?

I’m not sure about that, however I remember reading something about the possibility of implementing a custom file server, there you should be able to run custom logic.

I’m not aware of anything built-in, but there’s erlang-dirent:

You’d likely want to wrap this in a module that does a couple things:

  • translates the input from a binary to a charlist

  • uses Stream.resource etc to produce a Stream of matches

  • transforms the elements of the stream back into binaries from charlists

3 Likes

This is great thank you!

Forgive me for the naive question.

If I understand this correctly this lib in short before it gives you a full list of the folders/files it can instead give you the folder/file in question as readdir is reading the thing which I assume is how the filesystem builds the full list in the first place? Thus I could chunck that out via Stream.resource

Here’s a standalone demo:

Mix.install([
  {:dirent, git: "https://github.com/team-telnyx/erlang-dirent.git", branch: "master"}
])

target_path = "."

Stream.resource(
  fn ->
    {:ok, dir_ref} =
      target_path
      |> String.to_charlist()
      |> :dirent.opendir()

    dir_ref
  end,
  fn dir_ref ->
    case :dirent.readdir_type(dir_ref) do
      :finished ->
        {:halt, dir_ref}

      {:error, reason} ->
        {[{:error, reason}], dir_ref}

      {name, type} ->
        {[{List.to_string(name), type}], dir_ref}
    end
  end,
  fn _ -> :ok end # not used because :dirent cleans up on GC
)
|> Stream.each(&IO.inspect/1)
|> Stream.run()

Followup thoughts in no particular order:

  • This uses :dirent.readdir_type since the first question that code that consumes the stream is likely to ask is “is this a directory?”

  • The error handling is somewhat inconsistent; a failure in opendir will crash but a failure when iterating through the results will put {:error, reason} in the output stream. Your application may have different needs.

  • the situation with filenames that can’t be represented in UTF8 is complicated so this code completely ignores it. That may not be sufficient depending on your specific filesystem.

3 Likes

I ran into a little issue compiling to otp 26 and drop backed to 25 and it worked fine.
Something to do with IO something or other at build :person_shrugging:

Any how this is great and works nicely. Thank you.
I can see myself using this along with flow to do the job I’m looking at.