I’m looking to learn the File module and was trying to think of a pet project for doing so.
I have a huge dir full of mp3 files that I want to organize.
Given the nature of music Artist → Album → Track.
I figured I would be building a possibly big list of folders and files into memory.
So naturally the first thing that comes to mind is streams.
Is there something like File.ls but for streams?
Or maybe a better question. What kind of limits are there to the number of folders and files that can be made in any one folder on today’s OS systems?
How does File.ls handle large lists in other words?
I’m not sure about that, however I remember reading something about the possibility of implementing a custom file server, there you should be able to run custom logic.
If I understand this correctly this lib in short before it gives you a full list of the folders/files it can instead give you the folder/file in question as readdir is reading the thing which I assume is how the filesystem builds the full list in the first place? Thus I could chunck that out via Stream.resource
Mix.install([
{:dirent, git: "https://github.com/team-telnyx/erlang-dirent.git", branch: "master"}
])
target_path = "."
Stream.resource(
fn ->
{:ok, dir_ref} =
target_path
|> String.to_charlist()
|> :dirent.opendir()
dir_ref
end,
fn dir_ref ->
case :dirent.readdir_type(dir_ref) do
:finished ->
{:halt, dir_ref}
{:error, reason} ->
{[{:error, reason}], dir_ref}
{name, type} ->
{[{List.to_string(name), type}], dir_ref}
end
end,
fn _ -> :ok end # not used because :dirent cleans up on GC
)
|> Stream.each(&IO.inspect/1)
|> Stream.run()
Followup thoughts in no particular order:
This uses :dirent.readdir_type since the first question that code that consumes the stream is likely to ask is “is this a directory?”
The error handling is somewhat inconsistent; a failure in opendir will crash but a failure when iterating through the results will put {:error, reason} in the output stream. Your application may have different needs.
the situation with filenames that can’t be represented in UTF8 is complicated so this code completely ignores it. That may not be sufficient depending on your specific filesystem.