Generating fake files

blksheephw · November 6, 2017, 5:35pm

Hi,

I’m practicing some basic CS search stuff with Elixir so I want to create a few thousand .txt files with some random information inside them.
What would you guys recommend for generating a whole bunch of dummy files, and also for creating the dummy info inside of them? (just a bunch of random numbers or names would work fine)

I found the Faker library for generating fake data but I want to be able to generate these dummy files from a .exs script file, not a mix project. Should i still use Faker for that?

Thanks

NobbZ · November 6, 2017, 9:19pm

One of my favorite answers applies here

The best way to generate such fake data depends on your requirements of the fake data…

If you just need simple numbers, you can generate them easily. But yes, for everything beyond that, I’d probably go for Faker or a similar package.

Add that package to dev profile, create a folder tasks and change your mix file a bit:

def project do
  [
    …,
    elixirc_paths: elixirc_paths(Mix.env)
  ]
end

defp elixirc_paths(:test), do: ["test/support"] ++ elixirc_paths()
defp elixirc_paths(:dev),  do: ["tasks"] ++ elixirc_paths()
defp elixirc_paths(_),     do: elixirc_paths()

defp elixirc_paths(), do ["lib"]

Implement the generator inside of tasks. You can call your other stuff as necessary (but not vice versa!). Following this strategy, you could easily generate new fake data when necessary, also you do not need to put the data itself in the repository but carry around the info how to create.

The mix task could even take a seed value as argument…

OvermindDL1 · November 6, 2017, 9:24pm

You could potentially even generate fake data via the new StreamData library too, that way it is quite unique and you can test tons of different cases, you just have to figure out how to format it right.

NobbZ · November 6, 2017, 9:30pm

Well, maybe even combine Faker and StreamData? I’m not sure if the API of Faker allows it to use StreamData for seeding, though…

But StreamData generates some data that suites the type, but Faker does actually produce data that is also semantically correct. In a testfile that holds Address data I prefer to have data which looks like real street- and city names over some garbage strings.

blksheephw · November 6, 2017, 10:04pm

Thanks for the answers!

I’m practicing working with processes so I thought I’d generate a whole bunch of files and have some sort of info in each of the files, like a “name” or “ID”.
Then I’d spawn a process for each file and do a search for an “ID” or “name”.
My goal is to run this search with a whole bunch of processes (concurrently?) to see how fast it finishes and then have it return all of the files in which the search criteria was met.

I realize this is a different topic than the original question but I’m trying hard to learn about processes and concurrency so i’d be interested to know if you guys think this is a useful exercise or if there’s a better way to learn this stuff.

I will try your suggestions, thanks again!