File.read/1 content - is it possible to convert this into human readable string?

supermasno · February 19, 2021, 2:37pm

Hi, I’m using File.read/1 which returns: <<4, 0, 0, 0, 0, 0, 0, 0, 97, 51, 56, 101, 53, 101, 101, 97, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>> (Truncated version)

Is it possible to convert this into human readable string?

SOLVED:

File.stream!(“a38e5eea.dbt”) |> Stream.into(File.stream!(“output.txt”)) |> Stream.run

{:ok, file} = File.read(“output.txt”)

String.replace(file, ~r/[^\x9\xA\xD\x20-\x7F]/, “”)

al2o3cr · February 19, 2021, 2:41pm

That’s a lot of 0 - this doesn’t look like a file that’s supposed to be human-readable. What kind of file is it?

supermasno · February 19, 2021, 2:42pm

It’s a DBF file. Here it is:

supermasno · February 19, 2021, 2:44pm

Here is what I’m trying to replicate from out legacy PHP app:

        {
          $dbt_data = file_get_contents($full_path);
          $dbt_data = trim(clean_string($dbt_data));
          if($dbt_data)
            $ems_data['dbt']['DBT_A_CONTENT'] = $dbt_data;
        }

if ( ! function_exists('clean_string'))
{
  function clean_string ($string) {
      $table = array(
          'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
          'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
          'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
          'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
          'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
          'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
          'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
          'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', '`'=>"'", "’"=>"'", '„'=>',', "‘"=>"'", "’"=>"'", "´"=>"'", '“'=>'"',
          "”"=>'"'
      );

      return preg_replace("/[^\x9\xA\xD\x20-\x7F]/", "",strtr($string, $table));
  }
}

al2o3cr · February 19, 2021, 3:05pm

String.replace is a good place to start.

For instance, that last line is responsible for removing all the 0s, and translates to Elixir as:

String.replace(input, ~r/[^\x9\xA\xD\x20-\x7F]/, "")

You may also need to consider character encodings, as the file may not be UTF-8.

supermasno · February 19, 2021, 3:11pm

Thank you kind sir. Here is what looks like it’s working.

File.stream!("a38e5eea.dbt") |> Stream.into(File.stream!("output.txt")) |> Stream.run

{:ok, file} = File.read("output.txt")

String.replace(file, ~r/[^\x9\xA\xD\x20-\x7F]/, "")

"a38e5eea0 Instruction to Estimator Instructions:  Estimate and photos, CCC if total loss \nDamage:  Drivers side front and rear door \n\nDO NOT RELEASE THE ESTIMATE. Please submit quality photos and a detailed estimate of repairs for the damage to this vehicle. Provide repair days and add time for weekends. (1 day per 5 labor hours) \nPlease provide a photo of the vehicle registration if possible. If this is a total  loss, please submit a CCC valuation with the correct vehicle options and conditioning. Please document relevant unrelated damages. Thank You.IVD HIT OV WHILE EXITING A PARKING LOT"

dimitarvp · February 19, 2021, 6:29pm

How big are your .dbf files? Your solution can be very slow if they go beyond a few dozen kilobytes. I see the one you have uploaded is very small – only 512 bytes – but what about others you will process in the future?

Also, just to be crystal clear: what’s the goal? To delete every single byte that’s 09, 0A, 0D and everything between 20 to 7F?