Reading files / data structures from other platforms into Elixir/Erlang

I’ve built and Elixir application that needs to read files produced by a Legacy Java platform.

Looking at the Java source shows clearly how the files are being constructed:

    HashMap localHashMap = null;
    List localList1 = null;
    ...
    ByteArrayOutputStream localByteArrayOutputStream = null;
    ObjectOutputStream localObjectOutputStream = null;

      localByteArrayOutputStream = new ByteArrayOutputStream();
      localObjectOutputStream = new ObjectOutputStream(localByteArrayOutputStream);
      localObjectOutputStream.flush();
      localHashMap = new HashMap();
      localHashMap.put("SecretKey", someBytes);
      localHashMap.put("EncryptedData", localList1);
      localHashMap.put("IVSpec", someMoreBytes);
      localObjectOutputStream.writeObject(localHashMap);
      arrayOfByte3 = localByteArrayOutputStream.toByteArray();

At the end of the day it’s really just a Java HashMap with known string keys that’s being written to file…

I tried to grok this link below, but i feel there’s got to be a much simpler way to interop common data structures from other platforms

https://medium.com/@mr.anmolsehgal/java-hashmap-internal-implementation-21597e1efec3

Any suggestions please?

Samples:
http://paperlesssolutionsltd.com.ng/sample1.zip
http://paperlesssolutionsltd.com.ng/sample2.zip

Hey @CharlesO, your file.io links do not work for me, I get a 404 for both.

To make sure I understand what you’re trying to do, you’re trying to read these files in Elixir and build maps out of them?

1 Like

sorry, i’ve moved them to my personal server:
http://paperlesssolutionsltd.com.ng/sample1.zip
http://paperlesssolutionsltd.com.ng/sample2.zip

I want to read the content into elixir.
Looking @ the Java source, it’s clear how they are being constructed.

@CharlesO Unzipping those files turns into sample1.zip.cpgz, it does not unzip.

Sure, but it isn’t clear at all what the format of the files are. The java source has nothing to do with the structure of the file.

Oh, those zip files were encrypted.

Decrypting them is not the problem, it is reading the Java Hashmap that’s the issues.

The Java source shows how the file was constructed … if i can read back the parts, I can easily decrypt the files. I have the keys and all.

What does a java hashmap written to a file look like? What is its format?

Wouldn’t it be easier to read them back in a java application and save them as json or an other interop format?

There is no magical solution to read files from an internal java binary format. You could always try to understand and parse the java binary format but that sounds like something that will be a lot of work and possible error prone.

3 Likes

That’s my question exactly. See what the code is doing:

It reads the zipped file, encrypts it, and overwrites the same zipped file name with the encrypted content.

         byte[] arrayOfByte = getByteArray(this.compressedFile);
          arrayOfByte = SecurityService.performEnc(SecMgrConstants.ASYM_ALGORITHM, arrayOfByte);
          localFileOutputStream = new FileOutputStream(this.compressedFile);
          localFileOutputStream.write(arrayOfByte);
          localFileOutputStream.close();

basically this: https://paperlesssolutionsltd.com.ng/RSASecImpl.java.txt

But once Java has written the bits to a physical - binary file, shouldn’t any programing language be able to readback the binary content?

Sure, every language can read it the physical bits, but how do you know what these bits represent? But I guess (because you didn’t specify in your original question) that you need to do something with the files in Elixir, so how do you know what the contents of the file mean?

The information on how the bits are stored are defined by the ObjectOutputStream from java. If you want to read them, you can, and you will have to parse them yourself. Basically implementing an ObjectInputStream in elixir.

So, it would be a lot easier if instead of using the ObjectOutputStream you wrote the files to a json or xml format in java. Then you don’t have to implement the binary parsing yourself.

sadly it’s a third-party legacy app. we just have the files it generates, and parts of it’s source.

so how do you know what the contents of the file mean?

We can see how the files are constructed from the legacy java source.

The content layout is a java Hashmap with binary data written under three string keys written to file.

If i can read back the parts of the hashmap … then i’ve effectively solved this problem.

Java’s FileOutputStream writes raw bytes. What you write is what you get.

https://groups.google.com/d/msg/erlang-programming/eo9HUGagIv4/LjgRLENBQk0J

Would you consider publishing 1-2 unencrypted files with the raw Java HashMaps as their contents?

I’d be interested in trying to reverse-engineer them.

sure, see the links:

http://paperlesssolutionsltd.com.ng/sample1.zip
http://paperlesssolutionsltd.com.ng/sample2.zip

The zip files are encrypted, then the encrypted bits are packed together as a Hashmap. and written to file.

IF i can simply read back the hashmap from the given files, then getting them decrypted is the easier part

Would I be able to read those files without a password or a key right now?

Yes.

We should be able to read the binary content: it is 3 parts:

  localHashMap.put("SecretKey", someBytes);
  localHashMap.put("EncryptedData", localList1);
  localHashMap.put("IVSpec", someMoreBytes);

The sample files are just a java Hashmap written to files saved as “sample1.zip” etc

Can we readback the “sample1.zip” binary and read out the three parts under each of the keys, as described in the sample code

https://paperlesssolutionsltd.com.ng/RSASecImpl.java.txt

That’s the source

We’re having some miscommunication here.

If I open the sample1.zip it looks like those are not a zip format at all? It’s really just binary output but named .zip?

Also, by googling a bit I found: https://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html and this indeed matches the binary format of the sample1.zip file provided.

You could have a look at this site to check how to reverse engineer the binary format. But I still think it would be simpler to write a basic java utility that reads the sample1.zip file and writes it to json/…

4 Likes

correct, they are not zip files at all.

The original zip file was encrypted and put under the key: “EncryptedData”

some aditional data was then put under two aditional keys as follows in a HashMap:

  localHashMap.put("SecretKey", someBytes);
  localHashMap.put("EncryptedData", localList1);
  localHashMap.put("IVSpec", someMoreBytes);

The entire content was then saved as a binary with the name “sample1.zip”

sample1.zip is not a .zip file

I’ve looked at the article @tcoopman linked and while it’s very possible to make a relatively tidy Erlang/Elixir mode that utilizes binary pattern matching, I believe it would be brittle and would surely not cover all cases.

I’d recommend, just as @tcoopman said, you write a small Java utility that reads those serialized HashMaps and dump them in JSON. From then on, every normal language could work with them.

I am usually very interested in inter-operability between languages but in this case you’d be taking the much longer road if you don’t translate the Java serialized objects to JSON using Java itself.

1 Like