patryk.it

patryk.it

File encoding detection library

Hi guys!

My question is related to the file encoding topic. I’m working on a feature that allows users to upload a CSV file that might be in different charset. It’s not only about UTF-8 and I need to detect given encoding and then parse data.

In Golang, Python, Java, C++, Objective-C we have UniversalDetector/chardet libraries. Do we have something similar to the Elixir/Erlang usage? I’m stuck with this.

How are you working with the different charsets/encodings in Elixir applications? I know - that’s not the main problem to which was designed this language, but I won’t use erlports or AWS Lambda only for detecting charset. :frowning:

Most Liked

NobbZ

NobbZ

There is no way to get what you want.

Ask your clients to upload using a specified charset only.

If you see the single byte 0xC4, that could be a latin-1 encoded Ä, though in ISO-8859-5 (kyrillic) it would be as valid as with latin-1 though encoding a different character, the Ф.

Therefore it is impossible to detect the encoding without knowing the content in advance.

What you describe from the other languages, is usually a very dumb heuristic.

  1. Check if the input is valid UTF-8/16/32 by corresponding validators, perhaps even helped by a BOM
  2. If not, fall back to a single byte encoding derived from the hosts local settings
LostKobrakai

LostKobrakai

nimble_csv supports converting data to utf-8 before parsing / converting from utf-8 before dumping. By default it even ships NimbleCSV.Spreadsheet, which uses utf-16 le to work with excel csv’s. But you’ll need to setup a module per encoding.

evadne

evadne

The libMagic route is definitely worth considering. I have a library for that. Low profile, but working.

https://github.com/evadne/gen_magic

Where Next?

Popular in Questions Top

vertexbuffer
Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...
New
Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
qwerescape
Is there a way to get the call stack or stack trace at any point in the code? Not from exceptions, but an expression that returns how the...
New
chrisalley
ExUnit now has describe blocks which is a welcome addition coming from RSpec. In the docs, it states that nested hierarchies of describe ...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
johnnyicon
Hi all, I’ve just started learning Elixir and Phoenix Framework, so please pardon my n00bness at this stage. I’m trying to use Postgres...
New
vac
Hi, I’m quite new in Elixir and I’m trying to format a string to a PEM format. I have the certificate value like MIIDBTCCAe2...... and I...
New
fireproofsocks
Forgive me if this is obvious, but how does one delete a database record WITHOUT selecting it first? Ecto.Repo — Ecto v3.14.0 has exampl...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
ycv005
I have followed this StackOverflow post to install the specific version of Erlang. And When I am running mix ecto.setup then getting fol...
New

Other popular topics Top

sorentwo
Hello! tl;dr Announcing Oban, an Ecto based job processing library with a focus on reliability and historical observability. After spen...
985 42920 311
New
Darmani72
If I have a post route which an argument: post /my_post_route/:my_param1, MyController.my_post_handler How would get the post params ...
New
TunkShif
This post is an instruction guide to help you setup your Neovim for Elixir development from scratch. It includes general information on h...
274 41539 114
New
aesmail
Hello guys, I have finally made it. I created an admin interface for a framework. It’s been on my todo list for years and with the curre...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
electic
Hi, I am new to Elixir. I am trying to use the DateTime component to insert a date into MySQL however the there seems to be no way to fo...
New
chrismccord
This release brings a number of exciting features, including integration with the new Phoenix LiveDashboard and Phoenix LiveView. There h...
New
rms.mrcs
Hi, I need to transform a list of numbers into a map where the keys are the indexes and the values are the original values of the list. ...
New
sergio_101
I am VERY much an elixir newbie. I have taken one elixir course and one phoenix course on Udemy. During that course, I saw the instructor...
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

We're in Beta

About us Mission Statement