Representation of big integer in heap - connection between the header and data words

lykos · August 24, 2023, 7:30pm

Hey all,
I was looking into the representation of big integers as boxed terms in heap. I understood how the header word and data words contain information in bits. But I couldn’t figure out how the header word is connected to the data words. Will the immediate term containing the pointer in stack refer to both header word and the first data word or will the data words be always contiguously stored after the header word. Any help would be appreciated.

Thank you.

garazdawi · August 28, 2023, 10:20am

The data words will always be contiguously stored after the header word. This is the way it works for all boxed terms.

lykos · September 1, 2023, 5:39am

Hello,
I need help in understanding how the big integer bits are stored across multiple data words. I understand that the header word will contain only the arity of the data words and the bits of the big integer will be distributed across the data words which are stored contiguously after the header word.

Let’s say for example in a 32-bit system, the big integer 1_095_233_372_330, whose binary value is
11111111 00000000 11111111 00000000 10101010
is stored across two data words. May I know how these bits will be distributed across the two 32-bit data words.

will it be
00000000 00000000 00000000 11111111
00000000 11111111 00000000 10101010

Or does erlang have some other encoding that stores the total length of the actual data in bits?

Thank you.

al2o3cr · September 1, 2023, 10:05pm

The code stores “digits” the size of a machine word, and keeps the count for those in the upper bits of the header.

Some good places to read in the OTP source:

otp/erts/emulator/beam/erl_term.h: a nice diagram of how tags are used in a term’s header. A bignum’s sign is stored in the tag.
otp/erts/emulator/beam/big.h: some C macros that access different parts of a bignum:
- the sign is in the lowest bit of the header
- the digits all immediately follow the header (the +1 in BIG_V)
- the arity (number of digits) is stored in the header
otp/erts/emulator/beam/big.c: the implementation of most bignum functions. The link goes to I_comp which tells us about how digits are stored
- the first two cases dispose of the easy parts, when one number is more digits than the other
- the final case does the digit-by-digit comparison, starting from the digit farthest from the header as indicated by setting up the loop with x += (xl-1)

One final thing - normally I’d point people towards the BEAM Book which has an epic amount of internal detail, but alas the section on bignum representation is marked TODO

lykos · September 2, 2023, 5:48am

@al2o3cr Thanks for your explanation. I actually found this neat tool that prints out information about each word along with its address and bit information for different datatypes. For the example I just provided above, this is how the bits are represented as per the tool.

11111111 00000000 11111111 00000000 10101010 - positive big integer 1_095_233_372_330 in 32 bit system.

xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx10 boxed term with pointer to header in heap

00000000_00000000_00000000_10001000 header with arity 10(2), 0010 secondary tag, 00 pri tag
00000000_11111111_00000000_10101010 data word 1
00000000_00000000_00000000_11111111 data word 2

saleyn · January 5, 2024, 2:33pm

Is there a reason why this bignum to string function is not exposed as a NIF?

jhogberg · January 5, 2024, 3:12pm

Yes: we have a desire to keep the NIF API reasonably small. You’re free to use integer_to_binary/2 and pass the result of that to your NIF.

saleyn · March 18, 2024, 6:41am

I actually have a need in the opposite conversion in a NIF - a string representation of an integer in C (e.g. coming from a protocol like JSON) to a big integer that would return ERL_NIF_TERM. Currently the only way (that doesn’t involve passing it as binary and doing conversion on the Erlang side) would be to encode that string to an external erlang format, and then call enif_binary_to_term(), but this is very slow for large integers. So it would be great if ERL_NIF_TERM enif_chars_to_integer(ErlNifEnv*, const char* str, size_t) or enif_string_to_integer(...) would be added. Without it (and possibly also enif_integer_to_chars(...) the API seems to be incomplete.