[m-dev.] question about 32 bit machines

Discussion:

Zoltan Somogyi

2018-03-12 18:49:02 UTC

At the moment, an argument in a memory cell representing a term
will occupy two words in exactly one case:

- a 32 bit machine
- with 64 bit floats (i.e. floats are not single precision)
- in a non-pregen grade
- and a function symbol argument whose type is explicitly given as float.

On 32 bit machines,

- we box all 64 bit entities (both floats and int64/uint64s) *everywhere*
in pregen grades;
- we box all 64 bit entities *everywhere* where the statically known type
is a type variable; and
- we box all int64s and uint64s everywhere regardless of anything else.

The only exception from boxing is the one given above.

The different treatment of int64/uint64s from floats is just because
we haven't implemented their unpacked representation yet. However, I have
a question: should we have unpacked 64 bit arguments on 32 bit systems at all?
Given the relative rarity of 32 bit systems *and* the relative rarity of code
that not only uses floats, int64s or uint64s but also spends a significant fraction
of its runtime on operations on them, I am wondering whether a policy of
"always box 64 bit primitive values into 32 bit words on 32 bit systems when
putting them into a structure" would be good enough. It would certainly
simplify the relevant parts of the compiler.

There is another issue here. The existing support for unpacked 64 bit floats
on 32 bit machines has always allocated the next two available words for
any such arguments. This guaranteed that the float field's address would be
a multiple of 4, but did not guarantee that it would be a multiple of 8.
(Cells allocated by boehm are always at an address that is a multiple of 8.)
This works on x86s, but it does NOT work on any of the RISC instruction sets,
which always required primitive type values whose size is 8 bytes to have
an address that is a multiple of 8. (The x86 and x86/64 are almost uniquely lax
in their alignment requirements.)

Any opinions?

Zoltan.

Michael Day

2018-03-12 22:28:49 UTC

Permalink

Hi Zoltan,

Post by Zoltan Somogyi
The different treatment of int64/uint64s from floats is just because
we haven't implemented their unpacked representation yet. However, I have
a question: should we have unpacked 64 bit arguments on 32 bit systems at all?
Given the relative rarity of 32 bit systems *and* the relative rarity of code
that not only uses floats, int64s or uint64s but also spends a significant fraction
of its runtime on operations on them, I am wondering whether a policy of
"always box 64 bit primitive values into 32 bit words on 32 bit systems when
putting them into a structure" would be good enough. It would certainly
simplify the relevant parts of the compiler.

One program that spends a significant fraction of its runtime in
operations on floats is Prince, which is why we were interested in the
unboxing optimisation. Although 64-bit machines have become the new
standard, we still have many customers using 32-bit machines, and they
have the tightest address space requirements, so it would be nice to
keep this optimisation if we can.

Michael

--
Prince: Print with CSS!
http://www.princexml.com

Zoltan Somogyi

2018-03-13 18:43:42 UTC

Permalink

Post by Michael Day
Hi Zoltan,

Of course we can; the question was only whether doing so is worthwhile.
Your reply says it is, so I will instead implement the other way to resolve
the inconsistency, which is to implement unboxing of 64 bit ints and uints
on 32 bit machines.

Zoltan.

Peter Wang

2018-03-12 23:23:17 UTC

Permalink

Post by Zoltan Somogyi
At the moment, an argument in a memory cell representing a term
- a 32 bit machine
- with 64 bit floats (i.e. floats are not single precision)
- in a non-pregen grade
- and a function symbol argument whose type is explicitly given as float.
On 32 bit machines,
- we box all 64 bit entities (both floats and int64/uint64s) *everywhere*
in pregen grades;
- we box all 64 bit entities *everywhere* where the statically known type
is a type variable; and
- we box all int64s and uint64s everywhere regardless of anything else.
The only exception from boxing is the one given above.
The different treatment of int64/uint64s from floats is just because
we haven't implemented their unpacked representation yet. However, I have
a question: should we have unpacked 64 bit arguments on 32 bit systems at all?
Given the relative rarity of 32 bit systems *and* the relative rarity of code
that not only uses floats, int64s or uint64s but also spends a significant fraction
of its runtime on operations on them, I am wondering whether a policy of
"always box 64 bit primitive values into 32 bit words on 32 bit systems when
putting them into a structure" would be good enough. It would certainly
simplify the relevant parts of the compiler.

I think we can do without unpacked int64/uint64s.

Post by Zoltan Somogyi
There is another issue here. The existing support for unpacked 64 bit floats
on 32 bit machines has always allocated the next two available words for
any such arguments. This guaranteed that the float field's address would be
a multiple of 4, but did not guarantee that it would be a multiple of 8.
(Cells allocated by boehm are always at an address that is a multiple of 8.)
This works on x86s, but it does NOT work on any of the RISC instruction sets,
which always required primitive type values whose size is 8 bytes to have
an address that is a multiple of 8. (The x86 and x86/64 are almost uniquely lax
in their alignment requirements.)

We don't operate on the float fields directly but extract them into
temporary variables with MR_float_from_dword.

Mercury (and Prince) were previously confirmed working on 32-bit SPARC
after resolving bug #299 in commit 1094f42cc.

Peter

Zoltan Somogyi

2018-03-13 19:06:32 UTC

Permalink

Post by Peter Wang
I think we can do without unpacked int64/uint64s.

Actually, I think it is simpler to treat 64 bit integers the same as
64 bit floats on all platforms, including 32 bit ones.

Post by Peter Wang

We don't operate on the float fields directly but extract them into
temporary variables with MR_float_from_dword.

Yes, I know. However, accessing a known-to-be-properly aligned float
without copying it would be faster.

The downside would be that aligning 64 bit floats (and ints and uints)
on 64 bit boundaries on 32 bit machines may require the compiler
to insert 32 bits of padding.

*If* you control the definition of a type whose functors have float arguments,
and *if* you know about this issue, then you can choose to order the
arguments of such functors in a way that avoids such padding.

I see three possible approaches here.

1: the status quo. Never insert any padding before 64 bit entities,
and always refer to them via MR_float_from_word and related macros.

2: the speed demon approach: always align 64 bit entities on 64 bit
boundaries, and refer to them directly, without going through MR_float_from_word.

3: the compromise approach: never insert any padding before
64 bit entities, but refer to them directly, without MR_float_from_word,
whenever they *happen* to start on a 64 bit boundary.

Approach 3 is clearly superior to approach 1, but 2 is simpler to implement,
though I don't know by how much. I will try to implement 3.

Post by Peter Wang
Mercury (and Prince) were previously confirmed working on 32-bit SPARC
after resolving bug #299 in commit 1094f42cc.

How long ago was that?

That Mercury has worked on 32 bit SPARC in the past is *far* from surprising,
given that Mercury *first bootstrapped* on a 32-bit SPARC machine :-),
after starting development on a 32-bit MIPS machine.

Zoltan.

Peter Wang

2018-03-13 23:56:35 UTC

Permalink

Post by Zoltan Somogyi

Post by Peter Wang
I think we can do without unpacked int64/uint64s.

Actually, I think it is simpler to treat 64 bit integers the same as
64 bit floats on all platforms, including 32 bit ones.

Post by Peter Wang

We don't operate on the float fields directly but extract them into
temporary variables with MR_float_from_dword.

Yes, I know. However, accessing a known-to-be-properly aligned float
without copying it would be faster.

Right, though I assumed that copying into float registers / the FPU
register stack would be necessary either way, and (somehow) it would all
be equivalent to the C compiler in the end. It seems not, or not always.

Post by Zoltan Somogyi
The downside would be that aligning 64 bit floats (and ints and uints)
on 64 bit boundaries on 32 bit machines may require the compiler
to insert 32 bits of padding.
*If* you control the definition of a type whose functors have float arguments,
and *if* you know about this issue, then you can choose to order the
arguments of such functors in a way that avoids such padding.
I see three possible approaches here.
1: the status quo. Never insert any padding before 64 bit entities,
and always refer to them via MR_float_from_word and related macros.
2: the speed demon approach: always align 64 bit entities on 64 bit
boundaries, and refer to them directly, without going through MR_float_from_word.
3: the compromise approach: never insert any padding before
64 bit entities, but refer to them directly, without MR_float_from_word,
whenever they *happen* to start on a 64 bit boundary.
Approach 3 is clearly superior to approach 1, but 2 is simpler to implement,
though I don't know by how much. I will try to implement 3.

I think approach 3 is fine. Once (if) we have argument reordering
we can make all 64 bit entities aligned to 64 bit boundaries
(unless there is some constraint that I'm not aware of).

Post by Zoltan Somogyi

Post by Peter Wang
Mercury (and Prince) were previously confirmed working on 32-bit SPARC
after resolving bug #299 in commit 1094f42cc.

How long ago was that?
That Mercury has worked on 32 bit SPARC in the past is *far* from surprising,
given that Mercury *first bootstrapped* on a 32-bit SPARC machine :-),
after starting development on a 32-bit MIPS machine.

This was AFTER the introduction of double word floats.
git log 1094f42cc says Mon Oct 7 15:03:16 2013

Peter

Peter Wang

2018-03-14 00:39:52 UTC

Permalink

Post by Peter Wang
I think approach 3 is fine. Once (if) we have argument reordering
we can make all 64 bit entities aligned to 64 bit boundaries
(unless there is some constraint that I'm not aware of).

Oh yeah, in memory profiling grades we allocate an extra word before the
object for the allocation id, so the start of the object would not be 64
bit aligned. We would need to allocate an extra two words, or find
another solution.

The old (now useless?) term size profiling stuff used the same trick.
In handle_options.m we disable double-word fields in term size profiling
grades, but curiously not in memory profiling grades. I'll check it when
I get a chance.

Peter

Michael Day

2018-03-14 02:03:56 UTC

Permalink

Post by Peter Wang
The old (now useless?) term size profiling stuff used the same trick.
In handle_options.m we disable double-word fields in term size profiling
grades, but curiously not in memory profiling grades. I'll check it when
I get a chance.

Surely in memory profiling grades we want a more accurate profile of the
memory cells actually required, and boxing the fields would change this.

Michael

--
Prince: Print with CSS!
http://www.princexml.com

Peter Wang

2018-03-14 02:24:26 UTC

Permalink

Post by Michael Day

Surely in memory profiling grades we want a more accurate profile of the
memory cells actually required, and boxing the fields would change this.

Yeah, and in practice developers will use profiling grades on their
Intel (x86_64) machines anyway.

Peter