Discussion:
[m-dev.] a conditional field update operator
Zoltan Somogyi
2017-03-08 03:41:19 UTC
Permalink
One of the changes I made to simplify_info operations was to
replace code like this:

simplify_info_set_varset(X, !Info) :-
!Info ^ simp_varset := X.

with code like this:

simplify_info_set_varset(X, !Info) :-
( if private_builtin.pointer_equal(X, !.Info ^ simp_varset) then
true
else
!Info ^ simp_varset := X
).

The two are semantically equivalent, but different in performance.
The first never pays the cost of a test, but always pays the cost of
allocating a new structure, while for the second, things are the other
way around. This means that performance-wise, which one you want
depends on the probability of the new value of the field being different
from its old value, and on the size of the structure as a whole.

In many cases, the second approach is appropriate, but at the moment,
it takes five lines of code instead of one. I was thinking that we could
have a new operator, maybe ::= or :?=, that would be like :=, but
would create a new structure only if the new value of the field was
different from the old value. Then the compiler would expand the one line
to the five lines version internally.

Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?

Zoltan.
Michael Day
2017-03-08 03:45:53 UTC
Permalink
Hi Zoltan,
Post by Zoltan Somogyi
In many cases, the second approach is appropriate, but at the moment,
it takes five lines of code instead of one. I was thinking that we could
have a new operator, maybe ::= or :?=, that would be like :=, but
would create a new structure only if the new value of the field was
different from the old value. Then the compiler would expand the one line
to the five lines version internally.
I would be tempted to just make := always do the pointer equality test
and omit it if it's obviously going to fail, eg. if the right-hand side
is a newly constructed cell that can't possibly be equal.

Michael
--
Prince: Print with CSS!
http://www.princexml.com
Zoltan Somogyi
2017-03-08 04:13:16 UTC
Permalink
Post by Michael Day
Hi Zoltan,
Post by Zoltan Somogyi
In many cases, the second approach is appropriate, but at the moment,
it takes five lines of code instead of one. I was thinking that we could
have a new operator, maybe ::= or :?=, that would be like :=, but
would create a new structure only if the new value of the field was
different from the old value. Then the compiler would expand the one line
to the five lines version internally.
I would be tempted to just make := always do the pointer equality test
and omit it if it's obviously going to fail, eg. if the right-hand side
is a newly constructed cell that can't possibly be equal.
In most of the cases in which we now use the test-and-only-assign-if-
the-old-and-new-values-are different technique, or conditional-assign
for short, the new value is constructed somewhere else: either in the caller
of the current procedure, or in one of its callees. In procedures that *do*
decide the new value of the field themselves, the code already tends to
follow the pattern of

( if ... some test ... then
compute new value
assign the new value to !Info ^ fieldname (unconditionally)
else
don't assign to !Info ^ fieldname at all
)

In this case, the old and new values of the field will be the same only
by accident, and in many cases we (humans) can rule out the accident,
because often the new value is computed from the old value by an
operation whose output is necessarily different than its input (e.g.
appending an element to the front of a list). Having the computer
arrive at the same decision would, in many cases, require nontrivial
program analysis.

The reason why I want to preserve the ability of programmers
to ask for the current behavior of := is that in cases like this,
where you know that the "are the old and new values the same?"
test won't succeed often, or at all, the cost of the test is effectively
"distributed fat": a cost imposed on one part of the program
just because *another* part of the program may benefit from it.
Avoiding distributed fat whereever possible is one of the core principles
of the project. In this case, avoiding it is trivial.

One other consideration is that code that updates more than one field,
such as

!Info ^ field1 := F1,
!Info ^ field2 := F2

can be expanded one goal at a time. The parser will expand this
into code that constructs Info1 from Info0, and then Info2 from Info1,
and the common struct optimization pass will come along later,
process the resulting conjunction of unifications, and construct
Info2 directly from Info0, optimizing away the materialization of Info1.

For a sequence of two or more *conditional* assignments to fields,
this won't work, because the expansion of a conditional assignment
is not a flat conjunction of unifications. Either the parser must expand
adjacent field updates to the same state variable together (even if
the code does not use state variable syntax), or we need to make
the common struct optimization understand the more complex
code structures that result from their one-by-one expansion.

Does Prince make extensive use of conditional assignment?
Was that the motivation behind your proposal?

Zoltan.
Michael Day
2017-03-08 05:46:38 UTC
Permalink
Hi Zoltan,
Post by Zoltan Somogyi
The reason why I want to preserve the ability of programmers
to ask for the current behavior of := is that in cases like this,
where you know that the "are the old and new values the same?"
test won't succeed often, or at all, the cost of the test is effectively
"distributed fat": a cost imposed on one part of the program
just because *another* part of the program may benefit from it.
Avoiding distributed fat whereever possible is one of the core principles
of the project. In this case, avoiding it is trivial.
That seems reasonable.
Post by Zoltan Somogyi
One other consideration is that code that updates more than one field,
such as
!Info ^ field1 := F1,
!Info ^ field2 := F2
can be expanded one goal at a time. The parser will expand this
into code that constructs Info1 from Info0, and then Info2 from Info1,
and the common struct optimization pass will come along later,
process the resulting conjunction of unifications, and construct
Info2 directly from Info0, optimizing away the materialization of Info1.
Also very reasonable.
Post by Zoltan Somogyi
Does Prince make extensive use of conditional assignment?
Was that the motivation behind your proposal?
Some of our conditional assignments would need an explicit test anyway
as they have to do other operations if the test fails:

( if LineJoin0 = LineJoin then
Gs = Gs0
else
Gs = Gs0 ^ gs0 ^ stroke_linejoin := LineJoin,
(
LineJoin = miter,
write_op(Buf, [int(0)], "j", !IO)
;
LineJoin = round,
write_op(Buf, [int(1)], "j", !IO)
;
LineJoin = bevel,
write_op(Buf, [int(2)], "j", !IO)
)
)

Our CSS style processing code goes to some effort to avoid allocating
structures that haven't changed, but I think it wouldn't get any value
from this pointer equality test unless we canonicalised more values
deeper in the struct; currently we only canonicalise the whole thing at
the top level instead of each subfield.

There may be other places where we do conditional assignment (mostly on
float values not pointers I suspect), but probably not a sufficient
number to justify doing the test for every assignment.

Michael
--
Prince: Print with CSS!
http://www.princexml.com
Zoltan Somogyi
2017-03-08 05:57:56 UTC
Permalink
Post by Michael Day
Our CSS style processing code goes to some effort to avoid allocating
structures that haven't changed, but I think it wouldn't get any value
from this pointer equality test unless we canonicalised more values
deeper in the struct; currently we only canonicalise the whole thing at
the top level instead of each subfield.
I understand; there are several analogous situations in the Mercury compiler.

That is part of what I was trying to get at. As you say, conditional assignment
at any level of a data structure works only if you do it consistently at all the
levels below it; if you don't, then the new value of a field may be semantically
equal to the old value, but bitwise different from it. So to apply conditional
update to the transformation of a whole data structure, you have to do it
for the transformation of *every part*. This means that at the moment,
if you have code that does a traversal with unconditional update, your only
choices are

- replace every line where a part of the data structure is updated with *five*
lines of code, greatly increasing the size of the code and making it harder
to read, or

- give up on conditional update.

With the operator I propose, you would have a third option:

- replace every use of := with the new operator, leaving the code effectively
the same size, and with the same readibility.

Zoltan.
Mark Brown
2017-03-08 07:01:50 UTC
Permalink
Hi everyone,

On Wed, Mar 8, 2017 at 4:57 PM, Zoltan Somogyi
Post by Zoltan Somogyi
That is part of what I was trying to get at. As you say, conditional assignment
at any level of a data structure works only if you do it consistently at all the
levels below it; if you don't, then the new value of a field may be semantically
equal to the old value, but bitwise different from it. So to apply conditional
update to the transformation of a whole data structure, you have to do it
for the transformation of *every part*.
For me, this is the clincher.

The field update syntax can already be redefined to do any checks that
the user wants, including checking a condition before updating. You
can even pass extra parameters in that determine what kind of test you
want and use multiple modes to remove tests at compile time, as in the
example code below. But it won't do the right thing for updates of
nested fields. So I think some new syntax to support conditional
updates of nested fields is warranted.

One question: is this syntax something that can be redefined, like the
existing field syntax, or does it always compile down to calls to the
unconditional field updates (which may themselves be redefined), or
does it always compile to an ordinary construction like the default
field update functions? I vote that it compiles down to unconditional
field updates, viz:

!Term ^ field_list ?= FieldValue

becomes

( if private_builtin.pointer_equal(Term ^ field_list, FieldValue) then
true
else
!Term ^ field_list := FieldValue
)

Then this code should call the user defined unconditional field
update, if present.

Mark

--
Example code mentioned above:

:- type s ---> s(
f1 :: int,
f2 :: string
).

:- type update_when
---> always
; ptr_equal.

:- inst always ---> always.
:- inst ptr_equal ---> ptr_equal.

:- func s ^ f2(update_when) := string = s.
:- mode in ^ f2(in) := in = out is det.
:- mode in ^ f2(in(always)) := in = out is det.
:- mode in ^ f2(in(ptr_equal)) := in = out is det.

S ^ f2(When) := Name =
( if test_when(When, S ^ f2, Name) then S ^ f2 := Name else S ).

:- pred test_when(update_when, T, T).
:- mode test_when(in(always), in, in) is det.
:- mode test_when(in(ptr_equal), in, in) is semidet.
:- mode test_when(in, in, in) is semidet.

test_when(always, _, _).
test_when(ptr_equal, X, Y) :- private_builtin.pointer_equal(X, Y).
Zoltan Somogyi
2017-03-08 07:24:47 UTC
Permalink
Post by Mark Brown
One question: is this syntax something that can be redefined, like the
existing field syntax, or does it always compile down to calls to the
unconditional field updates (which may themselves be redefined), or
does it always compile to an ordinary construction like the default
field update functions?
Let me reformat that to number the alternatives:

1: is this syntax something that can be redefined, like the existing field syntax, or

2: does it always compile down to calls to the unconditional field updates (which may
themselves be redefined), or

3: does it always compile to an ordinary construction like the default
field update functions?

I agree with your vote: I also prefer 2. I think the only advantage of 1 over
the status quo is that you can avoid the When parameter in your example,
which I think is too minor an advantage to justify the work needed for
any change, while 3 unnecessarily throws away existing generality
whose implementation should be readily reusable (though I haven't looked
at the relevant code in a long time).
Post by Mark Brown
I vote that it compiles down to unconditional
!Term ^ field_list ?= FieldValue
becomes
( if private_builtin.pointer_equal(Term ^ field_list, FieldValue) then
true
else
!Term ^ field_list := FieldValue
)
Then this code should call the user defined unconditional field
update, if present.
Agreed for the transformation, though we may need to do something,
such a wrapping a new scope around the if-then-else, to simplify
the optimization of several consecutive conditional field updates
of the same structure, in the usual case of no user-defined field
update function.

I would however vote against using ?= as the conditional update operator.
Its form conveys "condition" well to the reader, but to me, it does NOT
suggest "update".
Is this preexisting code, or did you write it for this post?

Zoltan.
Mark Brown
2017-03-08 08:37:33 UTC
Permalink
Hi,

On Wed, Mar 8, 2017 at 6:24 PM, Zoltan Somogyi
Post by Zoltan Somogyi
I would however vote against using ?= as the conditional update operator.
Its form conveys "condition" well to the reader, but to me, it does NOT
suggest "update".
I misread it as ?= in the OP :-/
Post by Zoltan Somogyi
Is this preexisting code, or did you write it for this post?
It was preexisting but never used, precisely because it doesn't
completely work for nested fields.

Mark
Paul Bone
2017-03-08 05:02:11 UTC
Permalink
Post by Zoltan Somogyi
One of the changes I made to simplify_info operations was to
simplify_info_set_varset(X, !Info) :-
!Info ^ simp_varset := X.
simplify_info_set_varset(X, !Info) :-
( if private_builtin.pointer_equal(X, !.Info ^ simp_varset) then
true
else
!Info ^ simp_varset := X
).
Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?
Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.

Since this would be very rarely used I'd hesitate to give it a
symbol/operator. I'd prefer to give it a name, it may make it more awkward
to use but it's an operator that can be defined for something else later (if
necessary).

update_field_if_not_pointer_equal(!Info ^ simp_varset, X),

Hrm, it's a mouthful, just an idea.
--
Paul Bone
http://paul.bone.id.au
Zoltan Somogyi
2017-03-08 05:38:52 UTC
Permalink
Post by Paul Bone
Post by Zoltan Somogyi
Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?
Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.
pointer_equal is a builtin that compares its operands as bit vectors;
basically, it checks if the update would yield a new structure that is
bit-identical to the old one. I don't see any situation in which it is *not*
a correct test to use. (For integer and boolean fields, we often use
New = Old as the test, but only because it is easier to type and to read;
it is semantically identical to the pointer_equal test.)
Post by Paul Bone
Since this would be very rarely used I'd hesitate to give it a
symbol/operator.
The reasons why I am proposing a new operator is precisely because

- this is the nth time we have found a need for it in the compiler,
for largish values of n, and

- I think it is likely that in many cases, it is only the size and (at the moment)
relatively hard-to-read nature of the five-line version that prevents it
from being used instead of the one-line, always-allocate-a-new-structure
version. For example, the compiler has many passes that transform
parts of the HLDS *slightly*, leaving many or most parts of it intact.
If we had the new operator, we could use it to significantly reduce
the amount of memory being allocated by such passes.

Is your argument that you think such "rebuild with some rare modifications"
operations are rare in general? Mike's previous mail seems to imply that
he thinks just the opposite.
Post by Paul Bone
I'd prefer to give it a name, it may make it more awkward
to use but it's an operator that can be defined for something else later (if
necessary).
update_field_if_not_pointer_equal(!Info ^ simp_varset, X),
I don't think "not adding another operator" is a worthwhile goal,
and I don't see what else your proposal would accomplish.

If you make conditional-update use a procedure call as its syntax,
then either

- its implementation is also a procedure call, in which case
it is unnecessarily slow; or

- its implementation is not a procedure call but inline code,
it which case the compiler would have to have completely new code
to make the transformation.

Using the same overall syntax as !Info ^ field := X but with another
operators is easier for us as implementors. And for users, the easiest
way to make conditional-update easy to use, and to make switching
from always-update to conditional-update, or vice versa, is to make
condition-update a one-character change from the existing := operator.

Zoltan.
Paul Bone
2017-03-08 05:55:20 UTC
Permalink
Post by Zoltan Somogyi
Post by Paul Bone
Post by Zoltan Somogyi
Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?
Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.
pointer_equal is a builtin that compares its operands as bit vectors;
basically, it checks if the update would yield a new structure that is
bit-identical to the old one. I don't see any situation in which it is *not*
a correct test to use. (For integer and boolean fields, we often use
New = Old as the test, but only because it is easier to type and to read;
it is semantically identical to the pointer_equal test.)
Yes, it is the most conservative test.

However an equality test will catch items that are semantically equal, even
if not structurally or pointer equal, and avoid the memory allocation in
those situations also. This may be what people want, but possibly less
often, although it might not be correct in some cases.
Post by Zoltan Somogyi
Post by Paul Bone
Since this would be very rarely used I'd hesitate to give it a
symbol/operator.
The reasons why I am proposing a new operator is precisely because
- this is the nth time we have found a need for it in the compiler,
for largish values of n, and
- I think it is likely that in many cases, it is only the size and (at the moment)
relatively hard-to-read nature of the five-line version that prevents it
from being used instead of the one-line, always-allocate-a-new-structure
version. For example, the compiler has many passes that transform
parts of the HLDS *slightly*, leaving many or most parts of it intact.
If we had the new operator, we could use it to significantly reduce
the amount of memory being allocated by such passes.
Mmm, true. I hadn't considered the cases where I _don't_ use such a
pattern, even though I could/should.
Post by Zoltan Somogyi
Is your argument that you think such "rebuild with some rare modifications"
operations are rare in general? Mike's previous mail seems to imply that
he thinks just the opposite.
I had _assumed_ that they were rare. But your above example of the HLDS
reminded me of all the times I had done the same, and wished for something
like this. I now think that making it a symbolic operator such as :?= is
best.

I assumed that Mike based his message off the idea that, at least in
Mercury's C grades, a memory allocation is quite expensive while a branch is
much cheaper. Even if these are rare I guessed that he was thinking it
might be worth-while anyway. If they are rare, I think it wouldn't be
worth-while. And the cost to other transformations, such as coalescing
multiple structure updates together, would be at best annoying.

However, if we go to the effort of adding an operator such as :?=, it would
be interesting to see what happens if made it the default.
Post by Zoltan Somogyi
Post by Paul Bone
I'd prefer to give it a name, it may make it more awkward
to use but it's an operator that can be defined for something else later (if
necessary).
update_field_if_not_pointer_equal(!Info ^ simp_varset, X),
I don't think "not adding another operator" is a worthwhile goal,
and I don't see what else your proposal would accomplish.
If you make conditional-update use a procedure call as its syntax,
then either
- its implementation is also a procedure call, in which case
it is unnecessarily slow; or
- its implementation is not a procedure call but inline code,
it which case the compiler would have to have completely new code
to make the transformation.
I no-longer think this is a good idea. But wouldn't it be recognised
directly by the parser, like a reserved word?
Post by Zoltan Somogyi
Using the same overall syntax as !Info ^ field := X but with another
operators is easier for us as implementors. And for users, the easiest
way to make conditional-update easy to use, and to make switching
from always-update to conditional-update, or vice versa, is to make
condition-update a one-character change from the existing := operator.
These are good goals.
--
Paul Bone
http://paul.bone.id.au
Zoltan Somogyi
2017-03-08 06:53:30 UTC
Permalink
Post by Paul Bone
Post by Zoltan Somogyi
Post by Paul Bone
Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.
pointer_equal is a builtin that compares its operands as bit vectors;
basically, it checks if the update would yield a new structure that is
bit-identical to the old one. I don't see any situation in which it is *not*
a correct test to use. (For integer and boolean fields, we often use
New = Old as the test, but only because it is easier to type and to read;
it is semantically identical to the pointer_equal test.)
Yes, it is the most conservative test.
However an equality test will catch items that are semantically equal, even
if not structurally or pointer equal, and avoid the memory allocation in
those situations also. This may be what people want, but possibly less
often, although it might not be correct in some cases.
I thought you meant using normal equality tests only for atomic types,
the way we hand-implement conditional update in the Mercury compiler now.
I now see you are proposing it for nonatomic type as well.

For those, I am pretty sure it is a bad idea. Suppose you have a 1% chance
of accidentally using deep equality instead of pointer equality. Then it may be
that 0.5% of the time you compare two 100-byte structures for equality
instead of two pointers, and 0.5% of the time you compare two 10 megabyte
structures for equality instead of two pointers. The cost of that second 0.5%
will be *way* more than what can be possibly justified by the amount of allocation
being saved.

In general, the cost of an allocation is sort-of linear in the amount being
allocated, and therefore the fixed size of the cell that the update applies to
is a sort-of natural bound for the cost of allocation. On the other hand,
the cost of a deep equality test HAS no bound. Paying an unbounded cost
for a bounded gain is not a good general strategy.
Post by Paul Bone
Post by Zoltan Somogyi
The reasons why I am proposing a new operator is precisely because
- this is the nth time we have found a need for it in the compiler,
for largish values of n, and
- I think it is likely that in many cases, it is only the size and (at the moment)
relatively hard-to-read nature of the five-line version that prevents it
from being used instead of the one-line, always-allocate-a-new-structure
version. For example, the compiler has many passes that transform
parts of the HLDS *slightly*, leaving many or most parts of it intact.
If we had the new operator, we could use it to significantly reduce
the amount of memory being allocated by such passes.
Mmm, true. I hadn't considered the cases where I _don't_ use such a
pattern, even though I could/should.
I intended my original mail specifically to ask you guys to think about
*potential future* uses of conditional update, not just actual, current uses.
I see I should have made that more explicit.
Post by Paul Bone
Post by Zoltan Somogyi
Is your argument that you think such "rebuild with some rare modifications"
operations are rare in general? Mike's previous mail seems to imply that
he thinks just the opposite.
I had _assumed_ that they were rare. But your above example of the HLDS
reminded me of all the times I had done the same, and wished for something
like this. I now think that making it a symbolic operator such as :?= is
best.
Good; then we are on the same wavelenth.
Post by Paul Bone
I assumed that Mike based his message off the idea that, at least in
Mercury's C grades, a memory allocation is quite expensive while a branch is
much cheaper.
On current (and foreseeable) architectures, the cost of a branch can range
from effective zero, if the branch is perfectly predictable, to huge (several
tens of cycles, maybe more than a hundred cycles) if the branch is perfectly
unpredictable, i.e. effectively random. Call these costs epsilon and huge
respectively. For a branch for which the CPU's branch predictor gets things
right N% of the time, its average cost will be (N * epsilon + (100-N) * huge)/100.

The value of N depends not just on the branch but on lots of other things,
because modern branch predictors take several aspects of the history
leading up to the execution of the branch into account when they make
their prediction. (For example, they can learn that this branch is very likely
to be taken if either of the previous two branches were not taken, if that
happens to be true.) However, they are tuned to exploit *local* history,
history within e.g. a C function or Java method, not the significantly less
regular cross-procedural-boundary history that would be useful for Mercury code.

There is a branch in GC_malloc, in the test for whether the freelist for
the selected block size is empty, but it is almost perfectly predictable.

The branch in conditional update, in cases where you want to use it,
will be significantly less predictable. (If it is perfectly predictable that
the new field value is not the same as the old, you would use unconditional
update; if it is perfectly predictable that the new value is the same as the old,
you won't have any update at all.)

The obvious extra cost of allocating a new structure even when you don't need to
is the work involved in allocating the structure and filling it in. However,
both are reasonably cheap as long as the filled-in structure stays in the cache.
The problem is that almost certainly it won't, because even if the new structure
becomes dead soon, gc is rare enough that it is very unlikely to come along
and collect the dead structure before it is flushed from the cache.

So basically, the tradeoff is between the two costliest operations
in modern CPUs: mispredicted branches and memory accesses.
Neither is obviously costlier than the other; which one is costlier
depends on the details.

By the way, Nick Nethercote has written a paper a few years ago
where he showed that counts of these two operations, mispredicted branches
and memory accesses, allowed him to predict the runtimes of a selected
set of Haskell programs with about 85% accuracy. One can take this
to mean that to a first approximation, all other operations the program does
are effectively free.
Post by Paul Bone
Post by Zoltan Somogyi
If you make conditional-update use a procedure call as its syntax,
then either
- its implementation is also a procedure call, in which case
it is unnecessarily slow; or
- its implementation is not a procedure call but inline code,
it which case the compiler would have to have completely new code
to make the transformation.
I no-longer think this is a good idea. But wouldn't it be recognised
directly by the parser, like a reserved word?
Yes, but "parsing conditional update using procedure call syntax"
would still require more new code than "parsing conditional update
as a minor variation of unconditional update syntax", because
unlike the latter case, the former case would NOT be able to reuse
most of the existing code for parsing field updates. This is why I wrote
"completely new code" above.
Post by Paul Bone
Post by Zoltan Somogyi
Using the same overall syntax as !Info ^ field := X but with another
operators is easier for us as implementors. And for users, the easiest
way to make conditional-update easy to use, and to make switching
from always-update to conditional-update, or vice versa, is to make
condition-update a one-character change from the existing := operator.
These are good goals.
It seems we are converging on consensus.

Zoltan.
Paul Bone
2017-03-08 23:55:32 UTC
Permalink
Post by Zoltan Somogyi
Post by Paul Bone
Post by Zoltan Somogyi
Post by Paul Bone
Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.
pointer_equal is a builtin that compares its operands as bit vectors;
basically, it checks if the update would yield a new structure that is
bit-identical to the old one. I don't see any situation in which it is *not*
a correct test to use. (For integer and boolean fields, we often use
New = Old as the test, but only because it is easier to type and to read;
it is semantically identical to the pointer_equal test.)
Yes, it is the most conservative test.
However an equality test will catch items that are semantically equal, even
if not structurally or pointer equal, and avoid the memory allocation in
those situations also. This may be what people want, but possibly less
often, although it might not be correct in some cases.
I thought you meant using normal equality tests only for atomic types,
the way we hand-implement conditional update in the Mercury compiler now.
I now see you are proposing it for nonatomic type as well.
For those, I am pretty sure it is a bad idea. Suppose you have a 1% chance
of accidentally using deep equality instead of pointer equality. Then it may be
that 0.5% of the time you compare two 100-byte structures for equality
instead of two pointers, and 0.5% of the time you compare two 10 megabyte
structures for equality instead of two pointers. The cost of that second 0.5%
will be *way* more than what can be possibly justified by the amount of allocation
being saved.
Perhaps I'm wrong but I had assumed that a normal equality test (such as
unification) will first compare pointers, and then do the deep tests only if
the pointers are unequal, and short ciruit as soon as two fields are
unequal. And that it will do this transitively. So that the actual /
amortized cost is low or trivial.

Either way, I don't think this should be the normal comparison used by a :?=
operator. But it might make sense that this is available to developers who
want it. Of course they could write it manually, I don't think it would be
used very often relative to the your proposal with pointer equality.
Post by Zoltan Somogyi
In general, the cost of an allocation is sort-of linear in the amount being
allocated, and therefore the fixed size of the cell that the update applies to
is a sort-of natural bound for the cost of allocation. On the other hand,
the cost of a deep equality test HAS no bound. Paying an unbounded cost
for a bounded gain is not a good general strategy.
Even if my assumption that it is amortized or that large unbounded
comparisons are pathological, it is still more difficult to predict than the
bounded cost of allocation. I agree with you.
Post by Zoltan Somogyi
Post by Paul Bone
Post by Zoltan Somogyi
The reasons why I am proposing a new operator is precisely because
- this is the nth time we have found a need for it in the compiler,
for largish values of n, and
- I think it is likely that in many cases, it is only the size and (at the moment)
relatively hard-to-read nature of the five-line version that prevents it
from being used instead of the one-line, always-allocate-a-new-structure
version. For example, the compiler has many passes that transform
parts of the HLDS *slightly*, leaving many or most parts of it intact.
If we had the new operator, we could use it to significantly reduce
the amount of memory being allocated by such passes.
Mmm, true. I hadn't considered the cases where I _don't_ use such a
pattern, even though I could/should.
I intended my original mail specifically to ask you guys to think about
*potential future* uses of conditional update, not just actual, current uses.
I see I should have made that more explicit.
I also thought that part of the intent is to ask what uses may occur in
closed code bases such as Prince. The other developers will know more than
me about this, but AIUI there's less need for this in Prince than the
Mercury compiler. Also sometimes when we manipulate tree-like structures
(the DOM) we do so using mutvars, because the structures have cycles that
are easier to manipulate with mutvars.
Post by Zoltan Somogyi
Post by Paul Bone
I assumed that Mike based his message off the idea that, at least in
Mercury's C grades, a memory allocation is quite expensive while a branch is
much cheaper.
On current (and foreseeable) architectures, the cost of a branch can range
from effective zero, if the branch is perfectly predictable, to huge (several
tens of cycles, maybe more than a hundred cycles) if the branch is perfectly
unpredictable, i.e. effectively random. Call these costs epsilon and huge
respectively. For a branch for which the CPU's branch predictor gets things
right N% of the time, its average cost will be (N * epsilon + (100-N) * huge)/100.
The value of N depends not just on the branch but on lots of other things,
because modern branch predictors take several aspects of the history
leading up to the execution of the branch into account when they make
their prediction. (For example, they can learn that this branch is very likely
to be taken if either of the previous two branches were not taken, if that
happens to be true.) However, they are tuned to exploit *local* history,
history within e.g. a C function or Java method, not the significantly less
regular cross-procedural-boundary history that would be useful for Mercury code.
Each additional branch also affects the predictability of other branches, it
may cause them to be evicted from the cache.
Post by Zoltan Somogyi
There is a branch in GC_malloc, in the test for whether the freelist for
the selected block size is empty, but it is almost perfectly predictable.
The branch in conditional update, in cases where you want to use it,
will be significantly less predictable. (If it is perfectly predictable that
the new field value is not the same as the old, you would use unconditional
update; if it is perfectly predictable that the new value is the same as the old,
you won't have any update at all.)
The obvious extra cost of allocating a new structure even when you don't need to
is the work involved in allocating the structure and filling it in. However,
both are reasonably cheap as long as the filled-in structure stays in the cache.
The problem is that almost certainly it won't, because even if the new structure
becomes dead soon, gc is rare enough that it is very unlikely to come along
and collect the dead structure before it is flushed from the cache.
So basically, the tradeoff is between the two costliest operations
in modern CPUs: mispredicted branches and memory accesses.
Neither is obviously costlier than the other; which one is costlier
depends on the details.
That makes sense.

Maybe I'm biased against Boehm GC, knowing the other problems that it has
created for us, particularly poor multi-threaded performance.

From what I actually know about Boehm GC checking the free lists should be a
few (dependent) memory reads and a well-predicted branch on the fast path.
It's important that they are dependent reads (it does pointer following) so
it may create stalls. Each time a free list is populated it is done so by
lazy scanning of a memory block. Depending on the utilisation of a block
that could populate the list with a varying amount of items. Let's say for
16 byte objects and 4096 byte pages which are half-full. That would add 128
items to the free list (I haven't accounted for the page header). The
branch will be mis-predicted at least 1/128 times. But since that's when it
executes the slow path anyway it might not be a problem. The slow path does
a lot more work. It must synchronize with other threads, get a page, sweep
the page and build the free lists. This is amortised, but it's difficult to
say how well.

To summarise, Exactly what this cost is and how it trades off against a
mis-predicted branch is an open question, we'd need to measure it.
Post by Zoltan Somogyi
By the way, Nick Nethercote has written a paper a few years ago
where he showed that counts of these two operations, mispredicted branches
and memory accesses, allowed him to predict the runtimes of a selected
set of Haskell programs with about 85% accuracy. One can take this
to mean that to a first approximation, all other operations the program does
are effectively free.
That's been my understanding for a while. But I wasn't aware of Nick's
paper, is this the one? http://dotat.at/tmp/msp02.pdf
Post by Zoltan Somogyi
Post by Paul Bone
Post by Zoltan Somogyi
If you make conditional-update use a procedure call as its syntax,
then either
- its implementation is also a procedure call, in which case
it is unnecessarily slow; or
- its implementation is not a procedure call but inline code,
it which case the compiler would have to have completely new code
to make the transformation.
I no-longer think this is a good idea. But wouldn't it be recognised
directly by the parser, like a reserved word?
Yes, but "parsing conditional update using procedure call syntax"
would still require more new code than "parsing conditional update
as a minor variation of unconditional update syntax", because
unlike the latter case, the former case would NOT be able to reuse
"completely new code" above.
Ah, okay.
Post by Zoltan Somogyi
Post by Paul Bone
Post by Zoltan Somogyi
Using the same overall syntax as !Info ^ field := X but with another
operators is easier for us as implementors. And for users, the easiest
way to make conditional-update easy to use, and to make switching
from always-update to conditional-update, or vice versa, is to make
condition-update a one-character change from the existing := operator.
These are good goals.
It seems we are converging on consensus.
Yes, I think so. My tangents are mostly just tangents that may be
interesting but don't really effect this design decision.
--
Paul Bone
http://paul.bone.id.au
Zoltan Somogyi
2017-03-09 02:32:39 UTC
Permalink
Post by Paul Bone
Post by Zoltan Somogyi
For those, I am pretty sure it is a bad idea. Suppose you have a 1% chance
of accidentally using deep equality instead of pointer equality. Then it may be
that 0.5% of the time you compare two 100-byte structures for equality
instead of two pointers, and 0.5% of the time you compare two 10 megabyte
structures for equality instead of two pointers. The cost of that second 0.5%
will be *way* more than what can be possibly justified by the amount of allocation
being saved.
Perhaps I'm wrong but I had assumed that a normal equality test (such as
unification) will first compare pointers, and then do the deep tests only if
the pointers are unequal, and short ciruit as soon as two fields are
unequal. And that it will do this transitively.
That is correct.
Post by Paul Bone
So that the actual /
amortized cost is low or trivial.
Unfortunately, that does not follow. The bitwise equality tests on pointers
may reduce the cost of the equality test from traversing 10 Mb to traversing
(say) 10 Kb, but that cost is still *much* bigger than benefit you are trying to gain,
which is comparable to the cost of traversing a few tens of bytes. And
in the absence of a conditional update operator, those pointer equality tests
will tend to be less effective, because programmers will be more likely to write
traversals that don't preserve bitwise equality when the "updated" version
of a sub data structure is semantically equal to its old version.
Post by Paul Bone
Either way, I don't think this should be the normal comparison used by a :?=
operator. But it might make sense that this is available to developers who
want it. Of course they could write it manually, I don't think it would be
used very often relative to the your proposal with pointer equality.
In the absence of a convincing use case, I won't implement it.
Post by Paul Bone
Post by Zoltan Somogyi
I intended my original mail specifically to ask you guys to think about
*potential future* uses of conditional update, not just actual, current uses.
I see I should have made that more explicit.
I also thought that part of the intent is to ask what uses may occur in
closed code bases such as Prince.
I was giving you an opportunity to put your two cents in, yes.
Post by Paul Bone
The other developers will know more than
me about this, but AIUI there's less need for this in Prince than the
Mercury compiler. Also sometimes when we manipulate tree-like structures
(the DOM) we do so using mutvars, because the structures have cycles that
are easier to manipulate with mutvars.
I see. This proposal will do nothing to help code using mutvars, but you know that.
Post by Paul Bone
Each additional branch also affects the predictability of other branches, it
may cause them to be evicted from the cache.
Yes. That is one reason why predicting the performance of any piece of code
from first principles (i.e. without measurement) has gone from perfectly possible
if quite tedious in the 1960s, to effectively impossible in the last two or three decades.
Post by Paul Bone
From what I actually know about Boehm GC checking the free lists should be a
few (dependent) memory reads and a well-predicted branch on the fast path.
It's important that they are dependent reads (it does pointer following) so
it may create stalls. Each time a free list is populated it is done so by
lazy scanning of a memory block.
That is sequential access. Modern CPUs have circuits to detect sequential access
patterns, even when the programs is following several independent sequential
access patterns at once (e.g. when reading from two vectors to add them),
and they try to prefetch the contents of the next several addresses in each stream
so that it is in the cache (preferably L1) by the time the program asks for it.

In this case, for each of the most popular allocation sizes, such prefetches will likely
pull into the cache the next several words after the current allocation point
in the page at the head of the freelist. So while these reads may be dependent,
they are not that likely to be slow.
Post by Paul Bone
That's been my understanding for a while. But I wasn't aware of Nick's
paper, is this the one? http://dotat.at/tmp/msp02.pdf
The paper I remember was similar but different. I don't know if that means
I was thinking of a different paper, or remembered this one incorrectly.
Or it may have been a post on his blog; he has some good ones.

Zoltan.
Paul Bone
2017-03-09 02:49:55 UTC
Permalink
Post by Zoltan Somogyi
Post by Paul Bone
Post by Zoltan Somogyi
For those, I am pretty sure it is a bad idea. Suppose you have a 1% chance
of accidentally using deep equality instead of pointer equality. Then it may be
that 0.5% of the time you compare two 100-byte structures for equality
instead of two pointers, and 0.5% of the time you compare two 10 megabyte
structures for equality instead of two pointers. The cost of that second 0.5%
will be *way* more than what can be possibly justified by the amount of allocation
being saved.
Perhaps I'm wrong but I had assumed that a normal equality test (such as
unification) will first compare pointers, and then do the deep tests only if
the pointers are unequal, and short ciruit as soon as two fields are
unequal. And that it will do this transitively.
That is correct.
Post by Paul Bone
So that the actual /
amortized cost is low or trivial.
Unfortunately, that does not follow. The bitwise equality tests on pointers
may reduce the cost of the equality test from traversing 10 Mb to traversing
(say) 10 Kb, but that cost is still *much* bigger than benefit you are trying to gain,
which is comparable to the cost of traversing a few tens of bytes. And
in the absence of a conditional update operator, those pointer equality tests
will tend to be less effective, because programmers will be more likely to write
traversals that don't preserve bitwise equality when the "updated" version
of a sub data structure is semantically equal to its old version.
That's about the order of magnitude that I would have guessed. But my guess
is that most structures arn't that big. But this is far to many guesses
already, I'm not prepared make a bet or any further guesses.
Post by Zoltan Somogyi
Post by Paul Bone
Either way, I don't think this should be the normal comparison used by a :?=
operator. But it might make sense that this is available to developers who
want it. Of course they could write it manually, I don't think it would be
used very often relative to the your proposal with pointer equality.
In the absence of a convincing use case, I won't implement it.
It makes sense when some inner part of the traversal uses unconditional
update and most of the time returns semantically identical data. Then one
or more conditional updates in an outer part of the traversal use pointer
equality and allocate memory when they didn't need to, because the pointers
are unequal when the data is equal.

However if conditional update is as easy as s/:=/:?=/ then that's not
likely: the inner traversal should have used a conditional update.
Post by Zoltan Somogyi
Post by Paul Bone
The other developers will know more than
me about this, but AIUI there's less need for this in Prince than the
Mercury compiler. Also sometimes when we manipulate tree-like structures
(the DOM) we do so using mutvars, because the structures have cycles that
are easier to manipulate with mutvars.
I see. This proposal will do nothing to help code using mutvars, but you know that.
Yep, I mention it because it's one way that Prince is unlike Mercury with
respect to this kind of code.
Post by Zoltan Somogyi
Post by Paul Bone
From what I actually know about Boehm GC checking the free lists should be a
few (dependent) memory reads and a well-predicted branch on the fast path.
It's important that they are dependent reads (it does pointer following) so
it may create stalls. Each time a free list is populated it is done so by
lazy scanning of a memory block.
That is sequential access. Modern CPUs have circuits to detect sequential access
patterns, even when the programs is following several independent sequential
access patterns at once (e.g. when reading from two vectors to add them),
and they try to prefetch the contents of the next several addresses in each stream
so that it is in the cache (preferably L1) by the time the program asks for it.
In this case, for each of the most popular allocation sizes, such prefetches will likely
pull into the cache the next several words after the current allocation point
in the page at the head of the freelist. So while these reads may be dependent,
they are not that likely to be slow.
That makes sense.
Post by Zoltan Somogyi
Post by Paul Bone
That's been my understanding for a while. But I wasn't aware of Nick's
paper, is this the one? http://dotat.at/tmp/msp02.pdf
The paper I remember was similar but different. I don't know if that means
I was thinking of a different paper, or remembered this one incorrectly.
Or it may have been a post on his blog; he has some good ones.
Cool.
--
Paul Bone
http://paul.bone.id.au
Continue reading on narkive:
Loading...