[m-dev.] a conditional field update operator

Discussion:

Zoltan Somogyi

2017-03-08 03:41:19 UTC

One of the changes I made to simplify_info operations was to
replace code like this:

simplify_info_set_varset(X, !Info) :-
!Info ^ simp_varset := X.

with code like this:

simplify_info_set_varset(X, !Info) :-
( if private_builtin.pointer_equal(X, !.Info ^ simp_varset) then
true
else
!Info ^ simp_varset := X
).

The two are semantically equivalent, but different in performance.
The first never pays the cost of a test, but always pays the cost of
allocating a new structure, while for the second, things are the other
way around. This means that performance-wise, which one you want
depends on the probability of the new value of the field being different
from its old value, and on the size of the structure as a whole.

In many cases, the second approach is appropriate, but at the moment,
it takes five lines of code instead of one. I was thinking that we could
have a new operator, maybe ::= or :?=, that would be like :=, but
would create a new structure only if the new value of the field was
different from the old value. Then the compiler would expand the one line
to the five lines version internally.

Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?

Zoltan.

Michael Day

2017-03-08 03:45:53 UTC

Permalink

Hi Zoltan,

Post by Zoltan Somogyi
In many cases, the second approach is appropriate, but at the moment,
it takes five lines of code instead of one. I was thinking that we could
have a new operator, maybe ::= or :?=, that would be like :=, but
would create a new structure only if the new value of the field was
different from the old value. Then the compiler would expand the one line
to the five lines version internally.

I would be tempted to just make := always do the pointer equality test
and omit it if it's obviously going to fail, eg. if the right-hand side
is a newly constructed cell that can't possibly be equal.

Michael

--
Prince: Print with CSS!
http://www.princexml.com

Zoltan Somogyi

2017-03-08 04:13:16 UTC

Permalink

Post by Michael Day
Hi Zoltan,

In most of the cases in which we now use the test-and-only-assign-if-
the-old-and-new-values-are different technique, or conditional-assign
for short, the new value is constructed somewhere else: either in the caller
of the current procedure, or in one of its callees. In procedures that *do*
decide the new value of the field themselves, the code already tends to
follow the pattern of

( if ... some test ... then
compute new value
assign the new value to !Info ^ fieldname (unconditionally)
else
don't assign to !Info ^ fieldname at all
)

In this case, the old and new values of the field will be the same only
by accident, and in many cases we (humans) can rule out the accident,
because often the new value is computed from the old value by an
operation whose output is necessarily different than its input (e.g.
appending an element to the front of a list). Having the computer
arrive at the same decision would, in many cases, require nontrivial
program analysis.

The reason why I want to preserve the ability of programmers
to ask for the current behavior of := is that in cases like this,
where you know that the "are the old and new values the same?"
test won't succeed often, or at all, the cost of the test is effectively
"distributed fat": a cost imposed on one part of the program
just because *another* part of the program may benefit from it.
Avoiding distributed fat whereever possible is one of the core principles
of the project. In this case, avoiding it is trivial.

One other consideration is that code that updates more than one field,
such as

!Info ^ field1 := F1,
!Info ^ field2 := F2

can be expanded one goal at a time. The parser will expand this
into code that constructs Info1 from Info0, and then Info2 from Info1,
and the common struct optimization pass will come along later,
process the resulting conjunction of unifications, and construct
Info2 directly from Info0, optimizing away the materialization of Info1.

For a sequence of two or more *conditional* assignments to fields,
this won't work, because the expansion of a conditional assignment
is not a flat conjunction of unifications. Either the parser must expand
adjacent field updates to the same state variable together (even if
the code does not use state variable syntax), or we need to make
the common struct optimization understand the more complex
code structures that result from their one-by-one expansion.

Does Prince make extensive use of conditional assignment?
Was that the motivation behind your proposal?

Zoltan.

Michael Day

2017-03-08 05:46:38 UTC

Permalink

Hi Zoltan,

Post by Zoltan Somogyi
The reason why I want to preserve the ability of programmers
to ask for the current behavior of := is that in cases like this,
where you know that the "are the old and new values the same?"
test won't succeed often, or at all, the cost of the test is effectively
"distributed fat": a cost imposed on one part of the program
just because *another* part of the program may benefit from it.
Avoiding distributed fat whereever possible is one of the core principles
of the project. In this case, avoiding it is trivial.

That seems reasonable.

Post by Zoltan Somogyi
One other consideration is that code that updates more than one field,
such as
!Info ^ field1 := F1,
!Info ^ field2 := F2
can be expanded one goal at a time. The parser will expand this
into code that constructs Info1 from Info0, and then Info2 from Info1,
and the common struct optimization pass will come along later,
process the resulting conjunction of unifications, and construct
Info2 directly from Info0, optimizing away the materialization of Info1.

Also very reasonable.

Post by Zoltan Somogyi
Does Prince make extensive use of conditional assignment?
Was that the motivation behind your proposal?

Some of our conditional assignments would need an explicit test anyway
as they have to do other operations if the test fails:

( if LineJoin0 = LineJoin then
Gs = Gs0
else
Gs = Gs0 ^ gs0 ^ stroke_linejoin := LineJoin,
(
LineJoin = miter,
write_op(Buf, [int(0)], "j", !IO)
;
LineJoin = round,
write_op(Buf, [int(1)], "j", !IO)
;
LineJoin = bevel,
write_op(Buf, [int(2)], "j", !IO)
)
)

Our CSS style processing code goes to some effort to avoid allocating
structures that haven't changed, but I think it wouldn't get any value
from this pointer equality test unless we canonicalised more values
deeper in the struct; currently we only canonicalise the whole thing at
the top level instead of each subfield.

There may be other places where we do conditional assignment (mostly on
float values not pointers I suspect), but probably not a sufficient
number to justify doing the test for every assignment.

Michael

--
Prince: Print with CSS!
http://www.princexml.com

Zoltan Somogyi

2017-03-08 05:57:56 UTC

Permalink

Post by Michael Day
Our CSS style processing code goes to some effort to avoid allocating
structures that haven't changed, but I think it wouldn't get any value
from this pointer equality test unless we canonicalised more values
deeper in the struct; currently we only canonicalise the whole thing at
the top level instead of each subfield.

I understand; there are several analogous situations in the Mercury compiler.

That is part of what I was trying to get at. As you say, conditional assignment
at any level of a data structure works only if you do it consistently at all the
levels below it; if you don't, then the new value of a field may be semantically
equal to the old value, but bitwise different from it. So to apply conditional
update to the transformation of a whole data structure, you have to do it
for the transformation of *every part*. This means that at the moment,
if you have code that does a traversal with unconditional update, your only
choices are

- replace every line where a part of the data structure is updated with *five*
lines of code, greatly increasing the size of the code and making it harder
to read, or

- give up on conditional update.

With the operator I propose, you would have a third option:

- replace every use of := with the new operator, leaving the code effectively
the same size, and with the same readibility.

Zoltan.

Mark Brown

2017-03-08 07:01:50 UTC

Permalink

Hi everyone,

On Wed, Mar 8, 2017 at 4:57 PM, Zoltan Somogyi

Post by Zoltan Somogyi
That is part of what I was trying to get at. As you say, conditional assignment
at any level of a data structure works only if you do it consistently at all the
levels below it; if you don't, then the new value of a field may be semantically
equal to the old value, but bitwise different from it. So to apply conditional
update to the transformation of a whole data structure, you have to do it
for the transformation of *every part*.

For me, this is the clincher.

The field update syntax can already be redefined to do any checks that
the user wants, including checking a condition before updating. You
can even pass extra parameters in that determine what kind of test you
want and use multiple modes to remove tests at compile time, as in the
example code below. But it won't do the right thing for updates of
nested fields. So I think some new syntax to support conditional
updates of nested fields is warranted.

One question: is this syntax something that can be redefined, like the
existing field syntax, or does it always compile down to calls to the
unconditional field updates (which may themselves be redefined), or
does it always compile to an ordinary construction like the default
field update functions? I vote that it compiles down to unconditional
field updates, viz:

!Term ^ field_list ?= FieldValue

becomes

( if private_builtin.pointer_equal(Term ^ field_list, FieldValue) then
true
else
!Term ^ field_list := FieldValue
)

Then this code should call the user defined unconditional field
update, if present.

Mark

--
Example code mentioned above:

:- type s ---> s(
f1 :: int,
f2 :: string
).

:- type update_when
---> always
; ptr_equal.

:- inst always ---> always.
:- inst ptr_equal ---> ptr_equal.

:- func s ^ f2(update_when) := string = s.
:- mode in ^ f2(in) := in = out is det.
:- mode in ^ f2(in(always)) := in = out is det.
:- mode in ^ f2(in(ptr_equal)) := in = out is det.

S ^ f2(When) := Name =
( if test_when(When, S ^ f2, Name) then S ^ f2 := Name else S ).

:- pred test_when(update_when, T, T).
:- mode test_when(in(always), in, in) is det.
:- mode test_when(in(ptr_equal), in, in) is semidet.
:- mode test_when(in, in, in) is semidet.

test_when(always, _, _).
test_when(ptr_equal, X, Y) :- private_builtin.pointer_equal(X, Y).

Zoltan Somogyi

2017-03-08 07:24:47 UTC

Permalink

Post by Mark Brown
One question: is this syntax something that can be redefined, like the
existing field syntax, or does it always compile down to calls to the
unconditional field updates (which may themselves be redefined), or
does it always compile to an ordinary construction like the default
field update functions?

Let me reformat that to number the alternatives:

1: is this syntax something that can be redefined, like the existing field syntax, or

2: does it always compile down to calls to the unconditional field updates (which may
themselves be redefined), or

3: does it always compile to an ordinary construction like the default
field update functions?

I agree with your vote: I also prefer 2. I think the only advantage of 1 over
the status quo is that you can avoid the When parameter in your example,
which I think is too minor an advantage to justify the work needed for
any change, while 3 unnecessarily throws away existing generality
whose implementation should be readily reusable (though I haven't looked
at the relevant code in a long time).

Post by Mark Brown
I vote that it compiles down to unconditional
!Term ^ field_list ?= FieldValue
becomes
( if private_builtin.pointer_equal(Term ^ field_list, FieldValue) then
true
else
!Term ^ field_list := FieldValue
)
Then this code should call the user defined unconditional field
update, if present.

Agreed for the transformation, though we may need to do something,
such a wrapping a new scope around the if-then-else, to simplify
the optimization of several consecutive conditional field updates
of the same structure, in the usual case of no user-defined field
update function.

I would however vote against using ?= as the conditional update operator.
Its form conveys "condition" well to the reader, but to me, it does NOT
suggest "update".
Is this preexisting code, or did you write it for this post?

Zoltan.

Mark Brown

2017-03-08 08:37:33 UTC

Permalink

Hi,

On Wed, Mar 8, 2017 at 6:24 PM, Zoltan Somogyi

Post by Zoltan Somogyi
I would however vote against using ?= as the conditional update operator.
Its form conveys "condition" well to the reader, but to me, it does NOT
suggest "update".

I misread it as ?= in the OP :-/

Post by Zoltan Somogyi
Is this preexisting code, or did you write it for this post?

It was preexisting but never used, precisely because it doesn't
completely work for nested fields.

Mark

Paul Bone

2017-03-08 05:02:11 UTC

Permalink

Post by Zoltan Somogyi
One of the changes I made to simplify_info operations was to
simplify_info_set_varset(X, !Info) :-
!Info ^ simp_varset := X.
simplify_info_set_varset(X, !Info) :-
( if private_builtin.pointer_equal(X, !.Info ^ simp_varset) then
true
else
!Info ^ simp_varset := X
).
Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?

Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.

Since this would be very rarely used I'd hesitate to give it a
symbol/operator. I'd prefer to give it a name, it may make it more awkward
to use but it's an operator that can be defined for something else later (if
necessary).

update_field_if_not_pointer_equal(!Info ^ simp_varset, X),

Hrm, it's a mouthful, just an idea.

--
Paul Bone
http://paul.bone.id.au

Zoltan Somogyi

2017-03-08 05:38:52 UTC

Permalink

Post by Paul Bone

Post by Zoltan Somogyi
Do people think such an extension to field syntax would be useful
more generally than just in the Mercury compiler? If so, what should
the syntax be?

Depending on the code that creates the new value (X), it might be more
appropriate to use a normal equality test. It probably makes sense to
support both equality tests.

pointer_equal is a builtin that compares its operands as bit vectors;
basically, it checks if the update would yield a new structure that is
bit-identical to the old one. I don't see any situation in which it is *not*
a correct test to use. (For integer and boolean fields, we often use
New = Old as the test, but only because it is easier to type and to read;
it is semantically identical to the pointer_equal test.)

Post by Paul Bone
Since this would be very rarely used I'd hesitate to give it a
symbol/operator.

The reasons why I am proposing a new operator is precisely because

- this is the nth time we have found a need for it in the compiler,
for largish values of n, and

- I think it is likely that in many cases, it is only the size and (at the moment)
relatively hard-to-read nature of the five-line version that prevents it
from being used instead of the one-line, always-allocate-a-new-structure
version. For example, the compiler has many passes that transform
parts of the HLDS *slightly*, leaving many or most parts of it intact.
If we had the new operator, we could use it to significantly reduce
the amount of memory being allocated by such passes.

Is your argument that you think such "rebuild with some rare modifications"
operations are rare in general? Mike's previous mail seems to imply that
he thinks just the opposite.

Post by Paul Bone
I'd prefer to give it a name, it may make it more awkward
to use but it's an operator that can be defined for something else later (if
necessary).
update_field_if_not_pointer_equal(!Info ^ simp_varset, X),

I don't think "not adding another operator" is a worthwhile goal,
and I don't see what else your proposal would accomplish.

If you make conditional-update use a procedure call as its syntax,
then either

- its implementation is also a procedure call, in which case
it is unnecessarily slow; or

- its implementation is not a procedure call but inline code,
it which case the compiler would have to have completely new code
to make the transformation.

Using the same overall syntax as !Info ^ field := X but with another
operators is easier for us as implementors. And for users, the easiest
way to make conditional-update easy to use, and to make switching
from always-update to conditional-update, or vice versa, is to make
condition-update a one-character change from the existing := operator.

Zoltan.

Paul Bone

2017-03-08 05:55:20 UTC