[m-dev.] Array modes

Discussion:

Peter Wang

2014-10-24 05:06:50 UTC

Hi,

What do you think about changing the modes in the array module along the
lines shown in the attached file? Note that we don't need to use
constrained inst vars; they can be replaced by `ground' and `unique'.

The first change is to bring back the bogus insts while (hopefully)
avoiding the problem that caused them to be removed in commit 67bcaf3.
See also
http://www.mercurylang.org/list-archives/developers/2007-February/014642.html

The [intended] modes in the existing array.m don't make sense to me,
e.g.

:- pred lookup(array(T), int, T).
%:- mode lookup(array_ui, in, out) is det.

At the end of the call, the array is not unique in its elements as the
third argument shares a reference with one of its elements.

:- pred set(int, T, array(T), array(T)).
:- mode set(in, in, array_di, array_uo) is det.

Placing the non-unique value into the array makes it not unique in its
elements.

Peter

Zoltan Somogyi

2014-10-24 08:52:26 UTC

Permalink

Post by Peter Wang
The [intended] modes in the existing array.m don't make sense to me,
e.g.
:- pred lookup(array(T), int, T).
%:- mode lookup(array_ui, in, out) is det.
At the end of the call, the array is not unique in its elements as the
third argument shares a reference with one of its elements.
:- pred set(int, T, array(T), array(T)).
:- mode set(in, in, array_di, array_uo) is det.
Placing the non-unique value into the array makes it not unique in its
elements.

In the insts array_{ui,di,uo}, the uniqueness was always only intended
to refer to the memory of the array itself, not to the uniqueness of
the elements. The main concern of the array operations is: do they
have to copy an array if you want a modified version of it, or can they
do the update in place?

Preserving information about the uniqueness of the array elements
was not a design concern at the time, simply because even if you could
preserve that information, the compiler could not exploit it.

Given this view, I don't see anything wrong with the current modes
in array.m. It does seem that the documentation could explain the above
better.

Zoltan.

Mark Brown

2014-10-25 04:35:25 UTC

Permalink

On Fri, Oct 24, 2014 at 7:52 PM, Zoltan Somogyi

Post by Zoltan Somogyi

In the insts array_{ui,di,uo}, the uniqueness was always only intended
to refer to the memory of the array itself, not to the uniqueness of
the elements. The main concern of the array operations is: do they
have to copy an array if you want a modified version of it, or can they
do the update in place?
Preserving information about the uniqueness of the array elements
was not a design concern at the time, simply because even if you could
preserve that information, the compiler could not exploit it.
Given this view, I don't see anything wrong with the current modes
in array.m. It does seem that the documentation could explain the above
better.

The comments here are particularly misleading, I think:

% :- inst uniq_array == uniq_array(unique).
:- inst uniq_array == uniq_array(ground). % XXX work-around

The latter is what is intended; it is not a workaround.

Cheers,
Mark.

Mark Brown

2014-10-25 04:32:10 UTC

Permalink

Hi Peter,

Post by Peter Wang
Hi,
What do you think about changing the modes in the array module along the
lines shown in the attached file? Note that we don't need to use
constrained inst vars; they can be replaced by `ground' and `unique'.

The `unique' versions would be useful for arrays of arrays. For three
dimensions, `uniq_array(unique)' would be what you want, and so on. In
practice you could get away with a small number, but constrained inst
vars would be a neater and more general solution. They would also
allow arrays of predicates to be conveniently used. See further
comments below, though.

BTW, I noticed your recent changes to better support constrained inst
vars (I don't presently subscribe to the reviews list, but I scan the
archives from time to time). The changes looked the right idea to me
(good work!), but I can review it in more detail if anyone is still
interested.

Post by Peter Wang
The first change is to bring back the bogus insts while (hopefully)
avoiding the problem that caused them to be removed in commit 67bcaf3.
See also
http://www.mercurylang.org/list-archives/developers/2007-February/014642.html

I think these "bogus" constructors are really useful for foreign types
that act as containers for Mercury terms, of which arrays are one
example, so I don't agree with disallowing them entirely as is
suggested in that thread. But the objection, which is reasonable, is
that what the insts are saying isn't really true, and the compiler
quite rightfully draws its bad conclusions from the faulty premises it
is given.

So can bogus constructors be made a formal part of the language? The
meaning of them would be that they represent foreign constructors that
can contain zero or more Mercury terms matching each argument inst.
The compiler would need to avoid making assumptions about the
determinism of unifications involving these constructors, but nothing
special would need to be declared - just treat a inst's constructor as
a bogus constructor whenever it doesn't match a constructor from a
visible type. If any of an inst's constructors is a bogus constructor
then they all must be (that would catch typo errors).

If this is practical to implement I think it would be a better
solution to do this, and go back to the original array inst
definition. Also think of a better name than bogus.

Excellent point. Although, as Zoltan explained, the mode is correct,
this interface is simply not suitable for arrays of arrays or other
unique terms, for the reasons you mention. Anybody using it like that
will have to rewrite their code if the array module is changed to use
the proposed modes.

I would suggest adding appropriate support for arrays of arrays as
soon as possible, even though it isn't needed in practice right away.
For example:

% replace(I, NewT, OldT, !Array):
%
% Replace the (possibly unique) element OldT with NewT at index I.
% This can be used to update a unique element without losing the
% uniqueness of the other elements, by temporarily replacing it with
% a dummy value.
%
:- pred replace(int, T, T, array(T), array(T)).
:- mode replace(in, in, out, array_di, array_uo) is det.
:- mode replace(in, di, uo,
uniq_array(unique) >> dead, free >> uniq_array(unique)) is det.

% lookup2(Array, I, J) = Elem:
%
% Lookup an element in a 2D ragged array.
%
:- func lookup2(array(array(T))::in(uniq_array(uniq_array)),
int::in, int::in)
= (T::out) is det.

Similarly for lookup3, etc, if desired. A more general solution again
requires inst vars:

% extract_array(I, Elem, !Array):
%
% Lookup an element which is itself an array. The element is replaced
% with an empty array by the call, so it will generally need
to be set again
% after it is finished with. This can be applied repeatedly to
look up arrays
% of any dimension.
%
:- pred extract_array(int, array(T), array(array(T)), array(array(T))).
:- mode extract_array(in, free >> uniq_array(I),
uniq_array(uniq_array(I)) >> dead, free >>
uniq_array(uniq_array(I))) is det.

Post by Peter Wang
:- pred set(int, T, array(T), array(T)).
:- mode set(in, in, array_di, array_uo) is det.

This would need more modes but could otherwise stay the same.

Talking of inst 'unique', it seems to be confusing whether this refers
a term whose nodes are all unique, or a term whose top node is unique
and all other nodes are ground. The reference manual says:

Mercury also provides “unique” insts ‘unique’ and ‘unique(…)’ which are like
‘ground’ and ‘bound(…)’ respectively, except that they carry the additional
constraint that there can only be one reference to the corresponding value.

I read this as saying that only the top node is unique. But it used to
be commonly understood as meaning that all nodes are unique, and that
is what is implemented in compiler/inst_match.m. Which is the correct
meaning of 'unique'?

I think it ought to mean just that the top node is unique. The
implementation change could be something as simple as the diff below
(not fully tested). The only purpose I can think of for the other
meaning of `unique' is as the mode for unsafe_promise_unique, but
maybe that's another thing better done with inst vars anyway:

:- pred unsafe_promise_inst(T::in, T::out(I)) is det.

Opinions?

Cheers,
Mark.

diff --git a/compiler/inst_match.m b/compiler/inst_match.m
index 8c551cc..075d464 100644
--- a/compiler/inst_match.m
+++ b/compiler/inst_match.m
@@ -700,9 +700,7 @@ inst_matches_initial_4(InstA, InstB, MaybeType, !Info) :-
InstB = ground(UniqB, none),
compare_uniqueness(!.Info ^ imi_uniqueness_comparison, UniqA, UniqB),
inst_results_bound_inst_list_is_ground_mt(InstResultsA, BoundInstsA,
- MaybeType, !.Info ^ imi_module_info),
- compare_bound_inst_list_uniq(!.Info ^ imi_uniqueness_comparison,
- BoundInstsA, UniqB, !.Info ^ imi_module_info)
+ MaybeType, !.Info ^ imi_module_info)
;
InstA = bound(Uniq, InstResultsA, BoundInstsA),
InstB = abstract_inst(_,_),
@@ -734,7 +732,7 @@ inst_matches_initial_4(InstA, InstB, MaybeType, !Info) :-
compare_uniqueness(!.Info ^ imi_uniqueness_comparison, UniqA, UniqB),
bound_inst_list_is_complete_for_type(set.init,
!.Info ^ imi_module_info, BoundInstsB, Type),
- ground_matches_initial_bound_inst_list(UniqA, BoundInstsB, yes(Type),
+ ground_matches_initial_bound_inst_list(shared, BoundInstsB, yes(Type),
!Info)
;
InstA = ground(UniqA, HOInstInfoA),
@@ -1168,9 +1166,7 @@ inst_matches_final_3(InstA, InstB, MaybeType, !Info) :-
InstB = ground(UniqB, none),
unique_matches_final(UniqA, UniqB),
inst_results_bound_inst_list_is_ground_mt(InstResultsA, BoundInstsA,
- MaybeType, !.Info ^ imi_module_info),
- bound_inst_list_matches_uniq(BoundInstsA, UniqB,
- !.Info ^ imi_module_info)
+ MaybeType, !.Info ^ imi_module_info)
;
InstA = ground(UniqA, HOInstInfoA),
InstB = any(UniqB, HOInstInfoB),
@@ -1184,7 +1180,7 @@ inst_matches_final_3(InstA, InstB, MaybeType, !Info) :-
unique_matches_final(UniqA, UniqB),
inst_results_bound_inst_list_is_ground_mt(InstResultsB, BoundInstsB,
MaybeType, ModuleInfo),
- uniq_matches_bound_inst_list(UniqA, BoundInstsB, ModuleInfo),
+ uniq_matches_bound_inst_list(shared, BoundInstsB, ModuleInfo),
(
MaybeType = yes(Type),
% We can only do this check if the type is known.

Zoltan Somogyi

2014-10-25 05:20:52 UTC

Permalink

Post by Mark Brown
I think these "bogus" constructors are really useful for foreign types
that act as containers for Mercury terms, of which arrays are one
example, so I don't agree with disallowing them entirely as is
suggested in that thread.

I have long thought that having insts that are not bound to a specific type
is a bad idea. I would even support a change in the language to require
each inst definition to say what type it is for.

Post by Mark Brown
So can bogus constructors be made a formal part of the language?

I would need to see a more compelling use case than this.

Post by Mark Brown
Excellent point. Although, as Zoltan explained, the mode is correct,
this interface is simply not suitable for arrays of arrays or other
unique terms, for the reasons you mention.

Arrays of unique terms and N-dimensional arrays for N>1 may look like
they have similar requirements, but they don't. For one thing, operations
on array slices apply only to the latter.

There are several languages out there that have only 1D arrays,
and implement 2D, 3D etc arrays as nested 1D arrays. Depending
on the language, these usually have problems such as

- dictating implementation choices (e.g. dope vectors), which often
turn out to be unsuited to high performance on modern hardware,

- posing semantic problems, such as an inability to ensure that
different rows have the same number of columns.

Post by Mark Brown
I would suggest adding appropriate support for arrays of arrays as
soon as possible, even though it isn't needed in practice right away.

We already have array2d.m in the library. If it is missing some
functionality you need, adding it to array2d.m to should be simpler
than using nested 1D arrays. Similarly, I wouldn't object to adding
array3d.m., if needed.

Post by Mark Brown
Talking of inst 'unique', it seems to be confusing whether this refers
a term whose nodes are all unique, or a term whose top node is unique
Mercury also provides “unique” insts ‘unique’ and ‘unique(…)’ which are like
‘ground’ and ‘bound(…)’ respectively, except that they carry the additional
constraint that there can only be one reference to the corresponding value.
I read this as saying that only the top node is unique. But it used to
be commonly understood as meaning that all nodes are unique, and that
is what is implemented in compiler/inst_match.m. Which is the correct
meaning of 'unique'?
I think it ought to mean just that the top node is unique.

You can control what goes into the manual, but you cannot control
how people speak. Even the same person will often use "unique"
to mean different things in different contexts.

In the original research on mode systems that inspired Mercury,
I talked about types representing AND-OR trees: types (the OR nodes)
representing choices between function symbols, and function symbols
(the AND nodes) having several arguments. Each mode says,
which each OR node: is it the caller or the callee that makes the
choice here?

Uniqueness can be thought of the same way: it is a choice that
can be made separately for each OR node. The choice is: is the memory
occupied by the arguments of the chosen function symbol referred to
through more than one pointer? Or, in the presence of alias tracking,
can the compiler can see all the pointers?

If the type tree you are talking about has only one OR node,
then the notions of "all OR nodes are unique" and "this OR node
is unique" coincide. If it is has two or more OR nodes, they don't.

Zoltan.

Julien Fischer

2014-10-25 05:28:27 UTC

Permalink

Hi,

Post by Zoltan Somogyi

I have long thought that having insts that are not bound to a specific type
is a bad idea. I would even support a change in the language to require
each inst definition to say what type it is for.

As would I. I don't think it's going to be possible to ever clear up
<https://www.mercurylang.org/bugs/view.php?id=89> without such a change.

Cheers,
Julien.

Mark Brown

2014-10-25 07:26:47 UTC

Permalink

Post by Julien Fischer
Hi,

Post by Zoltan Somogyi

I have long thought that having insts that are not bound to a specific type
is a bad idea. I would even support a change in the language to require
each inst definition to say what type it is for.

That's fine. In this case the inst would be bound to the foreign type
array/1, and be immediately recognised as a "bogus" constructor.

Post by Julien Fischer
As would I. I don't think it's going to be possible to ever clear up
<https://www.mercurylang.org/bugs/view.php?id=89> without such a change.

Foreign types never have type_infos inserted anywhere. So you won't
hit this bug if there is an inst for a foreign type.

Cheers,
Mark.

Peter Wang

2014-10-27 01:05:01 UTC

Permalink

Post by Mark Brown

Hi,

Post by Zoltan Somogyi

I have long thought that having insts that are not bound to a specific type
is a bad idea. I would even support a change in the language to require
each inst definition to say what type it is for.

That's fine. In this case the inst would be bound to the foreign type
array/1, and be immediately recognised as a "bogus" constructor.

I agree with requiring types on insts as well.

An interesting use for bogus insts may be to track the state of
foreign type handles, e.g.

:- type db.
:- pragma foreign_type("C", db, "DB *").

:- type db_state
---> open
; txn.

:- pred open(db::out(unique(open))) is det.
:- pred begin_transaction(db::di(unique(open)), db::out(unique(txn)))
is det.
:- pred write(db::di(unique(txn)), db::out(unique(txn))) is det.
:- pred end_transaction(db::di(unique(txn)), db::out(unique(open)))
is det.
:- pred close(db::di(unique(open))) is det.

With nested uniqueness you could wrap the handle with a Mercury type
instead.

Peter

Julien Fischer

2014-10-27 12:10:19 UTC

Permalink

Hi Mark,

Post by Mark Brown

Post by Julien Fischer
Hi,

Post by Zoltan Somogyi

I have long thought that having insts that are not bound to a specific type
is a bad idea. I would even support a change in the language to require
each inst definition to say what type it is for.

That's fine. In this case the inst would be bound to the foreign type
array/1, and be immediately recognised as a "bogus" constructor.

Post by Julien Fischer
As would I. I don't think it's going to be possible to ever clear up
<https://www.mercurylang.org/bugs/view.php?id=89> without such a change.

Foreign types never have type_infos inserted anywhere. So you won't
hit this bug if there is an inst for a foreign type.

I know. I was just mentioning this as another reason for requiring
insts to be typed.

Cheers,
Julien.

Mark Brown

2014-10-25 07:06:10 UTC

Permalink

On Sat, Oct 25, 2014 at 4:20 PM, Zoltan Somogyi

Post by Zoltan Somogyi

I have long thought that having insts that are not bound to a specific type
is a bad idea. I would even support a change in the language to require
each inst definition to say what type it is for.

Post by Mark Brown
So can bogus constructors be made a formal part of the language?

I would need to see a more compelling use case than this.

I can't think of anything more compelling than arrays of arrays (i.e.,
ragged arrays).

Post by Zoltan Somogyi

Post by Mark Brown
I would suggest adding appropriate support for arrays of arrays as
soon as possible, even though it isn't needed in practice right away.

Those are rectangular arrays. I don't think ragged arrays belong there.

Post by Zoltan Somogyi

You can control what goes into the manual, but you cannot control
how people speak. Even the same person will often use "unique"
to mean different things in different contexts.
In the original research on mode systems that inspired Mercury,
I talked about types representing AND-OR trees: types (the OR nodes)
representing choices between function symbols, and function symbols
(the AND nodes) having several arguments. Each mode says,
which each OR node: is it the caller or the callee that makes the
choice here?
Uniqueness can be thought of the same way: it is a choice that
can be made separately for each OR node. The choice is: is the memory
occupied by the arguments of the chosen function symbol referred to
through more than one pointer? Or, in the presence of alias tracking,
can the compiler can see all the pointers?
If the type tree you are talking about has only one OR node,
then the notions of "all OR nodes are unique" and "this OR node
is unique" coincide. If it is has two or more OR nodes, they don't.

Yes, I think I understand. When I said "node" above, I was referring
to OR nodes in this sense. For the types io and store, there is only
one OR node, so my question would make no difference. Likewise, with
arrays a black box as far as the mode system is concerned, the
question doesn't make any difference either.

But for Peter's proposal it does make a difference, in particular with
the meaning of the I =< unique constraint. So what I'm asking is, is
it true (according to the intent of the reference manual) that
uniq_array(ground) =< unique? Yes if 'unique' means only the top node
is unique, no if it means all nodes are unique.

Cheers,
Mark.

Peter Wang

2014-10-25 08:33:13 UTC

Permalink

Post by Mark Brown
But for Peter's proposal it does make a difference, in particular with
the meaning of the I =< unique constraint. So what I'm asking is, is
it true (according to the intent of the reference manual) that
uniq_array(ground) =< unique? Yes if 'unique' means only the top node
is unique, no if it means all nodes are unique.

I understood `unique' to mean unique at every node, by the analogy with
`ground' meaning bound at every node. Page 59 of dmo's thesis supports
this; at least, I can't read it any other way.

If that's so, I agree that uniq_array(ground) =< unique is false,
and the constraint I =< unique is overly restrictive.

Peter

Zoltan Somogyi

2014-10-25 09:24:50 UTC

Permalink

Post by Peter Wang
I understood `unique' to mean unique at every node, by the analogy with
`ground' meaning bound at every node. Page 59 of dmo's thesis supports
this; at least, I can't read it any other way.

People who want to know what "unique" means won't be looking
in a thesis.

The problem is that I don't think the original question has
a clearcut answer either way. Because we don't have alias
tracking, uniqueness has only really been useful for values
of types which have only one OR node. For these, as I said,
the two notions of uniqueness coincide, and there simply
hasn't been a need to distinguish between them. The only
exception to this that I can think of right now is the work
on ctgc, which manipulates insts it discovers and creates
automatically, not insts written by a programmer. The
mapping between compiler-internal insts and how insts
appear in source code therefore isn't very relevant
there either.

Zoltan.

Peter Wang

2014-10-27 00:09:27 UTC

Permalink

Post by Zoltan Somogyi

People who want to know what "unique" means won't be looking
in a thesis.

It should be stated in the reference manual.

Post by Zoltan Somogyi
The problem is that I don't think the original question has
a clearcut answer either way. Because we don't have alias
tracking, uniqueness has only really been useful for values
of types which have only one OR node. For these, as I said,
the two notions of uniqueness coincide, and there simply
hasn't been a need to distinguish between them. The only
exception to this that I can think of right now is the work
on ctgc, which manipulates insts it discovers and creates
automatically, not insts written by a programmer. The
mapping between compiler-internal insts and how insts
appear in source code therefore isn't very relevant
there either.

I don't know how relevant these are either, then. Nonetheless, it
answers Mark's question about what is the intended meaning of unique:

http://www.mercurylang.org/list-archives/users/2002-May/001749.html
http://www.mercurylang.org/list-archives/users/2005-September/003264.html

Peter