winterkoninkje: shadowcrane (clean) (Default)

There's a conceptual problem that's been plaguing me with Eng. For a first writing on the language, this isn't the area I would normally choose. But the most logical starting points — e.g. lexical aliasing, code metastructure — I've a pretty good handle on. Those are topics I will delve into in the future, but for now, this is my problem.

One of the design considerations for Eng is to be able to provide crossplatform natively parallel code. [1] When I say this, one should bear in mind that this goal alone is as large or larger than all the other goals for the language combined. There's a specific facet to a specific part of my attempt to do this which is causing me troubles, but before that, some history.

Back in the 1980s parallelism was very much in vogue. Then, come the late-1980s/early-1990s when microprocessor speeds started bounding through the roof, everyone decided parallelism was stupid and abandoned most research on it. (Though of course, there are always a few diehards for any technology.) These days, as we're starting to reach the limits of current fabrication technologies, and with the shape of the curve between speed and cost and how it's evolved over time, the field of parallelism is starting to simmer again with a passion fit to boil over the coming decade.

To see this, just take a look at the current market: basic home computers are shipping with 2~4 processors, many of those are dual- or (will soon be) quad-core processors, or have hyperthreading. And if that home market doesn't convince you, then take a look at console game platforms like the xbox360 or the ps3 (or their previous counterparts). And this doesn't even get into the server market which is growing ever faster as computers become persistantly more ubiquitous tools for every endeavor and every business. Nor does it get into distributed systems like the microchip in your keyboard and mouse. Let alone the scientific markets for weather modelling, VLSI, and AI — i.e. those diehards who never left.

...

Now that you're convinced that a natively-parallel language is essential, the first part of my problem. There are two basic varieties of data in the universe: scalar data, and aggregate data; though the lines between them may not always be clear (cf. records or objects). We can clear that line up a bit if we discount records, or call them scalars, and only consider serialized aggregate data (e.g. arrays, matricies, etc). For this discussion we'll ignore the differences between different kinds of scalars, and just consider them unary widgets which can be put into serialized aggregate forms.

For reasons immediately apparent to any who've worked on SIMD systems, and perhaps even to those who've worked sufficiently on highly data-parallel programs on conventional SISD (aka non-parallel) systems, the normal presentation of arrays and other serialized aggregations is woefully simplistic and insufficient for dealing with massively data-parallel programs, and even for many other programs. Furthermore, that some languages have separate datatypes/syntaxes for first-order aggregations (arrays, vectors, etc) and second-order ones (matricies, 2D arrays, etc) is a kludge at best.

Luckily, there's a readily conceivable solution, though I don't know that anyone else has come up with it. The reasons built-in matrices, or the equally infamous arrays-of-arrays, and their ilk seem so kludgey is because as we look at them we can see that they're just an extended case of our basic dependable array. So we ask ourselves, why then do we treat them so differently? The ready solution is: we don't.

I've already started using some terminology which does this. In mathematics there's a concept called a tensor which is a generalization and abstraction of matrices and the like. A tensor is a multi-dimensional "thing" where the minimum number of dimensions needed to describe it is it's order (or rank). So when we're talking about arrays or vectors, we can also describe them as first-order tensors. And matrices are just second-order tensors. We can of course extend this upwards for 3rd-order tensors (cubes), 4th-order tensors (hypercubes), all the way up to infinity — though it becomes hard to visualize after four orders or so. We can also extend this downwards: a scalar, is simply a zeroth-order tensor.

In Eng, I propose that the built-in serialized datastructure is not an array nor a matrix, but rather that it is a tensor in all its abstraction. Which of course means, that we have no scalars, since scalars are just one extremum of a tensor. Now, maybe we don't want to call them tensors — maybe another name which doesn't bring to mind the horrors of linear algebra would be desirable — but the idea is the same, whatever you call it. For a fuller argument on the complexities of having tensors as your basic data type, what the benefits are, and why I've been motivated to head that direction, I will write another essay at some point.

Some more terminology is helpful for talking about tensors. The order of a tensor can also be described as the count of its axes. And every axis has a length or dimension describing how many elements it has along that axis. [2] The shape of a tensor is a full description of it's axes and each of their dimensions; this is helpful because we can have two kth-order tensors which still have a different shape (i.e. arrays can be of different lengths, matrices can be square or long or wide, etc). When performing operations on tensors their shapes must be compatible, not necessarily the same shape but shapes which "fit" together for the operation in question. For an example of different but compatible shapes, consider matrix multiplication which takes an AxB input and a BxA input.

...

Another topic I'll need to justify the importance of in a separate essay is, there's an irony in making a natively-parallel mid-level language. The idea behind MLLs is to minimize the abstraction in syntax (when compared to HLLs; they provide plenty of abstraction when compared to LLLs). And yet, say we have an array of elements that we want to perform some function on. With a conventional MLL we'd write a for-loop and iterate over all the elements. The problem is for parallel programs, the compiler has no way to look at that loop and determine that those operations could be done in parallel. And so, for parallelism, it's helpful to abstract the parallelism into the syntax.

In APL there's the notion of "verbs" and "adverbs" [3]. Verbs include functions, methods, and operators — they're things you do, operations to perform. Adverbs instead are things which affect how you perform those operations. I like the notion of adverbs because they allow for some very sophisticated code (cf. APL's generalized dot-product), and so I'd like to have them in some form or another. [4]

For an example, in most languages addition is a binary infix operator, so it takes arguments like a + b. If we want to get the summation of a list, we could use more than one addition as so: a + b +...+ z. Unfortunately this isn't always possible and seems over-redundant. In a language with adverbs we could use something like a "list-valence adverb" to modify the addition operator so that it takes a list of arguments instead of just two. So we could change the example to something like: +\ a, b, ..., z. [5]

Another nice thing that this sort of list-valence adverb provides is an abstraction for parallelism. It's easy for the compiler to look at one operation taking a large number of arguments and to decide that this should be done in parallel. With many individual calls where associativity and precedence and the like come into play, it's much harder.

...

An aside so the quandry presented later on will make sense. Now that we have these tensors, we need some way to extract elements from them to perform actions on. We'd have to be able to pull out individual elements, as a trivial requirement. But it'd also be nice to be able to pull out "slices" of data as well. For example, say we have a 3x3x3 cube of elements (that is, a third-order tensor with all axes having a dimension of 3). Perhaps this represents the contents of a room. Now, say we want to find out something about the room based on height, e.g. to discover that there are more objects in the lower portion of the room than the upper. It'd be helpful to be able to slice that cube up by its height into a number of "stacked plates" [6] so that we can find the number of objects in each "plate" to compare them. Naturally, each "plate" can be manipulated in parallel so it'd be nice to return a list of them to operate on at once, rather than needing a loop to iterate over them. Additionally, it'd be nice not to need to specify anything about any of the other dimensions than the one we're slicing along, perhaps trivial for our cube, but what if we were working with a 256th-order tensor?

Here's some working syntax I've been using, though like any working syntax the exact symbols could be changed to something better later on. So first we have the tensor in it's entirety, which could be referred to with A. To specify things about the dimensions of the tensor we use a suffix like [pos:axis] where pos is the position along the axis, and axis specifies which axis we're referring to. For example, we could get a corner element from A by typing A[0:0][0:1][0:2]. Since most tensors will be of low order and their lowest axes will be accessed the most frequently, the :axis portion can be optional with the implicit dimension being 0 for the first one, then 1 for the second, 2 for the third, etc; thus simplifying that corner element to A[0][0][0]. [7]

So far so good. Now, what about our example of wanting to get a stack of plates from a cube (that is, a list of second-order tensors corresponding to the third-order tensor). Well, what if we use the normal syntax for specifying details about an axis, but leave off the position, as in A[:0]? That seems intuitive enough. Now, if we want just one of those slices instead of all of them we could use A[0] (or A[0:0]). Note the difference between this and the fully-qualified A[0][0][0]; the first returns a 2nd-order tensor, whereas the latter returns a 0th-order tensor; we could also do A[0][0] to return a 1st-order tensor. And if we wanted more than one slice but not all of them, we could use A[0,1] (or A[0,1:0]) to return the first and the second (but discard the third).

For convenience of programming, we can also take a note from Perl. In Perl, since the length of arrays is known, we can use negative indices to count from the end of a dimension of a tensor. So -1 would be the last element, -2 the next to last, etc. For statically allocated tensors this ability is trivial to add, but even for dynamically allocated ones it should not be too difficult to implement.

...

Enter my quandry. So we have this list-valence adverb, \, which changes our binary infix operator for addition, + and turns it into an n-ary prefix operator for the sum of a list, +\. [8] And we have this notion of slices which returns a list of lower order tensors. So, if we had an array of numbers and wanted their sum, we could do something like +\ A[:0]. Or if we had a 2nd-order tensor and wanted to get a 1st-order tensor with the sums of the columns, we could do +\ A[:1] with the addition operator performing "straight-through" addition. (More on the issue of dot-products and tensor-products in a later essay.)

Now, at a first glance, a list isn't anything different than an array, and so shouldn't be any different than an axis in a tensor. Conceptually that's a very elegant idea, and in my minimalistic mindset I like it. However, there are some problems it raises. We introduced the list-valence adverb to simplify getting the sum (or product, or...) of a number of elements, both to improve parallelism and optimization, but also for easing clarity and code maintenance. So what if we wanted to do that column sum for a large number of tensors? We can't just do +\ A[:1], B[:1],... since that would give us the sum of the whole list (that is, lists are flat). We could use a sequence of +\ A[:1], +\ B[:1], etc or put it in a loop, but that's disgustingly kludgey. Maybe we could introduce a new slicewise adverb so we could do +:1 A, B,... to return a list of the 1st-order tensors.

But if a list is the same thing as a tensor's dimension, then the list-valence adverb is really just turns a binary operation which takes kth-order tensors into a unary operator which takes a single (k+1)th-order tensor with the new axis having a dimension equal to the length of the list. If that's the case, then the list-valence adverb is nearly identical to this new adverb. But that also throws off which number we use to refer to which axis, and what number do we use for the list axis? Zero and positive numbers are taken for indices, and the negatives are used for the convenient count-from-the-end indices so we can't use them either.

They both look like reduction adverbs. List-valence takes a list of kth-order tensors (or a (k+1)th-order tensor), and returns a single kth-order tensor. This new slicewise adverb takes a list of kth-order tensors (or one (k+1)th-order), and returns a list of (k-1)th-order tensors (or a kth-order tensor adding a new axis with dimenstion equal to the length of the input list, and removing the axis specified [9]). Whether we consider a list the same as an axis or not, there's a subtle distinction there, and I'm not sure whether it's relevant or not.

Of course, if we consider a list and an axis the same, then there's also potential for confusion with the list-valence adverb. What does +\ A mean? Is it the straight-through summation of a list of tensors (of which there happens to only be one and so the output is A), or does it slice A by it's first axis? This confusion is largely predicated on whether we consider the commata delimiting a list to be a constructor for this new larger tensor or not.

And then there's always the issue that in a tensor, the dimension of an axis is the same regardless of the position on other axes, i.e. any way we slice a tensor we'll get a list of lower order tensors each of which has the same shape (though shapes may differ between different lists). While this is handy for many purposes it's inconvenient for others, such as times when we may need a list of tensors with different shapes.

I suppose, in the end, it's probably not an issue where there's a "right" and a "wrong" answer. But rather I'll just have to pick one or the other — either a list is an axis, or it is not — and stick to it. Having talked through the complications raised by the question, namely the last one about lists of different shaped tensors, I'm thinking it's probably best to treat them as separate. But just as we have the slicing mechanism to convert an axis to a list, we should have some construction mechanism for turning a list into an axis.

[1] Specifically for "typical" home machines and servers rather than scientific machines, but good solutions to the former will doubtless lead to good solutions for the latter, even if they are not the same, as scientific computers are often a "simpler" form of parallelism than home machines, though there's much more of it.

[2] Of course it's possible to think of continuous tensors rather than discrete ones, in which case the axis' dimension would be a description of the range of values that exist along that axis. And we can have non-numerical or unordered axes as well, axes whose positions are names rather than integers.

[3] That's J terminology. In APL they're called "functions" and "operators" respectively, an unfortunate choice of words since it confuses those familiar with the conventional definitions, and doesn't distinguish user-defined functions like foo() and built-in 'functions' like +. Because of this confusion, J which is a modern successor to APL changed the jargon, and I'll use theirs to minimize the confusion.

[4] Aficionados of functional programming languages may point out that these sound simply like higher-order functions. And, I suppose, in a sense they are, and that may be a good way to implement the functionality. However I think that just as it's helpful to distinguish user-defined functions and operators, it's helpful to distinguish user-defined higher-order functions and adverbs.

[5] We could also just change the addition operator to be a prefix operator which takes a list, as it is in Lisp. However, there's something about the infix form which seems more intuitive and natural, and so it'd be nice to keep it around if possible.

[6] That is — depending on how we have the tensor organized — slicing along the first axis. We could also slice along the other axes as well. Since the original tensor was a cube, slicing along any axis will create a list of the same length comprised of 2nd-order tensors of the same shape (square). However, as we visualize this, it's apparent that there's a subtle difference between the items in these lists — namely, their orientation. Capturing this will require that we conceptualize all kth-order tensors as existing in an infinite-order universe within which they can be rotated. Yet again, the details of this are fodder for another essay.

[7] This simple explanation could also be extrapolated. So say we have that 256th-order tensor and we want to specify the 70th through 80th axes. We could use something like A[0:70][0]...[0], where the lack of :axis means use the previous axis plus one, or use the 0th axis if there are no previously specified axes.

[8] We could do similar things with -\, *\, or even some user-defined function foo\.

[9] Or possibly it returns a list of kth-order tensors or a single (k+1)th-order tensor, with the specified axis having a dimension of 1, to preserve sanity of axis numbering.

Date: 2006-04-05 04:55 pm (UTC)From: [identity profile] silmaril.livejournal.com
I sort of stopped understanding this when you started talking about verbs and adverbs---language design is so not my thing, I have trouble coding in C---but it is interesting stuff nonetheless. I hope you'll keep writing these here.

Date: 2006-04-06 08:53 am (UTC)From: [identity profile] winterkoninkje.livejournal.com
The functional programming explanation, which may make a little more sense than the quasi-linguistic one, is that an adverb such as "\" is a function which takes a function as its first argument and a list of operands as its second argument. So the pseudocode (sans parallelism) would be something like this:

define function "\" taking function "f", list "l"
    temp = identity(f) /* where the identity() function returns the identity of f() */
    for each i in elements of l
        temp = f(temp,i)
    end for
end define


Of course, for "\" to work right as described above, the function argument must be one which takes two operands. We might be able to broaden the definition to take more complex functions, but that's the basic idea.

Date: 2006-04-06 08:05 pm (UTC)From: [identity profile] winterkoninkje.livejournal.com
I'm glad you liked it. (As I mentioned in the water-testing post) This is one of the two main topics I'm considering for my thesis, and even if I choose the other one I'll likely do this as a major side project, so more ramblings are guaranteed :)

I'd sort of prefer to do the other topic since it'd serve me better insofar as a doctorate (since the other topic is linguistics based and concerns a theory I'm quite interested in), but both are interesting, and frankly either could be expanded to a doctorate-level topic. I haven't much background in ai yet so I can't start prepping for the other topic, though I'll be taking a basic (grad) course on it in the fall from the person I'd be working with if I did do it.

I'm thinking I'll start work on getting Eng implemented over the summer, at least the more basic front-end stuff. I'll probably try to get a working non-parallel version of the language (with handles for future parallelism) in place before diving into implementing the parallel aspects.

I forget, you do more of the ee side of things right?

Date: 2006-04-06 08:14 pm (UTC)From: [identity profile] silmaril.livejournal.com
If by "more" you mean "exclusively," yes. I only program when I need to model things and simulate physical processes, and I use the highest level language that I can---for a long time I've been writing exclusively in MATLAB, for instance. So this is all a bit alien to me. But it's interesting alien.

Date: 2006-04-06 08:20 pm (UTC)From: [identity profile] winterkoninkje.livejournal.com
If by "more" you mean "exclusively," yes.

Heh :)

I seem to recall hearing about some programming languages (Verilog (http://en.wikipedia.org/wiki/Verilog)?) which when compiled produce circuits as their output rather than machine code. Have you ever messed with any of those?

Date: 2006-04-05 06:12 pm (UTC)From: [identity profile] jes5199.livejournal.com
I think the spacial visualization may be screwing you.
but i don't want to get into that right now.
and personally, i think hash-like tensors are more useful than array-like tensors

also, talking about + muddies things. + is a special case: its parameters are symmetric, and that is not the norm.

i'd be happy to have + be a special list constructor, so that
A + B + C + D
is sugar for
sum(A, B, C, D)
but let's keep going
R = (A, B, C, D)
sum(R)
well? at this point
R[0] == A
sure looks true. i think that the list-constructor is secretly creating a new 1st axis.
leading to:
R[0:1] == (A[0:0], B[0:0], C[0:0], D[0,0])
which isn't terribly crazy
uh
oh yeah, so sum() seems to pop axis :0, and returns a (k-1)th order tensor

Date: 2006-04-05 06:13 pm (UTC)From: [identity profile] jes5199.livejournal.com
"axis stack" was the concept i didn't quite express.

Date: 2006-04-06 10:00 am (UTC)From: [identity profile] winterkoninkje.livejournal.com
I think the spacial visualization may be screwing you.
but i don't want to get into that right now.


How so? I was mainly using the visualization for folks who aren't familiar with the concept so it's easier to grasp.

and personally, i think hash-like tensors are more useful than array-like tensors

I agree. And there's a footnote later on about that. The only point where they become different in a meaningful way is when you start allowing mutable constructs though.

i'd be happy to have + be a special list constructor

I'm not sure that I'd be. From a functional perspective it'd make sense, but I'm not sure how well that'd work for a MLL. Basically, it'd be nice to avoid (functional style) lists as they have high overhead compared to known-memory addressing. Not to say that lists don't have their purposes, just that for a language with any hope of overthrowing C/C++, they can't be relied upon by the language.

In the back end of things, after compilation A + B is identical to sum(A, B), the "+" is just a special symbolic notation for us wee minded programmers. Of course the underlying summation function could conceivably take more than two operands (though the two arg version converts more nicely to SISD machine code). The problem isn't with the back end notation so much as what the coder has to write. Having something like A + B +... implicitly turned into +\ A, B,... by the compiler isn't a bad idea[1], but not having a source representation of the summation of a list makes it difficult to capture the idea "sum everything in this array" or "sum the first column" without enumerating all of them, either literally, or with a for loop; neither of which is desirable. So having the "\" allows the source code to signal to the parser that there's a change from the normal infix syntax, even though the underlying meaning in the AST is the same. (And since the parser/compiler deals with it is part of the reason I think it should be distinguished from a higher-order function.)

I agree with your point about the fact that it looks like the list constructor is secretly creating a new axis which the list summation is then destroying. That's the observation that got me into this whole quandary in the first place :) The question is whether they are just conceptually the same (serialized aggregation), or whether they are in fact identical. I think the fact that we'd want lists of elements with different shapes seems to imply that they're different (or that the "list" is a 1st-order tensor containing the other tensors, or references to them to make them the same size for known-memory purposes), as does the issue of maintaining axis numbering[2].

Of course this gets into another issue, namely embedding tensors within others. In Perl, which has no higher order constructs, we can write $A[0][13]{'foo'}[6]{'bar'} just fine because we know that each step along the line we're only dealing with one thing (grab the 0 of an array. grab the 13 of an array. grab the foo of a hash. ...). But what if we have, say, a bunch of 1st-order tensors as the elements of a 2nd-order tensor. Code like A[0][0] is ambiguous. Is it 1st-order tensor of the corner element, or is it all of the first elements of the tensors in the first axis?[3] We could clear that up by having A[0]->[0] or the like, but then that exposes the underlying representation to the caller in a way that I'm not entirely comfortable with: what if we decided to replace that construct with a simple 3rd-order tensor? Then we'd have to go and change all that code.

Date: 2006-04-06 10:00 am (UTC)From: [identity profile] winterkoninkje.livejournal.com
[1] One which'd take some work to implement appropriately, but not an insurmountable one I don't think. Of course, that'd put a lot of burden on the compiler to know which functions/operators have properties like associativity, transitivity, commutativity[4], and the like. Otherwise it'd be hard to turn A - B + C - D + E -... into +\ (A, C, E,...) + -\ (B, D,...).

[2] To which the numbering just provides a label. Much like having hash-like tensors, the fact that we use numbers to name the different axes (or positions along them) isn't terribly important; we could have our cube tensor have axes with names like "length", "width", and "height" instead of "0", "1", and "2". Provided that tensors are nonmutable, the programming convenience would be compiled away anyways so there wouldn't even be the overhead of a hash function. Of course, mutability is nice, though it may be something relegated to core or standard libraries instead of being built-in. But more on that when I talk about basic types and the hyperminimalism associated with them.

[3] The problem here is that the names of the axes in the "parent" and "child" tensors have some overlap. And whether we use hash-like or array-like naming of axes, there's no easy way around that. One simple rule would be that you can't embed a tensor in another tensor directly, i.e. the whole memory chunk as in a C array of struct literals. You could only do that if all the tensors were the same size (though not necessarily the same shape) anyways, so I don't think there's much loss. This would force you to store references or some other simple type as the elements of a tensor. So it may be that the A[0]->[0] syntax is unavoidable and should just be considered poor style. Of course that also gets into the notion (if not the actuality) of parallel references, which have been shied away from in all the parallel languages I'm familiar with. (But foo on them ;)

[4] As for commutativity, I had another idea I'm not sure whether I'll use about the notion of having an unordered aggregation (i.e. a set, or more accurately a bag). It'd be serialized (i.e. ordered) after compilation, but for the source-level code it would capture a subtlety that few languages do. If such a thing exists, then we could just define a commutative operation as taking an unordered list of two arguments. This becomes especially nice for issues of OOP where in languages like C++ you're forced to define every possible permutation of your class and others (e.g. sum(A,A); sum(A,int); sum(int,A);). Of course, for that particular issue, we could also use lexical aliasing to be rid of it. I've yet to decide whether this unordered list is a good idea or not. I'd need to know more about what it'd take to implement and the complications it raises/

Date: 2006-04-06 11:52 am (UTC)From: [identity profile] winterkoninkje.livejournal.com
Doh, that should've been +\ (A, C, E,...) + -\ (0, B, D,...) since -\ A, B... == A - +\ B...

Date: 2006-04-06 09:28 pm (UTC)From: [identity profile] jes5199.livejournal.com
kudos on [4] ... good luck with the syntax.

Date: 2006-04-06 09:27 pm (UTC)From: [identity profile] jes5199.livejournal.com
it's pretty clear that if you're going to have a REFERENCE as a scalar inside a tensor's cell that you cannot use the perl-style psuedo-multidimensional kludge. of course, it isn't immediately clear to me why you'd want to do that: a subtensor can be represented by a tensor whose dimensions are orthagonal to the supertensor.

Date: 2006-04-07 11:44 pm (UTC)From: [identity profile] winterkoninkje.livejournal.com
of course, it isn't immediately clear to me why you'd want to do that:

Largely in case the user wants the subtensor's axes to be of different dimensions, e.g. if A[0][0..5] but A[1][0..10]. I'm not sure how well such a thing could be dealt with using known-memory addressing, and how easy it'd be to do bounds checking et al. at compile time. One could always pad the shorter ones with empty elements to make them the same dimension, but that's not always a good solution.

Date: 2006-04-08 12:08 am (UTC)From: [identity profile] jes5199.livejournal.com
these structures are the very definition of geometric growth. it is hard for me to imagine them being usable if they are not sparce* [*in the filesystem sense]

Date: 2006-04-08 12:08 am (UTC)From: [identity profile] jes5199.livejournal.com
or rather "sparse"

June 2017

S M T W T F S
    123
45678910
11121314151617
18192021 222324
252627282930 

Tags

Page generated 21 Sep 2017 02:02 pm
Powered by Dreamwidth Studios