winterkoninkje: shadowcrane (clean) (Default)

It would seem over the last year or two my blog has lapsed from obscurity into death. Not being one to let things rest, I figure this horse still has some beating left in it. About, what, a month ago I handed in the final project for my MSE and so I am now a masterful computer scientist. This means, in short, that I now know enough to bore even other computer scientists on at least one topic.

The funny thing is that both topics of my project —category theory and unification— are topics I knew essentially nothing about when I transfered to JHU from PSU a year ago. Of course now, I know enough to consider myself a researcher in both fields, and hence know more than all but my peers within the field. I know enough to feel I know so little only because I have a stack of theses on my desk that I haven't finished reading yet. I'm thinking I should finish reading those before recasting my project into a submission to a conference/journal. Since the project is more in the vein of figuring out how a specific language should work, rather than general theoretical work, I'm not sure exactly how that casting into publishable form should go; it seems too... particular to be worth publishing. But then maybe I'm just succumbing to the academic demon that tells me my work is obvious to everyone since it is to me.

One thing that still disappoints me is that, much as I do indeed love programming languages and type theory, when I transfered here my goal was to move from programming languages and more towards computational linguistics. (If I were to stick with PL, I could have been working with the eminent Mark Jones or Tim Sheard back at PSU.) To be fair, I've also learned an enormous amount about computational linguistics, but I worry that my final project does not belie that learning to the admission committees for the PhD programs I'll be applying to over the next few months. Another problem that has me worried about those applications is, once again, in the demesne of internecine politics. For those who aren't aware, years ago a line was drawn in the dirt between computationally-oriented linguists and linguistically-oriented computer scientists, and over the years that line has evolved into trenches and concertina wire. To be fair, the concertina seems to have been taken down over the last decade, though there are still bundles of it laying around for the unwary (such as myself) to stumble into. There are individuals on both sides who are willing to reach across the divide, but from what I've seen the division is still ingrained for the majority of both camps.

My ultimate interests lie precisely along that division, but given the choice between the two I'd rather be thrown in with the linguists. On the CS side of things, what interests me most has always been the math: type theory, automata theory, etc. These are foundational to all of CS and so everyone at least dabbles, but the NLP and MT folks (in the States, less so in Europe) seem to focus instead on probabilistic models for natural language. I don't like statistics. I can do them, but I'm not fond of them. Back in my undergraduate days this is part of why I loved anthropology but couldn't stand sociology (again, barring the exceptional individual who crosses state lines). While in some sense stats are math too, they're an entirely different kind of math than the discrete and algebraic structures that entertain me. I can talk categories and grammars and algebra and models and logic, but the terminology and symbology of stats are greek to me. Tied in somehow with the probabilistic models is a general tendency towards topics like data mining, information extraction, and text classification. And while I enjoy machine learning, once again, I prefer artificial intelligence. And to me, none of these tendencies strike me as meaningfully linguistic.

More than the baroque obfuscatory traditions of their terminology, my distaste for statistics is more a symptom than a cause. A unifying theme among all these different axes —computational linguistics vs NLP, anthropology vs sociology, mathematics vs statistics, AI vs machine learning — is that I prefer deep theoretical explanations of the universe over attempts to model observations about the universe. Sociology can tell you that some trend exists in a population, but it can make no predictions about an individual's behavior. Machine learning can generate correct classifications, but it rarely explains anything about category boundaries or human learning. An n-gram language model for machine translation can generate output that looks at least passingly like the language, but it can't generalize to new lexemes or to complex dependencies.

My latest pleasure reading is Karen Armstrong's The Battle for God: A history of fundamentalism. In the first few chapters Armstrong presents a religious lens on the history of the late-fifteenth through nineteenth centuries. Towards the beginning of this history the concepts of mythos and logos are considered complementary forces each with separate spheres of prevalence. However, as Western culture is constructed over these centuries, logos becomes ascendant and mythos is cast aside and denigrated as falsity and nonsense. Her thesis is that this division is the origin of fundamentalist movements in the three branches of the Abrahamic tradition. It's an excellent book and you should read it, but I mention it more because it seems to me that my academic interests have a similar formulation.

One of the reasons I've been recalcitrant about joining the ranks of computer scientists is that, while I love the domain, I've always been skeptical of the people. When you take a group of students from the humanities they're often vibrant and interesting; multifaceted, whether you like them or not. But when you take a group of students from engineering and mathematical sciences, there tends to be a certain... soullessness that's common there. Some of this can be attributed to purely financial concerns: students go into engineering to make money, not because they love it; students go into humanities to do something interesting before becoming a bartender. When pitting workplace drudgery against passionate curiosity, it's no wonder the personalities are different. But I think there's a deeper difference. The mathematical sciences place a very high premium on logos and have little if any room for mythos, whereas the humanities place great importance on mythos (yet they still rely on logos as a complimentary force). In the open source movement, the jargon file, and other esoterica we can see that geeks have undeniably constructed countless mythoi. And yet the average computer geek is an entirely different beast than the average computer scientist or electrical engineer. I love computer geeks like I love humanists and humanitarians, so they're not the ones I'm skeptical of, though they seem to be sparse in academia.

I've always felt that it is important to have Renaissance men and women, and that modern science's focus on hyperspecialization is an impediment to the advancement of knowledge. This is one of the reasons I love systems theory (at least as Martin Zwick teaches it). While I think it's an orthogonal consideration, this breadth seems to be somewhat at odds with logocentric (pure) computer science. The disciplines that welcome diversity —artificial intelligence/life, cognitive science, systems theory, computational linguistics— seem to constantly become marginalized, even within the multidisciplinary spectrum of linguistics, computer science, et al. Non-coincidentally these are the same disciplines I'm most attracted to. It seems to me that the Renaissance spirit requires the complementary fusion of mythos and logos, which is why it's so rare in logocentric Western society.

winterkoninkje: shadowcrane (clean) (Default)

I've been thinking recently about the free monoid, in particular about why it is what it is. Before you run off in fear of the terminology, read on. The rest of this post is in English, a rare thing for discussions of fundamentals of mathematics. It's that rareness which led me to muse on what all that abstract nonsense actually means.

For those who don't know what a monoid is, it's a triple <S, ⋅, ε> where S is a set, ⋅ is a binary operation over that set which is associative (i.e. (x ⋅ y) ⋅ z == x ⋅ (y ⋅ z)), and ε is the left and right identity of ⋅ (i.e. x ⋅ ε == x == ε ⋅ x). These kinds of functions are incredibly common. Semirings, which are also incredibly common, each have two. For example: addition with 0 and multiplication with 1 over the natural numbers; disjunction with False and conjunction with True over the booleans; union with the empty set and intersection with the universal set over the subsets of some universal set. Given how common they are, sometimes we'd like to construct an arbitrary one for as cheaply as possible, for free.

Read more... )
winterkoninkje: shadowcrane (clean) (Default)

I used to drink the kool aid, it had a nice taste, but the more time passes the more I find myself agreeing with Bart, my mentor of old. Objects are big pile of fail. The Rubyists and the Pythonistas are coming now, with their pitchforks and baling wire. But they need not worry, they will be last against the wall. But to the wall they still will go.

In which I (drunkenly) tell everyone where they can go )
winterkoninkje: shadowcrane (clean) (Default)

As the good Tom Waits would say, I want to pull on your coat about something. As I've been revamping my cv and hunting for advisors for the next round of phd applications, I've begun once again lamenting the fragmentation of my field. I suppose I should tell you what my field is but, y'see, that's where all the problems lie: there's no such field. As diverse and Renaissance as my interests are, they're all three sides of the same coin: language, sociality, and intelligence.

So, first things first. Evidently language is a diverse topic, but I mean to focus on formal and theoretical matters, the quintessence of what makes what we call "language". The early work of Chomsky to the contrary, there's an unfortunate —though entirely understandable— break between the study of formal languages and natural languages. On the natural side I'm interested in morphology and its interfaces with other components of language (morphophonology, morphosyntax & scrambling, morphosemantics & nuance). On the formal side I'm interested in the design of programming languages, ontologies, and interfaces. And on the middle side I'm interested in grammar formalisms like TAG and CCG as well as the automata theory that drives these and parsers and machine translation.

Sociality is also a diverse topic, without even accounting for the fact that I'm abusing the term to cover both the structure of societies and the interactions within and between them. Here too there's an unfortunate —though entirely understandable— break between the humanities and the sciences. In the humanities I'm interested in anthropology, gender/sexuality studies, performativity, the body as media, urban neo-tribalism, and online communities. More scientifically I'm interested in nonlinear systems theory, information theory, chaos theory, catastrophe theory, scale-free networks, and theoretical genetics. And again, on the middle side there are issues of sociolinguistics: code switching, emotional particles, uses of prosody, politeness and group-formation; and evolution: both evolutionary computation, and also cultural and linguistic evolution.

And as you may no doubt be gathering, studies of intelligence too are vast and harshly divided— between wetware and hardware, or between cognition and computation if you prefer. Language is often pegged as a fundamental component to humanity's ability for higher thought, and yet even despite this the majority of linguistic formalisms neglect questions of how cognitively realistic they are as models of actual human linguistic performance. Over on the side of artificial intelligence and artificial life there's a rift between those studying complexity, adaptation, and emergence vs those trying to hammer thought and knowledge into the rigid formalisms of logic and probability. Sandwiched between these conflicts are the war-torn battle grounds of machine translation, language learning, and language acquisition.

So how many fields are involved in this tripartite Janus of interfaces, systems, and agency? To make a short list: linguistics, mathematics, computer science, cultural anthropology, gender/queer/feminist studies, women's lit, systems science/systems theory, cognitive science, social psychology, computational biology, artificial intelligence/artificial life/machine learning, and given the vagaries of universities often electrical engineering and philosophy for good measure. How many is that? Too goddamned many, that's how many. And to top it off, all of them are interdisciplinary to boot. Now you may be saying to yourself that I'm trying too hard to unify too many disparate discourses, and perhaps it's true, but there is a cohesion there which should be evident by the extent to which each of those many fields crosscut these three seemingly simple categories.

Systems theory gets it right when they say that the current state of science is burdened by its focus on fundamentalism. )
winterkoninkje: shadowcrane (clean) (Default)

So I got to thinking about monads for computation, in particular about the "no escape" clause. In practice you actually can escape from most monads (ST, State, List, Logic (for backtracking search), Maybe, Cont, even IO) and these monads all have some "run" function for doing so (though it may be "unsafe"). These functions all have different signatures because there are different ways of catamorphing the computation away in arriving at your result, but in principle it's possible to subtype these computations into having appropriate generic execution functions.

The simplest computation is a constant computation which successfully returns a value, ST and Identity are just such computations. One way of extending this is to say that a computation is nondeterministic and can return many answers[1]. A different way of extending it is to say that the computation might fail to return any value, Maybe is just such a monad. And of course we could take both those extensions and either fail to return any answers or return nondeterministically many answers, List and Logic are our examples here. Another extension is that a computation may have different kinds of failure, like (Either e), and this is where things start to get interesting.

Read more... )
winterkoninkje: shadowcrane (clean) (Default)

One of the classes I'm taking this term alternates between Haskell and Smalltalk in trying to teach a bunch of seniors and graduate students "extreme programming" and how coding in the real world is different from in school. In one of the exercises we were working with an attempt to formulate Haskell-like tuples and lists in Smalltalk, in particular trying to debug the implementation we were given. We found numerous issues with the implementation, but one in particular has been nagging me. It indicates inherent limitations in the syntax of Smalltalk, but mulling it over, it seems to be an even deeper issue than that.

Part of the implementation for tuples[1] was overloading the comma operator (normally string concatenation) to create pairs like (a, b) or triples like (a, b, c) etc. The problem was this, how do we do tuples of tuples? Using code like ((a, b), (c, d)) does not give us a pair of pairs, but rather is equivalent to (a, b, (c, d)). I thought, at first, it was a problem of associativity; when the parser sees the second comma, the one after the b, it takes the preceding object and combines it with what follows, in effect it's acting like an operator for constructing lists. Reversing the associativity of the operator just gives us the same problem in the other direction yielding ((a, b), c, d). This is not an issue for Haskell because the parentheses are required and so they let us know for certain when we're done making a tuple. But in Smalltalk as with most languages, the parentheses are only there as suggestions on how to create a parse tree.

All this diagnosis I did for the exercise, but I've just struck something. There is a deep seated difference between "destructive" and "constructive" operations in any language. )


17 Mar 2006 06:30 am
winterkoninkje: shadowcrane (clean) (Default)

As some of you may know, I've been considering the design of a new programming language lately. (And for those who haven't yet heard me talking about it, I will... at great lengths... whether you want me to or not.) This has been one of the things which has kept me busy and away from blogging of late. I'm not sure how much I should ramble on about it here though.

In working through my thoughts on the language, I'm writing an essay in three parts which mainly aims to justify why I'm doing this. (Before the abundance of circumlocutions becomes overwhelming, let us call this new language "Eng".) The first part is basically a laundry list of little features or attributes that Eng — or any language — would need to thrive against the current competitors. The second covers what I deem the fundamental Laws of Language Design, and these perhaps are the crux of the piece; certainly they are the part of the essay which is most liable to be useful outside of organizing my thoughts on Eng. And the third section of the essay is intended to introduce some ideas I have which would make this new language special. But there's a problem.

... )
winterkoninkje: shadowcrane (clean) (Default)

So I was reading an article on Pyramid recently, and the attendant discussion on the Pyramid discussion boards, which got me thinking. I often don't realize how much the internet has evolved in my lifetime, or how immersed I've been in that change. True, I'm no Great Old One who had the honor of dealing with punch-cards or actual terminals. But if those venerable souls were the ones who wrought the net from the unliving stone, then I am of the first progeny that genesis has spawned. Not even an early-adopter, for that would presume something to be adopted, but rather an embodiment of the time from which sprang the devices for adoption.

Looking back over the net, even just over my megre life, I have seen the invention and death of countless technologies. Technologies which have at their core the one simple truth about the internet. The net, as with the whole of human endeavor is concerned solely with the production and consumption of meaning, with the conveyance of thought and information.

To that end we have invented countless ways to communicate that meaning, from the very fundament of language itself, to books and radio and television, to bulletin boards, email, usenet, IRC, webpages, instant messaging, weblogs, newsfeeds, podcasts, voice-over-IP, wikis, and countless others. And over time we've seen the revival of a number of these forgotten souls in the new era from bulletin boards like Craigslist, to the reinvention of terminals with lessdisks, to the purported renaissance of IRC.

And yet, when e'er these ideas return they are thought of as novel and all too often they fear to look at their predecessors to learn from the faults of times past. An interesting thing is that the difficulties with each of these technologies are remarkable in their similarity despite their disparate implementations. The problem of spam originated on usenet if not before. And since then it has spread to email, IM, and even wikis. And so it is with the myriad of other difficulties.

Upon reflection, one thing which I find is lacking, is a unified system which categorizes these different technologies, a single language with which to discuss and compare them. A language which could be almost deceptive in its simplicity. There are a small number of axes on which these forms of communication can be rated.

One possible ontology follows )
RSS Atom

June 2017

18192021 222324


Page generated 23 Sep 2017 12:57 pm
Powered by Dreamwidth Studios