winterkoninkje: shadowcrane (clean) (Default)

I'm sick. I shouldn't be online. But just wanted to prattle on about a thing that'd take too long on twitter. A day or two ago I came across a linguist being cited somewhere in some article about celebrity couple name blends. In it they noted how certain syllables like "klol" and "prar" are forbidden in English. They phrased the restriction as forbidding CRVR (where C means a consonant, R means a liquid/sonorant —I forget how they phrased it—, and V a vowel).

There's something of merit going on here, but the specifics are far more complicated. Note that "slur" is perfectly acceptable. So, well maybe it's just that the C has to be a plosive. But then, "blur" is perfectly fine too. So, well maybe it's something special about how "-ur" forms a stand-alone rhotic vowel. But then "trill" and "drill" are just fine. So, well maybe...

winterkoninkje: shadowcrane (clean) (Default)

For all you local folks, I'll be giving a talk about my dissertation on November 5th at 4:00–5:00 in Ballantine Hall 011. For those who've heard me give talks about it before, not much has changed since NLCS 2013. But the majority of current CL/NLP, PL, and logic folks haven't seen the talk, so do feel free to stop by.

Abstract: Many natural languages allow scrambling of constituents, or so-called "free word order". However, most syntactic formalisms are designed for English first and foremost. They assume that word order is rigidly fixed, and consequently these formalisms cannot handle languages like Latin, German, Russian, or Japanese. In this talk I introduce a new calculus —the chiastic lambda-calculus— which allows us to capture both the freedoms and the restrictions of constituent scrambling in Japanese. In addition to capturing these syntactic facts about free word order, the chiastic lambda-calculus also captures semantic issues that arise in Japanese verbal morphology. Moreover, chiastic lambda-calculus can be used to capture numerous non-linguistic phenomena, such as: justifying notational shorthands in category theory, providing a strong type theory for programming languages with keyword-arguments, and exploring metatheoretical issues around the duality between procedures and values.

Edit 2014.11.05: The slides from the talk are now up.

winterkoninkje: shadowcrane (clean) (Default)

Next month I'll be giving a talk at the NLCS workshop, on the chiastic lambda-calculi I first presented at NASSLLI 2010 (slides[1]). After working out some of the metatheory for one of my quals, I gave more recent talks at our local PL Wonks and CLingDing seminars (slides). The NASSLLI talk was more about the linguistic motivations and the general idea, whereas the PLWonks/CLingDing talks were more about the formal properties of the calculus itself. For NLCS I hope to combine these threads a bit better— which has always been the challenge with this work.

NLCS is collocated with this year's LICS (and MFPS and CSF). I'll also be around for LICS itself, and in town for MFPS though probably not attending. So if you're around, feel free to stop by and chat.

[1] N.B., the NASSLLI syntax is a bit different than the newer version: square brackets were used instead of angle brackets (the latter were chosen because they typeset better in general); juxtaposition was just juxtaposition rather than being made explicit; and the left- vs right-chiastic distinction was called chi vs ksi (however, it turns out that ksi already has an important meaning in type theory).

Edit 2013.07.02: the slides are available here.

winterkoninkje: shadowcrane (clean) (Default)

Other than my research assistantship, I've been taking some cool classes. Larry Moss is teaching a course on category theory for coalgebra (yes, that Larry; I realized last xmas when my copy arrived). While I have a decent background in CT from being an experienced Haskell hacker and looking into things in that direction, it's nice to see it presented in the classroom. Also, we're using Adámek's Joy of Cats which gives a very different presentation than other books I've read (e.g., Pierce) since it's focused on concrete categories from mathematics (topology, group theory, Banach spaces, etc) instead of the CCC focus common in computer science.

Sandra's teaching a course on NLP for understudied and low-resource languages. As you may have discerned from my previous post, agglutinative languages and low-resource languages are the ones I'm particularly interested in. Both because they are understudied and therefore there is much new research to be done, but also because of political reasons (alas, Mike seems to have taken down the original manifesto). We've already read a bunch of great papers, and my term paper will be working on an extension of a book that was published less than a year ago; and I should be done in time to submit it to ACL this year, which would be awesome.

My last class is in historical linguistics. I never got to take one during my undergrad, which is why I signed up for it. Matt offered one my senior year, but I was one of only two people who signed up for it, so it was cancelled. It used to be that people equated linguistics with historical, though that has been outmoded for quite some time. Unfortunately it seems that the field hasn't progressed much since then, however. Oh wells, the class is full of amusing anecdotes about language change, and the prof is very keen to impress upon us the (radically modern) polysynchronic approach to language change, as opposed to taking large diachronic leaps or focusing on historical reconstruction. And I'm rather keen on polysynchrony.

winterkoninkje: shadowcrane (clean) (Default)

Less than a fortnight past one of my academic heroes passed on. Language Log has a good summary of the highlights of his life and some touching stories of bygone eras. I haven't a lot to add to Dan Everett's treatment, but I thought I'd send it along for those who don't read LL.

My love for bringing linguistics and anthropology into the discourse of other fields, and much of my philosophical perspective on the need for integrating formalism and functionalism, both stem from Lévi-Strauss and his work. Too rarely do people cross the borders in academia, and too rarely do they try to integrate opposing theories rather than choosing a side. One of the greatest of our clan is fallen.

winterkoninkje: shadowcrane (clean) (Default)

It would seem over the last year or two my blog has lapsed from obscurity into death. Not being one to let things rest, I figure this horse still has some beating left in it. About, what, a month ago I handed in the final project for my MSE and so I am now a masterful computer scientist. This means, in short, that I now know enough to bore even other computer scientists on at least one topic.

The funny thing is that both topics of my project —category theory and unification— are topics I knew essentially nothing about when I transfered to JHU from PSU a year ago. Of course now, I know enough to consider myself a researcher in both fields, and hence know more than all but my peers within the field. I know enough to feel I know so little only because I have a stack of theses on my desk that I haven't finished reading yet. I'm thinking I should finish reading those before recasting my project into a submission to a conference/journal. Since the project is more in the vein of figuring out how a specific language should work, rather than general theoretical work, I'm not sure exactly how that casting into publishable form should go; it seems too... particular to be worth publishing. But then maybe I'm just succumbing to the academic demon that tells me my work is obvious to everyone since it is to me.

One thing that still disappoints me is that, much as I do indeed love programming languages and type theory, when I transfered here my goal was to move from programming languages and more towards computational linguistics. (If I were to stick with PL, I could have been working with the eminent Mark Jones or Tim Sheard back at PSU.) To be fair, I've also learned an enormous amount about computational linguistics, but I worry that my final project does not belie that learning to the admission committees for the PhD programs I'll be applying to over the next few months. Another problem that has me worried about those applications is, once again, in the demesne of internecine politics. For those who aren't aware, years ago a line was drawn in the dirt between computationally-oriented linguists and linguistically-oriented computer scientists, and over the years that line has evolved into trenches and concertina wire. To be fair, the concertina seems to have been taken down over the last decade, though there are still bundles of it laying around for the unwary (such as myself) to stumble into. There are individuals on both sides who are willing to reach across the divide, but from what I've seen the division is still ingrained for the majority of both camps.

My ultimate interests lie precisely along that division, but given the choice between the two I'd rather be thrown in with the linguists. On the CS side of things, what interests me most has always been the math: type theory, automata theory, etc. These are foundational to all of CS and so everyone at least dabbles, but the NLP and MT folks (in the States, less so in Europe) seem to focus instead on probabilistic models for natural language. I don't like statistics. I can do them, but I'm not fond of them. Back in my undergraduate days this is part of why I loved anthropology but couldn't stand sociology (again, barring the exceptional individual who crosses state lines). While in some sense stats are math too, they're an entirely different kind of math than the discrete and algebraic structures that entertain me. I can talk categories and grammars and algebra and models and logic, but the terminology and symbology of stats are greek to me. Tied in somehow with the probabilistic models is a general tendency towards topics like data mining, information extraction, and text classification. And while I enjoy machine learning, once again, I prefer artificial intelligence. And to me, none of these tendencies strike me as meaningfully linguistic.

More than the baroque obfuscatory traditions of their terminology, my distaste for statistics is more a symptom than a cause. A unifying theme among all these different axes —computational linguistics vs NLP, anthropology vs sociology, mathematics vs statistics, AI vs machine learning — is that I prefer deep theoretical explanations of the universe over attempts to model observations about the universe. Sociology can tell you that some trend exists in a population, but it can make no predictions about an individual's behavior. Machine learning can generate correct classifications, but it rarely explains anything about category boundaries or human learning. An n-gram language model for machine translation can generate output that looks at least passingly like the language, but it can't generalize to new lexemes or to complex dependencies.

My latest pleasure reading is Karen Armstrong's The Battle for God: A history of fundamentalism. In the first few chapters Armstrong presents a religious lens on the history of the late-fifteenth through nineteenth centuries. Towards the beginning of this history the concepts of mythos and logos are considered complementary forces each with separate spheres of prevalence. However, as Western culture is constructed over these centuries, logos becomes ascendant and mythos is cast aside and denigrated as falsity and nonsense. Her thesis is that this division is the origin of fundamentalist movements in the three branches of the Abrahamic tradition. It's an excellent book and you should read it, but I mention it more because it seems to me that my academic interests have a similar formulation.

One of the reasons I've been recalcitrant about joining the ranks of computer scientists is that, while I love the domain, I've always been skeptical of the people. When you take a group of students from the humanities they're often vibrant and interesting; multifaceted, whether you like them or not. But when you take a group of students from engineering and mathematical sciences, there tends to be a certain... soullessness that's common there. Some of this can be attributed to purely financial concerns: students go into engineering to make money, not because they love it; students go into humanities to do something interesting before becoming a bartender. When pitting workplace drudgery against passionate curiosity, it's no wonder the personalities are different. But I think there's a deeper difference. The mathematical sciences place a very high premium on logos and have little if any room for mythos, whereas the humanities place great importance on mythos (yet they still rely on logos as a complimentary force). In the open source movement, the jargon file, and other esoterica we can see that geeks have undeniably constructed countless mythoi. And yet the average computer geek is an entirely different beast than the average computer scientist or electrical engineer. I love computer geeks like I love humanists and humanitarians, so they're not the ones I'm skeptical of, though they seem to be sparse in academia.

I've always felt that it is important to have Renaissance men and women, and that modern science's focus on hyperspecialization is an impediment to the advancement of knowledge. This is one of the reasons I love systems theory (at least as Martin Zwick teaches it). While I think it's an orthogonal consideration, this breadth seems to be somewhat at odds with logocentric (pure) computer science. The disciplines that welcome diversity —artificial intelligence/life, cognitive science, systems theory, computational linguistics— seem to constantly become marginalized, even within the multidisciplinary spectrum of linguistics, computer science, et al. Non-coincidentally these are the same disciplines I'm most attracted to. It seems to me that the Renaissance spirit requires the complementary fusion of mythos and logos, which is why it's so rare in logocentric Western society.

winterkoninkje: shadowcrane (clean) (Default)

As the good Tom Waits would say, I want to pull on your coat about something. As I've been revamping my cv and hunting for advisors for the next round of phd applications, I've begun once again lamenting the fragmentation of my field. I suppose I should tell you what my field is but, y'see, that's where all the problems lie: there's no such field. As diverse and Renaissance as my interests are, they're all three sides of the same coin: language, sociality, and intelligence.

So, first things first. Evidently language is a diverse topic, but I mean to focus on formal and theoretical matters, the quintessence of what makes what we call "language". The early work of Chomsky to the contrary, there's an unfortunate —though entirely understandable— break between the study of formal languages and natural languages. On the natural side I'm interested in morphology and its interfaces with other components of language (morphophonology, morphosyntax & scrambling, morphosemantics & nuance). On the formal side I'm interested in the design of programming languages, ontologies, and interfaces. And on the middle side I'm interested in grammar formalisms like TAG and CCG as well as the automata theory that drives these and parsers and machine translation.

Sociality is also a diverse topic, without even accounting for the fact that I'm abusing the term to cover both the structure of societies and the interactions within and between them. Here too there's an unfortunate —though entirely understandable— break between the humanities and the sciences. In the humanities I'm interested in anthropology, gender/sexuality studies, performativity, the body as media, urban neo-tribalism, and online communities. More scientifically I'm interested in nonlinear systems theory, information theory, chaos theory, catastrophe theory, scale-free networks, and theoretical genetics. And again, on the middle side there are issues of sociolinguistics: code switching, emotional particles, uses of prosody, politeness and group-formation; and evolution: both evolutionary computation, and also cultural and linguistic evolution.

And as you may no doubt be gathering, studies of intelligence too are vast and harshly divided— between wetware and hardware, or between cognition and computation if you prefer. Language is often pegged as a fundamental component to humanity's ability for higher thought, and yet even despite this the majority of linguistic formalisms neglect questions of how cognitively realistic they are as models of actual human linguistic performance. Over on the side of artificial intelligence and artificial life there's a rift between those studying complexity, adaptation, and emergence vs those trying to hammer thought and knowledge into the rigid formalisms of logic and probability. Sandwiched between these conflicts are the war-torn battle grounds of machine translation, language learning, and language acquisition.

So how many fields are involved in this tripartite Janus of interfaces, systems, and agency? To make a short list: linguistics, mathematics, computer science, cultural anthropology, gender/queer/feminist studies, women's lit, systems science/systems theory, cognitive science, social psychology, computational biology, artificial intelligence/artificial life/machine learning, and given the vagaries of universities often electrical engineering and philosophy for good measure. How many is that? Too goddamned many, that's how many. And to top it off, all of them are interdisciplinary to boot. Now you may be saying to yourself that I'm trying too hard to unify too many disparate discourses, and perhaps it's true, but there is a cohesion there which should be evident by the extent to which each of those many fields crosscut these three seemingly simple categories.

Systems theory gets it right when they say that the current state of science is burdened by its focus on fundamentalism. )
winterkoninkje: shadowcrane (clean) (Default)

There's been some recent furor over this piece recently. And after composing a rather long reply on the matter, I figured it's time to turn it into a full rant. Yes, I think English needs a spelling reform. No, I do not mean that fine example of, er, jernulizm. I mean a real reform. One that has a snowflake's chance in hell of actually happening.

Of all the various "attempts" to reform English spelling, both the serious and the humorous, all have two fatal flaws. First is that they all try to take things as far as they can possibly go thereby making the spelling resemble current English as little as possible. Second is that none of them are done by linguists who'd know what the hell they're doing. A reasonable reform is a reform that makes as few changes as necessary to reach its goal. That's what reform means. Making drastic changes is called revolution. And for the record, no, I am not the one to devise such a reform, I only know just enough about English phonetics to get by.

There was a site I found a long while back — which, alas, I seem unable to locate presently — which had a simple program. All this program would do is take in a list of spelling-to-pronunciation rules and compare them against a dictionary. The author initially devised the program as a tool for generating fictional languages. He found however that, as memory serves, something like a hundred rules covers over 95% of the English language and all its "irregularities". This would seem to support the intuition that many native speakers who are opposed to a spelling reform have that English spelling is not so forgone as to require twenty years of reform. A reform that I would suggest would not go so far as to try to break those hundred rules down into a much smaller number, but rather would only seek out the rogue 5% and change them to follow the rules, maybe simplifying a few esoteric rules along the way.

The big problem with such a minimalistic reform is that it wouldn't change things (well, that is the point). What I mean is, what they teach as "English" in schools has at best a remote link to the language that is being spoken in ever further corners of the globe. Even if a mere 60 or 80 rules covered every single word in the English language, it is unlikely that English teachers would inform their students of that, or if they taught them at all they'd only teach the first 15. The difficulty of the English language is not so much that it is irregular (though it is), it's that we refuse to teach it to anyone [1].

Most languages seek to primarily encode pronunciation in their spelling systems which is why they have "easier" spelling. When words are borrowed into the language, by the time they're accepted as "part of the language" instead of just being a loan word, their spelling is reformed to match the rules of the language. English however tends to prefer encoding etymology when it has to choose between that and pronunciation. When English is taught it should be required to teach the basic etymological history of the language (i.e. point out Old English, French, Latin, and Greek words). Knowing the origins of words helps a lot for narrowing down the possible spellings. There's no need to unify F and PH if you know when to expect which.

Other than teaching the fact that English really is pattern-full and has an actual history to why things are spelled the way they are, there are two big areas were I think English needs to be reformed. First is to actually admit to the quantity of vowels we have instead of blithely pretending there're only five. (For the curious, there are approximately ten for which cf. the near-complete minimal pairs: heed, hid, heyed, head, had, who'd, HUD, Hod, hawed. There's some contention about whether schwa (which most vowels destress to) and the mid-back-lax vowel should be considered the same or not, and some dialects don't distinguish the mid-back-tense and mid-back-lax vowels, but all the same: Five is a lie.) Having admitted this, the spellings for these vowels should be reformed so it's obvious how to pronounce them. I'm not saying here can be only one spelling for each sound, just that each spelling should have only one pronunciation. The majority of languages have between four and seven vowels— real vowels that is. One of the major obstacles to foreigners learning English is learning to deal with all those extra vowels. That we can't seem to keep the spellings straight only makes it all the harder.

Second, morphological spellings need to be cleaned up. Anecdotally, I think that morphological spelling is the real area that gets people caught up, not spelling of the basic words. One example of what I'm talking about is the subject-substanitive ending on verbs (i.e. verb becomes noun of person who does the verb). Most of the time this is -er, but sometimes it's -or. Another example is the "able" ending. Generally it's -able, but sometimes it's -ible, and sometimes it's -eable. Sure, there's an obscure pattern to which is correct, but it's obscure and not particularly reliable. Rules about doubling consonants when adding suffixes are yet another example; relatively rule driven, but with many senseless exceptions. Perhaps most importantly, few if any of these differences are pronounced any differently in the spoken language. These are the sorts of things that I think need to be cleaned up.

A challenge is that when altering spellings to better conform to pronunciation one must take into account the issue of dialects. Given the full breadth of differences between the Englishes (American, British, Australian, Kiwi,...) a reformed English would need to pick one of them just to narrow the scope enough that an equitable solution could be found. If we picked American, even there the breadth of dialects is amazing. Which is the reason why the reform could only be done by linguists who are quite knowledgeable about English's phonology and dialects. Anyone else would only look at their own pronunciation and come up with some incomprehensible system that would make us weep for the easy days of l33t and gyaru-moji.

The example of GH )
winterkoninkje: shadowcrane (clean) (Default)

So I've finally figured out where another of my spelling/pronunciation variants comes from ("towards", or any other "X-wards"; as opposed to the "-ward" variants). According to Jeremy Smith's American-British/British-American dictionary it's a Britishism. (As for "acrosst", "amongst", and the like the jury's still out. Jeremy Smith lists them as uncommon Americanisms but who knows.)

Spelling 'mistakes', or spelling 'reform'? )

End transmission.

winterkoninkje: shadowcrane (clean) (Default)

As I mentioned in my last post I spent much of yesterday in an extended web of internet spidering. Below are some of the paths this vaguery followed.

Geeks are renowned for their... peculiar social skills. And even within the realm of geekdom, some are considered particularly lacking. Michael Suileabhain-Wilson wrote an opinion piece discussing Five Geek Social Fallacies that lead to some of the more egregious examples of this

In a public service announcement on LiveJournal, [ profile] cerebrate mentioned that "mb" is an abbreviation for millibit which sparked a discussion about whether such a thing was even possible. Which in turn caused another discussion by [ profile] lederhosen on information entropy which I found surprisingly interesting. I say surprising because I frequently have little interest in higher-level (read: post-calculus, particularly logical) mathematics because they tend to be overly omphaloskeptic and offer very little of use to the non-theoretical (read: real) world. I'm not sure what it was particularly that I found so interesting about it, but it was... pleasant. If it strikes your fancy, Lederhosen also posted some followup links to Wikipedia and Shannon's seminal paper (Shannon being the one who came up with the whole idea).

One possible reason I found it interesting has to do with another post by [ profile] lederhosen regarding a linguistic side of the information theory branch of mathematics. In that post he lays out an example of designing a language of semaphores which I found particularly interesting for (a) its resemblance to "designing" the phonetics of a natural language and (b) how it seems to imply an emergent linguistic component to information.

Friendly friend page cut )
RSS Atom

March 2017



Page generated 23 Mar 2017 12:18 pm
Powered by Dreamwidth Studios